Resources
Database Credentialed Federated
MIMIC-IV-Ext-22MCTS: A 22 Millions-Event Temporal Clinical Time-Series Dataset with Relative Timestamp
Clinical risk prediction based on machine learning algorithms plays a vital role in modern healthcare. A crucial component in developing a reliable prediction model is a high-quality dataset with time series clinical events. In this work, we release…
clinical event annotation mimic time series temporal annotation
Published: Sept. 29, 2025. Version: 1.0.0 | DOI: 10.13026/dkj6-r828
Database Credentialed Federated
Multimodal Clinical Monitoring in the Emergency Department (MC-MED)
Emergency department (ED) patients often present with undiagnosed complaints, and can exhibit rapidly evolving physiology. Therefore, data from continuous physiologic monitoring, in addition to the electronic health record, is essential to understan…
Published: Sept. 25, 2025. Version: 1.0.1 | DOI: 10.13026/wvyw-g663
Database Credentialed Federated
RadVLM Instruction Dataset
We release the RadVLM instruction dataset, a large-scale resource used to train the RadVLM model on diverse radiology tasks. The dataset contains 1,115,021 image–instruction pairs spanning five task families: (i) report generation from frontal…
vision-language models medical ai chest x-rays
Published: Sept. 25, 2025. Version: 1.0.0 | DOI: 10.13026/et5g-h222
Database Credentialed Federated
MIMIC-Ext-DrugDetection
This project shares a large, annotated drug detection dataset created from MIMIC-III/IV discharge summaries. The dataset was developed to address the challenge of identifying substance use behaviors in Electronic Health Records (EHRs), where critica…
prescription opioid misuse cannabis benzodiazepine misuse ehr injection drug use heroin methamphetamine substance use multi-label cocaine drug detection polysubstance use mimic-iv mimic-iii clinical notes
Published: Sept. 25, 2025. Version: 1.0.0 | DOI: 10.13026/0kyx-r485
Database Restricted Federated
EchoNext: A Dataset for Detecting Echocardiogram-Confirmed Structural Heart Disease from ECGs
This dataset contains a de-identified collection of 100,000 12-lead electrocardiograms (ECGs) with paired structural heart disease (SHD) labels derived from echocardiography, collected at Columbia University Irving Medical Center. Each ECG is provid…
aortic stenosis deep learning health equity cardiovascular screening valvular heart disease heart failure digital health ecg machine learning ai model deployment left ventricular dysfunction artificial intelligence clinical decision support ai in healthcare population health electrocardiogram transthoracic echocardiogram structural heart disease
Published: Sept. 16, 2025. Version: 1.1.0 | DOI: 10.13026/3ykd-bf14
Database Credentialed Federated
RadGraph-XL: A Large-Scale Expert-Annotated Dataset for Entity and Relation Extraction from Radiology Reports
Radiology reports are essential for clinical care but pose challenges for automated processing due to their unstructured nature. Existing datasets like RadGraph-1.0 focus narrowly on chest X-rays (CXR), limiting their applicability. We introduce Rad…
Published: Sept. 12, 2025. Version: 1.0.0 | DOI: 10.13026/j8e7-pr22
Database Open Federated
Myocardial perfusion scintigraphy image database
This database provides a collection of myocardial perfusion scintigraphy images in DICOM format with all metadata and segmentations (masks) in NIfTI format. The images were obtained from patients undergoing scintigraphy examinations to investigate c…
myocardial perfusion systems modeling myocardial perfusion scintigraphy dicom metadata artificial intelligence ventricular walls coronary artery disease convolutional neural networks automated segmentation clinical diagnosis anonymization nifti
Published: Sept. 10, 2025. Version: 1.0.0 | DOI: 10.13026/ce2z-dw74
Database Restricted Federated
mcPHASES: A Dataset of Physiological, Hormonal, and Self-reported Events and Symptoms for Menstrual Health Tracking with Wearables
Individuals who menstruate are frequently led to believe that there is a standard menstrual cycle, typically characterized as 28 days in length with predictable and uniform patterns. This framing often emphasizes cycle dates as the only relevant met…
hormones menstrual health multimodal health wearables health sensor data womens health
Published: Sept. 10, 2025. Version: 1.0.0 | DOI: 10.13026/zx6a-2c81
Database Credentialed Federated
MIMIC-IV-Ext-Instr: A Dataset of 450K+ EHR-Grounded Instruction-Following Examples
Large language models (LLMs) have shown impressive capabilities in solving a wide range of tasks based on human instructions. However, developing a conversational AI assistant for electronic health record (EHR) data remains challenging due to the la…
medical question answering large language models instruction tuning
Published: Sept. 9, 2025. Version: 1.0.0 | DOI: 10.13026/e5bq-pr14
Database Restricted Federated
HYAMD High-Resolution Fundus Image Dataset for age related macular degeneration (AMD) Diagnosis
The Hillel Yaffe Age Related Macular Degeneration (HYAMD) longitudinal dataset comprises of 1,560 Digital Fundus Images (DFIs) of 325 patients examined at the Hillel Yaffe Medical Center (Hadera, Israel, Helsinki approval number 0048-24-HYMC) provid…
Published: Sept. 9, 2025. Version: 1.0.0 | DOI: 10.13026/ydf1-z238