Resources
Database Credentialed Federated
MIMIC-Ext-DrugDetection
This project shares a large, annotated drug detection dataset created from MIMIC-III/IV discharge summaries. The dataset was developed to address the challenge of identifying substance use behaviors in Electronic Health Records (EHRs), where critica…
prescription opioid misuse cannabis benzodiazepine misuse ehr injection drug use heroin methamphetamine substance use multi-label cocaine drug detection polysubstance use mimic-iv mimic-iii clinical notes
Published: Sept. 25, 2025. Version: 1.0.0 | DOI: 10.13026/0kyx-r485
Database Credentialed Federated
RadVLM Instruction Dataset
We release the RadVLM instruction dataset, a large-scale resource used to train the RadVLM model on diverse radiology tasks. The dataset contains 1,115,021 image–instruction pairs spanning five task families: (i) report generation from frontal…
vision-language models medical ai chest x-rays
Published: Sept. 25, 2025. Version: 1.0.0 | DOI: 10.13026/et5g-h222
Database Restricted Federated
EchoNext: A Dataset for Detecting Echocardiogram-Confirmed Structural Heart Disease from ECGs
This dataset contains a de-identified collection of 100,000 12-lead electrocardiograms (ECGs) with paired structural heart disease (SHD) labels derived from echocardiography, collected at Columbia University Irving Medical Center. Each ECG is provid…
aortic stenosis deep learning health equity cardiovascular screening valvular heart disease heart failure digital health ecg machine learning ai model deployment left ventricular dysfunction artificial intelligence clinical decision support ai in healthcare population health electrocardiogram transthoracic echocardiogram structural heart disease
Published: Sept. 16, 2025. Version: 1.1.0 | DOI: 10.13026/3ykd-bf14
Database Credentialed Federated
RadGraph-XL: A Large-Scale Expert-Annotated Dataset for Entity and Relation Extraction from Radiology Reports
Radiology reports are essential for clinical care but pose challenges for automated processing due to their unstructured nature. Existing datasets like RadGraph-1.0 focus narrowly on chest X-rays (CXR), limiting their applicability. We introduce Rad…
Published: Sept. 12, 2025. Version: 1.0.0 | DOI: 10.13026/j8e7-pr22
Database Open Federated
Myocardial perfusion scintigraphy image database
This database provides a collection of myocardial perfusion scintigraphy images in DICOM format with all metadata and segmentations (masks) in NIfTI format. The images were obtained from patients undergoing scintigraphy examinations to investigate c…
myocardial perfusion systems modeling myocardial perfusion scintigraphy dicom metadata artificial intelligence ventricular walls coronary artery disease convolutional neural networks automated segmentation clinical diagnosis anonymization nifti
Published: Sept. 10, 2025. Version: 1.0.0 | DOI: 10.13026/ce2z-dw74
Database Restricted Federated
mcPHASES: A Dataset of Physiological, Hormonal, and Self-reported Events and Symptoms for Menstrual Health Tracking with Wearables
Individuals who menstruate are frequently led to believe that there is a standard menstrual cycle, typically characterized as 28 days in length with predictable and uniform patterns. This framing often emphasizes cycle dates as the only relevant met…
hormones menstrual health multimodal health wearables health sensor data womens health
Published: Sept. 10, 2025. Version: 1.0.0 | DOI: 10.13026/zx6a-2c81
Database Restricted Federated
HYAMD High-Resolution Fundus Image Dataset for age related macular degeneration (AMD) Diagnosis
The Hillel Yaffe Age Related Macular Degeneration (HYAMD) longitudinal dataset comprises of 1,560 Digital Fundus Images (DFIs) of 325 patients examined at the Hillel Yaffe Medical Center (Hadera, Israel, Helsinki approval number 0048-24-HYMC) provid…
Published: Sept. 9, 2025. Version: 1.0.0 | DOI: 10.13026/ydf1-z238
Database Credentialed Federated
MIMIC-IV-Ext-Instr: A Dataset of 450K+ EHR-Grounded Instruction-Following Examples
Large language models (LLMs) have shown impressive capabilities in solving a wide range of tasks based on human instructions. However, developing a conversational AI assistant for electronic health record (EHR) data remains challenging due to the la…
medical question answering large language models instruction tuning
Published: Sept. 9, 2025. Version: 1.0.0 | DOI: 10.13026/e5bq-pr14
Database Open Federated
MIMIC-IV Clinical Database Demo on FHIR
Interoperability of healthcare data has become increasingly important given the increase in deployment of data driven algorithms in clinical settings. The Fast Healthcare Interoperability Resources (FHIR) standard has emerged as a promising mechanis…
electronic health records fhir mimic
Published: Aug. 27, 2025. Version: 2.1.0
Database Restricted Federated
Community-Acquired Pneumonia, Endotypes and Phenotypes (NACef): Prospective, observational cohort study of Translational Medicine
Community-Acquired Pneumonia (CAP) remains a prominent infectious process associated with elevated in-hospital morbidity and mortality rates. Through the exploration of phenotypes, endotypes, and biomarkers, it becomes feasible to identify individua…
Published: Aug. 22, 2025. Version: 2.0.1 | DOI: 10.13026/4y3t-pq44