Resources


Database Restricted Federated

TN-Mammo: A Multi-view Mammography Dataset for Breast Density Classification

Source: Physionet

Breast cancer is one of the most common types of cancer among women, leading to a growing and essential need for early and precise detection. A variety of machine learning techniques have been demonstrating great promise in improving diagnostic accu…

Published: Oct. 4, 2025. Version: 1.0.0 | DOI: 10.13026/1kx0-xc60


Database Restricted Federated

Organ Retrieval and Collection of Health Information for Donation (ORCHID)

Source: Physionet

There are well-documented inefficiencies and inequities in the current system of deceased donor organ transplantation. While much prior research has focused on designing better allocation systems to distribute donated organs, more can be done to stu…

organ procurement organizations organ transplantation

Published: Sept. 30, 2025. Version: 2.1.1 | DOI: 10.13026/rfeq-j318


Database Open Federated

MIMIC-IV demo data in the Medical Event Data Standard (MEDS)

Source: Physionet

This dataset is an automated ETL conversion of the MIMIC-IV Clinical Database Demo into the Medical Event Data Standard (MEDS). MEDS is a data schema for storing streams of medical events such as those sourced from Electronic Health Records or …

electronic health record mimic meds machine learning critical care medical event data standard ehr

Published: Sept. 30, 2025. Version: 0.0.1 | DOI: 10.13026/t2y8-ea41


Database Credentialed Federated

MIMIC-IV-Ext-22MCTS: A 22 Millions-Event Temporal Clinical Time-Series Dataset with Relative Timestamp

Source: Physionet

Clinical risk prediction based on machine learning algorithms plays a vital role in modern healthcare. A crucial component in developing a reliable prediction model is a high-quality dataset with time series clinical events. In this work, we release…

clinical event annotation mimic time series temporal annotation

Published: Sept. 29, 2025. Version: 1.0.0 | DOI: 10.13026/dkj6-r828


Database Credentialed Federated

RadVLM Instruction Dataset

Source: Physionet

We release the RadVLM instruction dataset, a large-scale resource used to train the RadVLM model on diverse radiology tasks. The dataset contains 1,115,021 image–instruction pairs spanning five task families: (i) report generation from frontal…

vision-language models medical ai chest x-rays

Published: Sept. 25, 2025. Version: 1.0.0 | DOI: 10.13026/et5g-h222


Database Credentialed Federated

Multimodal Clinical Monitoring in the Emergency Department (MC-MED)

Source: Physionet

Emergency department (ED) patients often present with undiagnosed complaints, and can exhibit rapidly evolving physiology. Therefore, data from continuous physiologic monitoring, in addition to the electronic health record, is essential to understan…

Published: Sept. 25, 2025. Version: 1.0.1 | DOI: 10.13026/wvyw-g663


Database Credentialed Federated

MIMIC-Ext-DrugDetection

Source: Physionet

This project shares a large, annotated drug detection dataset created from MIMIC-III/IV discharge summaries. The dataset was developed to address the challenge of identifying substance use behaviors in Electronic Health Records (EHRs), where critica…

prescription opioid misuse cannabis benzodiazepine misuse ehr injection drug use heroin methamphetamine substance use multi-label cocaine drug detection polysubstance use mimic-iv mimic-iii clinical notes

Published: Sept. 25, 2025. Version: 1.0.0 | DOI: 10.13026/0kyx-r485


Database Restricted Federated

EchoNext: A Dataset for Detecting Echocardiogram-Confirmed Structural Heart Disease from ECGs

Source: Physionet

This dataset contains a de-identified collection of 100,000 12-lead electrocardiograms (ECGs) with paired structural heart disease (SHD) labels derived from echocardiography, collected at Columbia University Irving Medical Center. Each ECG is provid…

aortic stenosis deep learning health equity cardiovascular screening valvular heart disease heart failure digital health ecg machine learning ai model deployment left ventricular dysfunction artificial intelligence clinical decision support ai in healthcare population health electrocardiogram transthoracic echocardiogram structural heart disease

Published: Sept. 16, 2025. Version: 1.1.0 | DOI: 10.13026/3ykd-bf14


Database Credentialed Federated

RadGraph-XL: A Large-Scale Expert-Annotated Dataset for Entity and Relation Extraction from Radiology Reports

Source: Physionet

Radiology reports are essential for clinical care but pose challenges for automated processing due to their unstructured nature. Existing datasets like RadGraph-1.0 focus narrowly on chest X-rays (CXR), limiting their applicability. We introduce Rad…

Published: Sept. 12, 2025. Version: 1.0.0 | DOI: 10.13026/j8e7-pr22


Database Restricted Federated

mcPHASES: A Dataset of Physiological, Hormonal, and Self-reported Events and Symptoms for Menstrual Health Tracking with Wearables

Source: Physionet

Individuals who menstruate are frequently led to believe that there is a standard menstrual cycle, typically characterized as 28 days in length with predictable and uniform patterns. This framing often emphasizes cycle dates as the only relevant met…

hormones menstrual health multimodal health wearables health sensor data womens health

Published: Sept. 10, 2025. Version: 1.0.0 | DOI: 10.13026/zx6a-2c81