Database Credentialed Access

COVID-19 Epidemiology and Vaccination Dataset

January Adams

Published: June 6, 2022. Version: 1.0.0

When using this resource, please cite: (show more options)
Adams, J. (2022). COVID-19 Epidemiology and Vaccination Dataset (version 1.0.0). Health Data Nexus.

Additionally, please cite the original publication:

Berry, I., O’Neill, M., Sturrock, S.L. et al. A sub-national real-time epidemiological and vaccination database for the COVID-19 pandemic in Canada. Sci Data 8, 173 (2021).


This dataset is a daily snapshot of the Canadian COVID-19 epidemiological dataset collected by the COVID-19 Canada Open Data Working Group. A team of volunteers have collected daily data from governmental and non-governmental sources, including a line list of cases and a time-series of COVID-19 recoveries, testing, and vaccine doses.

Due to the changing nature of COVID-19 coverage in Canada, the dataset was retired on May 4th, 2022 and replaced with a separate data stream, found here. This dataset contains the final snapshot of the original source, found on GitHub here.

This dataset is associated with the publication "A sub-national real-time epidemiological and vaccination database for the COVID-19 pandemic in Canada" (Berry et al., 2021).


The COVID-19 pandemic began in December 2019 and spread rapidly throughout the world. In the early days of the pandemic, the need for access to large quantities of high quality data has become apparent. The COVID-19 Canada Open Data Working Group (CCODWG) was created to collect COVID-19 case and mortality (and later, vaccination) data at the national and subnational level. This data has been collected daily and updated on GitHub.

At the end of 2021 and the beginning of 2022, many jurisdictions changed their policies around reporting COVID-19. As a result, the CCODWG has changed its approach, leading to a discontinuation of live updates to the dataset on May 4th.


The CCODWG is a volunteer group that collects COVID-19 data from publicly available sources, including government reports and verified news media. Quoting from the paper by Berry et al:

"First, official government sources (such as press releases/press conferences from regional ministries of health) are reviewed in full and any COVID-19-related case and mortality announcements are identified as the gold standard for data inclusion. Second, additional information is identified using purposive search methods for COVID-19-related articles and online reports from accredited national and local news agencies. Finally, we identify updates reported by official social media accounts (e.g., Twitter) that are verifiably linked to governmental or public health institutions (e.g., ministries of health, chief medical officers of health), and these are included if no alternative sources are found as well as to supplement existing information. To improve data quality and auditability, the corresponding sources are required to be included as a reference for each data entry. Aggregated provincial/territorial recovery, testing, and vaccine dose distribution and administration data are also identified using this hierarchical process."

Data Description

The data includes three levels of geographic resolution: country-level, province-level, and health region-level, which are a subdivision of provinces. Health regions are also referred to as zones or public health units. The borders of public health units can change, often when one health region is subdivided into larger regions. Cases and mortality in the dataset are retroactively updated to these finer boundaries where possible; where this is not possible, the larger aggregated regions are used for the time series.

A detailed description of the dataset is included in the CCODWG Technical Report.

Usage Notes

This dataset benefits from having data available at a relatively fine geographic level. This resolution allows users to identify COVID-19 hotspots and look at the changes in hotspots over time. Additionally, the long time horizon over which the data is collected allows for the exploration of trends over time, such as the introduction of vaccinations and non-pharmaceutical interventions.

Some notable shortcomings are the lack of accurate information on patient demographics, due to few jurisidctions collecting this data. In addition, the changing nature of data collection by jurisdiction, particularly at the end of 2021 and the beginning of 2022, may lead to inaccuracies in case data during this time period. Case data may be much less valuable than hospital and vaccination data during this period.


Release Notes

The authors declare no competing interests. All datasets were collected from existing publicly available data.


The authors declare no competing interests. All datasets were collected from existing publicly available data.

Conflicts of Interest

No conflicts of interest were presented by the authors of the dataset, and no conflicts of interest are present in uploading it.


Access Policy:
Only credentialed users who sign the DUA can access the files.

License (for files):
PhysioNet Contributor Review Health Data License 1.5.0

Data Use Agreement:
T-CAIREM Data Use Agreement

Required training:
CITI Data or Specimens Only Research

Corresponding Author
You must be logged in to view the contact information.