Our complete COVID-19 dataset is a collection of the COVID-19 data maintained by Our World in Data. It is updated daily and includes data on confirmed cases, deaths, and testing, as well as other variables of potential interest.
We will continue to publish up-to-date data on confirmed cases, deaths, and testing, throughout the duration of the COVID-19 pandemic.
- Confirmed cases and deaths: our data comes from the European Centre for Disease Prevention and Control (ECDC). We discuss how and when the ECDC collects and publishes this data here. The cases & deaths dataset is updated daily. Note: the number of cases or deaths reported by any institution—including the ECDC, the WHO, Johns Hopkins and others—on a given day does not necessarily represent the actual number on that date. This is because of the long reporting chain that exists between a new case/death and its inclusion in statistics. This also means that negative values in cases and deaths can sometimes appear when a country sends a correction to the ECDC, because it had previously overestimated the number of cases/deaths.
- Testing for COVID-19: this data is collected by the Our World in Data team from official reports; you can find further details in our post on COVID-19 testing, including our checklist of questions to understand testing data, information on geographical and temporal coverage, and detailed country-by-country source information. The testing dataset is updated around twice a week.
- Other variables: this data is collected from a variety of sources (United Nations, World Bank, Global Burden of Disease, Blavatnik School of Government, etc.). More information is available in our codebook.
Our complete COVID-19 dataset is available in CSV, XLSX, and JSON formats, and includes all of our historical data on the pandemic up to the date of publication.
The CSV and XLSX files follow a format of 1 row per location and date. The JSON version is split by country ISO code, with static variables and an array of daily records.
The variables represent all of our main data related to confirmed cases, deaths, and testing, as well as other variables of potential interest.
As of 16 June 2020, the columns are: iso_code
, continent
, location
, date
, total_cases
, new_cases
, total_deaths
, new_deaths
, total_cases_per_million
, new_cases_per_million
, total_deaths_per_million
, new_deaths_per_million
, total_tests
, new_tests
, new_tests_smoothed
, total_tests_per_thousand
, new_tests_per_thousand
, new_tests_smoothed_per_thousand
, tests_units
, stringency_index
, population
, population_density
, median_age
, aged_65_older
, aged_70_older
, gdp_per_capita
, extreme_poverty
, cvd_death_rate
, diabetes_prevalence
, female_smokers
, male_smokers
, handwashing_facilities
, hospital_beds_per_thousand
, life_expectancy
A full codebook is made available, with a description and source for each variable in the dataset.
If you are interested in the individual files that make up the complete dataset, or more detailed information, other files can be found in the subfolders:
ecdc
: data from the European Centre for Disease Prevention and Control, related to confirmed cases and deaths;testing
: data from various official sources, related to COVID-19 tests performed in each country. This folder contains two files with more detailed information:covid-testing-all-observations.csv
includes, for each historical observation, the source of the individual data point, and sometimes notes on data collection;covid-testing-latest-data-source-details.csv
includes, for each country in our testing dataset, the latest figures and a detailed description of how the country’s data is collected.
who
: data from the World Health Organization, related to confirmed cases and deaths—we have stopped using and updating this data since 18 March 2020.
- Up until 17 March 2020, we were using WHO data manually extracted from their daily situation report PDFs.
- From 19 March 2020, we started relying on data published by the European CDC. We wrote about why we decided to switch sources.
- On 3 April 2020, we added country-level time series on COVID-19 tests.
- On 16 April 2020, we made available a complete dataset of all of our main variables related to confirmed cases, deaths, and tests.
- On 25 April 2020, we added rows for "World" and "International" to our complete dataset. The
iso_code
column for "International" is blank, and for "World" we useOWID_WRL
. - On 9 May 2020, we added new variables related to demographic, economic, and public health data to our complete dataset.
- On 19 May 2020, we added 2 variables related to testing:
new_tests_smoothed
andnew_tests_smoothed_per_thousand
. To generate them we assume that testing changed equally on a daily basis over any periods in which no data was reported (as not all countries report testing data on a daily basis). This produces a complete series of daily figures, which is then averaged over a rolling 7-day window. - On 23 May 2020, we added a JSON version of our complete dataset.
- On 4 June 2020, we added a
continent
column to our complete dataset. - On 1 July 2020, we changed the format of the JSON version of our complete dataset to normalize the data and reduce file size.
- We standardize names of countries and regions. Since the names of countries and regions are different in different data sources, we standardize all names to the Our World in Data standard entity names.
- We may correct or discard inconsistencies that we detect in the original data.
- Testing data is collected from many different sources. A detailed documentation for each country is available in our post on COVID-19 testing.
- Where we collect multiple time series for a given country in our testing data (for example: for the United States, we collect data from both the CDC, and the COVID Tracking Project), our complete COVID-19 dataset only includes the most complete, or, if equally complete, data on the number of people tested rather than the number of tests/samples/swabs processed. The list of 'secondary' test series (those removed) is located in
scripts/input/owid/secondary_testing_series.csv
.
The /public
path of this repository is hosted at https://covid.ourworldindata.org/
. For example, you can access the CSV for the complete dataset at https://covid.ourworldindata.org/data/owid-covid-data.csv
.
We have the goal to keep all stable URLs working, even when we have to restructure this repository. If you need regular updates, please consider using the covid.ourworldindata.org
URLs rather than pointing to GitHub.
All of Our World in Data is completely open access and all work is licensed under the Creative Commons BY license. You have the permission to use, distribute, and reproduce in any medium, provided the source and authors are credited.
This data has been collected, aggregated, and documented by Diana Beltekian, Daniel Gavrilov, Charlie Giattino, Joe Hasell, Bobbie Macdonald, Edouard Mathieu, Esteban Ortiz-Ospina, Hannah Ritchie, Max Roser.
The mission of Our World in Data is to make data and research on the world’s largest problems understandable and accessible. Read more about our mission.