These scripts are intended to help analyze data from several sources, including:
- http://www.data.gouv.fr (French government Open Data)
- data appearing on EU Open Data Portal (EU ODP) https://data.europa.eu/
- now obsolete data from USA at https://covidtracking.com
This contains:
- a small library for pulling and updating data, managing a cache of the most recent data,
- some exploitation tools using python's ̀
pandas
,matplotlib
andseaborn
packages, - additional tools based on Scikit Learn
- jupyter notebook(s),
Jupyter notebooks included work with COVID-19 / SARS-cov-19 data provided by :
- Santé publique France and INSEE, which include:
- EU Open Data Portal
- Data for USA from https://covidtracking.com
See README-gallery.md for more, and README-Vaccine.md concerning vaccination data.
Comparison between largest 'départements' |
From 'data.gouv.fr' |
Vaccination |
Impact of vaccination on hospitalizations and deaths | |
Comparison between some European countries |
From 'data.europa.eu' |
- Jupyter notebook(s): display data. Automatically make use of the latest version of the data provided, which is cached locally with update synchronization with the remote site (automatic, after prescribed time interval)
- Python modules :
-
manage a local repository with files by handling file version/timestamp in file name.
- This is managed as a cache, where only a specified number of versions of each file are kept
- We limit cache size by erasing older versions
-
automate the transfer of files with information/directories located on the remote site:
- identified by badge or tag on doc.data.gouv.fr.
This uses the API (
http
based) documented at
- More details in README-Bug-X-DataFr.md.
- identified by badge or tag on doc.data.gouv.fr.
This uses the API (
-
identified by a SPARQL filtering regular expression on https://data.europa.eu/ SPARQL entry point, using the
-
permit some inquiries on the downloaded/cached meta data describing the data loaded from the remote site
-
figureHelpers.py
module:- some convenience tools to facilitate/automate making
matplotlib
figures. (Also looking forwards towards ̀seaborn`... after some wait... )
- some convenience tools to facilitate/automate making
-
perform some data analyses (model parameter fitting), see: ./JupySessions/FIT-Data-FromGouv.ipynb
-
-
For more information on changes (and bugs), see the git log.
-
Concerning the French site
data.gouv.fr
, we evolve towards support of a larger subset of the API -
Concerning the European site ̀
data.europa.eu
:- an issue has been corrected, see README-Bug-X-EuRDF.md
- Change in format of the data files collected (introducing weekly data) is in process. More in README-Data.md.
-
TBD: reduce redundancy (same files with different extensions) in cached data, for now caches take more space than really needed
- This requires Python 3, and has been tested on:
Python 3.6.5 | Ubuntu 18.04 LTS | before 3Q2020 |
Python 3.8.2 | Ubuntu 20.04 LT | before 1Q2021 |
Python 3.8.6 | Ubuntu 20.10 | Current |
- In the current version, the library is dependent on some features from the IPython package, which comes with Jupyter. This constraint may be removed in the future.
This is used with Jupyter as natively integrated / installed in the Ubuntu 20.04 LTS distribution.
pip install -U -R requirements.txt
This is provided as is, see the LICENSE file. Development is ongoing.
-
https://github.com/alichnewsky/covid : basic script to work with the Novel Coronavirus (COVID-19) cases dataset provided by JHU CSSE
-
in France
-
https://www.academie-sciences.fr/fr/: many references
-
https://www.eficiens.com/coronavirus-statistiques/#evolution-contamination-france: well represented statistics
-
-
other Covid related sites/developments: