Skip to content
This repository has been archived by the owner on Mar 29, 2023. It is now read-only.

Latest commit

 

History

History
427 lines (280 loc) · 15 KB

old-slides.md

File metadata and controls

427 lines (280 loc) · 15 KB

% Data Science
Show & Tell % Enrico Spinielli % June 9, 2016

{data-background-image="images/mindmap.jpg" data-background-size="1100px" data-background-position="bottom"}

Given:

  • life is short
  • I am lazy
  • You should not lie
  • Humans are intelligent (w/ caveats ;-)
  • ... and not all of them are working at Eurocontrol

...it follows

  • I'll (procastinate on boring stuff and only) work on useful/fun projects
  • Automation saves me from repeating boring and/or forgotten tasks
  • I'll be open to let others critisize/scrutinize/learn
  • ...and I'll learn back from them
  • I'll strive to produce truthful explanations/visualizations

Let's Do It!

The Axioms (IMHO) {data-background-image="images/mindmap.jpg" data-background-size="600px" data-background-position="bottom"}

  • Value of data --> visualization
  • Visualization --> WWW
  • Make data available
- *no Web*: then you do not exist, i.e. EC/PRB/PRU - *no boring stuff*: enough of it, do better. - *truthful*: no evil - *visualization*: humans perception & best practices! - data availability!

The Plan
(Jan 2015)

  • Generate a (static) website for the PRU
  • Version control it all
  • Automate!
- *static*: no need of server, no authentication, no hacks! - *version control*: done by systems not humans, i.e. naming convention in folders... - *automation*: the only way to scale

Now
one year and a half later {data-background-image="images/website-home.png" data-background-size="850px" data-background-position="bottom"}

Sections {data-background-image="images/website-bar.png" data-background-size="900px"}

Graphs {data-background-video="media/graphs.mp4" data-background-video-loop="true"}

Data {data-background-image="images/website-data.png" data-background-size="1100px"}

Metadata {data-background-image="images/website-metadata1.png" data-background-size="900px"}

...Metadata {data-background-image="images/website-metadata2.png" data-background-size="900px"}

...still Metadata {data-background-image="images/website-metadata3.png" data-background-size="900px"}

Studies {data-background-video="media/flows.mp4" data-background-video-loop="true"}

- inspired by [Global Migration of People](http://www.global-migration.info/) - ...which was inspired by [Circos](http://circos.ca/)

(interactive) Maps {data-background-video="media/map.mp4" data-background-video-loop="true"}

Features

Editing

  • easy, i.e. textual (ASCII, no HTML): separate content from style
  • nice Math (via MathJax): $$f(x)=\sum_{n=0}^\infty\frac{f^{(n)}(a)}{n!}(x-a)^n$$
  • bibliography: cite and style
  • templates for different kind of pages (Definitions, list of ANSP's, RN's)

Markdown {height=100px width="auto"}

No need to edit in HTML: we (mainly) use Markdown (from Pandoc)

## Methodology

[Horizontal en-route flight efficiency methodology](/r/m/hfe_pi.html)
is fully consistent with the Single European Sky (SES)
Performance Scheme [see {% cite pru-hfe-pi --file aviation %}].

## Column naming and types

### HFE data

{:.metatable}
| Column name | Src | Label     | Column description    | Example |
|-------------|-----|-----------|-----------------------|---------|
| YEAR        | NM  | YEAR      | Reference year        | 2014    |
| MONTH_NUM   | NM  | MONTH_NUM | Month (numeric)       | 9       |
| MONTH_MON   | NM  | MONTH_MON | Month (3-letter code) | JAN     |

Biblio {data-background-image="images/website-biblio.png" data-background-size="820px"}

[//c]:{height=700px width="auto"}

Versioning

Git {height=100px width="auto"} and GitHub {height=100px width="auto"}

{height=700px width="auto"}

...and web serving

naming convention (GitHub Pages) <user>.github.io

{height=700px width="auto"}

Tech Docs

{height=700px width="auto"}

Workflows

Branching scheme (using Git)

{height=700px width="auto"}

Release

Pull Request from GitHub

{height=700px width="auto"}

Bugs

Issues from GitHub

{height=700px width="auto"}

Generation

  • from DB queries to website: scripts
  • Jekyll: MD -> HTML
  • Pandoc: MD -> PDF
  • some from Rmarkdown/[knitr] in the near future

CI

Automatic builds and deployment using TravisCI{target="_blank"}

{height=700px width="auto"}

- But we **NEED MORE** to scale: for example checks on data consistency

Demo time

the site (running locally){target="_blank"}

the official PRU site, http://ansperformance.eu{target="_blank"}

ToDo's

DB {data-background-image="images/mindmap.jpg" data-background-size="600px" data-background-position="bottom"}

  • new schema for production: PRUPROD
  • use current ones for development (PRUDEV) and testing (PRUTEST)
  • version control [PL]SQL code, i.e. which code was used to produc which indicators
  • version control the DB used for prod: regulatory repository

Data

  • improved the Meta part of it: definitions, methodology
  • add more data and (web) API (see ICAO iSTARS)
  • generate the spreadsheets if CSV files/API are not enough
- Metadata is to be transparent and to avoid confusion, i.e. define what you name/use (delay, trajectory, FIR) - the API is to make the data available: remember we are not the only smart ones around

Viz {data-background-video="media/bullets.mp4" data-background-video-loop="true"}

More Viz

  • more Studies/Articles w/ interactivity (see NYT, WP)
  • more thinking of what is worth plotting
  • more graphs in Graphs
  • one year old experiment www{target="_blank"} or (8989){target="_blank"}
  • a recent one w/ STATFOR www{target="_blank"} or (8990){target="blank"}
# Wild thoughts * personally I am not interested in BI or industrial-like dashboards * I know that little is used of our NMIR

Just mine ones

  • PRR live in the website and PDF generated from the source in git repo
  • add Jypiter notebooks to the website for case studies

Conclusions {.conclusions}

We want you! {.slide: data-background="images/we-want-you.png" data-background-transition="zoom" data-state="wewantyou"}

<style>html.wewantyou .backgrounds {opacity: 0.15;}</style>
  • Share knowledge (or lack of)
  • Learn from and know each other
  • Discover internal and external datasets
  • critisize & propose alternatives
  • signal things you saw and would like to see implemented in our site
    For example NYT, Bloomberg (1, 2), WP, ProPublica, The Guardian, Financial Times ... have fantastic infographics

We hear you!

  • emails with questions, proposals are a good start
  • you are always welcome to come and chat (but bring your coffee)
  • present at the next Show & Tell

Don't wait...
Do it!

{data-background="images/wordcloud.svg" data-background-transition="zoom"}

References and Inspirations

Tools

* Google Charts cannot be run offline * GCharts make your life difficult if you want to load data locally, i.e. CSV instead of Google Spreadsheets

Social

Books

Yes, you still have to study!

  • Tufte, Edward
  • Cairo, Alberto
  • Few, Stephen

Credit Where it is Due

Trivia

Automation 1

xkdc 1319 and explanation

{height=500px width="auto"}

Title text: 'Automating' comes from the roots 'auto-' meaning 'self-', and 'mating', meaning 'screwing'.

Automation 2

xkdc 1205 and explanation

{height=500px width="auto"}

Title text: Don't forget the time you spend finding the chart to look up what you save. And the time spent reading this reminder about the time spent. And the time trying to figure out if either of those actually make sense. Remember, every second counts toward your life total, includingthese right now.

Correlation

xkdc 1205 and explanation

{height=400px width="auto"}

Title text: Correlation doesn't imply causation, but it does waggle its eyebrows suggestively and gesture furtively while mouthing 'look over there'.

Data Accuracy 1

Dilbert 2008-05-07

{height=400px width="auto"}

Data Accuracy 2

Dilbert 2008-05-08

{height=400px width="auto"}

Convincing

xkdc 833 and explanation

{height=500px width="auto"}

Title text: Don't forget the time you spend finding the chart to look up what you save. And the time spent reading this reminder about the time spent. And the time trying to figure out if either of those actually make sense. Remember, every second counts toward your life total, includingthese right now.