[WIP] Refactor aggregation script + new features #55

allejo · 2017-03-15T00:05:05Z

DO NOT MERGE

To Do:

Refactor and reorganize all of the code in the aggregation script
Add more unit tests + enable those unit tests in Travis
Add aggregation bit to generate new file for top domains
Offer either JSON or CSV files for all downloadable data (some are currently only CSV in the downloads section)
Final testing (get live JSON files and run them through this script to make sure the results are the same

allejo · 2017-03-15T00:05:23Z

@thekaveman can you take a look at the new aggregation script? It'd be best to just look at the current script instead of trying to make sense of the diff.

Also, is there anything else I should add to the to do list?

thekaveman · 2017-03-15T00:08:41Z

I will take a look! Thanks dude.

- Add ignore rules for common compressions used for backed up data during development - Ignore Python's cache - Ignore pyenv definition - Ignore Visual Studio Code's project settings

- Rewrite and optimize the entire script - Add basic unit tests for the core functions of the script that handle the actual data manipulation

The Makefile's 'fetch' command should respective the REALTIME variable to replicate the prod env

Relying on the data in the filesystem is only reliable when working with a clean slate. However, deleting websites will leave old data behind, so instead of checking the filesystem, use Jekyll generated files for their actual purpose: being the authoritative source of sites & reports. Fixes #57

thekaveman · 2017-10-26T23:54:15Z

App_Data/jobs/triggered/aggregate/job.py

-        except IOError:
-            pass
+    # Reports that will not be aggregated by this script
+    ignored_reports = []


I know we had this in the previous version - we haven't used it though, right? And it isn't being populated from any environment variables or elsewhere. Do we need this right now?

If not, we can also cleanup below L176-177

thekaveman · 2017-10-26T23:59:36Z

App_Data/jobs/triggered/aggregate/job.py

+    }
+
+    for k in aggregationDefinitions:
+        v = aggregationDefinitions[k]


for k, v in aggregationDefinitions.items():

allejo self-assigned this Mar 15, 2017

allejo requested a review from thekaveman March 15, 2017 00:05

allejo force-pushed the feature/aggregate-script-46 branch from 1177e88 to 546b6dd Compare March 15, 2017 00:21

allejo and others added 6 commits October 20, 2017 22:48

Standardize whitespace in Python script

9b236f1

Update .gitignore

cb43660

- Add ignore rules for common compressions used for backed up data during development - Ignore Python's cache - Ignore pyenv definition - Ignore Visual Studio Code's project settings

Huge refactor/update for the aggregate script

b39b7ba

- Rewrite and optimize the entire script - Add basic unit tests for the core functions of the script that handle the actual data manipulation

Rename variables.json to env.json

9acacd2

Get top domains from realtime data

47587ac

Dev Makefile should respect realtime policies

60d487d

The Makefile's 'fetch' command should respective the REALTIME variable to replicate the prod env

allejo force-pushed the feature/aggregate-script-46 branch from 2eb7dc8 to 60d487d Compare October 21, 2017 05:50

thekaveman reviewed Oct 27, 2017

View reviewed changes

allejo removed their assignment Mar 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Refactor aggregation script + new features #55

[WIP] Refactor aggregation script + new features #55

allejo commented Mar 15, 2017

allejo commented Mar 15, 2017

thekaveman commented Mar 15, 2017

thekaveman Oct 26, 2017

thekaveman Oct 26, 2017

[WIP] Refactor aggregation script + new features #55

Are you sure you want to change the base?

[WIP] Refactor aggregation script + new features #55

Conversation

allejo commented Mar 15, 2017

allejo commented Mar 15, 2017

thekaveman commented Mar 15, 2017

thekaveman Oct 26, 2017

Choose a reason for hiding this comment

thekaveman Oct 26, 2017

Choose a reason for hiding this comment