Skip to content
This repository has been archived by the owner on Apr 14, 2023. It is now read-only.

[WIP] Refactor aggregation script + new features #55

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

allejo
Copy link
Collaborator

@allejo allejo commented Mar 15, 2017

DO NOT MERGE

To Do:

  • Refactor and reorganize all of the code in the aggregation script
  • Add more unit tests + enable those unit tests in Travis
  • Add aggregation bit to generate new file for top domains
  • Offer either JSON or CSV files for all downloadable data (some are currently only CSV in the downloads section)
  • Final testing (get live JSON files and run them through this script to make sure the results are the same

@allejo allejo self-assigned this Mar 15, 2017
@allejo allejo requested a review from thekaveman March 15, 2017 00:05
@allejo
Copy link
Collaborator Author

allejo commented Mar 15, 2017

@thekaveman can you take a look at the new aggregation script? It'd be best to just look at the current script instead of trying to make sense of the diff.

Also, is there anything else I should add to the to do list?

@thekaveman
Copy link
Contributor

I will take a look! Thanks dude.

@allejo allejo force-pushed the feature/aggregate-script-46 branch from 1177e88 to 546b6dd Compare March 15, 2017 00:21
allejo and others added 6 commits October 20, 2017 22:48
- Add ignore rules for common compressions used for backed up data
  during development
- Ignore Python's cache
- Ignore pyenv definition
- Ignore Visual Studio Code's project settings
- Rewrite and optimize the entire script
- Add basic unit tests for the core functions of the script that handle
  the actual data manipulation
The Makefile's 'fetch' command should respective the REALTIME variable to replicate the prod env
@allejo allejo force-pushed the feature/aggregate-script-46 branch from 2eb7dc8 to 60d487d Compare October 21, 2017 05:50
Relying on the data in the filesystem is only reliable when working with
a clean slate. However, deleting websites will leave old data behind, so
instead of checking the filesystem, use Jekyll generated files for their
actual purpose: being the authoritative source of sites & reports.

Fixes #57
except IOError:
pass
# Reports that will not be aggregated by this script
ignored_reports = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we had this in the previous version - we haven't used it though, right? And it isn't being populated from any environment variables or elsewhere. Do we need this right now?

If not, we can also cleanup below L176-177

}

for k in aggregationDefinitions:
v = aggregationDefinitions[k]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for k, v in aggregationDefinitions.items():

@allejo allejo removed their assignment Mar 26, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants