Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: scytl connector #737

Merged
merged 17 commits into from
Oct 20, 2022
Merged

Conversation

agreenspan24
Copy link
Contributor

Scytl, or Clarity Elections, is a company that creates a tool for publishing election results in real-time. It's used in the U.S. by several states and over 800 counties, including Georgia, Colorado, Arkansas, and Dallas County, Texas.

For each participating election administrator, they publish a site with results that can be published on election night. Unfortunately, while that site contains downloadable data, that data is formatted in a complex way, making it difficult for developers to fetch election results. In general, their results either come zipped in either an unformatted text file or a complex XML document. Summary results come as a zipped CSV, but just contain top-line results. The JSON that powers the site results is even more complicated.

This connector provides methods to download the latest election results from their site and formats them into readable lists of dictionaries, which can easily be converted into a Parsons Table or Pandas dataframe.

Because this connector is can be useful for live reporting, it also contains a polling feature. As long as the class is instantiated, it will only fetch results that are new since the previous fetch of that method. To skip this feature, set force_update to true on any of the fetch methods.

Note: please do not confuse with Clarity Campaign Labs

Example microsite: https://results.enr.clarityelections.com/CO/105975/web.276603/

@shaunagm
Copy link
Collaborator

Hi @agreenspan24! Thanks so much for this contribution. I haven't had a chance to go through the whole PR yet but I'm hoping to get to it and leave a review later this week.

I appreciate you having thorough tests for you connector! I am a little hesitant about the 25-ish test files though. Are they all necessary? Might they cause issues for long-term maintainability if there are formatting changes?

@agreenspan24
Copy link
Contributor Author

Hi @agreenspan24! Thanks so much for this contribution. I haven't had a chance to go through the whole PR yet but I'm hoping to get to it and leave a review later this week.

I appreciate you having thorough tests for you connector! I am a little hesitant about the 25-ish test files though. Are they all necessary? Might they cause issues for long-term maintainability if there are formatting changes?

Hi Shauna! Makes total sense - was trying to keep the 18 county files to match the real results, but I think you're right to reduce them. I cut out all but two and adjusted the tests - run a good bit faster now too!

For maintainability, those files are just direct copies of real election results as Scytl holds them. If they change their format, I'm hoping we can just drop newer versions of those files into the tests and adjust the parser accordingly. But let me know your thoughts!

Copy link
Collaborator

@shaunagm shaunagm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good. Mostly docs issues here.

docs/scytl.rst Show resolved Hide resolved
ex: Clarke
"""

def __init__(self, state: str, election_id: str, county=""):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unusual for anything other than auth info to get passed in to a Parsons connector, although it certainly does happen, and it makes sense to pass in state and election_id here so they don't have to be passed in with get_summary_results, etc.

That said, county in particular confuses me. How does the county parameter here fit with the counties passed in with get_detailed_results_for_participating_counties? Does this county parameter get used at any other point? It seems like the other two are always state level but maybe I'm misunderstanding. Does passing in the county on init mean that get_summary_results and get_county_results apply only to the county?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the setup is kinda confusing, would love your help navigating it.

Essentially, they have a concept of an election (at the state or county level), which can have summary results and detailed results. The detailed results show results by county for state level elections, while they show precinct results for a county-level election.

State-level elections can also have sub-elections in each county, and link to each of those county elections, which each have summary and detailed results. So, a county-level election won't have participating counties and the method won't make sense.

The main reason I wanted to add the state and election_id at the class level instead of the method is so the class could track the previous version- I have a feeling people would use this for polling and I want to make that as easy as possible for them to see if results have updated.

One idea might be to split this up into a StateScytlElection and CountyScytlElection class, which would have some of the same implementation, but not all of it. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, that was a bit of a word salad. Does it make sense to you to split up the state-wide election and county-wide election into two separate connectors, since they have slightly different functions and outputs?

Separately, if we're using this for polling, to prevent fetching unnecessary results, does it make sense to you to store the previous version in the class or return it with the results and have them pass it back in the function?

parsons/scytl/scytl.py Outdated Show resolved Hide resolved
@shaunagm
Copy link
Collaborator

@agreenspan24 I think your test fixes look good!

parsons/scytl/scytl.py Outdated Show resolved Hide resolved
Copy link
Contributor

@alxmrs alxmrs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few notes.

parsons/scytl/scytl.py Outdated Show resolved Hide resolved
parsons/scytl/scytl.py Show resolved Hide resolved
parsons/scytl/scytl.py Outdated Show resolved Hide resolved
parsons/scytl/scytl.py Outdated Show resolved Hide resolved
parsons/scytl/scytl.py Outdated Show resolved Hide resolved
test/test_scytl/test_scytl.py Outdated Show resolved Hide resolved
scy = Scytl(TEST_STATE, TEST_ELECTION_ID)
result = scy.get_summary_results()

with open(f'{_dir}/114729_summary_expected.csv', 'r') as expected:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it hard to understand if the data is correct since it's hard to compare the source and destination data files in this diff.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's a reasonable point. I think it's important to use the zip files since unzipping is a big part of what we're testing.

I guess it's somewhat helpful to just read the expected CSV and make sure all the data looks good and reasonable, but not ideal.

test/test_scytl/test_scytl.py Outdated Show resolved Hide resolved
test/test_scytl/test_scytl.py Outdated Show resolved Hide resolved
test/test_scytl/test_scytl.py Outdated Show resolved Hide resolved
Copy link
Contributor Author

@agreenspan24 agreenspan24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alxmrs Thanks for the review! Still pretty new to proper python conventions, so I appreciate the deeper look! A couple of ideas here sound interesting to me, like typing and dataclasses, but I don't want to introduce new patterns without approval from maintainers, so @shaunagm let us know your thoughts!

parsons/scytl/scytl.py Outdated Show resolved Hide resolved
parsons/scytl/scytl.py Show resolved Hide resolved
parsons/scytl/scytl.py Outdated Show resolved Hide resolved
parsons/scytl/scytl.py Outdated Show resolved Hide resolved
parsons/scytl/scytl.py Show resolved Hide resolved
test/test_scytl/test_scytl.py Outdated Show resolved Hide resolved
scy = Scytl(TEST_STATE, TEST_ELECTION_ID)
result = scy.get_summary_results()

with open(f'{_dir}/114729_summary_expected.csv', 'r') as expected:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's a reasonable point. I think it's important to use the zip files since unzipping is a big part of what we're testing.

I guess it's somewhat helpful to just read the expected CSV and make sure all the data looks good and reasonable, but not ideal.

parsons/scytl/scytl.py Outdated Show resolved Hide resolved
parsons/scytl/scytl.py Show resolved Hide resolved
parsons/scytl/scytl.py Outdated Show resolved Hide resolved
@shaunagm
Copy link
Collaborator

shaunagm commented Sep 19, 2022

Still pretty new to proper python conventions, so I appreciate the deeper look! A couple of ideas here sound interesting to me, like typing and dataclasses, but I don't want to introduce new patterns without approval from maintainers, so @shaunagm let us know your thoughts!

Responded re: dataclasses inline. The question of typing is a bigger one. We currently don't require any sort of type annotations and certainly not any strict enforcement, but perhaps that's something we could at least ask of contributors going forward. We don't have capacity to retroactively add typing to the existing codebase but incremental improvements are better than nothing.

Thanks for raising these questions @agreenspan24 & @alxmrs

@agreenspan24
Copy link
Contributor Author

@shaunagm any idea what's causing docs-build to fail? No clear answer from CircleCI

@shaunagm
Copy link
Collaborator

I don't think it's an issue with your PR. @SorenSpicknall? This and a couple other open PRs are failing with this same error Error: [Errno 2] No such file or directory: '/home/circleci/project/venv/bin/python3'

@SorenSpicknall
Copy link
Collaborator

I don't think it's an issue with your PR. @SorenSpicknall? This and a couple other open PRs are failing with this same error Error: [Errno 2] No such file or directory: '/home/circleci/project/venv/bin/python3'

I'm working on this now; definitely looks like a CircleCI config breakage. If we can't figure it out in the next day or so, we may just merge some of these newer PRs without the additional testing step in order to not pause ongoing work, since the PRs generally test green aside from the CI issue.

@shaunagm
Copy link
Collaborator

@SorenSpicknall I know you submitted a PR to fix the CI issue (#752), what do we need to do to get this PR passing? Do we need to rebase it onto the updated main branch?

@SorenSpicknall
Copy link
Collaborator

@SorenSpicknall I know you submitted a PR to fix the CI issue (#752), what do we need to do to get this PR passing? Do we need to rebase it onto the updated main branch?

Yes, changes from main will need to be merged in or the PR will need to be recreated from scratch based on that newly updated main branch. Sorry about the annoyance of that, @agreenspan24 - you just got unlucky with timing of a weird cache issue with our CI.

@shaunagm
Copy link
Collaborator

Let me know if you need help rebasing, @agreenspan24. It can be tricky.

@shaunagm
Copy link
Collaborator

Hi @agreenspan24! Just checking in. I know a lot of folks are busy with the election, so I wanted to offer a few different options here:

(a) I can rebase your PR for you (I should be able to as a maintainer)
(b) we can pair and I can walk you through the process (don't know if you've rebased before, but it can be intimidating)
(c) we can just wait for you to have a chance to do it yourself

Totally up to you!

@agreenspan24
Copy link
Contributor Author

Hi @agreenspan24! Just checking in. I know a lot of folks are busy with the election, so I wanted to offer a few different options here:

(a) I can rebase your PR for you (I should be able to as a maintainer) (b) we can pair and I can walk you through the process (don't know if you've rebased before, but it can be intimidating) (c) we can just wait for you to have a chance to do it yourself

Totally up to you!

@shaunagm so sorry I missed this request, I can find some time in the next few days to rebase

@agreenspan24
Copy link
Contributor Author

@shaunagm all done!

@shaunagm shaunagm added the new connector Work type - creating a new Parsons connector for a tool label Oct 20, 2022
@shaunagm shaunagm merged commit ba9cadd into move-coop:main Oct 20, 2022
@shaunagm
Copy link
Collaborator

Thank you so much for the effort you put into this PR! Excited to finally get to merge it. :)

@agreenspan24 agreenspan24 deleted the ag-scytl-connector branch October 20, 2022 22:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new connector Work type - creating a new Parsons connector for a tool
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants