Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lookup tables should be maintained directly in SQLite #17

Closed
simonw opened this issue Nov 28, 2017 · 3 comments
Closed

Lookup tables should be maintained directly in SQLite #17

simonw opened this issue Nov 28, 2017 · 3 comments

Comments

@simonw
Copy link
Owner

simonw commented Nov 28, 2017

When evaluating -c we currently use a temporary table maintained in Python space:

def id_for_value(self, value):
if pd.isnull(value):
return None
try:
return self.value_to_id[value]
except KeyError:
id = self.next_id
self.id_to_value[id] = value
self.value_to_id[value] = id
self.next_id += 1
return id

For handling larger CSV files (#16) this would work much better if it was a SQLite table that was queried and updated as we process data. This would also help make lookup tables re-usable across multiple CSVs across several runs of the command.

@simonw
Copy link
Owner Author

simonw commented Nov 28, 2017

Also relevant for #10 and #14

simonw added a commit that referenced this issue Jan 23, 2018
@simonw
Copy link
Owner Author

simonw commented Jan 23, 2018

Still todo: test that this does the right thing when working with a SQLite database that already exists and already has a partially complete lookup table in it.

@simonw
Copy link
Owner Author

simonw commented Jan 23, 2018

Tested like this:

csvs-to-sqlite openelections-data-ny/2016/20161108__ny__general__franklin__precinct.csv /tmp/ny.db -c candidate
csvs-to-sqlite openelections-data-ny/2016/20161108__ny__general__schuyler__precinct.csv /tmp/ny.db -c candidate
csvs-to-sqlite openelections-data-ny/2016/20161108__ny__general__saratoga__precinct.csv /tmp/ny.db -c candidate
datasette /tmp/ny.db 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant