-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get v1 data, partial implementation #18
Conversation
…xamples with date arguments
…s and download data
Good stuff @e-kotov, thanks for the clear statement of changes and options, will test in due course, likely on Friday. |
…dir for successful tests
@Robinlovelace great! Take your time. Meanwhile, I added some last minute additions to finally pass the R CMD check. We now have a snapshot of metadata bundled with the package (compressed xmls are tiny and |
Sounds good, more soon! |
OK, going to give this a quick test and hopefully review now. Have quite limited time, ~15 minutes, due to family commitments later in the day. |
Update here: looking great! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an epic set of changes Egor, many thanks! Big +1, all tested on README and no further comment in limited time I have today. Let's get it merged to keep ball rolling but will leave that to you.
Ok, merging, but new big updates are coming up) |
Partially fulfils #5 |
1 version down, 1 to go! |
Hi @Robinlovelace , I have finished some of the work on v1 data and tried to envision how it would be united with current code for v2, as well as how I would suggest the v2 code may be changed and improved upon. Could you please review what I have for v1 (and minor changes to v2) below?
Setup:
Available data / metadata
V1 Data
Fetch available data for version 1 with an optional check for local files:
This function changes the default paths of the MITMA data locally to arrange it in a hive style for faster filtering and/or import with
duckdb/arrow
.Function Wrapping Suggestion
Consider wrapping both functions below into a single function with a version argument, for example,
spod_available_data(ver = 1)
. Alternatively, you could name themspod_available_data_v1()
andspod_available_data_v2()
, orspod_get_metadata_v1()
andspod_get_metadata_v2()
. Another option could bespod_metadata_v1()
andspod_metadata_v2()
. The current function handles the XML file cache by checking when it was last downloaded and fetches it automatically if it is older than one day. It might make sense to add an argument to load the latest cached XML for offline use.V2 Data
Fetch metadata for version 2:
I have not modified this function much, but I plan to add the path-changing feature for v2 data as well if it is appreciated in v1. The few changes I have made include updating the XML file name so it differs from v1, renaming the file mask to
data_links
, and removing thecurrent_timestamp
argument from the XML fetching function, as the current timestamp should be used automatically.The naming of both functions is still up for discussion, and I am open to suggestions.
Implemented Functions for V1 Data
Get Latest V1 File List
This is only really for internal use or for advanced users, so I would have this explicit long name and change to similar for v2, or even combine the functions into one.
Available Data for V1
Zones Loader
The zones loader with English parameter names and their variations that are converted to Spanish using internal helper functions. These functions load the shapefiles, check the geometry validity, and replace the uppercase ID with lowercase 'id'. In the future, I plan to attach metadata such as municipality names.
V2 Zones
For v2 zones, more adjustments are needed, such as attaching accompanying names and population numbers metadata that are available alongside the files. This is a TODO.
Download Function
The new download function handles most non-spatial data, automatically checks for the required data version, accepts multiple types of zoning, and processes date arguments automatically.
To download any two dates without interpreting them as a range, use the following. More than two dates are always interpreted as an arbitrary sequence and not converted to a range.
I think this advanced handling of dates would make the package more accessible for beginners, as I do not expect everyone to easily grasp what a regex is and they may be more comfortable using simpler dates argument.
There are also "testthat" tests for the underlying function that handles the magic behind the dates argument. That includes tests for data incompatibility (i.e. the helper functions check if the requested date range is spanning across multiple data versions and stops with a warning).
Next Steps
The next step is to implement an equivalent of
spod_get_zones()
and similar functions for v1 data that would rely on the already downloaded data or download it automatically if it is missing, this will use the same mechanics that I implemented for the download function as they would simply wrap it. I would appreciate your feedback on the current implementation and naming of the functions, as well as my suggestions on the overall structure of the package.We can discuss during our next scheduled meeting.