-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multigeometries #255
Open
yyu1
wants to merge
4
commits into
perrygeo:master
Choose a base branch
from
yyu1:multigeometries
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Multigeometries #255
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
cc1f5b8
added option to process multi-polygon cases by reading in individual …
6b25e4b
update to remove deprecated part in Shapely 1.8
cd8a88e
Copy over old variable. numpy.append does not append in-place
39fe278
changed variable name to follow PEP8 convention
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused as to how this saves any memory. My interpretation is this code is an alternate method for creating the
masked
array, but in both cases they will be same size regardless of thesplit_multi_geom
value. In other words, where do the memory savings actually occur?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be easier to explain with a diagram but it seems I can't upload a picture here, so let me try with description.
I'll use the example of a raster with 4 polygons that are each the size of 1 pixel, making this easier to imagine.
Let's say the raster is a 10 x 10 raster, with the 4 polygons falling on the pixels (0,0), (9,0), (0,9), and (9,9). So each polygon is at the corner of the raster. Even though the polygons only cover 4 total pixels
geom_bounds = tuple(geom.bounds)
will return the bounds of (0,0), (9,0), (0,9), and (9,9), because it will find a rectangle that includes all 4 polygons. Thenrast.read(bounds=geom_bounds, boundless=boundless)
is going to read the entire 10x10 raster into a 10x10 array, and it will also create a 10x10 mask array. This will end up using 2x(size of raster) amount of memory.If you break the multipolygon into 4 separate polygons, then each call to
geom_bounds = tuple(geom.bounds)
will only return 1 pixel. Then you will end up having 4x(1 pixel) amount of memory to hold the raster of the polygons, and a 4x(1 pixel) amount of memory to hold the mask.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I finally got a chance to look at this more closely and I see the optimization now, thanks for the explanation @yyu1!
I'd suggest adding a test to make sure the results are equal. This almost works, there are some floating point differences that you'll have to account for but otherwise they are equivalent:
Our test multipolygon actually makes a great example due to the null space between the two parts.
The difference in memory is roughly 4x for this multipolygon! screenshot from QGIS:
The only problem with this approach is that it changes the shape of the masked array. While this doesn't affect the built-in stats, there are two other parts of the API that could break here:
zone_func
custom statistic, the logic could break in subtle ways - for instance if a custom stat used the array dimensions in some calculation.raster_out
option (which outputs raster datasets based onmasked
) it would be broken as this requires a 2D array with geospatial referencing.So we have a few options to proceed. How about this?
add_stats
argument andsplit_multi_geom == True
, send a warning that your custom stats function can't assume anything about the masked array dimensions. This way it works but the user is aware of the limitation.raster_out == True
andsplit_multi_geom == True
, we should raise an Exception saying that these two options are mutually incompatible.Of course a test to assert the above behavior would also be appreciated. I can help with that if needed.