Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nodata count includes non-overlapping cells #105

Open
2 tasks
perrygeo opened this issue Dec 18, 2015 · 10 comments
Open
2 tasks

nodata count includes non-overlapping cells #105

perrygeo opened this issue Dec 18, 2015 · 10 comments
Labels
Milestone

Comments

@perrygeo
Copy link
Owner

For each feature, there are three types of cells: data, nodata, and non-overlapping. Nodata and non-overlapping get lumped together in the nodata statistic.

Was this always the case? Was the bug introduced in an attempt to optimize?

  • write a failing test case for the partial overlap case
  • fix it
@perrygeo perrygeo added the bug label Dec 18, 2015
@perrygeo perrygeo modified the milestone: 0.11 Sep 3, 2016
@perrygeo
Copy link
Owner Author

perrygeo commented Sep 3, 2016

Also, let's consider nans.

Here's an example, crudely drawn. The raster is 3 pixels tall with a nodata value of 0, the geometry covers 6 pixels.
photo on 9-3-16 at 10 19 am

So the breakdown of cell counts should be

  • 1 valid data cell (count)
  • 1 nodata cell (nodata)
  • 1 nan cell
  • 3 non-overlapping cells

The problem currently is that a) we don't track nan cell counts at all and b) the non-overlapping ones get lumped into nodata

Still debating if it's better to do

{'count': 1, 'nodata': 5} 
# or
{'count': 1, 'nodata': 1, 'nan': 1, 'no_overlap': 3}

IOW is there really a need to distinguish between the three types of invalid data cells?

@sgoodm
Copy link
Contributor

sgoodm commented Sep 3, 2016

i would track each type and then let the user decide how they are returned. group them by default with an option to return them separately

@perrygeo perrygeo modified the milestones: 0.11, 0.12 Oct 1, 2016
@perrygeo
Copy link
Owner Author

perrygeo commented Oct 1, 2016

Implementation is a little trickier than expected, deferring this to the 0.12 milestone

@XiaopingDu
Copy link

image
I do not think it's a problem and we need not count non-overlapping pixels. From my understanding, the non-overlapping is not a part of the raster. From above picture, the raster is inside the red box and black is nodata. The blue box is the geometry which covers 8 pixels. So:
{'count': 2, 'nodata': 6}
The white pixels do not belong to the raster.
In your example, I prefer this result:
{'count': 1, 'nodata': 1, 'nan': 1}

We can not get the result like this:
{'count': 1, 'nodata': 5}

@sgoodm
Copy link
Contributor

sgoodm commented Mar 22, 2017

looks like the non overlapping areas get lumped into nodata due to the the boundless options (boundless_array func in rasterstats for arrays, and equivalent steps in rasterio for rasters) which either initialize the boundless output with the nodata value (boundless_array) or fill it with the nodata value before it gets returned (rasterio)

this seems to be when it was introduced for arrays, rasters were probably earlier
e50d89b

@sgoodm
Copy link
Contributor

sgoodm commented Mar 22, 2017

this might work: if the user wants no_overlap count returned, they need to provide a second, unused "fake" nodata value. this value is passed to rasterio as the raster's nodata value which results in the true nodata value being treated as just another value. we then have distinct values for no_overlap vs nodata and just need to update our mask accordingly

in the case when the user does not request the no_overlap field, it just behaves exactly as it currently does

should be able to tweak the boundless_array func to do the same thing for np arrays

@sgoodm
Copy link
Contributor

sgoodm commented Mar 24, 2017

PR #146 adds 'nan' stat option

@sgoodm
Copy link
Contributor

sgoodm commented Mar 24, 2017

rasterio issue on shapes extending beyond raster extent
rasterio/rasterio#995

@perrygeo perrygeo mentioned this issue Mar 26, 2017
@sgoodm
Copy link
Contributor

sgoodm commented Mar 27, 2017

@perrygeo almost done with a no overlap stat proof of concept that works with array rasters. but i think there may be some limitations to getting it working with rasters opened via rasterio

@princemathur
Copy link

Mine shapefile polygon is not snapped with raster when i run zonal_stats results are not exactly matching with Arc GIS result. I am using Python 3.7 and pygeoprocessing 1.5.
Is there any way by which we can match Arc GIS result?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants