nodata count includes non-overlapping cells #105

perrygeo · 2015-12-18T10:53:48Z

For each feature, there are three types of cells: data, nodata, and non-overlapping. Nodata and non-overlapping get lumped together in the nodata statistic.

Was this always the case? Was the bug introduced in an attempt to optimize?

write a failing test case for the partial overlap case
fix it

The text was updated successfully, but these errors were encountered:

perrygeo · 2016-09-03T14:26:36Z

Also, let's consider nans.

Here's an example, crudely drawn. The raster is 3 pixels tall with a nodata value of 0, the geometry covers 6 pixels.

So the breakdown of cell counts should be

1 valid data cell (count)
1 nodata cell (nodata)
1 nan cell
3 non-overlapping cells

The problem currently is that a) we don't track nan cell counts at all and b) the non-overlapping ones get lumped into nodata

Still debating if it's better to do

{'count': 1, 'nodata': 5} 
# or
{'count': 1, 'nodata': 1, 'nan': 1, 'no_overlap': 3}

IOW is there really a need to distinguish between the three types of invalid data cells?

sgoodm · 2016-09-03T20:05:42Z

i would track each type and then let the user decide how they are returned. group them by default with an option to return them separately

perrygeo · 2016-10-01T11:28:35Z

Implementation is a little trickier than expected, deferring this to the 0.12 milestone

XiaopingDu · 2017-03-20T09:41:14Z

I do not think it's a problem and we need not count non-overlapping pixels. From my understanding, the non-overlapping is not a part of the raster. From above picture, the raster is inside the red box and black is nodata. The blue box is the geometry which covers 8 pixels. So:
{'count': 2, 'nodata': 6}
The white pixels do not belong to the raster.
In your example, I prefer this result:
{'count': 1, 'nodata': 1, 'nan': 1}

We can not get the result like this:
{'count': 1, 'nodata': 5}

sgoodm · 2017-03-22T20:00:18Z

looks like the non overlapping areas get lumped into nodata due to the the boundless options (boundless_array func in rasterstats for arrays, and equivalent steps in rasterio for rasters) which either initialize the boundless output with the nodata value (boundless_array) or fill it with the nodata value before it gets returned (rasterio)

this seems to be when it was introduced for arrays, rasters were probably earlier
e50d89b

sgoodm · 2017-03-22T21:20:50Z

this might work: if the user wants no_overlap count returned, they need to provide a second, unused "fake" nodata value. this value is passed to rasterio as the raster's nodata value which results in the true nodata value being treated as just another value. we then have distinct values for no_overlap vs nodata and just need to update our mask accordingly

in the case when the user does not request the no_overlap field, it just behaves exactly as it currently does

should be able to tweak the boundless_array func to do the same thing for np arrays

sgoodm · 2017-03-24T19:24:50Z

PR #146 adds 'nan' stat option

sgoodm · 2017-03-24T20:54:36Z

rasterio issue on shapes extending beyond raster extent
rasterio/rasterio#995

sgoodm · 2017-03-27T18:39:03Z

@perrygeo almost done with a no overlap stat proof of concept that works with array rasters. but i think there may be some limitations to getting it working with rasters opened via rasterio

princemathur · 2018-12-21T05:59:50Z

Mine shapefile polygon is not snapped with raster when i run zonal_stats results are not exactly matching with Arc GIS result. I am using Python 3.7 and pygeoprocessing 1.5.
Is there any way by which we can match Arc GIS result?

perrygeo added the bug label Dec 18, 2015

perrygeo modified the milestone: 0.11 Sep 3, 2016

perrygeo modified the milestones: 0.11, 0.12 Oct 1, 2016

perrygeo mentioned this issue Dec 13, 2016

CLI flag for nodata #138

Closed

perrygeo mentioned this issue Mar 26, 2017

Nan counts #146

Merged

sgoodm mentioned this issue Mar 28, 2017

No overlap counts #150

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nodata count includes non-overlapping cells #105

nodata count includes non-overlapping cells #105

perrygeo commented Dec 18, 2015

perrygeo commented Sep 3, 2016 •

edited

Loading

sgoodm commented Sep 3, 2016

perrygeo commented Oct 1, 2016 •

edited

Loading

XiaopingDu commented Mar 20, 2017

sgoodm commented Mar 22, 2017

sgoodm commented Mar 22, 2017

sgoodm commented Mar 24, 2017

sgoodm commented Mar 24, 2017

sgoodm commented Mar 27, 2017

princemathur commented Dec 21, 2018

nodata count includes non-overlapping cells #105

nodata count includes non-overlapping cells #105

Comments

perrygeo commented Dec 18, 2015

perrygeo commented Sep 3, 2016 • edited Loading

sgoodm commented Sep 3, 2016

perrygeo commented Oct 1, 2016 • edited Loading

XiaopingDu commented Mar 20, 2017

sgoodm commented Mar 22, 2017

sgoodm commented Mar 22, 2017

sgoodm commented Mar 24, 2017

sgoodm commented Mar 24, 2017

sgoodm commented Mar 27, 2017

princemathur commented Dec 21, 2018

perrygeo commented Sep 3, 2016 •

edited

Loading

perrygeo commented Oct 1, 2016 •

edited

Loading