Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Pycnophylactic Function #30

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

bransonf
Copy link
Contributor

@bransonf bransonf commented Jun 9, 2020

Function is not ready for merge to master, but opening PR for discussion/further development. This is on a separate branch of my fork, as not to conflict with #27

This PR will resolve #1 as well

Sorry if I haven't seemed active lately. It took me about a week to understand and implement Tobler's Pycnophylactic method. It's pretty rad though!

Same considerations, we need to handle errors for user input, edge cases, etc. I still need to verify that this is the absolute correct implementation of this algorithm.

We will need to add dependencies for raster and fasterize but these are imminent anyway for an implementation of the 3-class dasymetric regression.

Still investigating ways to speed this up, although it is already faster than the pycno package. Also need to add a convergence parameter, likely stop <- max(m) * 10^(-converge) where max(m) is the largest value in the matrix.

TO DOs

  • Handle NSE or Remove
  • Speed up by setting Immutable Indices (Rather than computing in each for)
  • Implement Convergence Parameter
  • Add Verbose for diagnosis/stats on convergence?
  • Handle Errors in User Input
  • Check for Edge Cases
  • Better Documentation
  • Full Test Coverage

@chris-prener
Copy link
Owner

nice @bransonf!

R/aw_pycno.R Show resolved Hide resolved
@chris-prener
Copy link
Owner

is there a problem with returning NaN values?

@bransonf
Copy link
Contributor Author

@chris-prener Short answer: yes.

Long answer:
In the edge case that we divide by 0, NaN is produced. When we correct based on NaN every cell in the source inherits NaN. The smoothing function relies on the mean of adjacent cells, and inevitably a NaN gets introduced and all the cells of that source become NaN as well. In a sufficient number of iterations, all of the data becomes NaN. This will also ruin the arithmetic to check for convergence, meaning the while statement never breaks.

It's a very specific edge case, but one that results in an infinite loop.

@chris-prener
Copy link
Owner

Gotcha... what if we assign a population of .1?

@bransonf
Copy link
Contributor Author

@chris-prener This is the correction factor, so the only actual solution is correct <- ifelse(is.nan(correct), 0, correct)

The real issue here is actually if some data had a negative value originally, in which case it would lead to the infinite loop. Basically a serious punishment for a not-unlikely data error. (We could add an error handler for supplying negative populations, but there could theoretically be an edge case still not handled by this alone)

@chris-prener
Copy link
Owner

I think error handling is the way to go then... though I'd like to know how it would be possible for the error handling to miss it?

@bransonf
Copy link
Contributor Author

@chris-prener So there are two possible edge cases. (I'm sparing some technical details here)

There is one in which there are negative populations. We can catch this with an error because there shouldn't be negative populations as a matter of theory. If all the cells of a source end up negative after the smoothing process, they will become zeros. Hence, the sum of the cells in the source will be 0, and produce NaN.

But, there is another case in which the user provides populations that are 0, which is valid as a matter of theory. So we can't catch these with an error handler. In the correct set of conditions (Namely, this region is surrounded by more 0 sum regions) all the cells of the source will remain zeros. Same issue, division by zero.

@chris-prener
Copy link
Owner

We could add an argument for 0 handling - by default it errors, but allow users to explicitly override

@bransonf
Copy link
Contributor Author

@chris-prener I don't think there's any reason the user should have to intervene. 0s are perfectly valid from a theoretical perspective. Besides that, it's a very specific situation containing 0s that would trigger the edge case.

I've committed changes that prevent the edge case from producing an error.

@chris-prener
Copy link
Owner

OK sounds good - happy to follow your lead on this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pycnophylactic interpolation method
2 participants