-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should Poisson[0,0] be handled especially? #293
Comments
I think this is a good idea. The
The mean and variance are The Poisson is implemented this way (requring only non-negative rate paramater) in R, which should give some confidence that it isn't an obviously silly thing to do. It appears in tests that scipy behaves this way too, although it's not clearly documented. |
My first hunch is that we would relegate this however the underlying probability libraries handle this. e.g. what do tfp.Poisson, torch.distributions.Poisson do? I could also see us using a (configurable) cut off / clamping. E.g. through the interpolations used in HF we might anyways get into situation where the rate becomes negative. Should we in this case cut off the rate with something a la
where cutoff is 1e-16 or similar. Do you care if its Pois(n|lam = 0) vs Pois(n|lam = 1e-16) or see any issues with this approach? |
I don't see any practical problem. For the sake of argument, if you have a histogram with Minor thing, wouldn't |
yes it would be better :) |
Consider the following example for PyTorch >>> import torch
>>> from torch.distributions import Poisson
>>>
>>> rate_lambda = torch.tensor([0.0])
>>> observed_k = torch.tensor([0.0, 1.0])
>>>
>>> degenerate_pmf = Poisson(rate_lambda)
>>> degenerate_pmf.log_prob(observed_k)
tensor([nan, -inf])
>>> torch.exp(degenerate_pmf.log_prob(observed_k))
tensor([nan, 0.]) and TensorFlow >>> import tensorflow as tf
>>> import tensorflow_probability as tfp
>>>
>>> rate_lambda = tf.convert_to_tensor([0.0])
>>> observed_k = tf.convert_to_tensor([0.0, 1.0])
>>>
>>> degenerate_pmf = tfp.distributions.Poisson(rate_lambda)
>>> degenerate_pmf.log_prob(observed_k)
<tf.Tensor: shape=(2,), dtype=float32, numpy=array([ nan, -inf], dtype=float32)>
>>> degenerate_pmf.prob(observed_k)
<tf.Tensor: shape=(2,), dtype=float32, numpy=array([nan, 0.], dtype=float32)> This behavior obviously should be avoided.
While this is fine from the numerical perspective, there is also the question on if this is a good idea to implement at all from a physics perspective. If you "observe" zero Monte Carlo events in a bin, this does not mean that the rate parameter is actually I would think that the most responsible thing to do is to loudly warn the user that their model is unphysical and suggest that they rebin or reconsider the model. What could be possible is that we implement a global flag so that by default this behavior happens, but if a user passes this flag then the zero bins are masked. |
When lambda is an MLE estimate of a rate parameter from an MC simulation, why does an MLE estimate of 0 for that rate parameter indicate that the model is unphysical? |
It could be good to catch models that predict zero yields in a bin with an easier-to-understand error message than
|
From a practical perspective, what do you think about adding a warning at runtime when using models with 0 yield predicted in some bins, and then letting the different backends implement the behavior in any way they like? I assume that usually when such a model is encountered, the model should be changed anyway. If not, the user can take care to pick the backend implementing the behavior they are looking for. The cutoff suggestion also sounds good to me, ideally accompanied by a warning message. |
>>> import pyhf
>>> from pyhf import tensorlib
>>> import torch
>>> torch.__version__
'1.10.0+cu102'
>>> pyhf.set_backend("pytorch")
>>> tensorlib.poisson(tensorlib.astensor([0, 0, 1, 1]), tensorlib.astensor([0, 1, 0, 1]))
tensor([1.0000, 0.3679, 0.0000, 0.3679]) so the backends might be deciding this for us. I'll need to check the other behaviours.
This sounds pretty reasonable. |
I've checked and all the backends agree on this now: the zero rate is allowed in the default public API across all of them # degenerate_pmf.py
print("\nSciPy:")
import numpy as np
from scipy.stats import poisson
rate_lambda = np.asarray([0.0])
observed_k = np.asarray([0.0, 1.0])
print(f"# degenerate pmf log prob: {np.log(poisson.pmf(observed_k, rate_lambda))}")
print(f"# degenerate pmf prob: {poisson.pmf(observed_k, rate_lambda)}")
print("\nJAX:")
import jax.numpy as jnp
import jax.scipy.stats.poisson as jax_poisson
rate_lambda = jnp.asarray([0.0])
observed_k = jnp.asarray([0.0, 1.0])
print(f"# degenerate pmf log prob: {jnp.log(jax_poisson.pmf(observed_k, rate_lambda))}")
print(f"# degenerate pmf prob: {jax_poisson.pmf(observed_k, rate_lambda)}")
print("\nPyTorch:")
import torch
from torch.distributions import Poisson
rate_lambda = torch.tensor([0.0])
observed_k = torch.tensor([0.0, 1.0])
degenerate_pmf = Poisson(rate_lambda)
print(f"# degenerate pmf log prob: {degenerate_pmf.log_prob(observed_k)}")
print(f"# degenerate pmf prob: {torch.exp(degenerate_pmf.log_prob(observed_k))}")
print("\nTensorFlow:")
import tensorflow as tf
import tensorflow_probability as tfp
rate_lambda = tf.convert_to_tensor([0.0])
observed_k = tf.convert_to_tensor([0.0, 1.0])
degenerate_pmf = tfp.distributions.Poisson(rate_lambda)
print(f"# degenerate pmf log prob: {degenerate_pmf.log_prob(observed_k)}")
print(f"# degenerate pmf prob: {degenerate_pmf.prob(observed_k)}")
So it probably makes sense to implement this default behavior. We should check what ROOT does as well, and then consider the cutoff behavior. |
Question
Filing on behalf of @wiso for now since we need an issue to move and focus the discussion better. The initial PR #277 added a feature to use
scipy.special.xlogy
which was actually reverted in #280 because we started seeing divergence in calculations among the backends around the special lambda=0 result.That said, the poisson is not well-defined for lambda=0, so it's not clear if we should define it for Poisson[0,0]. @wiso's discussion here #277 (comment) is copied/pasted here verbatim. From this point on.
By the way if I well understood the implementation for the poisson in the other backends is a gaussian, e.g.
https://github.com/diana-hep/pyhf/blob/cbb899fc134b7d156550dfdc9130154f264a5188/pyhf/tensor/tensorflow_backend.py#L198-L200
So I don't think you want to implement such a check:
I don't know how the nan vs 1 problem can affect the optimization (probably you should reason in that way and not from the mathematical point of view (e.g. Poisson not defined for lam=0), in the same way as you are using a form of continuous Poissonian). By the way if we observe n=0 then the formula become
exp(-lam)
and its limit is 1 forlam->0
, which is also the maximum. This is true also for the (discrete) Poisson distribution.Here a stupid case where having a number for Poisson[0, 0] may be useful.
with this branch, result is 10 as expected. Also, if I put Let put
NOBS=0
as a result I get 0, without any RuntimeWarning.With master branch (just pulled) with any numbers (I tried 10 or 0):
It works only if I put a non-zero background.
The text was updated successfully, but these errors were encountered: