Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should Poisson[0,0] be handled especially? #293

Closed
kratsg opened this issue Oct 4, 2018 · 10 comments · Fixed by #1657
Closed

Should Poisson[0,0] be handled especially? #293

kratsg opened this issue Oct 4, 2018 · 10 comments · Fixed by #1657
Assignees
Labels
bug Something isn't working follow up help wanted Extra attention is needed / contributions welcome question Further information is requested research experimental stuff

Comments

@kratsg
Copy link
Contributor

kratsg commented Oct 4, 2018

Question

Filing on behalf of @wiso for now since we need an issue to move and focus the discussion better. The initial PR #277 added a feature to use scipy.special.xlogy which was actually reverted in #280 because we started seeing divergence in calculations among the backends around the special lambda=0 result.

That said, the poisson is not well-defined for lambda=0, so it's not clear if we should define it for Poisson[0,0]. @wiso's discussion here #277 (comment) is copied/pasted here verbatim. From this point on.


By the way if I well understood the implementation for the poisson in the other backends is a gaussian, e.g.

https://github.com/diana-hep/pyhf/blob/cbb899fc134b7d156550dfdc9130154f264a5188/pyhf/tensor/tensorflow_backend.py#L198-L200

So I don't think you want to implement such a check:

In [2]: pyhf.tensorlib.poisson(3, 2)
Out[2]: 0.18044704431548356

In [3]: pyhf.set_backend(pyhf.tensor.mxnet_backend())

In [4]: pyhf.tensorlib.poisson(3, 2)
Out[4]: 

[0.21969564]
<NDArray 1 @cpu(0)>

In [5]: pyhf.set_backend(pyhf.tensor.pytorch_backend())

In [6]: pyhf.tensorlib.poisson(3, 2)
Out[6]: tensor([0.2197])

I don't know how the nan vs 1 problem can affect the optimization (probably you should reason in that way and not from the mathematical point of view (e.g. Poisson not defined for lam=0), in the same way as you are using a form of continuous Poissonian). By the way if we observe n=0 then the formula become exp(-lam) and its limit is 1 for lam->0, which is also the maximum. This is true also for the (discrete) Poisson distribution.

Here a stupid case where having a number for Poisson[0, 0] may be useful.

from pyhf import get_backend
from pyhf.utils import loglambdav
import pyhf.simplemodels

_, optimizer = get_backend()

pdf = pyhf.simplemodels.hepdata_like(signal_data=[1], bkg_data=[0.],
                                     bkg_uncerts=[10])  # last parameter should not count in this case

init_pars = pdf.config.suggested_init()
par_bounds = pdf.config.suggested_bounds()

NOBS = 10
print (optimizer.unconstrained_bestfit(loglambdav, [NOBS] + pdf.config.auxdata,
                                       pdf, init_pars, par_bounds)[0])

with this branch, result is 10 as expected. Also, if I put Let put NOBS=0 as a result I get 0, without any RuntimeWarning.

With master branch (just pulled) with any numbers (I tried 10 or 0):

/home/turra/pyhf/pyhf/tensor/numpy_backend.py:186: RuntimeWarning: divide by zero encountered in log
  return np.exp(n * np.log(lam) - lam - gammaln(n + 1.))
/home/turra/pyhf/pyhf/tensor/numpy_backend.py:186: RuntimeWarning: invalid value encountered in multiply
  return np.exp(n * np.log(lam) - lam - gammaln(n + 1.))
     fun: nan
     jac: array([nan, nan])
 message: 'Iteration limit exceeded'
    nfev: 1404
     nit: 101
    njev: 101
  status: 9
 success: False
       x: array([nan, nan])
Traceback (most recent call last):
  File "mytest.py", line 15, in <module>
    pdf, init_pars, par_bounds)[0])
  File "/home/turra/pyhf/pyhf/optimize/opt_scipy.py", line 14, in unconstrained_bestfit
    assert result.success
AssertionError

It works only if I put a non-zero background.

@kratsg kratsg added bug Something isn't working help wanted Extra attention is needed / contributions welcome question Further information is requested follow up research experimental stuff labels Oct 4, 2018
@andrewfowlie
Copy link

andrewfowlie commented May 12, 2020

I think this is a good idea. The $\lambda \to 0$ limit of a Poisson seems to me well-defined and intuitive,

P(n | \lambda = 0) = { 1 if n = 0
                     { 0 otherwise

The mean and variance are $E(n) = Var(n) = \lambda = 0$ as expected.

The Poisson is implemented this way (requring only non-negative rate paramater) in R, which should give some confidence that it isn't an obviously silly thing to do.

It appears in tests that scipy behaves this way too, although it's not clearly documented.

@lukasheinrich
Copy link
Contributor

My first hunch is that we would relegate this however the underlying probability libraries handle this. e.g. what do tfp.Poisson, torch.distributions.Poisson do?

I could also see us using a (configurable) cut off / clamping. E.g. through the interpolations used in HF we might anyways get into situation where the rate becomes negative. Should we in this case cut off the rate with something a la

np.where(rate <=0, cutoff, rate)

where cutoff is 1e-16 or similar. Do you care if its Pois(n|lam = 0) vs Pois(n|lam = 1e-16) or see any issues with this approach?

@andrewfowlie
Copy link

I don't see any practical problem. For the sake of argument, if you have a histogram with n bins and in each bin you observed 0 events, I will be able to drive a difference between the correct behaviour for $\lambda \to 0$ and $\lambda$ clipped at 1e-16 by increasing n. It will however take n \sim 1e16 bins before the effect becomes noticeable 😄

Minor thing, wouldn't np.where(rate <=cutoff, cutoff, rate) be better though?

@lukasheinrich
Copy link
Contributor

yes it would be better :)

@matthewfeickert
Copy link
Member

I don't see any practical problem.

e.g. what do tfp.Poisson, torch.distributions.Poisson do?

Consider the following example for PyTorch

>>> import torch
>>> from torch.distributions import Poisson
>>>
>>> rate_lambda = torch.tensor([0.0])
>>> observed_k = torch.tensor([0.0, 1.0])
>>>
>>> degenerate_pmf = Poisson(rate_lambda)
>>> degenerate_pmf.log_prob(observed_k)
tensor([nan, -inf])
>>> torch.exp(degenerate_pmf.log_prob(observed_k))
tensor([nan, 0.])

and TensorFlow

>>> import tensorflow as tf
>>> import tensorflow_probability as tfp
>>>
>>> rate_lambda = tf.convert_to_tensor([0.0])
>>> observed_k = tf.convert_to_tensor([0.0, 1.0])
>>>
>>> degenerate_pmf = tfp.distributions.Poisson(rate_lambda)
>>> degenerate_pmf.log_prob(observed_k)
<tf.Tensor: shape=(2,), dtype=float32, numpy=array([ nan, -inf], dtype=float32)>
>>> degenerate_pmf.prob(observed_k)
<tf.Tensor: shape=(2,), dtype=float32, numpy=array([nan,  0.], dtype=float32)>

This behavior obviously should be avoided.

I could also see us using a (configurable) cut off / clamping.

While this is fine from the numerical perspective, there is also the question on if this is a good idea to implement at all from a physics perspective.

If you "observe" zero Monte Carlo events in a bin, this does not mean that the rate parameter is actually 0 — this gives us \hat{λ} not λ. If you had 100 times as many Monte Carlo events the number could be different. If the rate parameter is actually zero in the model then nothing physical is being modeled.

I would think that the most responsible thing to do is to loudly warn the user that their model is unphysical and suggest that they rebin or reconsider the model. What could be possible is that we implement a global flag so that by default this behavior happens, but if a user passes this flag then the zero bins are masked.

@andrewfowlie
Copy link

When lambda is an MLE estimate of a rate parameter from an MC simulation, why does an MLE estimate of 0 for that rate parameter indicate that the model is unphysical?

@alexander-held
Copy link
Member

It could be good to catch models that predict zero yields in a bin with an easier-to-understand error message than

File "[...]/pyhf/src/pyhf/modifiers/staterror.py", line 115, in finalize
    assert len(sigmas[sigmas > 0]) == pdfconfig.param_set(mod).n_parameters
AssertionError

@alexander-held
Copy link
Member

From a practical perspective, what do you think about adding a warning at runtime when using models with 0 yield predicted in some bins, and then letting the different backends implement the behavior in any way they like? I assume that usually when such a model is encountered, the model should be changed anyway. If not, the user can take care to pick the backend implementing the behavior they are looking for.

The cutoff suggestion also sounds good to me, ideally accompanied by a warning message.

@matthewfeickert
Copy link
Member

matthewfeickert commented Oct 21, 2021

torch v1.10.0 now also allows for zero rate from pytorch/pytorch#61511

>>> import pyhf
>>> from pyhf import tensorlib
>>> import torch
>>> torch.__version__
'1.10.0+cu102'
>>> pyhf.set_backend("pytorch")
>>> tensorlib.poisson(tensorlib.astensor([0, 0, 1, 1]), tensorlib.astensor([0, 1, 0, 1]))
tensor([1.0000, 0.3679, 0.0000, 0.3679])

so the backends might be deciding this for us. I'll need to check the other behaviours.

From a practical perspective, what do you think about adding a warning at runtime when using models with 0 yield predicted in some bins, and then letting the different backends implement the behavior in any way they like?

This sounds pretty reasonable.

@matthewfeickert
Copy link
Member

I've checked and all the backends agree on this now: the zero rate is allowed in the default public API across all of them

# degenerate_pmf.py
print("\nSciPy:")
import numpy as np
from scipy.stats import poisson

rate_lambda = np.asarray([0.0])
observed_k = np.asarray([0.0, 1.0])

print(f"# degenerate pmf log prob: {np.log(poisson.pmf(observed_k, rate_lambda))}")
print(f"# degenerate pmf prob: {poisson.pmf(observed_k, rate_lambda)}")

print("\nJAX:")
import jax.numpy as jnp
import jax.scipy.stats.poisson as jax_poisson

rate_lambda = jnp.asarray([0.0])
observed_k = jnp.asarray([0.0, 1.0])

print(f"# degenerate pmf log prob: {jnp.log(jax_poisson.pmf(observed_k, rate_lambda))}")
print(f"# degenerate pmf prob: {jax_poisson.pmf(observed_k, rate_lambda)}")

print("\nPyTorch:")
import torch
from torch.distributions import Poisson

rate_lambda = torch.tensor([0.0])
observed_k = torch.tensor([0.0, 1.0])

degenerate_pmf = Poisson(rate_lambda)
print(f"# degenerate pmf log prob: {degenerate_pmf.log_prob(observed_k)}")
print(f"# degenerate pmf prob: {torch.exp(degenerate_pmf.log_prob(observed_k))}")

print("\nTensorFlow:")
import tensorflow as tf
import tensorflow_probability as tfp

rate_lambda = tf.convert_to_tensor([0.0])
observed_k = tf.convert_to_tensor([0.0, 1.0])

degenerate_pmf = tfp.distributions.Poisson(rate_lambda)
print(f"# degenerate pmf log prob: {degenerate_pmf.log_prob(observed_k)}")
print(f"# degenerate pmf prob: {degenerate_pmf.prob(observed_k)}")
$ python degenerate_pmf.py

SciPy:
/home/feickert/Code/GitHub/pyhf/degenerate_pmf.py:8: RuntimeWarning: divide by zero encountered in log
  print(f"# degenerate pmf log prob: {np.log(poisson.pmf(observed_k, rate_lambda))}")
# degenerate pmf log prob: [  0. -inf]
# degenerate pmf prob: [1. 0.]

JAX:
# degenerate pmf log prob: [-4.768373e-07          -inf]
# degenerate pmf prob: [0.9999995 0.       ]

PyTorch:
# degenerate pmf log prob: tensor([0., -inf])
# degenerate pmf prob: tensor([1., 0.])

TensorFlow:
# degenerate pmf log prob: [  0. -inf]
# degenerate pmf prob: [1. 0.]

So it probably makes sense to implement this default behavior. We should check what ROOT does as well, and then consider the cutoff behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working follow up help wanted Extra attention is needed / contributions welcome question Further information is requested research experimental stuff
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants