Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First attempt at wrapping ABCpy - work-in-progress #4

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

LoryPack
Copy link

Hello,

I thought I'd open this pull request to keep track of my progress and ask for feedback (even if I don't think this is ready to merge yet).

I've done my first attempt at wrapping ABCpy. For now I've wrapped the Rejection ABC algorithm (I know you've that already, but that was the easiest one to do).

It seems to work (I've added a try_ABCpy.py file if you want to experiment with it), but I have made quite some simplifications as I did not know how to do some things. Precisely:

  • It looks like you have two separate parameters: num_simulations which is the total number of simulations which you allow, and then num_samples which is the returned number of posterior samples; if I am not wrong, you generate those with KDE from the ABC samples (at least this is what happens in the pyABC wrap). In my wrap of ABCpy, I have for now run Rej-ABC for num_simulations times, and then simply returned as posterior samples the ones obtained when considering the quantile of distances defined by quantile, or the num_top_samples ones. I am therefore not returning the specified num_samples. Do you suggest using your KDE code in this case as well? I believe it should be easily doable, only did not know what was the precise aim of it.
  • In the pyABC wrap, you choose one of num_top_samples, quantile, eps. I have not yet used the eps one, but will add code for that soon.
  • In ABCpy, we have the option of generating more than one simulation for parameter value and average the estimate of the distabce; I have not put that option for now, do you think that is useful?
  • I've not yet added SASS option (even if that is implemented in ABCpy); also, we do not have postprocessing LRA in ABCpy.

Please let me know what you think of my attempt.

@jan-matthis
Copy link
Contributor

jan-matthis commented Feb 1, 2021

Hi,

first off, this is great, thanks a lot for the PR!

To answer your questions:

I am therefore not returning the specified num_samples. Do you suggest using your KDE code in this case as well? I believe it should be easily doable, only did not know what was the precise aim of it.

To compare algorithms we mostly use 2-sample tests, e.g. C2ST. Let's say we run rejection ABC at a simulation budget of 1k simulations and use a quantile of 0.1, so we end up with 100 top samples. Typically, we compute C2STs against 10k reference posterior samples and would like to compute C2ST on a balanced dataset (i.e., containing as many reference as approximate posterior samples). The num_samples keyword exists for this reason -- it will typically be 10_000 when we run algorithms, in order to get back the desired amount of samples for calculating metrics.

We could either resample from the population of 100 samples to obtain 10k samples or fit a KDE to obtain more samples -- in the manuscript's appendix, we compare both and found that doing a KDE fit slightly improves performance for some tasks. So it'd be great to have the KDE option for ABCpy as well.

(Side note: We also checked computing C2STs using 1k instead of 10k samples -- it did not make much of a difference in terms of overall results -- we opted for 10k to be on the safe side).

In the pyABC wrap, you choose one of num_top_samples, quantile, eps. I have not yet used the eps one, but will add code for that soon.

We only used quantiles / num_top_samples in the manuscript in the end, but I agree it would be nice to have in case it is easy to do.

In ABCpy, we have the option of generating more than one simulation for parameter value and average the estimate of the distabce; I have not put that option for now, do you think that is useful?

This sounds very useful to me -- it would allow exploring whether splitting the budget to do multi-simulations per parameter is helpful on a given task.

I've not yet added SASS option (even if that is implemented in ABCpy); also, we do not have postprocessing LRA in ABCpy.

SASS and LRA were for some experiments we report in the appendix, it would be very nice to have but I think it's totally fine if they are not supported in the first version.

Let me know if more questions come up, or something was unclear in my explanations above.

Best, Jan-Matthis

@jan-matthis jan-matthis linked an issue Feb 1, 2021 that may be closed by this pull request
@LoryPack
Copy link
Author

LoryPack commented Feb 1, 2021

Hi Jan-Matthis,

Thanks for your reply, that was very clear. I'll add the KDE, eps and the multiple simulations per parameter value in the next iteration, and update once that is done. I'll skip the SASS and LRA for now.

Thanks for your help

@LoryPack
Copy link
Author

LoryPack commented Mar 5, 2021

Hello, I have finally had time to work on this (sorry for the long gap from my last update). As suggested, I have done the followin:

  • added the KDE and the resampling to obtain the correct number of samples, following the implementation for the other algorithms
  • I have added an option to directly fix the ABC threshold value (eps); this is in alternative to either quantile or num_top_samples. Directly fixing eps is however tricky, as it can result in no acceptances among the simulated datasets
  • Finally, I have added the option of using multiple simulations for every parameter values in order to better estimate the ABC distance; as you said, this allocates the budges by considering a smaller number of possible parameter values but attempting to estimate better the ABC distance.

Just as a quick remark, I noticing that some trials raise the following warning when selecting the kernel bandwidth for KDE with cross-validation:

UserWarning: One or more of the test scores are non-finite: [  -11.71028102   -19.40130322   -41.9697046  
 -137.58079662  -1501.8660018             nan            nan            nan            nan            nan]

Please let me know if there is anything you'd like to be changed in some way.

Copy link
Contributor

@jan-matthis jan-matthis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks a lot, great progress!

I have done a quick first pass over the PR and left a few comments here and there. Perhaps @janfb can have a look as well?

Regarding the KDE error, could you post a short code snippet to reproduce it (ideally with a fixed random seed)? We'll have a closer look

sbibm/algorithms/abcpy/abcpy_utils.py Show resolved Hide resolved
sbibm/algorithms/abcpy/abcpy_utils.py Outdated Show resolved Hide resolved
sbibm/algorithms/abcpy/rejection_abc.py Outdated Show resolved Hide resolved
sbibm/algorithms/abcpy/rejection_abc.py Outdated Show resolved Hide resolved
sbibm/algorithms/abcpy/rejection_abc.py Outdated Show resolved Hide resolved
try_ABCpy.py Outdated Show resolved Hide resolved
@LoryPack
Copy link
Author

Thanks for that, will work on incorporating the updates you suggested.

Once this is OK for the simple Rejection ABC, I am thinking of wrapping some of the other ABC algorithms which we have in ABCpy, as well as the Synthetic Likelihood ones.

The issue is however for some ABC algorithms you cannot know a priori how many simulations they will require; I will therefore start by wrapping the ones which have a fixed simulations budget.

Copy link
Contributor

@janfb janfb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I added a comment and a question.

sbibm/algorithms/abcpy/rejection_abc.py Outdated Show resolved Hide resolved
)
journal_standard_ABC = sampler.sample(
[[np.array(observation)]],
n_samples=num_simulations // num_simulations_per_param,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the n_samples is a fixed budget, right? there is in principle no way it can be exceeded, and if it was then we would get a SimulationBudgetExceeded exception because of the max_calls passed above, no?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically in ABCpy Rejection ABC fixes a number of samples with distance from the observation below a given threshold, and simulates from the model until that number of samples are accepted. As a workaround to get a fixed simulation budget, I used an extremely large epsilon so that all simulation are accepted. The results are then post-processed and the ones with smaller distance are accepted.

I realize it's not the cleanest implementation ever, but it should work here. As in ABCpy we rarely use RejectionABC for practical purposes, we never improved the implementation so that it allows a fixed simulation target.

@jan-matthis jan-matthis marked this pull request as draft November 10, 2021 11:36
@jan-matthis jan-matthis added the enhancement New feature or request label Feb 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Development

Successfully merging this pull request may close these issues.

Adding ABCpy package
3 participants