First attempt at wrapping ABCpy - work-in-progress #4

LoryPack · 2021-01-30T17:19:35Z

Hello,

I thought I'd open this pull request to keep track of my progress and ask for feedback (even if I don't think this is ready to merge yet).

I've done my first attempt at wrapping ABCpy. For now I've wrapped the Rejection ABC algorithm (I know you've that already, but that was the easiest one to do).

It seems to work (I've added a try_ABCpy.py file if you want to experiment with it), but I have made quite some simplifications as I did not know how to do some things. Precisely:

It looks like you have two separate parameters: num_simulations which is the total number of simulations which you allow, and then num_samples which is the returned number of posterior samples; if I am not wrong, you generate those with KDE from the ABC samples (at least this is what happens in the pyABC wrap). In my wrap of ABCpy, I have for now run Rej-ABC for num_simulations times, and then simply returned as posterior samples the ones obtained when considering the quantile of distances defined by quantile, or the num_top_samples ones. I am therefore not returning the specified num_samples. Do you suggest using your KDE code in this case as well? I believe it should be easily doable, only did not know what was the precise aim of it.
In the pyABC wrap, you choose one of num_top_samples, quantile, eps. I have not yet used the eps one, but will add code for that soon.
In ABCpy, we have the option of generating more than one simulation for parameter value and average the estimate of the distabce; I have not put that option for now, do you think that is useful?
I've not yet added SASS option (even if that is implemented in ABCpy); also, we do not have postprocessing LRA in ABCpy.

Please let me know what you think of my attempt.

jan-matthis · 2021-02-01T08:54:31Z

Hi,

first off, this is great, thanks a lot for the PR!

To answer your questions:

I am therefore not returning the specified num_samples. Do you suggest using your KDE code in this case as well? I believe it should be easily doable, only did not know what was the precise aim of it.

To compare algorithms we mostly use 2-sample tests, e.g. C2ST. Let's say we run rejection ABC at a simulation budget of 1k simulations and use a quantile of 0.1, so we end up with 100 top samples. Typically, we compute C2STs against 10k reference posterior samples and would like to compute C2ST on a balanced dataset (i.e., containing as many reference as approximate posterior samples). The num_samples keyword exists for this reason -- it will typically be 10_000 when we run algorithms, in order to get back the desired amount of samples for calculating metrics.

We could either resample from the population of 100 samples to obtain 10k samples or fit a KDE to obtain more samples -- in the manuscript's appendix, we compare both and found that doing a KDE fit slightly improves performance for some tasks. So it'd be great to have the KDE option for ABCpy as well.

(Side note: We also checked computing C2STs using 1k instead of 10k samples -- it did not make much of a difference in terms of overall results -- we opted for 10k to be on the safe side).

In the pyABC wrap, you choose one of num_top_samples, quantile, eps. I have not yet used the eps one, but will add code for that soon.

We only used quantiles / num_top_samples in the manuscript in the end, but I agree it would be nice to have in case it is easy to do.

In ABCpy, we have the option of generating more than one simulation for parameter value and average the estimate of the distabce; I have not put that option for now, do you think that is useful?

This sounds very useful to me -- it would allow exploring whether splitting the budget to do multi-simulations per parameter is helpful on a given task.

I've not yet added SASS option (even if that is implemented in ABCpy); also, we do not have postprocessing LRA in ABCpy.

SASS and LRA were for some experiments we report in the appendix, it would be very nice to have but I think it's totally fine if they are not supported in the first version.

Let me know if more questions come up, or something was unclear in my explanations above.

Best, Jan-Matthis

LoryPack · 2021-02-01T21:03:59Z

Hi Jan-Matthis,

Thanks for your reply, that was very clear. I'll add the KDE, eps and the multiple simulations per parameter value in the next iteration, and update once that is done. I'll skip the SASS and LRA for now.

Thanks for your help

…imulations

…p_samples

LoryPack · 2021-03-05T18:48:15Z

Hello, I have finally had time to work on this (sorry for the long gap from my last update). As suggested, I have done the followin:

added the KDE and the resampling to obtain the correct number of samples, following the implementation for the other algorithms
I have added an option to directly fix the ABC threshold value (eps); this is in alternative to either quantile or num_top_samples. Directly fixing eps is however tricky, as it can result in no acceptances among the simulated datasets
Finally, I have added the option of using multiple simulations for every parameter values in order to better estimate the ABC distance; as you said, this allocates the budges by considering a smaller number of possible parameter values but attempting to estimate better the ABC distance.

Just as a quick remark, I noticing that some trials raise the following warning when selecting the kernel bandwidth for KDE with cross-validation:

UserWarning: One or more of the test scores are non-finite: [  -11.71028102   -19.40130322   -41.9697046  
 -137.58079662  -1501.8660018             nan            nan            nan            nan            nan]

Please let me know if there is anything you'd like to be changed in some way.

jan-matthis

Hi, thanks a lot, great progress!

I have done a quick first pass over the PR and left a few comments here and there. Perhaps @janfb can have a look as well?

Regarding the KDE error, could you post a short code snippet to reproduce it (ideally with a fixed random seed)? We'll have a closer look

sbibm/algorithms/abcpy/abcpy_utils.py

sbibm/algorithms/abcpy/rejection_abc.py

try_ABCpy.py

LoryPack · 2021-03-14T13:48:01Z

Thanks for that, will work on incorporating the updates you suggested.

Once this is OK for the simple Rejection ABC, I am thinking of wrapping some of the other ABC algorithms which we have in ABCpy, as well as the Synthetic Likelihood ones.

The issue is however for some ABC algorithms you cannot know a priori how many simulations they will require; I will therefore start by wrapping the ones which have a fixed simulations budget.

Remove random seed and small fix

janfb

Looks great! I added a comment and a question.

sbibm/algorithms/abcpy/rejection_abc.py

janfb · 2021-03-17T07:45:40Z

sbibm/algorithms/abcpy/rejection_abc.py

+    )
+    journal_standard_ABC = sampler.sample(
+        [[np.array(observation)]],
+        n_samples=num_simulations // num_simulations_per_param,


the n_samples is a fixed budget, right? there is in principle no way it can be exceeded, and if it was then we would get a SimulationBudgetExceeded exception because of the max_calls passed above, no?

Basically in ABCpy Rejection ABC fixes a number of samples with distance from the observation below a given threshold, and simulates from the model until that number of samples are accepted. As a workaround to get a fixed simulation budget, I used an extremely large epsilon so that all simulation are accepted. The results are then post-processed and the ones with smaller distance are accepted.

I realize it's not the cleanest implementation ever, but it should work here. As in ABCpy we rarely use RejectionABC for practical purposes, we never improved the implementation so that it allows a fixed simulation target.

First attempt at wrapping ABCpy

eb7f8fb

jan-matthis linked an issue Feb 1, 2021 that may be closed by this pull request

Adding ABCpy package #3

Open

LoryPack added 3 commits March 5, 2021 16:46

Add KDE for generating the correct number of posterior samples from s…

e8d59a7

…imulations

Add option of choosing epsilon directly instead of quantile or num_to…

f81edba

…p_samples

Add option of using num_samples_per_param > 1

9d5abef

jan-matthis requested changes Mar 10, 2021

View reviewed changes

LoryPack added 4 commits March 14, 2021 14:59

Some small fixes

055f74c

isort and black

a409676

Remove random seed and small fix

30b3a5a

Remove random seed and small fix

Create tests

c9c6469

janfb reviewed Mar 17, 2021

View reviewed changes

Fix docstring

b35211e

jan-matthis marked this pull request as draft November 10, 2021 11:36

jan-matthis added the enhancement New feature or request label Feb 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

First attempt at wrapping ABCpy - work-in-progress #4

First attempt at wrapping ABCpy - work-in-progress #4

LoryPack commented Jan 30, 2021

jan-matthis commented Feb 1, 2021 •

edited

Loading

LoryPack commented Feb 1, 2021

LoryPack commented Mar 5, 2021

jan-matthis left a comment

LoryPack commented Mar 14, 2021

janfb left a comment

janfb Mar 17, 2021

LoryPack Mar 18, 2021

First attempt at wrapping ABCpy - work-in-progress #4

Are you sure you want to change the base?

First attempt at wrapping ABCpy - work-in-progress #4

Conversation

LoryPack commented Jan 30, 2021

jan-matthis commented Feb 1, 2021 • edited Loading

LoryPack commented Feb 1, 2021

LoryPack commented Mar 5, 2021

jan-matthis left a comment

Choose a reason for hiding this comment

LoryPack commented Mar 14, 2021

janfb left a comment

Choose a reason for hiding this comment

janfb Mar 17, 2021

Choose a reason for hiding this comment

LoryPack Mar 18, 2021

Choose a reason for hiding this comment

jan-matthis commented Feb 1, 2021 •

edited

Loading