-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First attempt at wrapping ABCpy - work-in-progress #4
base: main
Are you sure you want to change the base?
Conversation
Hi, first off, this is great, thanks a lot for the PR! To answer your questions:
To compare algorithms we mostly use 2-sample tests, e.g. C2ST. Let's say we run rejection ABC at a simulation budget of 1k simulations and use a quantile of 0.1, so we end up with 100 top samples. Typically, we compute C2STs against 10k reference posterior samples and would like to compute C2ST on a balanced dataset (i.e., containing as many reference as approximate posterior samples). The We could either resample from the population of 100 samples to obtain 10k samples or fit a KDE to obtain more samples -- in the manuscript's appendix, we compare both and found that doing a KDE fit slightly improves performance for some tasks. So it'd be great to have the KDE option for (Side note: We also checked computing C2STs using 1k instead of 10k samples -- it did not make much of a difference in terms of overall results -- we opted for 10k to be on the safe side).
We only used quantiles / num_top_samples in the manuscript in the end, but I agree it would be nice to have in case it is easy to do.
This sounds very useful to me -- it would allow exploring whether splitting the budget to do multi-simulations per parameter is helpful on a given task.
SASS and LRA were for some experiments we report in the appendix, it would be very nice to have but I think it's totally fine if they are not supported in the first version. Let me know if more questions come up, or something was unclear in my explanations above. Best, Jan-Matthis |
Hi Jan-Matthis, Thanks for your reply, that was very clear. I'll add the KDE, eps and the multiple simulations per parameter value in the next iteration, and update once that is done. I'll skip the SASS and LRA for now. Thanks for your help |
Hello, I have finally had time to work on this (sorry for the long gap from my last update). As suggested, I have done the followin:
Just as a quick remark, I noticing that some trials raise the following warning when selecting the kernel bandwidth for KDE with cross-validation:
Please let me know if there is anything you'd like to be changed in some way. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, thanks a lot, great progress!
I have done a quick first pass over the PR and left a few comments here and there. Perhaps @janfb can have a look as well?
Regarding the KDE error, could you post a short code snippet to reproduce it (ideally with a fixed random seed)? We'll have a closer look
Thanks for that, will work on incorporating the updates you suggested. Once this is OK for the simple Rejection ABC, I am thinking of wrapping some of the other ABC algorithms which we have in ABCpy, as well as the Synthetic Likelihood ones. The issue is however for some ABC algorithms you cannot know a priori how many simulations they will require; I will therefore start by wrapping the ones which have a fixed simulations budget. |
Remove random seed and small fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! I added a comment and a question.
) | ||
journal_standard_ABC = sampler.sample( | ||
[[np.array(observation)]], | ||
n_samples=num_simulations // num_simulations_per_param, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the n_samples
is a fixed budget, right? there is in principle no way it can be exceeded, and if it was then we would get a SimulationBudgetExceeded
exception because of the max_calls
passed above, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically in ABCpy Rejection ABC fixes a number of samples with distance from the observation below a given threshold, and simulates from the model until that number of samples are accepted. As a workaround to get a fixed simulation budget, I used an extremely large epsilon so that all simulation are accepted. The results are then post-processed and the ones with smaller distance are accepted.
I realize it's not the cleanest implementation ever, but it should work here. As in ABCpy we rarely use RejectionABC for practical purposes, we never improved the implementation so that it allows a fixed simulation target.
Hello,
I thought I'd open this pull request to keep track of my progress and ask for feedback (even if I don't think this is ready to merge yet).
I've done my first attempt at wrapping ABCpy. For now I've wrapped the Rejection ABC algorithm (I know you've that already, but that was the easiest one to do).
It seems to work (I've added a
try_ABCpy.py
file if you want to experiment with it), but I have made quite some simplifications as I did not know how to do some things. Precisely:num_simulations
which is the total number of simulations which you allow, and thennum_samples
which is the returned number of posterior samples; if I am not wrong, you generate those with KDE from the ABC samples (at least this is what happens in the pyABC wrap). In my wrap of ABCpy, I have for now run Rej-ABC fornum_simulations
times, and then simply returned as posterior samples the ones obtained when considering the quantile of distances defined byquantile
, or thenum_top_samples
ones. I am therefore not returning the specifiednum_samples
. Do you suggest using your KDE code in this case as well? I believe it should be easily doable, only did not know what was the precise aim of it.num_top_samples
,quantile
,eps
. I have not yet used theeps
one, but will add code for that soon.Please let me know what you think of my attempt.