Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New to Inform - Have a few questions #84

Open
aleksejs-fomins opened this issue Dec 3, 2019 · 9 comments
Open

New to Inform - Have a few questions #84

aleksejs-fomins opened this issue Dec 3, 2019 · 9 comments

Comments

@aleksejs-fomins
Copy link

Dear Developers,

I currently actively use a code from your colleagues called JIDT/IDTxl, I'm sure you are familiar with it. Currently my code works well, but has prohibitively high computation time and memory footprint (even after I have parallelized it to multiple cores). I am looking for potential alternatives, and your code seems to be implemented in C, which gives me hope that it might be a good alternative for my tasks.

I am interested in computing Entropy, Mutual Information and Transfer Entropy for 3D matrices of [Channel x Time x Trial], where Trial stands for repetitions of the same experiment.

I have read through your documentation and some part of the source code, and still have unanswered questions. Would you be so kind to answer, or direct me to the correct place to ask the questions:

  • Is it currently possible to use real data? All examples seem to be using integer time-series
  • In the source code I have only found histogram-based estimators. Are there currently other estimators available (such as Kraskov)? Is the histogram estimator bias-corrected?
  • What exactly does block-entropy for k>1 do? Does it split time-steps into subsets of length k, or does is sweep the timesteps with a window of length k?
  • I am not able to figure out from the documentation what is an initial condition. Could you explain this concept or direct me to literature? Is this the same as what I call Trial? In that case, is it possible to, for example, find mutual information between two variables, for which only one time step, but many trials are given?
  • Transfer Entropy operates with lags. Questions of interest are "what is TE for X->Y at lag=7" or "what is TE for X->Y given all lags={1,2,3,4,5}". Can a lag parameter be provided? What is the current convention?
  • JIDT provides multivatiate TE estimators, which allow (to some extent) to eliminate spurious connections such as those due to common ancestry and intermediate link. Is such functionality present or foreseen in the nearest future?
  • For TE and MI, another super valuable measure is test against zero. Currently, JIDT performs such tests and returns p-values along with the estimates, allowing the user to estimate if there is at all a relationship between variables above chance? Is such functionality implemented or intended?

In principle, I would be interested in contributing, if I can achieve my goals with your toolbox given a few weeks of coding.

Best,
Aleksejs

@mwibral
Copy link

mwibral commented Dec 3, 2019 via email

@aleksejs-fomins
Copy link
Author

Dear Michael,

Thank you very much for your answer. Would you be so kind to elaborate on a few of them?

First a bit of details:
We are trying to compute dynamic changes in functional connectivity (FC) by sweeping the dataset with a rather small time-window, and relying on extracting the FC from many repetitions of the same experiment. A single dataset is approx (Nodes, Samples, Trials) = (12, 200, 400), which we sweep with a window of 4 timesteps, resulting in 197 runs of network analysis for the shape (12, 4, 400). Naturally, we also parallelize over targets. My main problem is not even the speed, but also the memory use. A single use of analyse_single_target for Kraskov estimator can exceed 4Gb of ram for the data shape above, which prevents me from efficiently running the parallelized code on the cluster. I have already contacted Joseph and Patricia about it, we will try to sort it out.

Questions then:

  • Is it possible to accelerate the testing procedure by extracting the surrogates only once from the entire dataset, and then testing each window with the same set of surrogates? Or, for example, if the same experiment has been performed on multiple days, extracting the surrogates from all days simultaneously? I understand that, theoretically, the most rigorous approach is to assume that the probability distribution and hence the testing threshold is different for every timestep and every day. But this way, there is also fewest points to estimate that probability distribution and that threshold. I am wondering if there is an alternative way of making a compromise somewhere there?
  • I do not doubt that the developers of JIDT have spent a lot of time in optimizing it. But, if I recall correctly, the use of Java over C was originally motivated by a compromise between speed and portability. Do you believe that there is merit in accelerating entropy estimation by moving its implementation to a lower level language such as C? It seems to me that portability is not much of a issue since installing a Python or R wrapper for C code that does not use any crazy pre-requisite libraries seems to be a single line command.

@aleksejs-fomins
Copy link
Author

Ah, yes, while I am here, another tiny question. I went through the IDTxl and JIDT documentation, and mostly saw estimators for MI and TE. Can I use JIDT/IDTxl to just estimate (conditional) entropy of my data? For example, a question I want to address is the information capacity of each channel excluding its immediate past. Given that we are using calcium imaging neuronal data at 20Hz, there is a certain concern that the signals may not change fast enough to provide information above noise. I would like to estimate the size of this effect

@mwibral
Copy link

mwibral commented Dec 3, 2019 via email

@aleksejs-fomins
Copy link
Author

Dear Michael,

I don't know either if 1600 datapoints is enough. From one side, more data is better. From the other, the macroscopic FC in mouse brain changes pretty rapidly it seems, so for large windows I get quite dense matrices. There is probably an optimum somewhere, so I am playing around with different window lengths.

Thanks for recommending the paper, I will have a look.

Yes, I compute single targets and fuse results. The shocking thing is that when I run analyse_single_target in a loop alongside with a memory profiler the memory consumption increases over time. I have done lots of benchmarking and can guarantee it is inside IDTxl. When I generate dummy results and comment out IDTxl, I only use 20Mb of memory for the same task. An even more weird thing is that it does not behave like a proper memory leak. It decreases over time. So the first run of analyse_single_target reserves most of the memory, the second a little less, the third even less. The total memory consumption approaches a certain maximum, after which it no longer grows. The memory does not get freed up if I close JVM, but it does get freed up if I close python. If I were to guess it looks like some sort of dynamic memory allocation which does anticipatory allocation based on current use. I am working on a minimal example right now, then I will send it around.

@dglmoore
Copy link
Contributor

dglmoore commented Dec 3, 2019

Hey now, that's enough talk about JIDT and IDTxl in Inform's issues sections. 😜

@aleksejs-fomins, thanks for your interest in Inform. I'll try to answer your questions, but right out of the gate I need to say that JIDT and IDTxl are much more developed than Inform is. Inform has high hopes, but still has a long way to go. To my knowledge, Inform has no features in common with IDTxl; it's much closer to — though still a long way from — JIDT.

Currently my code works well, but has prohibitively high computation time and memory footprint (even after I have parallelized it to multiple cores). I am looking for potential alternatives, and your code seems to be implemented in C, which gives me hope that it might be a good alternative for my tasks.

It will largely depend on your particular use case. You might get a bit of a performance bump, but it probably won't be earth shattering — maybe a 1-3x speedup and slightly better memory overhead. We included a (very naive) benchmark comparison of Inform and JIDT in our conference paper. JIDT and Inform are pretty even footed until you get to longer history lengths (though you probably shouldn't be looking at history lengths that long anyway). As these things go, I find the only way to know for sure is to give it a try. My hunch is that Inform will be bit speedier and lighter on memory, but whether it's worth the effort of converting your code from Java to C is for you to decide. Inform is fairly low-level. It doesn't offer anything like network inference, as IDTxl does, and you'd have to manually implement any parallel processing.

On to your questions and comments!

I am interested in computing Entropy, Mutual Information and Transfer Entropy for 3D matrices of [Channel x Time x Trial], where Trial stands for repetitions of the same experiment.

We have all three implemented, though you'll need to be careful with the array formatting. Inform expects the time series to be a continuous array. I can elaborate on this if you'd like.

I have read through your documentation and some part of the source code, and still have unanswered questions.

Sorry the documentation is lacking. I'll update it to reflect these questions and answers.

  • Is it currently possible to use real data? All examples seem to be using integer time-series
    In the source code I have only found histogram-based estimators. Are there currently other estimators available (such as Kraskov)? Is the histogram estimator bias-corrected?

Well, a lot of the "real" data we deal with in-house is discrete, so that's what Inform has focused on. We do not currently have continuously-valued estimators (though we plan to some day #21). We do have functions for binning data. Of course, binning isn't always appropriate. The histogram estimators are not biased-corrected, though I'm sure it'd be easy enough to implement bias correction.

  • What exactly does block-entropy for k>1 do? Does it split time-steps into subsets of length k, or does is sweep the timesteps with a window of length k?

It sweeps. I don't know of many use cases wherein you want to split the time series, but I think you could use the inform_black_box to accomplish that. That function seems to confuse people, but I think it's one of the nicest features of Inform. It turns out to be very useful for all kinds of things, e.g. you can use it together with any of the information measures to make the measure multivariate. Again, I'm happy to elaborate.

  • I am not able to figure out from the documentation what is an initial condition. Could you explain this concept or direct me to literature? Is this the same as what I call Trial? In that case, is it possible to, for example, find mutual information between two variables, for which only one time step, but many trials are given?

Yeah, I'm pretty sure we're using the same word for the same thing. Since there is no history length involved in mutual information, and because we estimate the distributions based on all trials (as JIDT does, I believe — @mwibral might know better than I do), I think you'd get the same answer regardless of how many trials you have. A better example would be something like transfer entropy:

int xs[6] = {0, 1,   // Trial 1
             1, 1,   // Trial 2
             0, 1};  // Trial 3
int ys[6] = {0, 1,
             1, 0,
             1, 1};
double te = inform_transfer_entropy(xs,     // source data
                                    ys,     // target data
                                    NULL,   // the 3rd argument is series against which to condition
                                    0,      // no conditions
                                    3,      // 3 trials
                                    2,      // 2 time steps per trial
                                    2,      // base-2 time series
                                    1,      // k = 1
                                    NULL);  // Ignore any errors because this is just an example and I'm lazy
assert(te == 0.6666666666666666);
  • Transfer Entropy operates with lags. Questions of interest are "what is TE for X->Y at lag=7" or "what is TE for X->Y given all lags={1,2,3,4,5}". Can a lag parameter be provided? What is the current convention?

Unfortunately, we don't have this implemented in Inform. It would be very simple to do, but we just haven't had the time. That said, if your time series only have one trial in them, then you can simply offset the arrays, something like

int xs[9] = {0, 0, 1, 0, 0, 1, 0, 0};
int ys[9] = {0, 0, 1, 1, 0, 1, 1, 0};

inform_transfer_entropy(xs, ys + 1, // this is the same as lagging xs by 1
                        NULL, 0, 1, 8, // your time series are now 1 element shorter
                        2, 1, NULL);

Of course, this is a bit unsatisfying. We've been toying with ideas for how to implement it. The issue we have is that C's not the nicest language for API design. The transfer entropy function takes 9 argument! That's a lot of information to keep straight. The direction I'm leaning is to create a function, similar to inform_black_box, which will introduce lags. You could then pass the resulting time series as arguments to the various information functions. I really like this kind of composable design and it helps us avoid monolithic and redundant implementation (though there's still plenty of that in Inform 😉 ).

  • JIDT provides multivatiate TE estimators, which allow (to some extent) to eliminate spurious connections such as those due to common ancestry and intermediate link. Is such functionality present or foreseen in the nearest future?

Well, you could mean one of two things here: multivariate in the sense of "what is the TE for (X × Y) →Z", or conditional in the sense of "what is the TE for X → Z conditioned on Y". Inform can do both, but the former requires inform_black_box. For the former, you can do something like...

int xs_ys[18] = {1, 0, 0, 1, 0, 1, 0, 0, 1,  // variable X
                 0, 1, 1, 0, 1, 1, 0, 0, 1}; // variable Y
int xs_times_ys[9];
inform_black_box(xs_ys,         // variables to black-box/coarse-grain/product
                 2,             // the number of variables
                 1,             // the number of trials per variable
                 9,             // the number of observations per trial
                 (int[2]){2,2}, // the base of each variable (both binary)
                 NULL,          // do not use a history length
                 NULL,          // do not use a future length
                 xs_times_ys,   // the array into which to place X × Y
                 NULL);         // Ignore any errors

// xs_times_ys == {2, 1, 1, 2, 1, 3, 0, 0, 3}

int zs[9] = {1, 0, 1, 0, 0, 1, 1, 0, 0};

// (X × Y) → Z
double te = inform_transfer_entropy(xs_times_ys, // source data
                                    zs,          // target data
                                    NULL,        // the 3rd argument is series against which to condition
                                    0,           // no conditions
                                    1,           // 1 trial per variable
                                    9,           // 9 time steps per trial
                                    4,           // base-4 timeseries (the largest base of the inputs)
                                    1,           // k = 1
                                    NULL);       // Ignore any errors because this is just an example and I'm lazy
printf("(X × Y) → Z: %.16lf\n", te);
assert(te == 0.9056390622295665);

// Z → (X × Y)
te = inform_transfer_entropy(zs, xs_times_ys, NULL, 0, 1, 9, 4, 1, NULL);
printf("Z → (X × Y): %.16lf\n", te);
assert(te == 0.5943609377704335);

Conditional transfer entropy (which we just fixed #78), is a little simpler:

int xs[9] = {1, 0, 0, 1, 0, 1, 0, 0, 1};
int ys[9] = {0, 1, 1, 0, 1, 1, 0, 0, 1};
int zs[9] = {1, 0, 1, 0, 0, 1, 1, 0, 0};

// X → Y | Z
double cte = inform_transfer_entropy(xs,    // source data
                                     ys,    // target data
                                     zs,    // condition data
                                     1,     // 1 condition
                                     1,     // 1 trial per variable
                                     9,     // 9 time steps per trial
                                     2,     // base-2 timeseries
                                     1,     // k = 1
                                     NULL); // Ignore any errors because this is just an example and I'm lazy
printf("X → Y | Z: %.16lf\n", cte);
assert(cte == 0.25);

// X → Z | Y
cte = inform_transfer_entropy(xs, zs, ys, 1, 1, 9, 2, 1, NULL);
printf("X → Z | Y: %.16lf\n", cte);
assert(cte == 0.25);

// Z → X | Y
cte = inform_transfer_entropy(zs, xs, ys, 1, 1, 9, 2, 1, NULL);
printf("Z → X | Y: %.16lf\n", cte);
assert(cte == 0.34436093777043353);
  • For TE and MI, another super valuable measure is test against zero. Currently, JIDT performs such tests and returns p-values along with the estimates, allowing the user to estimate if there is at all a relationship between variables above chance? Is such functionality implemented or intended?

This isn't currently implemented in Inform, but we have already done it in-house for a few different measures. It's actually pretty easy to do permutation tests, and will probably be one of the next features added.

Anyway, I hope this helps. Given that your data is real-valued, I suspect JIDT/IDTxl is the best fit for you — at least for now. That said, if you are still interested in Inform, we'd welcome any contributions you might have: code, comment and suggestions alike.

I personally like Inform, but I'm biased 😀.

@mwibral
Copy link

mwibral commented Dec 3, 2019 via email

@dglmoore
Copy link
Contributor

dglmoore commented Dec 3, 2019

@mwibral Ha. No worries. I was just giving you a hard time. The details might be different, but I think your comments are useful for Inform users too.

@aleksejs-fomins
Copy link
Author

Dear Douglas,

Thanks a lot for your broad reply. I apologise for using the word "real data". I meant "real-valued", not "real-world" :D. I would in principle consider contributing some continuous -variable estimators. Not with the purpose of competing with JIDT/IDTxl, but more as an exercise for myself to understand how they really work.

Cheers,
Aleksejs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants