Using C++11s Distributions in Shogun #1998

cameo54321 · 2014-03-14T19:14:59Z

The recent addition in shogun for a probability distribution was:
http://www.shogun-toolbox.org/doc/en/3.0.0/classshogun_1_1CProbabilityDistribution.html
with implementation
http://www.shogun-toolbox.org/doc/en/3.0.0/classshogun_1_1CGaussianDistribution.html

The above implementation uses SIMD-oriented Fast Mersenne Twister (SFMT) pseudorandom number generator for random number generation and Eigen3 for generating the Gaussian Distribution from these samples.

The C++11s Pseudo-random number generator (as pointed by @vigsterkr) has many built-in distributions and random number generators which can be used for generating distributions in Shogun. So instead of implementing each distribution by ourselves, can we use C++11s distributions?

What if the classes for the Issue #1929 are written utilizing C++11s distributions in mind instead of (or alongwith) SFMT? What can be a good direction to achieve that?

@karlnapf @vigsterkr Comments please?

karlnapf · 2014-03-16T13:27:23Z

I agree, we should try to use as much C++11 as possible. This is well-tested and reliable code and we don't even add a new dependency.

So we can do most of the sampling using c++11 methods. For some multivariate distributions, we will need some amount of linear algebra operations. Those should be done against our Shogun internal (soon to exist) framework, see #1930 and #1973. Or, since the latter interface is not yet complete, use eigen3 (such as the multivariate Gaussian).

In additions, and maybe more important than sampling, is to be able to evaluate the pdf/cdf of those densities. AFAIK this is not supported by c++11. Also, one can do lots of things ineffective or even wrong. An example is again the Gaussian, the Cholesky decomposition of the covariance should only be computed once in the beginning (if not already specified) such that evaluating the pdf of samples does not have to do that again. For many distributions, evaluating the pdf of many points at once comes at the same cost as evaluating a single points (see again Gaussian). Another interesting feature would be to compute quantiles of a given number of points (see https://github.com/karlnapf/kameleon-mcmc)
For univariate distributions, pdf/cdf functions usually depend on table lookups or numerical integration (see for example CStatistics::gamma_cdf). Currently, we borrowed a couple of those implementations from ALGLIB. Apart from the problem that this is GPL, integrating such codes is not the best idea due to the overhead it creates and the impossibility of maintaining any bugs/changes since nobody understands the code. I would much rather depend on http://www.boost.org/doc/libs/1_55_0/libs/math/doc/html/dist.html which is mature, tested, etc. We can make the dependency optional since most shogun packages dont need complicated distributions. Using external libs should always be done in a way that we have minimal dependency.

Stan can serve as a

inspiration how to represent distributions
we could also borrow code for complicated PDFs
inspiration for auto-diff (which is out of scope for the mcmc GSoC project though)

karlnapf · 2014-03-16T13:28:18Z

Final point: We want a unified interface in Shogun, so merge all the existing probability classes into one

maximum likelihood learning
some other methods, see CDistribution

vigsterkr · 2014-03-16T17:24:05Z

regarding SFMT: no... we've just added SFMT. before even suggesting something like that, it'd be better to investigate the performance between SFMT/dSFMT and c++11

karlnapf · 2014-03-16T22:42:57Z

yeah very good point! In fact, let's just stay with the current one and only add things from c++11 if we there is not existing implementation. This is not really about re-doing normal numbers but about interfaces of probability distributions

vigsterkr · 2014-03-16T22:49:39Z

yeah we had a discussion about this with @cameo54321 on IRC. i've tried to suggest that it would be good to check out other libraries (e.g. stan, boost) to get some good ideas how our probability distrib API should look...

karlnapf · 2014-03-16T23:37:18Z

Yeah, so just to summarise a few things

This is first of all about API aka how to represent distributions in Shogun. Unified interface!
This might involve interfaces for creating random numbers. This task's priority is not on this. We just should keep things open here. It should be noticed that c++11 could possibly help us there, but we don't know. Backends have to be investigated. We currently have a working one, so no need to touch this as long as we don't need to.
Way more important are the (log) pdf/cdf functions. We need a way of getting rid of this ugly ALGLIB GPL code in CStatistics Those things are messy (lookup tables, integration, etc) and important (small mistakes fuck up things) and have therefore to be tested properly. Stealing code and integrating into Shogun with copy/paste (the current state) is not what we want there. This is where I suggest a possible boost backend. This is independent of the interface, there could be multiple implementations. Boost is the only serious one that I know, I am open for any suggestions here. But I really dont want any re-implementations of such things, its way too much work, much harder than generating random numbers. But again, this is first about interfaces.

Please see the kameleon-mcmc distribution framework on my github. This is in the direction of what we want.

vigsterkr · 2017-07-09T01:56:55Z

feature/refactor-random is on this path :)

karlnapf · 2017-07-09T15:23:14Z

FYI: ALGLIB did already got pushed out a while ago (it was GPL code, replace with cdflib)

cameo54321 mentioned this issue Mar 18, 2014

Initial Commit for Unifying Distributions #2026

Closed

karlnapf mentioned this issue Sep 20, 2016

add t distribution and robust EP #3450

Open

vigsterkr closed this as completed Jul 9, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using C++11s Distributions in Shogun #1998

Using C++11s Distributions in Shogun #1998

cameo54321 commented Mar 14, 2014

karlnapf commented Mar 16, 2014

karlnapf commented Mar 16, 2014

vigsterkr commented Mar 16, 2014

karlnapf commented Mar 16, 2014

vigsterkr commented Mar 16, 2014

karlnapf commented Mar 16, 2014

vigsterkr commented Jul 9, 2017

karlnapf commented Jul 9, 2017