Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strided containers #166

Open
superbobry opened this issue Aug 10, 2016 · 6 comments
Open

Strided containers #166

superbobry opened this issue Aug 10, 2016 · 6 comments

Comments

@superbobry
Copy link
Contributor

superbobry commented Aug 10, 2016

Have you considered making Vector.t strided (c.f. NumPy)?

The major benefits of this approach are:

  • Memory efficiency. No pointer chasing for whole-matrix operations.

  • Code reuse between Vectors and Matrices, e.g.

    (** Converts a matrix to a row-major vector in O(1) time. *)
    let flatten : t -> Vector.t = ...
    
    let add m1 m2 = Vector.add (flatten m1) (flatten m2)

Update: I probably should've added this as a comment to #27.

@rleonid
Copy link
Owner

rleonid commented Aug 10, 2016

The short answer is not Vector.t specifically. Vector.t (and Matrices) is a stop gap that is woefully insufficient for what I actually want for a linear algebra and statistical analysis supporting set of data structures.

The long answer is yes and I was working on implementing this in Bau. It wasn't a show stopper but I quickly arrived at the conclusion that one needed ppx rewriters to achieve fast performance on BigArray iteration. My next step was going to be in creating a variant layer that would manage data representation with minimal copying. But that is a very big task that I didn't (and unfortunately still don't) have the bandwith to tackle.

Perhaps this is a good spot as any to write down my thoughts and experiences on the subject. I think there are several issues in this space, some of which interact in various ways:

  1. Data representations layer. I chose float arrays out of convenience, but depending upon the situation (size, convenience) one can make arguments for BigArrays or float lists. I'm still not certain whether the library should remain flexible (operate over different inputs), strict (one type and everything is converted into that type) or intelligent (take as input lots of different types, and copy when/if needed). My inclination is for an "intelligent" (my naming of that approach exposes my bias) approach, but how to balance the features (and speed) found in a vector based language (ex. J) versus one that is expressive of statistical problems (ex. R) is still unknown to me. The other complications in this area are:
    • Modular Implicits - Would obviate functorizing operations, but unknown time frame.
    • Flambda - Seems like the way of the future, how should it effect the design?
    • Oml's target audience - there is some interest in separating the capabilities in the library away from the low level Bigarray, Lacaml stuff. To me this seems pressing and perhaps a good argument for sticking with float array's at the moment.
  2. Data operations. Linear Algebra and Statistical analysis (which are usually just folds over a set of floats. I want these things to have more of an OCaml/EDSL style (transposing a matrix 2x should be a no-op) and they should be agnostic to the actual representation (computing the mean should be an operation that can take an float array, float list ... etc).

All of this shouldn't discourage you from submitting a PR; working code is infinitely more valuable than no code. I just want to share my thinking on the subject.

@dbuenzli
Copy link
Contributor

A bit OT but note that things like flambda are not available in bytecode which is what js_of_ocaml starts from.

Which leads me to something I wanted to ask for a long time, how much of oml can be compiled to the browser with js_of_ocaml ? E.g. bigarrays are available there, but I doubt lacaml stuff can be brought there.

@rleonid
Copy link
Owner

rleonid commented Aug 10, 2016

I did not realize that bigarrays are available on js_of_ocaml! But you're also pointing at another subtle issue of bytecode vs native.

I've reserved a bit of time this and next week to see how much of oml I can built without the lacaml, lbfgs; An OCaml only oml. This will precipitate even more build system hacking, but hopefully an elegant solution will materialize.

@rleonid
Copy link
Owner

rleonid commented Aug 19, 2016

@dbuenzli If you have a chance take a look at oml-lite. Some of the big stuff (ex. multivariate and logistic regression) is missing. But it is viable, and hopefully the framework for iteration has been laid.

@dbuenzli
Copy link
Contributor

@rleonid Caveat, I have little knowledge on how oml is structured (I never used it, yet). However I do have the impression that by extracting oml-lite the way you did through cppo means 1) You are going to live a miserable build life in the long term 2) Other third-party libraries wanting to build on top of oml will have to choose between oml-lite and oml and the final user of the third-party library may not agree with the choice the latter did.

My approach would be to rather try to reorganize the API so that oml-lite shows up naturally as a single library with others libraries gradually mixing in the C and fortran dependencies. The whole could then be distributed through a single package, the sub-libraries letting end-user control the amount of C or fortran they want to bring in their code base.

@rleonid
Copy link
Owner

rleonid commented Aug 22, 2016

@dbuenzli Thank you for your feedback. I've replied here #173, so maybe we can keep these discussions organized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants