Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Oml vs Oml-lite split #173

Closed
rleonid opened this issue Aug 22, 2016 · 13 comments
Closed

Oml vs Oml-lite split #173

rleonid opened this issue Aug 22, 2016 · 13 comments

Comments

@rleonid
Copy link
Owner

rleonid commented Aug 22, 2016

@dbuenzli wrote:

@rleonid Caveat, I have little knowledge on how oml is structured (I never used it, yet). However I do have the impression that by extracting oml-lite the way you did through cppo means 1) You are going to live a miserable build life in the long term 2) Other third-party libraries wanting to build on top of oml will have to choose between oml-lite and oml and the final user of the third-party library may not agree with the choice the latter did.

I agree about 1), though I am already living a miserable build life dealing with testing. Currently, the choice of oml-lite or oml is about binding to C code or not (or as some others have described it, "waiting half an hour for fortran to compile"). I think that it would be very difficult to get around this issue by not addressing the build.

I also think that your second point is valid. But it is true of all software. I face similar issues with regard to core and lots of other downstream packages. It is ultimately, up to the second-party developer to make smart choices.

My approach would be to rather try to reorganize the API so that oml-lite shows up naturally as a single library with others libraries gradually mixing in the C and fortran dependencies. The whole could then be distributed through a single package, the sub-libraries letting end-user control the amount of C or fortran they want to bring in their code base.

Do you mean something along the lines of distributing each of the sub-packs separately (ie. Statistics, Classification ... etc)? I generally agree (and think this is the long term goal) but at the moment I am hesitant to start separating because there are many inter-dependencies; both in how one thinks about the algorithms (ex. LDA can be used for Classification or Unsupervised learning) and how the code is written. Furthermore, one of the problems oml is trying to address directly, is to group lots of separate functionality that may not be thoughtfully integrated. For example, the result of a regression analysis should have hypothesis tests easily associated with it.

This issue is certainly not closed. I hope that you can give oml (or oml-lite a try) and we can talk specifics.

@dbuenzli
Copy link
Contributor

dbuenzli commented Aug 22, 2016

I also think that your second point is valid. But it is true of all software. I face similar issues with regard to core and lots of other downstream packages. It is ultimately, up to the second-party developer to make smart choices.

Not really... she can't do anything about it. Suppose she has a lib that depends only on oml-lite, the right move would be to depend on that one.

Now suppose a user of lib wants to use the full oml, that user is stuck. It forces the second-party developer to replicate the lib-lite, lib structure, effectively imposing build mess to everyone.

This wouldn't happen if you had a properly structured library where oml depends on oml-lite (note, whether you distribute them in OPAM separately or not is unrelated, the problem is that with what you added now to OPAM oml does not reuse oml-lite).

Do you mean something along the lines of distributing each of the sub-packs separately (ie. Statistics, Classification ... etc)?

Not necessarily. It is unclear to me what exactly is provided by the third-party libraries so it's diffcult for me to answer the question.

@hammer
Copy link
Contributor

hammer commented Aug 23, 2016

oml does not reuse oml-lite

@rleonid how hard would it be for oml to reuse oml-lite?

@rleonid
Copy link
Owner Author

rleonid commented Aug 23, 2016

@hammer The way things are currently structured it would be difficult.

I am hesitant to say impossible, but I actually don't know how to do it without restructuring the project. The main obstacle being the packing logic that gives oml the light namespace hierarchy (Classification, Statistics ... etc). I value this hierarchy because it allows us to semantically catalogue operations; Classification.Descriminant vs Unsupervised.Descriminant (to be implemented), or Statistics.Hypothesis_test vs Regression.Test (also to be implemented). Furthermore, if we were to actually separate oml into separate packages, I think these would make more sense to depend upon than a C vs non-C split.

While I think that @dbuenzli is making a good point about wanting oml to depend on oml-lite, it is an idealized point. At this point the set of people who could need a library that uses oml-lite and then also need oml is much smaller than the set of people who might be attracted to using oml-lite for calculations and the set of people who might want to contribute code.

@dbuenzli
Copy link
Contributor

the main obstacle being the packing logic that gives oml the light namespace hierarchy (Classification, Statistics ... etc). I value this hierarchy because it allows us to semantically catalogue operations;

I don't see what prevents you from having Oml_lite and Oml namespacing modules, designed to be opened which define the same toplevel modules with Oml simply including those of Oml_lite and adding more.

It seems you are underusing the naming and structuring capabilities of the module system. You would not even need to publish more than one package you can simply use opam's depopts, to build Oml's library conditionnally.

While I think that @dbuenzli is making a good point about wanting oml to depend on oml-lite, it is an idealized point.

I wouldn't call that an idealized point, you asked for feedback about the approach, here you have it: what you are doing is simply anti-modular.

In a library eco-system like opam where dependency cones grow quickly, the problem could show up more quickly that you'd think. There are ways to provide oml-lite without introducing these problems in the eco-system and I don't think they would need much restructuring at the API level beyond the introduction of the aforementioned names and a few opens in client code.

@rleonid
Copy link
Owner Author

rleonid commented Aug 23, 2016

@dbuenzli How would you address the Functions module?

Writing special mathematical functions is tricky since it requires balancing the trade-offs of convergence, performance and accuracy. Without resorting to rewriting all of them in pure OCaml I wrapped Cephes a suitable library in C (ocephes). There will also be functions that are easily implemented in OCaml (ex. softmax). Now, AFAIK, since I made the compilation of Functions dependent on the presence of ocephes the module will have different signatures in oml and oml-lite (or a library with or without this C dependency), thus preventing the loading of one given the other.

Would you recommend splitting up Functions entirely?

@dbuenzli
Copy link
Contributor

dbuenzli commented Aug 23, 2016

@dbuenzli How would you address the Functions module?

module Oml_lite : sig
  module Functions : sig
    val softmax : ?temperature:float -> float array -> float array 
  end
end = struct
  module Functions = Oml_lite_functions
end
module Oml : sig 
  module Functions : sig 
    include module type of Oml_lite.Functions
    val gamma : float -> float
    ...
  end 
end = struct
   module Functions = struct
      include Oml_lite_functions
      include Oml_functions
   end
end

@rleonid
Copy link
Owner Author

rleonid commented Aug 23, 2016

Right, but Functions is packed into Statistics; unfortunate place for it at the moment, but some sub-pack namespace is unavoidable, or should I get rid of those?
Otherwise I would need 2 level of the previous split; Oml_statistics , Oml_lite_statistics that have Stats_functions and Stats_lite_functions.

@dbuenzli
Copy link
Contributor

dbuenzli commented Aug 23, 2016

Right, but Functions is packed into Statistics; unfortunate place for it at the moment, but some sub-pack namespace is unavoidable, or should I get rid of those?

There's no problem in adapting the thing for more than one level. You don't need to introduce modules for the other levels if they contain only modules. This means that you can simply make the split lite/non-lite at the lowest level. You can then design your names in the namespacing module:

module Oml : sig ...
end = struct
  module Statistics = struct 
     module Functions = struct
        include Oml_lite_functions
        include Oml_functions
     end
   ...
  end
end

but some sub-pack namespace is unavoidable, or should I get rid of those?

In general avoid too deeply nested hierarchies, beyond two (not counting the toplevel namespace) things become a bit annoying to read and write and people then tend to open or define their own aliases which is bad for readability.

@rleonid
Copy link
Owner Author

rleonid commented Aug 23, 2016

There will be multi-level split for things such as Classification (which contains Naive_bayes that must be split).

Regardless, you're advocating for manual packing the namespace modules. This is the approach that I wanted to avoid from the beginning. My main reservation against it is that it will become much more difficult to maintain as inter-module dependencies will grow, and I think the solution will not be elegant.

@dbuenzli How about a deal: if I re-implement oml_lite with with all of these manual packs, then you'll contribute new functionality to oml (or oml_lite)?

@dbuenzli
Copy link
Contributor

My main reservation against it is that it will become much more difficult to maintain as inter-module dependencies will grow, and I think the solution will not be elegant.

I'm unconvinced about the maintenance argument. If you are well principled you can keep the mapping between the namespacing module and the implementations obvious. As far as elegance goes, I personally find a language based approach much more elegant than pre-processing ifdefs... it may also make your build system simpler.

@dbuenzli How about a deal: if I re-implement oml_lite with with all of these manual packs, then you'll contribute new functionality to oml (or oml_lite)?

Ha ! For once I wanted to be a user... So that I can throw away toy stuff like this.

@rleonid
Copy link
Owner Author

rleonid commented Aug 23, 2016

I'm unconvinced about the maintenance argument.

Says the guy asking someone else to do the maintenance. 😈

Ha ! For once I wanted to be a user... So that I can throw away toy stuff like this.

Do we have a deal? I'll let you loosely interpret the terms and what you think is a comparable contribution.

@dbuenzli
Copy link
Contributor

dbuenzli commented Aug 24, 2016

Says the guy asking someone else to do the maintenance. 😈

As someone who publishes a lot of packages I really care about maintenance costs, I'm really telling you what I think is best here and what I'd actually do personally. If you compare this approach to the one with pre-processing you'll get help from the compiler and it ensures that both oml and oml-lite do not drift appart, something #ifdef spaghettis make much more easier to achieve.

Do we have a deal? I'll let you loosely interpret the terms and what you think is a comparable contribution.

I'm afraid I won't have time in the foreseeable future, so I can't promise anything. Do what you think is best for the project...

@rleonid
Copy link
Owner Author

rleonid commented Nov 2, 2016

Resolved with #174

@rleonid rleonid closed this as completed Nov 2, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants