Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: README suggestions #2

Merged
merged 2 commits into from
Oct 30, 2018
Merged

WIP: README suggestions #2

merged 2 commits into from
Oct 30, 2018

Conversation

kohr-h
Copy link

@kohr-h kohr-h commented Oct 22, 2018

Here are some suggestions, WIP, currently reached the option 3 header.

Remarks/questions:

General

  • Less jargon would look less scary to beginners (like JNI, interop, refactoring, release tag, native deps etc.).
  • "native dependencies" is a bit abstract, maybe associate it with a concrete, tangible artifact like "MXNet core", if that's the correct way of looking at it?

Introduction

  • What are the "needed tools"?
  • Which APIs are exposed in Clojure? Aren't the real low-level ones only available in C++, and the rest either "intermediate" or "high" level?
  • GANs and natural language support are not in the same category as the aspects (tools, building blocks) mentioned before them -- they're rather constructs one can build out of the available blocks. I'd find it more compelling to be more concrete and mention that the Module API is supported and the Gluon API is work in progress (with links to both so users can check). Then advanced applications and network structures can be mentioned as use cases.
  • What does "interop" mean?
  • I totally don't get the part with the JNI bindings. Why are we suddenly talking about refactoring? Is this part important? I think this part is better suited for a technical intro for developers.

Current State and Plans

  • The text doesn't say anything about the current state or about about plans 😄 .
  • As a potential user, I'd like to know:
    • Is the package stable, beta, alpha, somewhat usable, barely working or totally broken?
    • How long can I expect it to take until I hit the first crippling bug?
  • I think readers would like something more concrete and not have to click on a link to get concrete answers.
  • Why not write that the best way to get involved is to install the package, run the examples, play around etc., and get back to the devs in case something isn't working as expected. And then would follow a description of how to best get involved (a brief explanation of the Slack channel, the mailing list, and the GH issue tracker). Probably it's a good idea to have a separate section "Getting Involved" for this purpose.

Getting Started

  • C++ rather than C?

  • OS specificity: does it matter only for the core library or also for the language bindings? Do we have to be that specific here?

  • Three ways to get started: I think I understood the second item after reading it 3 times. Is a release tag a Git release tag? What is in source form, what comes as a jar?

  • I think the important cases to cover here are

    1. Everything prebuilt
    2. Only Clojure package from source
    3. Everything from source

    And all cases should be annotated with the expected difficulty (as they already are).

    People can probably interpolate additional options.

  • How about CUDA instructions on Ubuntu? We only have them for Arch.

  • Is it really necessary to match the version of the Scala package with the Git tag or is it fine to just go with Git master?

* Checkout the MXNet project from a release tag using the Scala jars with native deps. This is also a pretty easy way to get started.
* Build from the MXNet project master. This option can be used to build the whole project yourself.
1. Install [prebuilt Clojure jars](https://search.maven.org/search?q=clojure%20mxnet) with the native dependencies baked in. This the quickest way to get going.
2. Install the Clojure package from source, but use prebuilt jars for the native dependencies. Choose this option if you want pre-release features of the Clojure package but don't want to build (compile native dependencies yourself.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
2. Install the Clojure package from source, but use prebuilt jars for the native dependencies. Choose this option if you want pre-release features of the Clojure package but don't want to build (compile native dependencies yourself.
2. Install the Clojure package from source, but use prebuilt jars for the native dependencies. Choose this option if you want pre-release features of the Clojure package but don't want to build (compile) native dependencies yourself.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's great. I think there will be some merge conflicts with the current version that puts anchors in.

@gigasquid
Copy link
Owner

First - thanks so much for your thoughtful feedback. I really like the direction.

I've provided answers to your questions in italics below:

General

Less jargon would look less scary to beginners (like JNI, interop, refactoring, release tag, native deps etc.).
Yes! This is the goal to make it beginner friendly. I'm up for any refactoring to make that better.

"native dependencies" is a bit abstract, maybe associate it with a concrete, tangible artifact like "MXNet core", if that's the correct way of looking at it?
That's a good way of looking at it. The other types of dependencies are the stuff that the Core depends on like OpenCV

Introduction

What are the "needed tools"?
I like the way you reworded this. How about more along the lines of "including low-level and high-level ways to create network layers that can do many things including image recognition, natural language task, and Generative Adversarial Networks (GANs)."

Which APIs are exposed in Clojure? Aren't the real low-level ones only available in C++, and the rest either "intermediate" or "high" level?
The lowest level ones mirror the ones available in the core. Those are NDArray and Symbol. The Scala API generates those with a Macro by looking at the C++ code. The high level api is the Module API. This is on a similar level to what you would see with Keras.

GANs and natural language support are not in the same category as the aspects (tools, building blocks) mentioned before them -- they're rather constructs one can build out of the available blocks. I'd find it more compelling to be more concrete and mention that the Module API is supported and the Gluon API is work in progress (with links to both so users can check). Then advanced applications and network structures can be mentioned as use cases.
Agree. Gluon hasn't been tackled yet but is on the board.

What does "interop" mean?
Interop here means the Clojure api was built using the underlying Java functions that have Scala data structures, so interop is needed to convert back and forth to clojure.

I totally don't get the part with the JNI bindings. Why are we suddenly talking about refactoring? Is this part important? I think this part is better suited for a technical intro for developers.

Agree. It is more of holdover from the rationale of going the development direction the package did

Current State and Plans

The text doesn't say anything about the current state or about about plans 😄 .
As a potential user, I'd like to know:
Is the package stable, beta, alpha, somewhat usable, barely working or totally broken?
How long can I expect it to take until I hit the first crippling bug?
I think readers would like something more concrete and not have to click on a link to get concrete answers.
Good point. Important things to mention here is that Clojure package has been brought into package as contrib. It can graduate out of contrib after some period of stabilization and feedback from the users. That being said, I would argue that it is stable and production ready due to it's wrapping of the existing mature Scala package. However, there may be bugs, especial in the "interop" that could be encountered. These items are generally fixed very quickly by opening an issue in the project. Another advantage is that the Clojure package is now part of an Apache project who has longevity and compatibility (no breaking changes) as part of its core values.

Why not write that the best way to get involved is to install the package, run the examples, play around etc., and get back to the devs in case something isn't working as expected. And then would follow a description of how to best get involved (a brief explanation of the Slack channel, the mailing list, and the GH issue tracker). Probably it's a good idea to have a separate section "Getting Involved" for this purpose.
Agree - that sounds great. I started a version for "Need Help" in the updated PR but I also like specifically calling out Getting Involved.

Getting Started

C++ rather than C?
Yes. Honestly, that area is definitely not my expertise :)

OS specificity: does it matter only for the core library or also for the language bindings? Do we have to be that specific here?
The generated jars are for different for OSX vs Linux and cpu vs gpu, so it matters on all levels

Three ways to get started: I think I understood the second item after reading it 3 times. Is a release tag a Git release tag? What is in source form, what comes as a jar?
Yes the release tag is a git release tag. This is important because the scala/clojure jars are expecting the core in a certain way. If there are new functions added or something changes, there will be a mismatch. The building form source artifacts are as follows: The core build produces a lib/libmxnet.so. The scala build produces the scala jar built from the libmxnet.so (and is packaged inside it too). The clojure jar depends on the scala jar.

I think the important cases to cover here are

Everything prebuilt
Only Clojure package from source
Everything from source
And all cases should be annotated with the expected difficulty (as they already are).

Yes. I tried to call that out in the latest version of the README.

People can probably interpolate additional options.

How about CUDA instructions on Ubuntu? We only have them for Arch.
I included the CUDA instructions on Arch because the main documentation site didn't have any Arch instructions but they do have Ubuntu https://mxnet.incubator.apache.org/install/ubuntu_setup.html. (Eventually we should see about adding Clojure install on that site too)

Is it really necessary to match the version of the Scala package with the Git tag or is it fine to just go with Git master?
Sometimes it is fine (if there are no underlying changes in the core) and sometimes it gives a user a confusing error. I don't know how to best word that without saying "Danger!" :)

@kohr-h
Copy link
Author

kohr-h commented Oct 23, 2018

What are the "needed tools"?
I like the way you reworded this. How about more along the lines of "including low-level and high-level ways to create network layers that can do many things including image recognition, natural language task, and Generative Adversarial Networks (GANs)."

👍 That sounds good! Perhaps just write natural language processing, which may be slightly narrower than you like, but a technical term people will recognize.

"native dependencies" is a bit abstract, maybe associate it with a concrete, tangible artifact like "MXNet core", if that's the correct way of looking at it?
That's a good way of looking at it. The other types of dependencies are the stuff that the Core depends on like OpenCV

OS specificity: does it matter only for the core library or also for the language bindings? Do we have to be that specific here?
The generated jars are for different for OSX vs Linux and cpu vs gpu, so it matters on all levels

So is it correct that the native MXNet code sits in the core package, and that both the Scala and Clojure packages don't deal with native code at all? And the only reason they get "tainted" is by building a jar that contains the native core? Or does the Scala package produce some native binaries as well? Sorry if I'm digging into technicalities -- that's not really my intent, I'd just like to understand the overall construct correctly.

Which APIs are exposed in Clojure? Aren't the real low-level ones only available in C++, and the rest either "intermediate" or "high" level?
The lowest level ones mirror the ones available in the core. Those are NDArray and Symbol. The Scala API generates those with a Macro by looking at the C++ code. The high level api is the Module API. This is on a similar level to what you would see with Keras.

Ah okay, that makes sense. I thought the symbolic stuff was part of the Module API, but that's not the case.
The way you just explained it to me, this piece of information would definitely be valuable to newcomers. So maybe you'd like to weave some of that into the part that mentions "low level" and "high level" APIs?

What does "interop" mean?
Interop here means the Clojure api was built using the underlying Java functions that have Scala data structures, so interop is needed to convert back and forth to clojure.

Ok. In my mind, this goes onto the same "technical details" pile as the JNI bindings. On a high level, Clojure talks to Scala talks to C++ (or native code), which is all that matters in the beginning (I think).

Three ways to get started: I think I understood the second item after reading it 3 times. Is a release tag a Git release tag? What is in source form, what comes as a jar?
Yes the release tag is a git release tag. This is important because the scala/clojure jars are expecting the core in a certain way. If there are new functions added or something changes, there will be a mismatch. The building form source artifacts are as follows: The core build produces a lib/libmxnet.so. The scala build produces the scala jar built from the libmxnet.so (and is packaged inside it too). The clojure jar depends on the scala jar.

Is it really necessary to match the version of the Scala package with the Git tag or is it fine to just go with Git master?
Sometimes it is fine (if there are no underlying changes in the core) and sometimes it gives a user a confusing error. I don't know how to best word that without saying "Danger!" :)

Hm, if it's like this, how much flexibility does one really have with Git revisions on the various levels? To remove a bit of FUD on the user side, I think it would be important to give some guidelines as to which combinations of versions/revisions are "safe" to use.
For instance, if I install the Scala package from jar in version 1.3.0, can I use the Clojure package's Git master revision or do I have to expect breakage? When do I have to expect breakage? How will it break (roughly)? Is it only the core that introduces breakage, or is it possible that the Clojure package master revision relies on features in the Scala package master revision, such that it's broken if the Scala package remains at the latest release?

This is similarly important for the third option where everything is built from source. Is there anything I can do to make sure that I build a working combination? Maybe there's a CI log that will show the last working combination?

How about CUDA instructions on Ubuntu? We only have them for Arch.
I included the CUDA instructions on Arch because the main documentation site didn't have any Arch instructions but they do have Ubuntu https://mxnet.incubator.apache.org/install/ubuntu_setup.html. (Eventually we should see about adding Clojure install on that site too)

Alright, I guess it makes sense not to duplicate that information. Do we have that link somewhere, together with the names of the relevant sections? I vaguely remember something, but I'm not sure.


I'll try to get to the rest of the README and the merge conflicts asap.

@gigasquid
Copy link
Owner

That sounds good! Perhaps just write natural language processing, which may be slightly narrower than you like, but a technical term people will recognize.

I'm fine with that 👍

So is it correct that the native MXNet code sits in the core package, and that both the Scala and Clojure packages don't deal with native code at all? And the only reason they get "tainted" is by building a jar that contains the native core? Or does the Scala package produce some native binaries as well? Sorry if I'm digging into technicalities -- that's not really my intent, I'd just like to understand the overall construct correctly.

The Scala package uses the JNI bindings and Macros to create the API. For example this macro would be used to create an NDArray function that would be eventually be wrapped by the clojure package

The Scala build process to actually build the jars is system specific and packages it up. The Clojure package just requires one of these jars as its dependency in project.clj

Ah okay, that makes sense. I thought the symbolic stuff was part of the Module API, but that's not the case.

The Module API does use the symbolic api. It just builds on it.

The way you just explained it to me, this piece of information would definitely be valuable to newcomers. So maybe you'd like to weave some of that into the part that mentions "low level" and "high level" APIs?

Sure :)

Hm, if it's like this, how much flexibility does one really have with Git revisions on the various levels? To remove a bit of FUD on the user side, I think it would be important to give some guidelines as to which combinations of versions/revisions are "safe" to use.
For instance, if I install the Scala package from jar in version 1.3.0, can I use the Clojure package's Git master revision or do I have to expect breakage? When do I have to expect breakage? How will it break (roughly)? Is it only the core that introduces breakage, or is it possible that the Clojure package master revision relies on features in the Scala package master revision, such that it's broken if the Scala package remains at the latest release?

If you use a 1.3.0 jar on master it will break because there was an underlying operator change. This may not always be the case. We could say, if you run into errors then there may have been a change in the core that is not compatible and try to check out the tag....

The Clojure package always autobuilds the interface based on the Scala jars it consumes. It is possible that there is a Clojure change that is not compatible. For example, in master, someone added a Clojure function that depended on a new operator and the published jars did not have that yet.

This is similarly important for the third option where everything is built from source. Is there anything I can do to make sure that I build a working combination? Maybe there's a CI log that will show the last working combination?

Running lein test will provide you with an assurance that everything is working. CI also runs on master for the Clojure package. http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-12927/1/pipeline/1009

Alright, I guess it makes sense not to duplicate that information. Do we have that link somewhere, together with the names of the relevant sections? I vaguely remember something, but I'm not sure.

I'm not sure I know what you are referring to ....


Thanks again for helping collaborate with me on making this better!

@kohr-h
Copy link
Author

kohr-h commented Oct 30, 2018

Done some more work on the 3rd of the options. I also tried to incorporate some of the discussion above. There are still some rough edges, though.
And I haven't really incorporated the upstream modifications from the reviews. Not sure if it makes sense everywhere, but I'll check what the changes are and how they would apply to this PR.

@kohr-h
Copy link
Author

kohr-h commented Oct 30, 2018

Oh, and I haven't actually tested the instructions yet 😉

@gigasquid
Copy link
Owner

Great. Thanks again for all your work! 💯 We can make any refinements in the main PR.

@gigasquid gigasquid merged commit 86f88f8 into gigasquid:improve-clojure-readme Oct 30, 2018
@kohr-h kohr-h deleted the clj-readme branch October 30, 2018 18:02
gigasquid pushed a commit that referenced this pull request Apr 12, 2019
…e#10792)

* Kvstore strkey (#2)

* support string type for kvstore key in cpp-package

* make lines short

* fix build

* add kvstore testcase

* no rand() use

* fix cpplint sanity check

* support string type for kvstore key in cpp-package

* make lines short

* fix build

* print error log

* Update test_kvstore.cpp

* update

* add gpu unittest

* check gpu count

* fix sanity check
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants