WIP: README suggestions #2

kohr-h · 2018-10-22T22:37:46Z

Here are some suggestions, WIP, currently reached the option 3 header.

Remarks/questions:

General

Less jargon would look less scary to beginners (like JNI, interop, refactoring, release tag, native deps etc.).
"native dependencies" is a bit abstract, maybe associate it with a concrete, tangible artifact like "MXNet core", if that's the correct way of looking at it?

Introduction

What are the "needed tools"?
Which APIs are exposed in Clojure? Aren't the real low-level ones only available in C++, and the rest either "intermediate" or "high" level?
GANs and natural language support are not in the same category as the aspects (tools, building blocks) mentioned before them -- they're rather constructs one can build out of the available blocks. I'd find it more compelling to be more concrete and mention that the Module API is supported and the Gluon API is work in progress (with links to both so users can check). Then advanced applications and network structures can be mentioned as use cases.
What does "interop" mean?
I totally don't get the part with the JNI bindings. Why are we suddenly talking about refactoring? Is this part important? I think this part is better suited for a technical intro for developers.

Current State and Plans

The text doesn't say anything about the current state or about about plans 😄 .
As a potential user, I'd like to know:
- Is the package stable, beta, alpha, somewhat usable, barely working or totally broken?
- How long can I expect it to take until I hit the first crippling bug?
I think readers would like something more concrete and not have to click on a link to get concrete answers.
Why not write that the best way to get involved is to install the package, run the examples, play around etc., and get back to the devs in case something isn't working as expected. And then would follow a description of how to best get involved (a brief explanation of the Slack channel, the mailing list, and the GH issue tracker). Probably it's a good idea to have a separate section "Getting Involved" for this purpose.

Getting Started

C++ rather than C?
OS specificity: does it matter only for the core library or also for the language bindings? Do we have to be that specific here?
Three ways to get started: I think I understood the second item after reading it 3 times. Is a release tag a Git release tag? What is in source form, what comes as a jar?
I think the important cases to cover here are
1. Everything prebuilt
2. Only Clojure package from source
3. Everything from source
And all cases should be annotated with the expected difficulty (as they already are).

People can probably interpolate additional options.
How about CUDA instructions on Ubuntu? We only have them for Arch.
Is it really necessary to match the version of the Scala package with the Git tag or is it fine to just go with Git master?

kohr-h · 2018-10-22T22:38:55Z

contrib/clojure-package/README.md

-* Checkout the MXNet project from a release tag using the Scala jars with native deps. This is also a pretty easy way to get started.
-* Build from the MXNet project master. This option can be used to build the whole project yourself.
+1. Install [prebuilt Clojure jars](https://search.maven.org/search?q=clojure%20mxnet) with the native dependencies baked in. This the quickest way to get going.
+2. Install the Clojure package from source, but use prebuilt jars for the native dependencies. Choose this option if you want pre-release features of the Clojure package but don't want to build (compile native dependencies yourself.


Suggested change

2. Install the Clojure package from source, but use prebuilt jars for the native dependencies. Choose this option if you want pre-release features of the Clojure package but don't want to build (compile native dependencies yourself.

2. Install the Clojure package from source, but use prebuilt jars for the native dependencies. Choose this option if you want pre-release features of the Clojure package but don't want to build (compile) native dependencies yourself.

That's great. I think there will be some merge conflicts with the current version that puts anchors in.

gigasquid · 2018-10-23T21:33:50Z

First - thanks so much for your thoughtful feedback. I really like the direction.

I've provided answers to your questions in italics below:

General

Less jargon would look less scary to beginners (like JNI, interop, refactoring, release tag, native deps etc.).
Yes! This is the goal to make it beginner friendly. I'm up for any refactoring to make that better.

"native dependencies" is a bit abstract, maybe associate it with a concrete, tangible artifact like "MXNet core", if that's the correct way of looking at it?
That's a good way of looking at it. The other types of dependencies are the stuff that the Core depends on like OpenCV

Introduction

What are the "needed tools"?
I like the way you reworded this. How about more along the lines of "including low-level and high-level ways to create network layers that can do many things including image recognition, natural language task, and Generative Adversarial Networks (GANs)."

Which APIs are exposed in Clojure? Aren't the real low-level ones only available in C++, and the rest either "intermediate" or "high" level?
The lowest level ones mirror the ones available in the core. Those are NDArray and Symbol. The Scala API generates those with a Macro by looking at the C++ code. The high level api is the Module API. This is on a similar level to what you would see with Keras.

GANs and natural language support are not in the same category as the aspects (tools, building blocks) mentioned before them -- they're rather constructs one can build out of the available blocks. I'd find it more compelling to be more concrete and mention that the Module API is supported and the Gluon API is work in progress (with links to both so users can check). Then advanced applications and network structures can be mentioned as use cases.
Agree. Gluon hasn't been tackled yet but is on the board.

What does "interop" mean?
Interop here means the Clojure api was built using the underlying Java functions that have Scala data structures, so interop is needed to convert back and forth to clojure.

I totally don't get the part with the JNI bindings. Why are we suddenly talking about refactoring? Is this part important? I think this part is better suited for a technical intro for developers.

Agree. It is more of holdover from the rationale of going the development direction the package did

Current State and Plans

The text doesn't say anything about the current state or about about plans 😄 .
As a potential user, I'd like to know:
Is the package stable, beta, alpha, somewhat usable, barely working or totally broken?
How long can I expect it to take until I hit the first crippling bug?
I think readers would like something more concrete and not have to click on a link to get concrete answers.
Good point. Important things to mention here is that Clojure package has been brought into package as contrib. It can graduate out of contrib after some period of stabilization and feedback from the users. That being said, I would argue that it is stable and production ready due to it's wrapping of the existing mature Scala package. However, there may be bugs, especial in the "interop" that could be encountered. These items are generally fixed very quickly by opening an issue in the project. Another advantage is that the Clojure package is now part of an Apache project who has longevity and compatibility (no breaking changes) as part of its core values.

Why not write that the best way to get involved is to install the package, run the examples, play around etc., and get back to the devs in case something isn't working as expected. And then would follow a description of how to best get involved (a brief explanation of the Slack channel, the mailing list, and the GH issue tracker). Probably it's a good idea to have a separate section "Getting Involved" for this purpose.
Agree - that sounds great. I started a version for "Need Help" in the updated PR but I also like specifically calling out Getting Involved.

Getting Started

C++ rather than C?
Yes. Honestly, that area is definitely not my expertise :)

OS specificity: does it matter only for the core library or also for the language bindings? Do we have to be that specific here?
The generated jars are for different for OSX vs Linux and cpu vs gpu, so it matters on all levels

Three ways to get started: I think I understood the second item after reading it 3 times. Is a release tag a Git release tag? What is in source form, what comes as a jar?
Yes the release tag is a git release tag. This is important because the scala/clojure jars are expecting the core in a certain way. If there are new functions added or something changes, there will be a mismatch. The building form source artifacts are as follows: The core build produces a lib/libmxnet.so. The scala build produces the scala jar built from the libmxnet.so (and is packaged inside it too). The clojure jar depends on the scala jar.

I think the important cases to cover here are

Everything prebuilt
Only Clojure package from source
Everything from source
And all cases should be annotated with the expected difficulty (as they already are).

Yes. I tried to call that out in the latest version of the README.

People can probably interpolate additional options.

How about CUDA instructions on Ubuntu? We only have them for Arch.
I included the CUDA instructions on Arch because the main documentation site didn't have any Arch instructions but they do have Ubuntu https://mxnet.incubator.apache.org/install/ubuntu_setup.html. (Eventually we should see about adding Clojure install on that site too)

Is it really necessary to match the version of the Scala package with the Git tag or is it fine to just go with Git master?
Sometimes it is fine (if there are no underlying changes in the core) and sometimes it gives a user a confusing error. I don't know how to best word that without saying "Danger!" :)

kohr-h · 2018-10-23T22:34:27Z

What are the "needed tools"?
I like the way you reworded this. How about more along the lines of "including low-level and high-level ways to create network layers that can do many things including image recognition, natural language task, and Generative Adversarial Networks (GANs)."

👍 That sounds good! Perhaps just write natural language processing, which may be slightly narrower than you like, but a technical term people will recognize.

"native dependencies" is a bit abstract, maybe associate it with a concrete, tangible artifact like "MXNet core", if that's the correct way of looking at it?
That's a good way of looking at it. The other types of dependencies are the stuff that the Core depends on like OpenCV

OS specificity: does it matter only for the core library or also for the language bindings? Do we have to be that specific here?
The generated jars are for different for OSX vs Linux and cpu vs gpu, so it matters on all levels

So is it correct that the native MXNet code sits in the core package, and that both the Scala and Clojure packages don't deal with native code at all? And the only reason they get "tainted" is by building a jar that contains the native core? Or does the Scala package produce some native binaries as well? Sorry if I'm digging into technicalities -- that's not really my intent, I'd just like to understand the overall construct correctly.

Which APIs are exposed in Clojure? Aren't the real low-level ones only available in C++, and the rest either "intermediate" or "high" level?
The lowest level ones mirror the ones available in the core. Those are NDArray and Symbol. The Scala API generates those with a Macro by looking at the C++ code. The high level api is the Module API. This is on a similar level to what you would see with Keras.

Ah okay, that makes sense. I thought the symbolic stuff was part of the Module API, but that's not the case.
The way you just explained it to me, this piece of information would definitely be valuable to newcomers. So maybe you'd like to weave some of that into the part that mentions "low level" and "high level" APIs?

What does "interop" mean?
Interop here means the Clojure api was built using the underlying Java functions that have Scala data structures, so interop is needed to convert back and forth to clojure.

Ok. In my mind, this goes onto the same "technical details" pile as the JNI bindings. On a high level, Clojure talks to Scala talks to C++ (or native code), which is all that matters in the beginning (I think).

Three ways to get started: I think I understood the second item after reading it 3 times. Is a release tag a Git release tag? What is in source form, what comes as a jar?
Yes the release tag is a git release tag. This is important because the scala/clojure jars are expecting the core in a certain way. If there are new functions added or something changes, there will be a mismatch. The building form source artifacts are as follows: The core build produces a lib/libmxnet.so. The scala build produces the scala jar built from the libmxnet.so (and is packaged inside it too). The clojure jar depends on the scala jar.

Is it really necessary to match the version of the Scala package with the Git tag or is it fine to just go with Git master?
Sometimes it is fine (if there are no underlying changes in the core) and sometimes it gives a user a confusing error. I don't know how to best word that without saying "Danger!" :)

Hm, if it's like this, how much flexibility does one really have with Git revisions on the various levels? To remove a bit of FUD on the user side, I think it would be important to give some guidelines as to which combinations of versions/revisions are "safe" to use.
For instance, if I install the Scala package from jar in version 1.3.0, can I use the Clojure package's Git master revision or do I have to expect breakage? When do I have to expect breakage? How will it break (roughly)? Is it only the core that introduces breakage, or is it possible that the Clojure package master revision relies on features in the Scala package master revision, such that it's broken if the Scala package remains at the latest release?

This is similarly important for the third option where everything is built from source. Is there anything I can do to make sure that I build a working combination? Maybe there's a CI log that will show the last working combination?

How about CUDA instructions on Ubuntu? We only have them for Arch.
I included the CUDA instructions on Arch because the main documentation site didn't have any Arch instructions but they do have Ubuntu https://mxnet.incubator.apache.org/install/ubuntu_setup.html. (Eventually we should see about adding Clojure install on that site too)

Alright, I guess it makes sense not to duplicate that information. Do we have that link somewhere, together with the names of the relevant sections? I vaguely remember something, but I'm not sure.

I'll try to get to the rest of the README and the merge conflicts asap.

gigasquid · 2018-10-24T00:01:26Z

That sounds good! Perhaps just write natural language processing, which may be slightly narrower than you like, but a technical term people will recognize.

I'm fine with that 👍

So is it correct that the native MXNet code sits in the core package, and that both the Scala and Clojure packages don't deal with native code at all? And the only reason they get "tainted" is by building a jar that contains the native core? Or does the Scala package produce some native binaries as well? Sorry if I'm digging into technicalities -- that's not really my intent, I'd just like to understand the overall construct correctly.

The Scala package uses the JNI bindings and Macros to create the API. For example this macro would be used to create an NDArray function that would be eventually be wrapped by the clojure package

The Scala build process to actually build the jars is system specific and packages it up. The Clojure package just requires one of these jars as its dependency in project.clj

Ah okay, that makes sense. I thought the symbolic stuff was part of the Module API, but that's not the case.

The Module API does use the symbolic api. It just builds on it.

The way you just explained it to me, this piece of information would definitely be valuable to newcomers. So maybe you'd like to weave some of that into the part that mentions "low level" and "high level" APIs?

Sure :)

Hm, if it's like this, how much flexibility does one really have with Git revisions on the various levels? To remove a bit of FUD on the user side, I think it would be important to give some guidelines as to which combinations of versions/revisions are "safe" to use.
For instance, if I install the Scala package from jar in version 1.3.0, can I use the Clojure package's Git master revision or do I have to expect breakage? When do I have to expect breakage? How will it break (roughly)? Is it only the core that introduces breakage, or is it possible that the Clojure package master revision relies on features in the Scala package master revision, such that it's broken if the Scala package remains at the latest release?

If you use a 1.3.0 jar on master it will break because there was an underlying operator change. This may not always be the case. We could say, if you run into errors then there may have been a change in the core that is not compatible and try to check out the tag....

The Clojure package always autobuilds the interface based on the Scala jars it consumes. It is possible that there is a Clojure change that is not compatible. For example, in master, someone added a Clojure function that depended on a new operator and the published jars did not have that yet.

This is similarly important for the third option where everything is built from source. Is there anything I can do to make sure that I build a working combination? Maybe there's a CI log that will show the last working combination?

Running lein test will provide you with an assurance that everything is working. CI also runs on master for the Clojure package. http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-12927/1/pipeline/1009

Alright, I guess it makes sense not to duplicate that information. Do we have that link somewhere, together with the names of the relevant sections? I vaguely remember something, but I'm not sure.

I'm not sure I know what you are referring to ....

Thanks again for helping collaborate with me on making this better!

kohr-h · 2018-10-30T00:06:46Z

Done some more work on the 3rd of the options. I also tried to incorporate some of the discussion above. There are still some rough edges, though.
And I haven't really incorporated the upstream modifications from the reviews. Not sure if it makes sense everywhere, but I'll check what the changes are and how they would apply to this PR.

kohr-h · 2018-10-30T00:11:35Z

Oh, and I haven't actually tested the instructions yet 😉

gigasquid · 2018-10-30T00:15:27Z

Great. Thanks again for all your work! 💯 We can make any refinements in the main PR.

…e#10792) * Kvstore strkey (#2) * support string type for kvstore key in cpp-package * make lines short * fix build * add kvstore testcase * no rand() use * fix cpplint sanity check * support string type for kvstore key in cpp-package * make lines short * fix build * print error log * Update test_kvstore.cpp * update * add gpu unittest * check gpu count * fix sanity check

kohr-h requested a review from gigasquid as a code owner October 22, 2018 22:37

kohr-h commented Oct 22, 2018

View reviewed changes

gigasquid mentioned this pull request Oct 26, 2018

Improve the Clojure Package README to Make it Easier to Get Started apache/mxnet#12881

Merged

2 tasks

kohr-h added 2 commits October 30, 2018 01:02

WIP: update readme

56ef5ce

WIP: readme option 3

7dd2205

gigasquid merged commit 86f88f8 into gigasquid:improve-clojure-readme Oct 30, 2018

kohr-h deleted the clj-readme branch October 30, 2018 18:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: README suggestions #2

WIP: README suggestions #2

kohr-h commented Oct 22, 2018

kohr-h Oct 22, 2018

gigasquid Oct 23, 2018

gigasquid commented Oct 23, 2018

kohr-h commented Oct 23, 2018

gigasquid commented Oct 24, 2018

kohr-h commented Oct 30, 2018

kohr-h commented Oct 30, 2018

gigasquid commented Oct 30, 2018

	2. Install the Clojure package from source, but use prebuilt jars for the native dependencies. Choose this option if you want pre-release features of the Clojure package but don't want to build (compile native dependencies yourself.
	2. Install the Clojure package from source, but use prebuilt jars for the native dependencies. Choose this option if you want pre-release features of the Clojure package but don't want to build (compile) native dependencies yourself.

WIP: README suggestions #2

WIP: README suggestions #2

Conversation

kohr-h commented Oct 22, 2018

General

Introduction

Current State and Plans

Getting Started

kohr-h Oct 22, 2018

Choose a reason for hiding this comment

gigasquid Oct 23, 2018

Choose a reason for hiding this comment

gigasquid commented Oct 23, 2018

General

Introduction

Current State and Plans

Getting Started

kohr-h commented Oct 23, 2018

gigasquid commented Oct 24, 2018

kohr-h commented Oct 30, 2018

kohr-h commented Oct 30, 2018

gigasquid commented Oct 30, 2018