From 9a35cca53590a058e09d80699955dfea9bcd7cdb Mon Sep 17 00:00:00 2001
From: Ziyi Mu <ziyi.mu@columbia.edu>
Date: Tue, 7 Jan 2020 19:50:00 +0000
Subject: [PATCH 1/9] add tutorial readme

---
 example/extensions/lib_custom_op/README.md | 69 ++++++++++++++++++++++
 1 file changed, 69 insertions(+)
 create mode 100644 example/extensions/lib_custom_op/README.md

diff --git a/example/extensions/lib_custom_op/README.md b/example/extensions/lib_custom_op/README.md
new file mode 100644
index 000000000000..25ffd350f8c3
--- /dev/null
+++ b/example/extensions/lib_custom_op/README.md
@@ -0,0 +1,69 @@
+CustomOp Example and Tutorial
+====
+
+## Getting Started
+
+## Have MXNet Ready:
+
+First you should install MXNet either from compiling from source code or download from nightly build. It doesn’t matter if the build comes with CUDA or MKLDNN. The custom operator doesn’t intervene with the execution of other native MXNet operators.
+
+## Run An Example:
+
+You can start getting familiar with custom operator by running some examples we provide in the *example/extensions/lib_custom_op* directory. There are 2 examples: a simple 2D gemm operator, a subgraph operator, and a Makefile.
+
+Let’s start with gemm operator. Go to that directory and follow the steps:
+
+1. run *make gemm_lib*, the Makefile will generate a dynamic library libgemm_lib.so compiled from gemm_lib.cc. This is the library you are going to load that contains everything of the custom gemm operator.
+2. run *python test_gemm.py*, and it’ll first load the above .so library, find operators,  register them in the MXNet backend, and print "Found x operators"; then invoke the operator like a regular MXNet operator and print the result.
+
+## Basic Files For GEMM Library:
+
+* lib_custom_op/gemm_lib.cc: This file has source code implementation of all required components of a custom operator, as well as the registration of the custom operator.
+
+* lib_custom_op/Makefile: Compile source code to a dynamic shared library, with a header file include/mxnet/lib_api.h from MXNet source code. Currently the custom operator is compatible with C++11 onwards.
+
+* lib_custom_op/test_gemm.py: This file calls mx.library.load(‘libgemm_lib.so’) to load custom operator, invoke the operator using both ndarray and symbol API, and print outputs of forward and backward pass. The outputs should be the same as the regular MXNet gemm operator.
+
+## Writing Custom Operators:
+
+## Regular Custom Operator:
+
+There are several basic building blocks for making a (stateless) custom operator:
+
+* parseAttrs - Attributes Parser: This function specifies number of input and output tensors for the custom operator. 
+
+* inferType - Type Inference: This function specifies how custom operator infers output data types using input data types
+
+* inferShape - Shape Inference: This function specifies how custom operator infers output tensor shape using input shape
+
+* forward - Forward function: This function specifies the computation of forward pass of the operator
+
+* REGISTER_OP(my_op_name) Macro: This macro registers custom operator to all MXNet APIs by its name, and you need to call setters to bind the above functions to the registered operator.
+
+Also there are some operational functions you can specify:
+
+* backward - Backward Gradient function: This function specifies the computation of backward pass of the operator
+
+* mutateInputs - Mutate Input Mark: This function allows you to mark some inputs to be mutate inputs, useful when using aux parameters for BatchNorm-like operators
+
+Let’s take a closer look at those registry functions:
+
+* parseAttrs: This function takes 3 parameters. 1st parameter is an input, which is the attributes passed all the way from Python code. When user calls mx.nd.my_op_name(s,t,keyword=1), the keyword is passed to the attributes as an entry of the map. 2nd & 3rd parameters are outputs, and you need to assign num_in/num_out values to those placeholders.  If the number of input and output tensors are fixed, you can use hard-coded numbers. Otherwise you can get the keyword value to determine the num_in and num_out.
+
+* inferType: This function takes 3 parameters. 1st parameter is the attributes. 2nd parameter is the a list of input data type enum corresponding to the data types of input tensors. 3rd parameter is the placeholder for output tensor data types you need to assign. For example, if this operator has 1 input and 1 output and data type doesn’t change, then you can do outtypes[0] = intypes[0]; to populate the data type.
+
+* inferShape: This function is similar to inferType function, except it is used for populating the output data shapes. You need to figure out the shapes of each output tensors for this computation.
+
+* forward: This function is doing the main forward computation. It also takes 3 parameters. 1st parameter is the attributes. 2nd parameter is the a list of input MXTensors which stores all data and info of input ndarrays. 3rd parameter is the output MXTensors. You need to do the forward computing given the input tensors and data types, and write the result back to the output tensor data pointer. Additionally you can use dltensor tensor structor stored in MXTensor as a more standardized data structure for computing.
+
+* backward: This function is doing the backward gradient computation. It will be similar to forward function. And you need to  figure out the formula of backward.
+
+* mutateInputs: This function is for marking mutate inputs. It takes 2 parameters. 1st parameter is the attributes. 2nd parameter is a list of  indices of mutate inputs among all input tensors. It is useful when some inputs are auxiliary model parameters and might be altered during forward/backward computation. Remember the index number of input_indices should not exceed the number of inputs.
+
+## Stateful Custom Operator:
+
+Stateful operator is useful when a forward/backward call needs some data or ‘state’ from the previous forward/backward call. Idiomatically we create a class and make instance variables store the state used for computing or caching.
+
+Most of the building blocks for making stateful custom operator is the same as regular custom operator, except it’ll register *createOpState* instead of forward for the computation.
+
+* createOpState: This function takes 2 parameters. 1st parameter is attributes. 2nd parameter is a placeholder for  CustomStatefulOp object. You must define a class that inherits CustomStatefulOp and override the forward function. Then you need to create an instance and assign it to the placeholder, in this way all the forward/backward calls will use the same methods in that instance and the instance is able to keep the state.

From 2d733af8fe1d12e06c7b31e58d8484ce9da8c56e Mon Sep 17 00:00:00 2001
From: Ziyi Mu <ziyi.mu@columbia.edu>
Date: Thu, 9 Jan 2020 21:49:48 +0000
Subject: [PATCH 2/9] refine doc

---
 example/extensions/lib_custom_op/README.md | 66 +++++++++++++---------
 1 file changed, 40 insertions(+), 26 deletions(-)

diff --git a/example/extensions/lib_custom_op/README.md b/example/extensions/lib_custom_op/README.md
index 25ffd350f8c3..d792e9f1e5b0 100644
--- a/example/extensions/lib_custom_op/README.md
+++ b/example/extensions/lib_custom_op/README.md
@@ -5,24 +5,22 @@ CustomOp Example and Tutorial
 
 ## Have MXNet Ready:
 
-First you should install MXNet either from compiling from source code or download from nightly build. It doesn’t matter if the build comes with CUDA or MKLDNN. The custom operator doesn’t intervene with the execution of other native MXNet operators.
+First you should install MXNet either from compiling from source code or download from nightly build. It doesn’t matter if the build comes with CUDA or MKLDNN. The custom operator doesn’t interact with the execution of other native MXNet operators.
 
 ## Run An Example:
 
-You can start getting familiar with custom operator by running some examples we provide in the *example/extensions/lib_custom_op* directory. There are 2 examples: a simple 2D gemm operator, a subgraph operator, and a Makefile.
+You can start getting familiar with custom operator by running some examples we provide in the **example/extensions/lib_custom_op** directory. Let’s start with gemm (Generalized Matrix Multiplication) operator, a common linear algebra operator. Go to that directory and follow the steps:
 
-Let’s start with gemm operator. Go to that directory and follow the steps:
-
-1. run *make gemm_lib*, the Makefile will generate a dynamic library libgemm_lib.so compiled from gemm_lib.cc. This is the library you are going to load that contains everything of the custom gemm operator.
-2. run *python test_gemm.py*, and it’ll first load the above .so library, find operators,  register them in the MXNet backend, and print "Found x operators"; then invoke the operator like a regular MXNet operator and print the result.
+1. run `make gemm_lib`, the Makefile will generate a dynamic library **libgemm_lib.so** compiled from gemm_lib.cc. This is the library you are going to load that contains everything of the custom gemm operator.
+2. run `python test_gemm.py`, and it’ll first load the above .so library, find operators,  register them in the MXNet backend, print "Found x operators"; then invoke the operator like a regular MXNet operator and output the result.
 
 ## Basic Files For GEMM Library:
 
-* lib_custom_op/gemm_lib.cc: This file has source code implementation of all required components of a custom operator, as well as the registration of the custom operator.
+* **lib_custom_op/gemm_lib.cc**: This file has source code implementation of all required components of a custom operator, as well as the registration of the custom operator.
 
-* lib_custom_op/Makefile: Compile source code to a dynamic shared library, with a header file include/mxnet/lib_api.h from MXNet source code. Currently the custom operator is compatible with C++11 onwards.
+* **lib_custom_op/Makefile**: Compile source code to a dynamic shared library, with a header file **include/mxnet/lib_api.h** from MXNet source code. Currently the custom operator is compatible with C++11 onwards.
 
-* lib_custom_op/test_gemm.py: This file calls mx.library.load(‘libgemm_lib.so’) to load custom operator, invoke the operator using both ndarray and symbol API, and print outputs of forward and backward pass. The outputs should be the same as the regular MXNet gemm operator.
+* **lib_custom_op/test_gemm.py**: This file calls `mx.library.load(‘libgemm_lib.so’)` to load the library containing the custom operator, invoke the operator using both ndarray and symbol API, and print outputs of forward and backward pass. The outputs should be the same as the regular MXNet gemm operator.
 
 ## Writing Custom Operators:
 
@@ -30,40 +28,56 @@ Let’s start with gemm operator. Go to that directory and follow the steps:
 
 There are several basic building blocks for making a (stateless) custom operator:
 
-* parseAttrs - Attributes Parser: This function specifies number of input and output tensors for the custom operator. 
+* [parseAttrs](./gemm_lib.cc#L118) - Attribute Parser:
+    * `MXReturnValue parseAttrs(std::map<std::string, std::string> attrs, int* num_in, int* num_out)`
+    * This function specifies number of input and output tensors for the custom operator; also this is where a custom operator can validate the attributes (ie. options) specified by the user.
+
 
-* inferType - Type Inference: This function specifies how custom operator infers output data types using input data types
+* [inferType](./gemm_lib.cc#L124) - Type Inference:
+    * `MXReturnValue inferType(std::map<std::string, std::string> attrs, std::vector<int> &intypes, std::vector<int> &outtypes)`
+    * This function specifies how custom operator infers output data types using input data types.
 
-* inferShape - Shape Inference: This function specifies how custom operator infers output tensor shape using input shape
+* [inferShape](./gemm_lib.cc#L143) - Shape Inference:
+    * `MXReturnValue inferShape(std::map<std::string, std::string> attrs,  std::vector<std::vector<unsigned int>> &inshapes,  std::vector<std::vector<unsigned int>> &outshapes)`
+    * This function specifies how custom operator infers output tensor shape using input shape.
 
-* forward - Forward function: This function specifies the computation of forward pass of the operator
+* [forward](./gemm_lib.cc#L56) - Forward function:
+    * `MXReturnValue forward(std::map<std::string, std::string> attrs, std::vector<MXTensor> inputs, std::vector<MXTensor> outputs, OpResource res)`
+    * This function specifies the computation of forward pass of the operator.
 
-* REGISTER_OP(my_op_name) Macro: This macro registers custom operator to all MXNet APIs by its name, and you need to call setters to bind the above functions to the registered operator.
+* [REGISTER_OP(my_op_name) Macro](./gemm_lib.cc#L169):
+    * This macro registers custom operator to all MXNet APIs by its name, and you need to call setters to bind the above functions to the registered operator.
 
-Also there are some operational functions you can specify:
+Also there are some optional functions you can specify:
 
-* backward - Backward Gradient function: This function specifies the computation of backward pass of the operator
+* [backward](./gemm_lib.cc#L90) - Backward Gradient function:
+    * `MXReturnValue backward(std::map<std::string, std::string> attrs, std::vector<MXTensor> inputs, std::vector<MXTensor> outputs, OpResource res)`
+    * This function specifies the computation of backward pass of the operator.
 
-* mutateInputs - Mutate Input Mark: This function allows you to mark some inputs to be mutate inputs, useful when using aux parameters for BatchNorm-like operators
+* [mutateInputs](./gemm_lib.cc#L214) - Specify mutable input:
+    * `MXReturnValue mutateInputs(std::map<std::string, std::string> attrs, std::vector<int> &input_indices)`
+    * This function allows you to mark some inputs to be mutable inputs, useful when using aux parameters for BatchNorm-like operators.
 
 Let’s take a closer look at those registry functions:
 
-* parseAttrs: This function takes 3 parameters. 1st parameter is an input, which is the attributes passed all the way from Python code. When user calls mx.nd.my_op_name(s,t,keyword=1), the keyword is passed to the attributes as an entry of the map. 2nd & 3rd parameters are outputs, and you need to assign num_in/num_out values to those placeholders.  If the number of input and output tensors are fixed, you can use hard-coded numbers. Otherwise you can get the keyword value to determine the num_in and num_out.
+* **parseAttrs**: This function takes 3 arguments. 1st argument is an input, which is the attributes passed all the way from Python code. When user calls `mx.nd.my_op_name(s,t,keyword=1)`, the keyword is passed to the attributes as an entry of the map. 2nd & 3rd arguments are outputs, and you need to set number of inputs and outputs values to those placeholders.  If the number of input and output tensors are fixed, you can use hard-coded numbers. Otherwise you can get the user-specified attributes to determine the number of inputs and outputs.
 
-* inferType: This function takes 3 parameters. 1st parameter is the attributes. 2nd parameter is the a list of input data type enum corresponding to the data types of input tensors. 3rd parameter is the placeholder for output tensor data types you need to assign. For example, if this operator has 1 input and 1 output and data type doesn’t change, then you can do outtypes[0] = intypes[0]; to populate the data type.
+* **inferType**: This function takes 3 arguments. 1st argument is the attributes (same as above). 2nd argument is the a list of input data types corresponding to the input tensors. 3rd argument is the placeholder for output tensor data types you need to assign. For example, if this operator has 1 input and 1 output and data type doesn’t change, then you can do `outtypes[0] = intypes[0]` to populate the data type.
 
-* inferShape: This function is similar to inferType function, except it is used for populating the output data shapes. You need to figure out the shapes of each output tensors for this computation.
+* **inferShape**: This function is similar to inferType function, except it is used for populating the output data shapes. You need to figure out the shapes of each output tensors for this computation.
 
-* forward: This function is doing the main forward computation. It also takes 3 parameters. 1st parameter is the attributes. 2nd parameter is the a list of input MXTensors which stores all data and info of input ndarrays. 3rd parameter is the output MXTensors. You need to do the forward computing given the input tensors and data types, and write the result back to the output tensor data pointer. Additionally you can use dltensor tensor structor stored in MXTensor as a more standardized data structure for computing.
+* **forward**: This function executes the main forward computation. It also takes 4 arguments. 1st argument is the attributes. 2nd argument is the input MXTensors which stores all data and info of input ndarrays. 3rd argument is the output MXTensors. 4th argument is OpResource object for memory allocation and other utilities. Additionally you can use dltensor tensor structure stored in MXTensor as a more standardized data structure for computing.
 
-* backward: This function is doing the backward gradient computation. It will be similar to forward function. And you need to  figure out the formula of backward.
+* **backward**: This function is doing the backward gradient computation. It will be similar to forward function. And you need to  figure out the formula of backward.
 
-* mutateInputs: This function is for marking mutate inputs. It takes 2 parameters. 1st parameter is the attributes. 2nd parameter is a list of  indices of mutate inputs among all input tensors. It is useful when some inputs are auxiliary model parameters and might be altered during forward/backward computation. Remember the index number of input_indices should not exceed the number of inputs.
+* **mutateInputs**: This function is for marking mutable inputs. It takes 2 arguments. 1st argument is the attributes. 2nd argument is a list of input indices that are mutable among all input tensors. It is useful when some inputs are auxiliary model parameters and might be altered during forward/backward computation. Remember the index number of input_indices should not exceed the number of inputs.
 
 ## Stateful Custom Operator:
 
-Stateful operator is useful when a forward/backward call needs some data or ‘state’ from the previous forward/backward call. Idiomatically we create a class and make instance variables store the state used for computing or caching.
+Stateful operator is useful when a forward/backward call needs some data or ‘state’ from previous forward/backward calls. Normally we create a class and make instance variables store the states used for computing or caching.
 
-Most of the building blocks for making stateful custom operator is the same as regular custom operator, except it’ll register *createOpState* instead of forward for the computation.
+Most of the building blocks for making stateful custom operator is the same as regular custom operator, except it’ll register **createOpState** instead of forward function for the computation.
 
-* createOpState: This function takes 2 parameters. 1st parameter is attributes. 2nd parameter is a placeholder for  CustomStatefulOp object. You must define a class that inherits CustomStatefulOp and override the forward function. Then you need to create an instance and assign it to the placeholder, in this way all the forward/backward calls will use the same methods in that instance and the instance is able to keep the state.
+* [createOpState](./gemm_lib.cc#L204) - Create stateful operator instance:
+    * `MXReturnValue createOpState(std::map<std::string, std::string> attrs,CustomStatefulOp** op_inst)`
+    * This function takes 2 arguments. 1st argument is attributes. 2nd argument is a placeholder for CustomStatefulOp object. You must [define a class that inherits CustomStatefulOp](./gemm_lib.cc#L178) and override the forward function (optionally the backward function), then you need to create an instance of your class and assign it to the placeholder. In this way all the forward/backward calls will use the same methods in that instance, and the instance is able to keep the state of the operator.

From 3ece00b1acf3e5ca2bbf46d6eaf36ae900cd7666 Mon Sep 17 00:00:00 2001
From: rondogency <ziyi.mu@columbia.edu>
Date: Fri, 10 Jan 2020 00:05:21 -0800
Subject: [PATCH 3/9] improve doc format

---
 example/extensions/lib_custom_op/README.md | 59 +++++++++++++++++-----
 1 file changed, 47 insertions(+), 12 deletions(-)

diff --git a/example/extensions/lib_custom_op/README.md b/example/extensions/lib_custom_op/README.md
index d792e9f1e5b0..5a6547d498b1 100644
--- a/example/extensions/lib_custom_op/README.md
+++ b/example/extensions/lib_custom_op/README.md
@@ -3,18 +3,18 @@ CustomOp Example and Tutorial
 
 ## Getting Started
 
-## Have MXNet Ready:
+### Have MXNet Ready:
 
 First you should install MXNet either from compiling from source code or download from nightly build. It doesn’t matter if the build comes with CUDA or MKLDNN. The custom operator doesn’t interact with the execution of other native MXNet operators.
 
-## Run An Example:
+### Run An Example:
 
 You can start getting familiar with custom operator by running some examples we provide in the **example/extensions/lib_custom_op** directory. Let’s start with gemm (Generalized Matrix Multiplication) operator, a common linear algebra operator. Go to that directory and follow the steps:
 
 1. run `make gemm_lib`, the Makefile will generate a dynamic library **libgemm_lib.so** compiled from gemm_lib.cc. This is the library you are going to load that contains everything of the custom gemm operator.
 2. run `python test_gemm.py`, and it’ll first load the above .so library, find operators,  register them in the MXNet backend, print "Found x operators"; then invoke the operator like a regular MXNet operator and output the result.
 
-## Basic Files For GEMM Library:
+### Basic Files For Gemm Library:
 
 * **lib_custom_op/gemm_lib.cc**: This file has source code implementation of all required components of a custom operator, as well as the registration of the custom operator.
 
@@ -24,40 +24,72 @@ You can start getting familiar with custom operator by running some examples we
 
 ## Writing Custom Operators:
 
-## Regular Custom Operator:
+### Regular Custom Operator:
 
 There are several basic building blocks for making a (stateless) custom operator:
 
 * [parseAttrs](./gemm_lib.cc#L118) - Attribute Parser:
-    * `MXReturnValue parseAttrs(std::map<std::string, std::string> attrs, int* num_in, int* num_out)`
     * This function specifies number of input and output tensors for the custom operator; also this is where a custom operator can validate the attributes (ie. options) specified by the user.
 
+            MXReturnValue parseAttrs(
+                std::map<std::string,
+                std::string> attrs,
+                int* num_in,
+                int* num_out)
+
 
 * [inferType](./gemm_lib.cc#L124) - Type Inference:
-    * `MXReturnValue inferType(std::map<std::string, std::string> attrs, std::vector<int> &intypes, std::vector<int> &outtypes)`
     * This function specifies how custom operator infers output data types using input data types.
 
+            MXReturnValue inferType(
+                std::map<std::string, std::string> attrs,
+                std::vector<int> &intypes,
+                std::vector<int> &outtypes)
+
 * [inferShape](./gemm_lib.cc#L143) - Shape Inference:
-    * `MXReturnValue inferShape(std::map<std::string, std::string> attrs,  std::vector<std::vector<unsigned int>> &inshapes,  std::vector<std::vector<unsigned int>> &outshapes)`
     * This function specifies how custom operator infers output tensor shape using input shape.
 
+            MXReturnValue inferShape(
+                std::map<std::string, std::string> attrs,
+                std::vector<std::vector<unsigned int>> &inshapes,
+                std::vector<std::vector<unsigned int>> &outshapes)
+
 * [forward](./gemm_lib.cc#L56) - Forward function:
-    * `MXReturnValue forward(std::map<std::string, std::string> attrs, std::vector<MXTensor> inputs, std::vector<MXTensor> outputs, OpResource res)`
     * This function specifies the computation of forward pass of the operator.
 
+            MXReturnValue forward(
+                std::map<std::string, std::string> attrs,
+                std::vector<MXTensor> inputs,
+                std::vector<MXTensor> outputs,
+                OpResource res)
+
 * [REGISTER_OP(my_op_name) Macro](./gemm_lib.cc#L169):
     * This macro registers custom operator to all MXNet APIs by its name, and you need to call setters to bind the above functions to the registered operator.
 
+            REGISTER_OP(my_op_name)
+            .setForward(forward)
+            .setParseAttrs(parseAttrs)
+            .setInferType(inferType)
+            .setInferShape(inferShape);
+
 Also there are some optional functions you can specify:
 
 * [backward](./gemm_lib.cc#L90) - Backward Gradient function:
-    * `MXReturnValue backward(std::map<std::string, std::string> attrs, std::vector<MXTensor> inputs, std::vector<MXTensor> outputs, OpResource res)`
     * This function specifies the computation of backward pass of the operator.
 
+            MXReturnValue backward(
+                std::map<std::string, std::string> attrs,
+                std::vector<MXTensor> inputs,
+                std::vector<MXTensor> outputs,
+                OpResource res)
+
 * [mutateInputs](./gemm_lib.cc#L214) - Specify mutable input:
-    * `MXReturnValue mutateInputs(std::map<std::string, std::string> attrs, std::vector<int> &input_indices)`
     * This function allows you to mark some inputs to be mutable inputs, useful when using aux parameters for BatchNorm-like operators.
 
+            MXReturnValue mutateInputs(
+                std::map<std::string, std::string> attrs,
+                std::vector<int> &input_indices)
+
 Let’s take a closer look at those registry functions:
 
 * **parseAttrs**: This function takes 3 arguments. 1st argument is an input, which is the attributes passed all the way from Python code. When user calls `mx.nd.my_op_name(s,t,keyword=1)`, the keyword is passed to the attributes as an entry of the map. 2nd & 3rd arguments are outputs, and you need to set number of inputs and outputs values to those placeholders.  If the number of input and output tensors are fixed, you can use hard-coded numbers. Otherwise you can get the user-specified attributes to determine the number of inputs and outputs.
@@ -72,12 +104,15 @@ Let’s take a closer look at those registry functions:
 
 * **mutateInputs**: This function is for marking mutable inputs. It takes 2 arguments. 1st argument is the attributes. 2nd argument is a list of input indices that are mutable among all input tensors. It is useful when some inputs are auxiliary model parameters and might be altered during forward/backward computation. Remember the index number of input_indices should not exceed the number of inputs.
 
-## Stateful Custom Operator:
+### Stateful Custom Operator:
 
 Stateful operator is useful when a forward/backward call needs some data or ‘state’ from previous forward/backward calls. Normally we create a class and make instance variables store the states used for computing or caching.
 
 Most of the building blocks for making stateful custom operator is the same as regular custom operator, except it’ll register **createOpState** instead of forward function for the computation.
 
 * [createOpState](./gemm_lib.cc#L204) - Create stateful operator instance:
-    * `MXReturnValue createOpState(std::map<std::string, std::string> attrs,CustomStatefulOp** op_inst)`
     * This function takes 2 arguments. 1st argument is attributes. 2nd argument is a placeholder for CustomStatefulOp object. You must [define a class that inherits CustomStatefulOp](./gemm_lib.cc#L178) and override the forward function (optionally the backward function), then you need to create an instance of your class and assign it to the placeholder. In this way all the forward/backward calls will use the same methods in that instance, and the instance is able to keep the state of the operator.
+
+            MXReturnValue createOpState(
+                std::map<std::string, std::string> attrs,
+                CustomStatefulOp** op_inst)

From fbd0a37ac4418628c0a45b865f1afe906e81f2c0 Mon Sep 17 00:00:00 2001
From: rondogency <ziyi.mu@columbia.edu>
Date: Thu, 23 Jan 2020 11:28:17 -0800
Subject: [PATCH 4/9] resolve aaron comments

---
 example/extensions/lib_custom_op/README.md | 95 ++++++++++++++--------
 1 file changed, 62 insertions(+), 33 deletions(-)

diff --git a/example/extensions/lib_custom_op/README.md b/example/extensions/lib_custom_op/README.md
index 5a6547d498b1..69542f261c0e 100644
--- a/example/extensions/lib_custom_op/README.md
+++ b/example/extensions/lib_custom_op/README.md
@@ -1,34 +1,63 @@
 CustomOp Example and Tutorial
-====
+=============================
+
+## Introduction
+
+Adding new operators in MXNet requires understanding of MXNet backend operator registration and recompiling of MXNet with all its dependencies. Users can use the old Python custom operator to add new operators, but it is slow, complicated and has poor adoption rate. So our approach for adding custom operators is to enable dynamic loading of C++ custom operators compiled in external libraries at runtime.
+
+Custom operators (CustomOp) enable users to write new operators without compiling against all of MXNet header files and dependencies. When a library containing custom operators is loaded dynamically, the operators found in the library will be re-registered in MXNet so that users can call those operators natively just like other built-in operators.
 
 ## Getting Started
 
-### Have MXNet Ready:
+### Have MXNet Ready
 
 First you should install MXNet either from compiling from source code or download from nightly build. It doesn’t matter if the build comes with CUDA or MKLDNN. The custom operator doesn’t interact with the execution of other native MXNet operators.
 
 ### Run An Example:
 
-You can start getting familiar with custom operator by running some examples we provide in the **example/extensions/lib_custom_op** directory. Let’s start with gemm (Generalized Matrix Multiplication) operator, a common linear algebra operator. Go to that directory and follow the steps:
+You can start getting familiar with custom operators by running some examples provided in the **example/extensions/lib_custom_op** directory. Start with a common linear algebra operator like `gemm` (Generalized Matrix Multiplication). Go to `lib_custom_op` directory and follow these steps:
 
-1. run `make gemm_lib`, the Makefile will generate a dynamic library **libgemm_lib.so** compiled from gemm_lib.cc. This is the library you are going to load that contains everything of the custom gemm operator.
-2. run `python test_gemm.py`, and it’ll first load the above .so library, find operators,  register them in the MXNet backend, print "Found x operators"; then invoke the operator like a regular MXNet operator and output the result.
+1. Run `make gemm_lib`. The Makefile will generate a dynamic library **libgemm_lib.so** compiled from `gemm_lib.cc`. This is the library you are going to load that contains everything for the custom gemm operator.
+2. Run `python test_gemm.py`. It’ll first load the above .so library, find the operators, register them in the MXNet backend, print "Found x operators", then invoke the operator like a regular MXNet operator and output the result.
 
 ### Basic Files For Gemm Library:
 
-* **lib_custom_op/gemm_lib.cc**: This file has source code implementation of all required components of a custom operator, as well as the registration of the custom operator.
+* **lib_custom_op/gemm_lib.cc**: This file has a source code implementation of all required components of a custom operator, as well as the registration of the custom operator.
+
+* **lib_custom_op/Makefile**: Compile source code to a dynamic shared library, with a header file `include/mxnet/lib_api.h` from MXNet source code. Currently the custom operator is compatible with C++11 onwards.
+
+* **lib_custom_op/test_gemm.py**: This file calls `mx.library.load(‘libgemm_lib.so’)` to load the library containing the custom operator, invokes the operator using both NDArray and Symbol APIs, and prints outputs of the forward and backward passes. The outputs should be the same as the regular MXNet `gemm` operator.
+
+## Writing Custom Operator Library:
+
+For building a library containing your own custom operator, compose a C++ source file like `myop_lib.cc`, include `lib_api.h` header file, and write your custom operator implementation with those essential functions:
+- `initialize` - Library Initialization Function
+- `REGISTER_OP` - Operator Registration Marco
+- `parseAttrs` - Attribute Parser
+- `inferType` - Type Inference
+- `inferShape` - Shape Inference
+- `forward` - Forward Computation (can be replace with `createOpState`, see below for details)
+
+Then compile it to `libmyop_lib.so` dynamic library using the following command
+
+    g++ -shared -fPIC -std=c++11 myop_lib.cc -o libmyop_lib.so -I ../../../include/mxnet
+
+Finally you can write a python script to load the library and run your custom operator
 
-* **lib_custom_op/Makefile**: Compile source code to a dynamic shared library, with a header file **include/mxnet/lib_api.h** from MXNet source code. Currently the custom operator is compatible with C++11 onwards.
+    import mxnet as mx
+    mx.library.load(‘libmyop_lib.so’)
+    mx.nd.my_op(...)
 
-* **lib_custom_op/test_gemm.py**: This file calls `mx.library.load(‘libgemm_lib.so’)` to load the library containing the custom operator, invoke the operator using both ndarray and symbol API, and print outputs of forward and backward pass. The outputs should be the same as the regular MXNet gemm operator.
+### Writing Regular Custom Operator:
 
-## Writing Custom Operators:
+There are several essential building blocks for making a (stateless) custom operator:
 
-### Regular Custom Operator:
+* [initialize](./gemm_lib.cc#L227):
+    * This function is the library initialization function necessary for any dynamic libraries. It checks if you are using a compatible version of MXNet. Note that this `version` parameter is passed from MXNet when library is loaded.
 
-There are several basic building blocks for making a (stateless) custom operator:
+            MXReturnValue initialize(int version)
 
-* [parseAttrs](./gemm_lib.cc#L118) - Attribute Parser:
+* [parseAttrs](./gemm_lib.cc#L118):
     * This function specifies number of input and output tensors for the custom operator; also this is where a custom operator can validate the attributes (ie. options) specified by the user.
 
             MXReturnValue parseAttrs(
@@ -38,24 +67,24 @@ There are several basic building blocks for making a (stateless) custom operator
                 int* num_out)
 
 
-* [inferType](./gemm_lib.cc#L124) - Type Inference:
-    * This function specifies how custom operator infers output data types using input data types.
+* [inferType](./gemm_lib.cc#L124):
+    * This function specifies how the custom operator infers output data types using input data types.
 
             MXReturnValue inferType(
                 std::map<std::string, std::string> attrs,
                 std::vector<int> &intypes,
                 std::vector<int> &outtypes)
 
-* [inferShape](./gemm_lib.cc#L143) - Shape Inference:
-    * This function specifies how custom operator infers output tensor shape using input shape.
+* [inferShape](./gemm_lib.cc#L143):
+    * This function specifies how the custom operator infers output tensor shape using input shape.
 
             MXReturnValue inferShape(
                 std::map<std::string, std::string> attrs,
                 std::vector<std::vector<unsigned int>> &inshapes,
                 std::vector<std::vector<unsigned int>> &outshapes)
 
-* [forward](./gemm_lib.cc#L56) - Forward function:
-    * This function specifies the computation of forward pass of the operator.
+* [forward](./gemm_lib.cc#L56):
+    * This function specifies the computation of the forward pass of the operator.
 
             MXReturnValue forward(
                 std::map<std::string, std::string> attrs,
@@ -63,8 +92,8 @@ There are several basic building blocks for making a (stateless) custom operator
                 std::vector<MXTensor> outputs,
                 OpResource res)
 
-* [REGISTER_OP(my_op_name) Macro](./gemm_lib.cc#L169):
-    * This macro registers custom operator to all MXNet APIs by its name, and you need to call setters to bind the above functions to the registered operator.
+* [REGISTER_OP(my_op_name)](./gemm_lib.cc#L169):
+    * This macro registers the custom operator and its properties to MXNet NDArray and Symbol APIs by its name.
 
             REGISTER_OP(my_op_name)
             .setForward(forward)
@@ -74,8 +103,8 @@ There are several basic building blocks for making a (stateless) custom operator
 
 Also there are some optional functions you can specify:
 
-* [backward](./gemm_lib.cc#L90) - Backward Gradient function:
-    * This function specifies the computation of backward pass of the operator.
+* [backward](./gemm_lib.cc#L90) - Backward gradient function:
+    * This function specifies the computation of the backward pass of the operator.
 
             MXReturnValue backward(
                 std::map<std::string, std::string> attrs,
@@ -84,7 +113,7 @@ Also there are some optional functions you can specify:
                 OpResource res)
 
 * [mutateInputs](./gemm_lib.cc#L214) - Specify mutable input:
-    * This function allows you to mark some inputs to be mutable inputs, useful when using aux parameters for BatchNorm-like operators.
+    * This function allows you to mark some inputs to be mutable inputs. It is useful when using aux parameters for BatchNorm-like operators.
 
             MXReturnValue mutateInputs(
                 std::map<std::string, std::string> attrs,
@@ -92,26 +121,26 @@ Also there are some optional functions you can specify:
 
 Let’s take a closer look at those registry functions:
 
-* **parseAttrs**: This function takes 3 arguments. 1st argument is an input, which is the attributes passed all the way from Python code. When user calls `mx.nd.my_op_name(s,t,keyword=1)`, the keyword is passed to the attributes as an entry of the map. 2nd & 3rd arguments are outputs, and you need to set number of inputs and outputs values to those placeholders.  If the number of input and output tensors are fixed, you can use hard-coded numbers. Otherwise you can get the user-specified attributes to determine the number of inputs and outputs.
+* **parseAttrs**: This function takes three arguments. The 1st argument is an input, which is the attributes passed all the way from Python code. When user calls `mx.nd.my_op_name(s,t,keyword=1)`, the keyword is passed to the attributes as an entry of the map. The 2nd & 3rd arguments are outputs, and you need to set number of inputs and outputs values to those placeholders.  If the number of input and output tensors are fixed, you can use hard-coded numbers. Otherwise you can get the user-specified attributes to determine the number of inputs and outputs.
 
-* **inferType**: This function takes 3 arguments. 1st argument is the attributes (same as above). 2nd argument is the a list of input data types corresponding to the input tensors. 3rd argument is the placeholder for output tensor data types you need to assign. For example, if this operator has 1 input and 1 output and data type doesn’t change, then you can do `outtypes[0] = intypes[0]` to populate the data type.
+* **inferType**: This function takes three arguments. The 1st argument is the attributes (same as above). The 2nd argument is the a list of input data types corresponding to the input tensors. The 3rd argument is the placeholder for output tensor data types you need to assign. For example, if this operator has one input and one output, and data type doesn’t change, then you can do `outtypes[0] = intypes[0]` to populate the data type.
 
-* **inferShape**: This function is similar to inferType function, except it is used for populating the output data shapes. You need to figure out the shapes of each output tensors for this computation.
+* **inferShape**: This function is similar to the `inferType` function, except it is used for populating the output data shapes. You need to figure out the shapes of each output tensors for this computation. For example, if the inputs are images with shape (224,224,3) and you write a padding operator to make 10px borders for the images, then your output shape will be (234,234,3).
 
-* **forward**: This function executes the main forward computation. It also takes 4 arguments. 1st argument is the attributes. 2nd argument is the input MXTensors which stores all data and info of input ndarrays. 3rd argument is the output MXTensors. 4th argument is OpResource object for memory allocation and other utilities. Additionally you can use dltensor tensor structure stored in MXTensor as a more standardized data structure for computing.
+* **forward**: This function executes the main forward computation. It takes four arguments. The 1st argument is the attributes. The 2nd argument is the input `MXTensors` which stores all data and info of input ndarrays. The 3rd argument is the output `MXTensors`. The 4th argument is the `OpResource` object for memory allocation and other utilities. Additionally, you can use a `dltensor` tensor structure stored in the `MXTensor` as a more standardized data structure for computing.
 
-* **backward**: This function is doing the backward gradient computation. It will be similar to forward function. And you need to  figure out the formula of backward.
+* **backward**: This function is doing the backward gradient computation. It will be similar to the forward function. And you need to figure out the formula of the backward gradient computation.
 
-* **mutateInputs**: This function is for marking mutable inputs. It takes 2 arguments. 1st argument is the attributes. 2nd argument is a list of input indices that are mutable among all input tensors. It is useful when some inputs are auxiliary model parameters and might be altered during forward/backward computation. Remember the index number of input_indices should not exceed the number of inputs.
+* **mutateInputs**: This function is for marking mutable inputs. It takes two arguments. The 1st argument is the attributes. The 2nd argument is a list of input indices that are mutable among all input tensors. It is useful when some inputs are auxiliary model parameters and might be altered during forward/backward computation. Remember, the index number of `input_indices` should not exceed the number of inputs.
 
-### Stateful Custom Operator:
+### Writing Stateful Custom Operator:
 
-Stateful operator is useful when a forward/backward call needs some data or ‘state’ from previous forward/backward calls. Normally we create a class and make instance variables store the states used for computing or caching.
+A stateful custom operator is useful when a forward/backward call needs some data or ‘state’ from previous forward/backward calls. Normally we create a class, and make instance variables store the states used for computing or caching.
 
-Most of the building blocks for making stateful custom operator is the same as regular custom operator, except it’ll register **createOpState** instead of forward function for the computation.
+Most of the building blocks for making a stateful custom operator is the same as regular custom operator, except it’ll register `createOpState` instead of a `forward` function for the computation.
 
 * [createOpState](./gemm_lib.cc#L204) - Create stateful operator instance:
-    * This function takes 2 arguments. 1st argument is attributes. 2nd argument is a placeholder for CustomStatefulOp object. You must [define a class that inherits CustomStatefulOp](./gemm_lib.cc#L178) and override the forward function (optionally the backward function), then you need to create an instance of your class and assign it to the placeholder. In this way all the forward/backward calls will use the same methods in that instance, and the instance is able to keep the state of the operator.
+    * This function takes two arguments. The 1st argument is attributes. The 2nd argument is a placeholder for `CustomStatefulOp` object. You must [define a class that inherits CustomStatefulOp](./gemm_lib.cc#L178) and override the forward function (optionally the backward function). Then you need to create an instance of your class and assign it to the placeholder. In this way, all of the forward/backward calls will use the same methods in that instance, and the instance is able to keep the state of the operator.
 
             MXReturnValue createOpState(
                 std::map<std::string, std::string> attrs,

From cdcb098d17a7a1c525cbe790883d017e4597979a Mon Sep 17 00:00:00 2001
From: rondogency <ziyi.mu@columbia.edu>
Date: Thu, 23 Jan 2020 15:04:56 -0800
Subject: [PATCH 5/9] add license

---
 example/extensions/lib_custom_op/README.md | 39 ++++++++++++++++------
 1 file changed, 29 insertions(+), 10 deletions(-)

diff --git a/example/extensions/lib_custom_op/README.md b/example/extensions/lib_custom_op/README.md
index 69542f261c0e..cb5e535bcb9b 100644
--- a/example/extensions/lib_custom_op/README.md
+++ b/example/extensions/lib_custom_op/README.md
@@ -1,3 +1,20 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
 CustomOp Example and Tutorial
 =============================
 
@@ -30,7 +47,7 @@ You can start getting familiar with custom operators by running some examples pr
 
 ## Writing Custom Operator Library:
 
-For building a library containing your own custom operator, compose a C++ source file like `myop_lib.cc`, include `lib_api.h` header file, and write your custom operator implementation with those essential functions:
+For building a library containing your own custom operator, compose a C++ source file like `myop_lib.cc`, include `lib_api.h` header file, and write your custom operator implementation with these essential functions:
 - `initialize` - Library Initialization Function
 - `REGISTER_OP` - Operator Registration Marco
 - `parseAttrs` - Attribute Parser
@@ -38,15 +55,17 @@ For building a library containing your own custom operator, compose a C++ source
 - `inferShape` - Shape Inference
 - `forward` - Forward Computation (can be replace with `createOpState`, see below for details)
 
-Then compile it to `libmyop_lib.so` dynamic library using the following command
-
-    g++ -shared -fPIC -std=c++11 myop_lib.cc -o libmyop_lib.so -I ../../../include/mxnet
-
-Finally you can write a python script to load the library and run your custom operator
-
-    import mxnet as mx
-    mx.library.load(‘libmyop_lib.so’)
-    mx.nd.my_op(...)
+Then compile it to `libmyop_lib.so` dynamic library using the following command:
+```bash
+g++ -shared -fPIC -std=c++11 myop_lib.cc -o libmyop_lib.so -I ../../../include/mxnet
+```
+
+Finally, you can write a Python script to load the library and run your custom operator:
+```python
+import mxnet as mx
+mx.library.load(‘libmyop_lib.so’)
+mx.nd.my_op(...)
+```
 
 ### Writing Regular Custom Operator:
 

From 57dad0a34ef1213890009e0ff18c9063598f206f Mon Sep 17 00:00:00 2001
From: rondogency <ziyi.mu@columbia.edu>
Date: Fri, 24 Jan 2020 15:38:41 -0800
Subject: [PATCH 6/9] retrigger ci


From 8f7e6116cac56cbf81084ea7fc209539d0e867b4 Mon Sep 17 00:00:00 2001
From: rondogency <ziyi.mu@columbia.edu>
Date: Sat, 25 Jan 2020 20:04:26 -0800
Subject: [PATCH 7/9] retrigger ci


From 9f5afcd2ea78374d964003b7d59b0f9c66a53a85 Mon Sep 17 00:00:00 2001
From: rondogency <ziyi.mu@columbia.edu>
Date: Sun, 26 Jan 2020 17:53:12 -0800
Subject: [PATCH 8/9] retrigger ci


From f4f7f20df0ddd2e58798c54eb311148a2c1d66c1 Mon Sep 17 00:00:00 2001
From: rondogency <ziyi.mu@columbia.edu>
Date: Mon, 27 Jan 2020 10:42:02 -0800
Subject: [PATCH 9/9] retrigger ci