diff --git a/docs/python_docs/python/tutorials/getting-started/crash-course/1-ndarray.md b/docs/python_docs/python/tutorials/getting-started/crash-course/1-ndarray.md
index d825eccab944..453cc35264c1 100644
--- a/docs/python_docs/python/tutorials/getting-started/crash-course/1-ndarray.md
+++ b/docs/python_docs/python/tutorials/getting-started/crash-course/1-ndarray.md
@@ -15,107 +15,97 @@
 <!--- specific language governing permissions and limitations -->
 <!--- under the License. -->
 
-# Manipulate data with `ndarray`
+# Step 1: Manipulate data with NP on MXNet
 
-We'll start by introducing the `NDArray`, MXNet’s primary tool for storing and transforming data. If you’ve worked with `NumPy` before, you’ll notice that an NDArray is, by design, similar to NumPy’s multi-dimensional array.
+This getting started exercise introduces the `np` package, which is similar to Numpy. For more information, please see [Differences between NP on MXNet and NumPy](/api/python/docs/tutorials/getting-started/deepnumpy/deepnumpy-vs-numpy.html).
 
-## Get started
+## Import packages and create an array
 
-To get started, let's import the `ndarray` package (`nd` is a shorter alias) from MXNet.
 
-```{.python .input  n=1}
-# If you haven't installed MXNet yet, you can uncomment the following line to
-# install the latest stable release
-# !pip install -U mxnet
+To get started, run the following commands to import the `np` package together with the NumPy extensions package `npx`. Together, `np` with `npx` make up the NP on MXNet front end.
 
-from mxnet import nd
+```{.python .input  n=1}
+from mxnet import np, npx
+npx.set_np()  # Activate NumPy-like mode.
 ```
 
-Next, let's see how to create a 2D array (also called a matrix) with values from two sets of numbers: 1, 2, 3 and 4, 5, 6. This might also be referred to as a tuple of a tuple of integers.
+In this step, create a 2D array (also called a matrix). The following code example creates a matrix with values from two sets of numbers: 1, 2, 3 and 4, 5, 6. This might also be referred to as a tuple of a tuple of integers.
 
 ```{.python .input  n=2}
-nd.array(((1,2,3),(5,6,7)))
+np.array(((1,2,3),(5,6,7)))
 ```
 
-We can also create a very simple matrix with the same shape (2 rows by 3 columns), but fill it with 1s.
+You can also create a very simple matrix with the same shape (2 rows by 3 columns), but fill it with 1s.
 
 ```{.python .input  n=3}
-x = nd.ones((2,3))
+x = np.ones((2,3))
 x
 ```
 
-Often we’ll want to create arrays whose values are sampled randomly. For example, sampling values uniformly between -1 and 1. Here we create the same shape, but with random sampling.
+You can create arrays whose values are sampled randomly. For example, sampling values uniformly between -1 and 1. The following code example creates the same shape, but with random sampling.
 
 ```{.python .input  n=15}
-y = nd.random.uniform(-1,1,(2,3))
+y = np.random.uniform(-1,1, (2,3))
 y
 ```
 
-You can also fill an array of a given shape with a given value, such as `2.0`.
-<!-- added to improve multiplication example -->
-
-```{.python .input  n=16}
-x = nd.full((2,3), 2.0)
-x
-```
-
-As with NumPy, the dimensions of each NDArray are accessible by accessing the `.shape` attribute. We can also query its `size`, which is equal to the product of the components of the shape. In addition, `.dtype` tells the data type of the stored values.
+As with NumPy, the dimensions of each ndarray are shown by accessing the `.shape` attribute. As the following code example shows, you can also query for `size`, which is equal to the product of the components of the shape. In addition, `.dtype` tells the data type of the stored values.
 
 ```{.python .input  n=17}
 (x.shape, x.size, x.dtype)
 ```
 
-## Operations
+## Performing operations on an array
 
-NDArray supports a large number of standard mathematical operations, such as element-wise multiplication:
+An ndarray supports a large number of standard mathematical operations. Here are three examples. You can perform element-wise multiplication by using the following code example.
 
 ```{.python .input  n=18}
 x * y
 ```
 
-Exponentiation:
+You can perform exponentiation by using the following code example.
 
 ```{.python .input  n=23}
-y.exp()
+np.exp(y)
 ```
 
-And transposing a matrix to compute a proper matrix-matrix product:
+You can also find a matrix’s transpose to compute a proper matrix-matrix product by using the following code example.
 
 ```{.python .input  n=24}
-nd.dot(x, y.T)
+np.dot(x, y.T)
 ```
 
-## Indexing
+## Indexing an array
 
-MXNet NDArrays support slicing in all the ridiculous ways you might imagine accessing your data. Here’s an example of reading a particular element, which returns a 1D array with shape `(1,)`.
+The ndarrays support slicing in many ways you might want to access your data. The following code example shows how to read a particular element, which returns a 1D array with shape `(1,)`.
 
 ```{.python .input  n=25}
 y[1,2]
 ```
 
-Read the second and third columns from `y`.
+This example shows how to read the second and third columns from `y`.
 
 ```{.python .input  n=26}
 y[:,1:3]
 ```
 
-and write to a specific element.
+This example shows how to write to a specific element.
 
 ```{.python .input  n=27}
 y[:,1:3] = 2
 y
 ```
 
-Multi-dimensional slicing is also supported.
+You can perform multi-dimensional slicing, which is shown in the following code example.
 
 ```{.python .input  n=28}
 y[1:2,0:2] = 4
 y
 ```
 
-## Converting between MXNet NDArray and NumPy
+## Converting between MXNet ndarrays and NumPy ndarrays
 
-Converting MXNet NDArrays to and from NumPy is easy. The converted arrays do not share memory.
+You can convert MXNet ndarrays to and from NumPy ndarrays, as shown in the following example. The converted arrays do not share memory.
 
 ```{.python .input  n=29}
 a = x.asnumpy()
@@ -123,5 +113,9 @@ a = x.asnumpy()
 ```
 
 ```{.python .input  n=30}
-nd.array(a)
+np.array(a)
 ```
+
+## Next steps
+
+Learn how to construct a neural network with the Gluon module: [Step 2: Create a neural network](2-nn.md).
diff --git a/docs/python_docs/python/tutorials/getting-started/crash-course/2-nn.md b/docs/python_docs/python/tutorials/getting-started/crash-course/2-nn.md
index 88bb030ea502..f2ea348d636a 100644
--- a/docs/python_docs/python/tutorials/getting-started/crash-course/2-nn.md
+++ b/docs/python_docs/python/tutorials/getting-started/crash-course/2-nn.md
@@ -15,18 +15,21 @@
 <!--- specific language governing permissions and limitations -->
 <!--- under the License. -->
 
-# Create a neural network
+# Step 2: Create a neural network
 
-Now let's look how to create neural networks in Gluon. In addition the NDArray package (`nd`) that we just covered, we now will also import the neural network `nn` package from `gluon`.
+In this step, you learn how to use NP on MXNet to create neural networks in Gluon. In addition to the `np` package that you learned about in the previous step [Step 1: Manipulate data with NP on MXNet](1-ndarray.md), you also import the neural network `nn` package from `gluon`.
+
+Use the following commands to import the packages required for this step.
 
 ```{.python .input  n=2}
-from mxnet import nd
+from mxnet import np, npx
 from mxnet.gluon import nn
+npx.set_np()  # Change MXNet to the numpy-like mode.
 ```
 
 ## Create your neural network's first layer
 
-Let's start with a dense layer with 2 output units.
+Use the following code example to start with a dense layer with two output units.
 <!-- mention what the none and the linear parts mean? -->
 
 ```{.python .input  n=31}
@@ -34,20 +37,20 @@ layer = nn.Dense(2)
 layer
 ```
 
-Then initialize its weights with the default initialization method, which draws random values uniformly from $[-0.7, 0.7]$.
+Initialize its weights with the default initialization method, which draws random values uniformly from $[-0.7, 0.7]$. You can see this in the following example.
 
 ```{.python .input  n=32}
 layer.initialize()
 ```
 
-Then we do a forward pass with random data. We create a $(3,4)$ shape random input `x` and feed into the layer to compute the output.
+Do a forward pass with random data, shown in the following example. We create a $(3,4)$ shape random input `x` and feed into the layer to compute the output.
 
 ```{.python .input  n=34}
-x = nd.random.uniform(-1,1,(3,4))
+x = np.random.uniform(-1,1,(3,4))
 layer(x)
 ```
 
-As can be seen, the layer's input limit of 2 produced a $(3,2)$ shape output from our $(3,4)$ input. Note that we didn't specify the input size of `layer` before (though we can specify it with the argument `in_units=4` here), the system will automatically infer it during the first time we feed in data, create and initialize the weights. So we can access the weight after the first forward pass:
+As can be seen, the layer's input limit of two produced a $(3,2)$ shape output from our $(3,4)$ input. You didn't specify the input size of `layer` before, though you can specify it with the argument `in_units=4` here. The system  automatically infers it during the first time you feed in data, create, and initialize the weights. You can access the weight after the first forward pass, as shown in this example.
 
 ```{.python .input  n=35}
 layer.weight.data()
@@ -55,7 +58,7 @@ layer.weight.data()
 
 ## Chain layers into a neural network
 
-Let's first consider a simple case that a neural network is a chain of layers. During the forward pass, we run layers sequentially one-by-one. The following code implements a famous network called [LeNet](http://yann.lecun.com/exdb/lenet/) through `nn.Sequential`.
+Consider a simple case where a neural network is a chain of layers. During the forward pass, you run layers sequentially one-by-one. Use the following code to implement a famous network called [LeNet](http://yann.lecun.com/exdb/lenet/) through `nn.Sequential`.
 
 ```{.python .input}
 net = nn.Sequential()
@@ -80,18 +83,18 @@ net
 
 <!--Mention the tuple option for kernel and stride as an exercise for the reader? Or leave it out as too much info for now?-->
 
-The usage of `nn.Sequential` is similar to `nn.Dense`. In fact, both of them are subclasses of `nn.Block`. The following codes show how to initialize the weights and run the forward pass.
+Using `nn.Sequential` is similar to `nn.Dense`. In fact, both of them are subclasses of `nn.Block`. Use the following code to initialize the weights and run the forward pass.
 
 ```{.python .input}
 net.initialize()
 # Input shape is (batch_size, color_channels, height, width)
-x = nd.random.uniform(shape=(4,1,28,28))
+x = np.random.uniform(size=(4,1,28,28))
 y = net(x)
 y.shape
 ```
 
-We can use `[]` to index a particular layer. For example, the following
-accesses the 1st layer's weight and 6th layer's bias.
+You can use `[]` to index a particular layer. For example, the following
+accesses the first layer's weight and sixth layer's bias.
 
 ```{.python .input}
 (net[0].weight.data().shape, net[5].bias.data().shape)
@@ -100,9 +103,9 @@ accesses the 1st layer's weight and 6th layer's bias.
 ## Create a neural network flexibly
 
 In `nn.Sequential`, MXNet will automatically construct the forward function that sequentially executes added layers.
-Now let's introduce another way to construct a network with a flexible forward function.
+Here is another way to construct a network with a flexible forward function.
 
-To do it, we create a subclass of `nn.Block` and implement two methods:
+Create a subclass of `nn.Block` and implement two methods by using the following code.
 
 - `__init__` create the layers
 - `forward` define the forward function.
@@ -117,7 +120,7 @@ class MixMLP(nn.Block):
                      nn.Dense(4, activation='relu'))
         self.dense = nn.Dense(5)
     def forward(self, x):
-        y = nd.relu(self.blk(x))
+        y = npx.relu(self.blk(x))
         print(y)
         return self.dense(y)
 
@@ -125,18 +128,23 @@ net = MixMLP()
 net
 ```
 
-In the sequential chaining approach, we can only add instances with `nn.Block` as the base class and then run them in a forward pass. In this example, we used `print` to get the intermediate results and `nd.relu` to apply relu activation. So this approach provides a more flexible way to define the forward function.
+In the sequential chaining approach, you can only add instances with `nn.Block` as the base class and then run them in a forward pass. In this example, you used `print` to get the intermediate results and `nd.relu` to apply relu activation. This approach provides a more flexible way to define the forward function.
 
-The usage of `net` is similar as before.
+The following code example uses `net` in a similar manner as earlier.
 
 ```{.python .input}
 net.initialize()
-x = nd.random.uniform(shape=(2,2))
+x = np.random.uniform(size=(2,2))
 net(x)
 ```
 
-Finally, let's access a particular layer's weight
+Finally, access a particular layer's weight with this code.
 
 ```{.python .input  n=8}
 net.blk[1].weight.data()
 ```
+
+## Next steps
+
+After you create a neural network, learn how to automatically
+compute the gradients in [Step 3: Automatic differentiation with autograd](3-autograd.md).
diff --git a/docs/python_docs/python/tutorials/getting-started/crash-course/3-autograd.md b/docs/python_docs/python/tutorials/getting-started/crash-course/3-autograd.md
index b7cb9f4dee8b..b959b4deb4a6 100644
--- a/docs/python_docs/python/tutorials/getting-started/crash-course/3-autograd.md
+++ b/docs/python_docs/python/tutorials/getting-started/crash-course/3-autograd.md
@@ -15,49 +15,52 @@
 <!--- specific language governing permissions and limitations -->
 <!--- under the License. -->
 
-# Automatic differentiation with `autograd`
+# Step 3: Automatic differentiation with autograd
 
-We train models to get better and better as a function of experience. Usually, getting better means minimizing a loss function. To achieve this goal, we often iteratively compute the gradient of the loss with respect to weights and then update the weights accordingly. While the gradient calculations are straightforward through a chain rule, for complex models, working it out by hand can be a pain.
+In this step, you learn how to use the MXNet `autograd` package to perform gradient calculations by automatically calculating derivatives.
 
-Before diving deep into the model training, let's go through how MXNet’s `autograd` package expedites this work by automatically calculating derivatives.
+This is helpful because it will help you save time and effort. You train models to get better as a function of experience. Usually, getting better means minimizing a loss function. To achieve this goal, you often iteratively compute the gradient of the loss with respect to weights and then update the weights accordingly. Gradient calculations are straightforward through a chain rule. However, for complex models, working this out manually is challenging.
 
-## Basic usage
+The `autograd` package helps you by automatically calculating derivatives.
 
-Let's first import the `autograd` package.
+## Basic use
+
+To get started, import the `autograd` package as in the following code.
 
 ```{.python .input}
-from mxnet import nd
+from mxnet import np, npx
 from mxnet import autograd
+npx.set_np()
 ```
 
-As a toy example, let’s say that we are interested in differentiating a function $f(x) = 2 x^2$ with respect to parameter $x$. We can start by assigning an initial value of $x$.
+As an example, you could differentiate a function $f(x) = 2 x^2$ with respect to parameter $x$. You can start by assigning an initial value of $x$, as follows:
 
 ```{.python .input  n=3}
-x = nd.array([[1, 2], [3, 4]])
+x = np.array([[1, 2], [3, 4]])
 x
 ```
 
-Once we compute the gradient of $f(x)$ with respect to $x$, we’ll need a place to store it. In MXNet, we can tell an NDArray that we plan to store a gradient by invoking its `attach_grad` method.
+After you compute the gradient of $f(x)$ with respect to $x$, you need a place to store it. In MXNet, you can tell an ndarray that you plan to store a gradient by invoking its `attach_grad` method, shown in the following example.
 
 ```{.python .input  n=6}
 x.attach_grad()
 ```
 
-Now we’re going to define the function $y=f(x)$. To let MXNet store $y$, so that we can compute gradients later, we need to put the definition inside a `autograd.record()` scope.
+Next, define the function $y=f(x)$. To let MXNet store $y$, so that you can compute gradients later, use the following code to put the definition inside an `autograd.record()` scope. 
 
 ```{.python .input  n=7}
 with autograd.record():
     y = 2 * x * x
 ```
 
-Let’s invoke back propagation (backprop) by calling `y.backward()`. When $y$ has more than one entry, `y.backward()` is equivalent to `y.sum().backward()`.
+You can invoke back propagation (backprop) by calling `y.backward()`. When $y$ has more than one entry, `y.backward()` is equivalent to `y.sum().backward()`.
 <!-- I'm not sure what this second part really means. I don't have enough context. TMI?-->
 
 ```{.python .input  n=8}
 y.backward()
 ```
 
-Now, let’s see if this is the expected output. Note that $y=2x^2$ and $\frac{dy}{dx} = 4x$, which should be `[[4, 8],[12, 16]]`. Let's check the automatically computed results:
+Next, verify whether this is the expected output. Note that $y=2x^2$ and $\frac{dy}{dx} = 4x$, which should be `[[4, 8],[12, 16]]`. Check the automatically computed results.
 
 ```{.python .input  n=9}
 x.grad
@@ -65,35 +68,39 @@ x.grad
 
 ## Using Python control flows
 
-Sometimes we want to write dynamic programs where the execution depends on some real-time values. MXNet will record the execution trace and compute the gradient as well.
+Sometimes you want to write dynamic programs where the execution depends on real-time values. MXNet records the execution trace and computes the gradient as well.
 
-Consider the following function `f`: it doubles the inputs until it's `norm` reaches 1000. Then it selects one element depending on the sum of its elements.
+Consider the following function `f` in the following example code. The function doubles the inputs until its `norm` reaches 1000. Then it selects one element depending on the sum of its elements. 
 <!-- I wonder if there could be another less "mathy" demo of this -->
 
 ```{.python .input}
 def f(a):
     b = a * 2
-    while b.norm().asscalar() < 1000:
+    while np.abs(b).sum() < 1000:
         b = b * 2
-    if b.sum().asscalar() >= 0:
+    if b.sum() >= 0:
         c = b[0]
     else:
         c = b[1]
     return c
 ```
 
-We record the trace and feed in a random value:
+In this example, you record the trace and feed in a random value.
 
 ```{.python .input}
-a = nd.random.uniform(shape=2)
+a = np.random.uniform(size=2)
 a.attach_grad()
 with autograd.record():
     c = f(a)
 c.backward()
 ```
 
-We know that `b` is a linear function of `a`, and `c` is chosen from `b`. Then the gradient with respect to `a` be will be either `[c/a[0], 0]` or `[0, c/a[1]]`, depending on which element from `b` we picked. Let's find the results:
+You can see that `b` is a linear function of `a`, and `c` is chosen from `b`. The gradient with respect to `a` be will be either `[c/a[0], 0]` or `[0, c/a[1]]`, depending on which element from `b` is picked. You see the results of this example with this code:
 
 ```{.python .input}
-[a.grad, c/a]
+a.grad == c/a
 ```
+
+## Next Steps
+
+After you have used `autograd`, learn about training a neural network. See [Step 4: Train the neural network](4-train.md).
diff --git a/docs/python_docs/python/tutorials/getting-started/crash-course/4-train.md b/docs/python_docs/python/tutorials/getting-started/crash-course/4-train.md
index 3a7f0d090bdf..ec3a07e52057 100644
--- a/docs/python_docs/python/tutorials/getting-started/crash-course/4-train.md
+++ b/docs/python_docs/python/tutorials/getting-started/crash-course/4-train.md
@@ -15,37 +15,37 @@
 <!--- specific language governing permissions and limitations -->
 <!--- under the License. -->
 
-# Train the neural network
+# Step 4: Train the neural network
 
-In this section, we will discuss how to train the previously defined network with data. We first import the libraries. The new ones are `mxnet.init` for more weight initialization methods, the `datasets` and `transforms` to load and transform computer vision datasets, `matplotlib` for drawing, and `time` for benchmarking.
+In this step, you learn how to train the previously defined network with data. First, import the libraries. The new ones are `mxnet.init` for more weight initialization methods. Import the `datasets` and `transforms` to load and transform computer vision datasets. Import  `matplotlib` for drawing, and `time` for benchmarking. The example command here shows this.
 
 ```{.python .input  n=1}
 # Uncomment the following line if matplotlib is not installed.
 # !pip install matplotlib
 
-from mxnet import nd, gluon, init, autograd
+from mxnet import np, npx, gluon, init, autograd
 from mxnet.gluon import nn
-from mxnet.gluon.data.vision import datasets, transforms
 from IPython import display
 import matplotlib.pyplot as plt
 import time
+npx.set_np()
 ```
 
 ## Get data
 
-The handwritten digit MNIST dataset is one of the most commonly used datasets in deep learning. But it is too simple to get a 99% accuracy. Here we use a similar but slightly more complicated dataset called FashionMNIST. The goal is no longer to classify numbers, but clothing types instead.
+The handwritten digit, MNIST dataset is one of the most commonly used datasets in deep learning. However, it's too simple to get 99 percent accuracy. For this tutorial, you use a similar but slightly more complicated dataset called FashionMNIST. The end-goal is to classify clothing types.
 
 The dataset can be automatically downloaded through Gluon's `data.vision.datasets` module. The following code downloads the training dataset and shows the first example.
 
 ```{.python .input  n=2}
-mnist_train = datasets.FashionMNIST(train=True)
+mnist_train = gluon.data.vision.datasets.FashionMNIST(train=True)
 X, y = mnist_train[0]
 ('X shape: ', X.shape, 'X dtype', X.dtype, 'y:', y)
 ```
 
-Each example in this dataset is a $28\times 28$ size grey image, which is presented as NDArray with the shape format of `(height, width, channel)`.  The label is a `numpy` scalar.
+Each example in this dataset is a $28\times 28$ size grey image, which is presented as ndarray with the shape format of `(height, width, channel)`.  The label is a `numpy` scalar.
 
-Next, we visualize the first ten examples.
+Next, visualize the first six examples.
 
 ```{.python .input  n=3}
 text_labels = ['t-shirt', 'trouser', 'pullover', 'dress', 'coat',
@@ -54,7 +54,7 @@ X, y = mnist_train[0:10]
 # plot images
 display.set_matplotlib_formats('svg')
 _, figs = plt.subplots(1, X.shape[0], figsize=(15, 15))
-for f,x,yi in zip(figs, X,y):
+for f, x, yi in zip(figs, X, y):
     # 3D->2D by removing the last channel dim
     f.imshow(x.reshape((28,28)).asnumpy())
     ax = f.axes
@@ -65,16 +65,16 @@ for f,x,yi in zip(figs, X,y):
 plt.show()
 ```
 
-In order to feed data into a Gluon model, we need to convert the images to the `(channel, height, width)` format with a floating point data type. It can be done by `transforms.ToTensor`. In addition, we normalize all pixel values with `transforms.Normalize` with the real mean 0.13 and standard deviation 0.31. We chain these two transforms together and apply it to the first element of the data pair, namely the images.
+In order to feed data into a Gluon model, convert the images to the `(channel, height, width)` format with a floating point data type. It can be done by `transforms.ToTensor`. In addition, normalize all pixel values with `transforms.Normalize` with the real mean 0.13 and standard deviation 0.31. You can chain these two transforms together and apply it to the first element of the data pair, namely the images.
 
 ```{.python .input  n=4}
-transformer = transforms.Compose([
-    transforms.ToTensor(),
-    transforms.Normalize(0.13, 0.31)])
+transformer = gluon.data.vision.transforms.Compose([
+    gluon.data.vision.transforms.ToTensor(),
+    gluon.data.vision.transforms.Normalize(0.13, 0.31)])
 mnist_train = mnist_train.transform_first(transformer)
 ```
 
-`FashionMNIST` is a subclass of `gluon.data.Dataset`, which defines how to get the `i`-th example. In order to use it in training, we need to get a (randomized) batch of examples. It can be easily done by `gluon.data.DataLoader`. Here we use four works to process data in parallel, which is often necessary especially for complex data transforms.
+`FashionMNIST` is a subclass of `gluon.data.Dataset`, which defines how to get the `i`-th example. In order to use it in training, you need to get a (randomized) batch of examples. Do this by using `gluon.data.DataLoader`. The example here uses four works to process data in parallel, which is often necessary especially for complex data transforms.
 
 ```{.python .input  n=5}
 batch_size = 256
@@ -90,18 +90,18 @@ for data, label in train_data:
     break
 ```
 
-Finally, we create a validation dataset and data loader.
+Finally, create a validation dataset and data loader.
 
 ```{.python .input  n=7}
 mnist_valid = gluon.data.vision.FashionMNIST(train=False)
 valid_data = gluon.data.DataLoader(
     mnist_valid.transform_first(transformer),
-	batch_size=batch_size, num_workers=4)
+    batch_size=batch_size, num_workers=4)
 ```
 
 ## Define the model
 
-We reimplement the same LeNet introduced before. One difference here is that we changed the weight initialization method to `Xavier`, which is a popular choice for deep convolutional neural networks.
+Implement the network called [LeNet](http://yann.lecun.com/exdb/lenet/). One difference here is that you change the weight initialization method to `Xavier`, which is a popular choice for deep convolutional neural networks.
 
 ```{.python .input  n=8}
 net = nn.Sequential()
@@ -109,40 +109,38 @@ net.add(nn.Conv2D(channels=6, kernel_size=5, activation='relu'),
         nn.MaxPool2D(pool_size=2, strides=2),
         nn.Conv2D(channels=16, kernel_size=3, activation='relu'),
         nn.MaxPool2D(pool_size=2, strides=2),
-        nn.Flatten(),
         nn.Dense(120, activation="relu"),
         nn.Dense(84, activation="relu"),
         nn.Dense(10))
 net.initialize(init=init.Xavier())
 ```
 
-Besides the neural network, we need to define the loss function and optimization method for training. We will use standard softmax cross entropy loss for classification problems. It first performs softmax on the output to obtain the predicted probability, and then compares the label with the cross entropy.
+In addition to the neural network, define the loss function and optimization method for training. Use standard softmax cross entropy loss for classification problems. It first performs softmax on the output to obtain the predicted probability, and then compares the label with the cross entropy.
 
 ```{.python .input  n=9}
 softmax_cross_entropy = gluon.loss.SoftmaxCrossEntropyLoss()
 ```
 
-The optimization method we pick is the standard stochastic gradient descent with constant learning rate of 0.1.
+The optimization method you pick is the standard stochastic gradient descent with constant learning rate of 0.1.
 
 ```{.python .input  n=10}
 trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.1})
 ```
 
-The `trainer` is created with all parameters (both weights and gradients) in `net`. Later on, we only need to call the `step` method to update its weights.
+The `trainer` is created with all parameters (both weights and gradients) in `net`. Later on, you only need to call the `step` method to update its weights.
 
-## Train
+## Train the model
 
-We create an auxiliary function to calculate the model accuracy.
+Create an auxiliary function to calculate the model accuracy. 
 
 ```{.python .input  n=11}
 def acc(output, label):
     # output: (batch, num_output) float32 ndarray
     # label: (batch, ) int32 ndarray
-    return (output.argmax(axis=1) ==
-            label.astype('float32')).mean().asscalar()
+    return (output.argmax(axis=1) == label.astype('float32')).mean()
 ```
 
-Now we can implement the complete training loop.
+Implement the complete training loop.
 
 ```{.python .input  n=12}
 for epoch in range(10):
@@ -157,7 +155,7 @@ for epoch in range(10):
         # update parameters
         trainer.step(batch_size)
         # calculate training metrics
-        train_loss += loss.mean().asscalar()
+        train_loss += loss.mean()
         train_acc += acc(output, label)
     # calculate validation accuracy
     for data, label in valid_data:
@@ -169,8 +167,12 @@ for epoch in range(10):
 
 ## Save the model
 
-Finally, we save the trained parameters onto disk, so that we can use them later.
+Finally, save the trained parameters onto disk, so that you can use them later.
 
 ```{.python .input  n=13}
 net.save_parameters('net.params')
 ```
+
+## Next Steps
+
+After the model is trained and saved, learn how to use it to predict new examples: [Step 5: Predict with a pretrained model](5-predict.md).
diff --git a/docs/python_docs/python/tutorials/getting-started/crash-course/5-predict.md b/docs/python_docs/python/tutorials/getting-started/crash-course/5-predict.md
index 5141ae39acb4..a0b948a8d67c 100644
--- a/docs/python_docs/python/tutorials/getting-started/crash-course/5-predict.md
+++ b/docs/python_docs/python/tutorials/getting-started/crash-course/5-predict.md
@@ -15,24 +15,23 @@
 <!--- specific language governing permissions and limitations -->
 <!--- under the License. -->
 
-# Predict with a pre-trained model
+# Step 5: Predict with a pretrained model
 
-A saved model can be used in multiple places, such as to continue training, to fine tune the model, and for prediction. In this tutorial we will discuss how to predict new examples using a pretrained model.
+In this step, you learn how to predict new examples using a pretrained model. A saved model can be used in multiple places, such as to continue training, to fine tune the model, and for prediction.
 
 ## Prerequisites
 
-Please run the [previous tutorial](4-train.html) to train the network and save its parameters to file. You will need this file to run the following steps.
+Before you begin the procedures here, run :label:`crash_course_train` to train the network and save its parameters to file. You use this file to run the following steps.
 
 ```{.python .input  n=1}
-from mxnet import nd
-from mxnet import gluon
+from mxnet import np, npx, gluon, image
 from mxnet.gluon import nn
-from mxnet.gluon.data.vision import datasets, transforms
 from IPython import display
 import matplotlib.pyplot as plt
+npx.set_np()
 ```
 
-To start, we will copy a simple model's definition.
+To start, copy a simple model's definition by using the following code.
 
 ```{.python .input  n=2}
 net = nn.Sequential()
@@ -40,13 +39,12 @@ net.add(nn.Conv2D(channels=6, kernel_size=5, activation='relu'),
         nn.MaxPool2D(pool_size=2, strides=2),
         nn.Conv2D(channels=16, kernel_size=3, activation='relu'),
         nn.MaxPool2D(pool_size=2, strides=2),
-        nn.Flatten(),
         nn.Dense(120, activation="relu"),
         nn.Dense(84, activation="relu"),
         nn.Dense(10))
 ```
 
-In the last section, we saved all parameters into a file, now let's load it back.
+In the previous step, you saved all parameters to a file. Now load it back.
 
 ```{.python .input  n=3}
 net.load_parameters('net.params')
@@ -54,37 +52,37 @@ net.load_parameters('net.params')
 
 ## Predict
 
-Remember the data transformation we did for training? Now we need the same transformation for predicting.
+Remember the data transformation you did for the training step? The following code provides the same transformation for predicting.
 
 ```{.python .input  n=4}
-transformer = transforms.Compose([
-    transforms.ToTensor(),
-    transforms.Normalize(0.13, 0.31)])
+transformer = gluon.data.vision.transforms.Compose([
+    gluon.data.vision.transforms.ToTensor(),
+    gluon.data.vision.transforms.Normalize(0.13, 0.31)])
 ```
 
-Now let's try to predict the first ten images in the validation dataset and store the predictions into `preds`.
+Use the following code to predict the first six images in the validation dataset and store the predictions into `preds`.
 
 ```{.python .input  n=5}
-mnist_valid = datasets.FashionMNIST(train=False)
+mnist_valid = gluon.data.vision.datasets.FashionMNIST(train=False)
 X, y = mnist_valid[:10]
 preds = []
 for x in X:
-    x = transformer(x).expand_dims(axis=0)
+    x = np.expand_dims(transformer(x), axis=0)
     pred = net(x).argmax(axis=1)
-    preds.append(pred.astype('int32').asscalar())
+    preds.append(int(pred))
 ```
 
-Finally, we visualize the images and compare the prediction with the ground truth.
+Finally, use the following code to visualize the images and compare the prediction with the ground truth.
 
 ```{.python .input  n=15}
 _, figs = plt.subplots(1, 10, figsize=(15, 15))
 text_labels = ['t-shirt', 'trouser', 'pullover', 'dress', 'coat',
                'sandal', 'shirt', 'sneaker', 'bag', 'ankle boot']
 display.set_matplotlib_formats('svg')
-for f,x,yi,pyi in zip(figs, X, y, preds):
+for f, x, yi, pyi in zip(figs, X, y, preds):
     f.imshow(x.reshape((28,28)).asnumpy())
     ax = f.axes
-    ax.set_title(text_labels[yi]+'\n'+text_labels[pyi])
+    ax.set_title(text_labels[int(yi)]+'\n'+text_labels[pyi])
     ax.title.set_fontsize(14)
     ax.get_xaxis().set_visible(False)
     ax.get_yaxis().set_visible(False)
@@ -94,39 +92,35 @@ plt.show()
 ## Predict with models from Gluon model zoo
 
 
-The LeNet trained on FashionMNIST is a good example to start with, but too simple to predict real-life pictures. Instead of training large-scale model from scratch, [Gluon model zoo](https://mxnet.apache.org/api/python/gluon/model_zoo.html) provides multiple pre-trained powerful models. For example, we can download and load a pre-trained ResNet-50 V2 model that was trained on the ImageNet dataset.
+The LeNet, trained on FashionMNIST, is a good example to start with. However, it's too simple to predict real-life pictures. In order to save the time and effort of training a large-scale model from scratch, the [Gluon model zoo](https://mxnet.incubator.apache.org/api/python/gluon/model_zoo.html) provides multiple pre-trained models. For example, with the following code example, you can download a pre-trained ResNet-50 V2 model that was trained on the ImageNet dataset.
 
 ```{.python .input  n=7}
-from mxnet.gluon.model_zoo import vision as models
-from mxnet.gluon.utils import download
-from mxnet import image
-
-net = models.resnet50_v2(pretrained=True)
+net = gluon.model_zoo.vision.resnet50_v2(pretrained=True)
 ```
 
-We also download and load the text labels for each class.
+You'll also need to download the text labels for each class, as in the following example.
 
 ```{.python .input  n=8}
 url = 'http://data.mxnet.io/models/imagenet/synset.txt'
-fname = download(url)
+fname = gluon.utils.download(url)
 with open(fname, 'r') as f:
     text_labels = [' '.join(l.split()[1:]) for l in f]
 ```
 
-We randomly pick a dog image from Wikipedia as a test image, download and read it.
+The following example shows how to select a dog image from Wikipedia as a test, download and read it.
 
 ```{.python .input  n=9}
 url = 'https://upload.wikimedia.org/wikipedia/commons/thumb/b/b5/\
 Golden_Retriever_medium-to-light-coat.jpg/\
 365px-Golden_Retriever_medium-to-light-coat.jpg'
-fname = download(url)
-x = image.imread(fname)
+fname = gluon.utils.download(url)
+x = image.imread(fname)  # TODO, use npx.image instead
 ```
 
-Following the conventional way of preprocessing ImageNet data:
+Following the conventional way of preprocessing ImageNet data, do the following:
 
-1. Resize the short edge into 256 pixes,
-2. And then perform a center crop to obtain a 224-by-224 image.
+1. Resize the short edge into 256 pixes.
+2. Perform a center crop to obtain a 224-by-224 image.
 
 ```{.python .input  n=10}
 x = image.resize_short(x, 256)
@@ -135,27 +129,31 @@ plt.imshow(x.asnumpy())
 plt.show()
 ```
 
-Now you may know it is a golden retriever (You can also infer it from the image URL).
+Now you can see it is a golden retriever. You can also infer it from the image URL.
 
-The futher data transformation is similar to FashionMNIST except that we subtract the RGB means and divide by the corresponding variances to normalize each color channel.
+The next data transformation is similar to FashionMNIST. Here, you subtract the RGB means and divide by the corresponding variances to normalize each color channel.
 
 ```{.python .input  n=11}
 def transform(data):
-    data = data.transpose((2,0,1)).expand_dims(axis=0)
-    rgb_mean = nd.array([0.485, 0.456, 0.406]).reshape((1,3,1,1))
-    rgb_std = nd.array([0.229, 0.224, 0.225]).reshape((1,3,1,1))
+    data = np.expand_dims(np.transpose(data, (2,0,1)), axis=0)
+    rgb_mean = np.array([0.485, 0.456, 0.406]).reshape((1,3,1,1))
+    rgb_std = np.array([0.229, 0.224, 0.225]).reshape((1,3,1,1))
     return (data.astype('float32') / 255 - rgb_mean) / rgb_std
 ```
 
-Now we can recognize the object in the image now. We perform an additional softmax on the output to obtain probability scores. And then print the top-5 recognized objects.
+Now you can recognize the object in the image. Perform an additional softmax on the output to obtain probability scores. Print the top-5 recognized objects.
 
 ```{.python .input  n=12}
-prob = net(transform(x)).softmax()
-idx = prob.topk(k=5)[0]
+prob = npx.softmax(net(transform(x)))
+idx = npx.topk(prob, k=5)[0]
 for i in idx:
-    i = int(i.asscalar())
     print('With prob = %.5f, it contains %s' % (
-        prob[0,i].asscalar(), text_labels[i]))
+        prob[0, int(i)], text_labels[int(i)]))
 ```
 
-As can be seen, the model is fairly confident the image contains a golden retriever.
+As can be seen, the model is fairly confident that the image contains a golden retriever.
+
+## Next Steps
+
+You might find that both training and prediction are a little bit slow. If you have a GPU
+available, learn how to accomplish your tasks faster in [Step 6: Use GPUs to increase efficiency](6-use_gpus.md).
diff --git a/docs/python_docs/python/tutorials/getting-started/crash-course/6-use_gpus.md b/docs/python_docs/python/tutorials/getting-started/crash-course/6-use_gpus.md
index fc457ea7dc33..1e60d5f929b9 100644
--- a/docs/python_docs/python/tutorials/getting-started/crash-course/6-use_gpus.md
+++ b/docs/python_docs/python/tutorials/getting-started/crash-course/6-use_gpus.md
@@ -15,60 +15,60 @@
 <!--- specific language governing permissions and limitations -->
 <!--- under the License. -->
 
-# Use GPUs
+# Step 6: Use GPUs to increase efficiency
 
-We often use GPUs to train and deploy neural networks, because it offers significant more computation power compared to CPUs. In this tutorial we will introduce how to use GPUs with MXNet.
+In this step, you learn how to use graphics processing units (GPUs) with MXNet. If you use GPUs to train and deploy neural networks, you get significantly more computational power when compared to central processing units (CPUs).
 
-First, make sure you have at least one Nvidia GPU in your machine and CUDA
-properly installed. Other GPUs such as AMD and Intel GPUs are not supported
-yet. Then be sure you have installed the GPU-enabled version of MXNet.
+## Prerequisites
 
-```{.python .input  n=15}
-# If you pip installed the plain `mxnet` before, uncomment the
-# following two lines to install the GPU version. You may need to
-# replace `cu92` according to your CUDA version.
-# !pip uninstall mxnet
-# !pip install mxnet-cu92
+Before you start the other steps here, make sure you have at least one Nvidia GPU in your machine and CUDA properly installed. GPUs from AMD and Intel are not supported. Install the GPU-enabled version of MXNet.
 
-from mxnet import nd, gpu, gluon, autograd
+Use the following commands to check the number GPUs that are available.
+
+```{.python .input  n=2}
+from mxnet import np, npx, gluon, autograd
 from mxnet.gluon import nn
-from mxnet.gluon.data.vision import datasets, transforms
 import time
+npx.set_np()
+
+npx.num_gpus()
 ```
 
 ## Allocate data to a GPU
 
-You may notice that MXNet's NDArray is very similar to Numpy. One major difference is NDArray has a `context` attribute that specifies which device this array is on. By default, it is `cpu()`. Now we will change it to the first GPU. You can use `gpu()` or `gpu(0)` to indicate the first GPU.
+MXNet's ndarray is very similar to NumPy. One major difference is MXNet's ndarray has a `context` attribute that specifies which device an array is on. By default, it is on `npx.cpu()`. Change it to the first GPU with the following code. Use `npx.gpu()` or `npx.gpu(0)` to indicate the first GPU.
 
 ```{.python .input  n=10}
-x = nd.ones((3,4), ctx=gpu())
+gpu = npx.gpu() if npx.num_gpus() > 0 else npx.cpu()
+x = np.ones((3,4), ctx=gpu)
 x
 ```
 
-For a CPU, MXNet will allocate data on main memory, and try to use all CPU cores as possible, even if there is more than one CPU socket. While if there are multiple GPUs, MXNet needs to specify which GPUs the NDArray will be allocated.
+If you're using a CPU, MXNet allocates data on main memory and tries to use as many CPU cores as possible.  This is true even if there is more than one CPU socket. If there are multiple GPUs, MXNet specifies which GPUs the ndarray is allocated.
 
-Let's assume there is a least one more GPU. We can create another NDArray and assign it there. (If you only have one GPU, then you will see an error). Here we copy `x` to the second GPU, `gpu(1)`:
+Assume there is a least one more GPU. Create another ndarray and assign it there. If you only have one GPU, then you get an error. In the example code here, you copy `x` to the second GPU, `npx.gpu(1)`:
 
 ```{.python .input  n=11}
-x.copyto(gpu(1))
+gpu_1 = npx.gpu(1) if npx.num_gpus() > 1 else npx.cpu()
+x.copyto(gpu_1)
 ```
 
-MXNet needs users to explicitly move data between devices. But several operators such as `print`, `asnumpy` and `asscalar`, will implicitly move data to main memory.
+MXNet requries that users explicitly move data between devices. But several operators such as `print`, and `asnumpy`, will implicitly move data to main memory.
 
 ## Run an operation on a GPU
 
-To perform an operation on a particular GPU, we only need to guarantee that the inputs of this operation are already on that GPU. The output will be allocated on the same GPU as well. Almost all operators in the `nd` module support running on a GPU.
+To perform an operation on a particular GPU, you only need to guarantee that the input of an operation is already on that GPU. The output is allocated on the same GPU as well. Almost all operators in the `np` and `npx` module support running on a GPU.
 
 ```{.python .input  n=21}
-y = nd.random.uniform(shape=(3,4), ctx=gpu())
+y = np.random.uniform(size=(3,4), ctx=gpu)
 x + y
 ```
 
-Remember that if the inputs are not on the same GPU, you will see an error.
+Remember that if the inputs are not on the same GPU, you get an error.
 
 ## Run a neural network on a GPU
 
-Similarly, to run a neural network on a GPU, we only need to copy/move the input data and parameters to the GPU. Let's reuse the previously defined LeNet.
+To run a neural network on a GPU, you only need to copy and move the input data and parameters to the GPU. Reuse the previously defined LeNet. The following code example shows this.
 
 ```{.python .input  n=16}
 net = nn.Sequential()
@@ -76,51 +76,50 @@ net.add(nn.Conv2D(channels=6, kernel_size=5, activation='relu'),
         nn.MaxPool2D(pool_size=2, strides=2),
         nn.Conv2D(channels=16, kernel_size=3, activation='relu'),
         nn.MaxPool2D(pool_size=2, strides=2),
-        nn.Flatten(),
         nn.Dense(120, activation="relu"),
         nn.Dense(84, activation="relu"),
         nn.Dense(10))
 ```
 
-And then load the saved parameters into GPU 0 directly, or use `net.reset_ctx` to change the device.
+Load the saved parameters into GPU 0 directly as shown here, or use `net.collect_params().reset_ctx` to change the device.
 
 ```{.python .input  n=20}
-net.load_parameters('net.params', ctx=gpu(0))
+net.load_parameters('net.params', ctx=gpu)
 ```
 
-Now create input data on GPU 0. The forward function will then run on GPU 0.
+Use the following command to create input data on GPU 0. The forward function will then run on GPU 0.
 
 ```{.python .input  n=22}
-x = nd.random.uniform(shape=(1,1,28,28), ctx=gpu(0))
-net(x)
+# x = np.random.uniform(size=(1,1,28,28), ctx=gpu)
+# net(x) FIXME
 ```
 
-## [Advanced] Multi-GPU training
+## Training with multiple GPUs
 
-Finally, we show how to use multiple GPUs to jointly train a neural network through data parallelism. Let's assume there are *n* GPUs. We split each data batch into *n* parts, and then each GPU will run the forward and backward passes using one part of the data.
+Finally, you can see how to use multiple GPUs to jointly train a neural network through data parallelism. Assume there are *n* GPUs. Split each data batch into *n* parts, and then each GPU will run the forward and backward passes using one part of the data.
 
-Let's first copy the data definitions and the transform function from the [previous tutorial](5-predict.html).
+First copy the data definitions with the following commands, and the transform function from the [Predict tutorial](5-predict.md).
 
 ```{.python .input}
 batch_size = 256
-transformer = transforms.Compose([
-    transforms.ToTensor(),
-    transforms.Normalize(0.13, 0.31)])
+transformer = gluon.data.vision.transforms.Compose([
+    gluon.data.vision.transforms.ToTensor(),
+    gluon.data.vision.transforms.Normalize(0.13, 0.31)])
 train_data = gluon.data.DataLoader(
-    datasets.FashionMNIST(train=True).transform_first(transformer),
-    batch_size, shuffle=True, num_workers=4)
+    gluon.data.vision.datasets.FashionMNIST(train=True).transform_first(
+        transformer), batch_size, shuffle=True, num_workers=4)
 valid_data = gluon.data.DataLoader(
-    datasets.FashionMNIST(train=False).transform_first(transformer),
-    batch_size, shuffle=False, num_workers=4)
+    gluon.data.vision.datasets.FashionMNIST(train=False).transform_first(
+        transformer), batch_size, shuffle=False, num_workers=4)
 ```
 
-The training loop is quite similar to what we introduced before. The major differences are highlighted in the following code.
+The training loop is quite similar to that shown earlier. The major differences are highlighted in the following code.
 
 ```{.python .input}
 # Diff 1: Use two GPUs for training.
-devices = [gpu(0), gpu(1)]
+devices = [gpu, gpu_1]
 # Diff 2: reinitialize the parameters and place them on multiple GPUs
-net.initialize(force_reinit=True, ctx=devices)
+net.collect_params().initialize(force_reinit=True, ctx=devices)
 # Loss and trainer are the same as before
 softmax_cross_entropy = gluon.loss.SoftmaxCrossEntropyLoss()
 trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.1})
@@ -139,8 +138,14 @@ for epoch in range(10):
         for l in losses:
             l.backward()
         trainer.step(batch_size)
-        # Diff 5: sum losses over all devices
-        train_loss += sum([l.sum().asscalar() for l in losses])
+        # Diff 5: sum losses over all devices. Here float will copy data
+        # into CPU.
+        train_loss += sum([float(l.sum()) for l in losses])
     print("Epoch %d: loss %.3f, in %.1f sec" % (
         epoch, train_loss/len(train_data)/batch_size, time.time()-tic))
 ```
+
+## Next steps
+
+Now you have completed training and predicting with a neural network by using NP on MXNet and
+Gluon. You can check the guides to these two front ends: [What is NP on MXNet](../deepnumpy/index.html) and [gluon](../gluon_from_experiment_to_deployment.md).
diff --git a/docs/python_docs/python/tutorials/getting-started/crash-course/index.rst b/docs/python_docs/python/tutorials/getting-started/crash-course/index.rst
index 124c94ae0814..b9a86e0978c9 100644
--- a/docs/python_docs/python/tutorials/getting-started/crash-course/index.rst
+++ b/docs/python_docs/python/tutorials/getting-started/crash-course/index.rst
@@ -1,27 +1,8 @@
-.. Licensed to the Apache Software Foundation (ASF) under one
-   or more contributor license agreements.  See the NOTICE file
-   distributed with this work for additional information
-   regarding copyright ownership.  The ASF licenses this file
-   to you under the Apache License, Version 2.0 (the
-   "License"); you may not use this file except in compliance
-   with the License.  You may obtain a copy of the License at
-
-     http://www.apache.org/licenses/LICENSE-2.0
-
-   Unless required by applicable law or agreed to in writing,
-   software distributed under the License is distributed on an
-   "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-   KIND, either express or implied.  See the License for the
-   specific language governing permissions and limitations
-   under the License.
-
-Crash Course
+Getting started with NP on MXNet
 ============
 
-This crash course will give you a quick overview of the core concept of NDArray
-(manipulating multiple dimensional arrays) and Gluon (create and train neural
-networks). This is a good place to start if you are already familiar with
-machine learning or other deep learning frameworks.
+This crash course shows how to get started with NP on MXNet. The topics here provide a quick overview of the core concepts for both NP on MXNet, which helps you manipulate multiple dimensional arrays, and Gluon, which helps you create and train neural
+networks. This is a good place to start if you are already familiar with machine learning or other deep learning frameworks.
 
 .. toctree::
    :maxdepth: 1
@@ -33,20 +14,3 @@ machine learning or other deep learning frameworks.
    4-train
    5-predict
    6-use_gpus
-
-
-..
-   # add back the videos until apis are updated.
-   You can also watch the video tutorials for this crash course. Note that two APIs
-   described in vidoes have changes:
-
-   - ``with name_scope`` is not necessary any more.
-   - use ``save_parameters/load_parameters`` instead of ``save_params/load_params``
-
-   .. raw:: html
-
-      <style> iframe {width: 448px; height: 252px; margin: 1em 0;} </style>
-
-      <iframe src="https://www.youtube.com/embed/r4-Ynxw0X5w" frameborder="0"
-      allow="accelerometer; autoplay; encrypted-media; gyroscope;
-      picture-in-picture" allowfullscreen></iframe>
diff --git a/docs/python_docs/python/tutorials/getting-started/deepnumpy/cheat-sheet.md b/docs/python_docs/python/tutorials/getting-started/deepnumpy/cheat-sheet.md
new file mode 100644
index 000000000000..7f4d82605653
--- /dev/null
+++ b/docs/python_docs/python/tutorials/getting-started/deepnumpy/cheat-sheet.md
@@ -0,0 +1,463 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+# The NP on MXNet cheat sheet
+
+To begin, import the `np` and `npx` module and update MXNet to run in
+NumPy-like mode.
+
+```{.python .input  n=1}
+from mxnet import np, npx
+npx.set_np()  # Change MXNet to the numpy-like mode.
+```
+
+NDArray figure (TODO)
+
+## Creating arrays
+
+```{.python .input  n=2}
+np.array([1, 2, 3])  # default datatype is float32
+```
+
+```{.python .input  n=3}
+np.array([(1.5, 2, 3), (4, 5, 6)], dtype='float16')
+```
+
+```{.python .input  n=4}
+np.array([[(15,2,3), (4,5,6)], [(3,2,1), (4,5,6)]], dtype='int32')
+```
+
+### Initial placeholders
+
+```{.python .input  n=5}
+np.zeros((3, 4))  # Create an array of zeros
+```
+
+```{.python .input  n=6}
+np.ones((2, 3, 4), dtype='int8')  # Create an array of ones
+```
+
+```{.python .input  n=7}
+np.arange(10, 25, 5)  # Create an array of evenly spaced values (step value)
+```
+
+```{.python .input  n=8}
+# Create an array of evenly spaced values (number of samples)
+# np.linspace(0, 2, 9)
+```
+
+```{.python .input  n=9}
+# np.full((2, 2), 7)  # Create a constant array
+```
+
+```{.python .input  n=10}
+# np.eye(2)  # Create a 2X2 identity matrix
+```
+
+```{.python .input  n=11}
+# np.random.random((2, 2))  # Create an array with random values
+```
+
+```{.python .input  n=12}
+np.empty((3,2))  # Create an empty array
+```
+
+## I/O
+
+### Saving and loading on disk
+
+```{.python .input  n=12}
+# Save one array
+a = np.array([1, 2, 3])
+npx.save('my_array', a)
+npx.load('my_array')
+```
+
+```{.python .input  n=20}
+# Save a list of arrays
+b = np.array([4, 6, 8])
+npx.save('my_arrays', [a, b])  # FIXME, cannot be a tuple
+npx.load('my_arrays')
+```
+
+### Saving and loading text files
+
+```{.python .input  n=20}
+# np.loadtxt("myfile.txt")
+# np.genfromtxt("my_file.csv", delimiter=',')
+# np.savetxt("myarray.txt", a, delimiter=" ")
+```
+
+## Data types
+
+```{.python .input  n=20}
+# np.int64    # Signed 64-bit integer types
+# np.float32  # Standard double-precision floating point
+# np.complex  # Complex numbers represented by 128 floats
+# np.bool     # Boolean type storing TRUE and FALSE values
+# np.object   # Python object type
+# np.string_  # Fixed-length string type
+# np.unicode_ # Fixed-length unicode type
+```
+
+## Inspecting your array
+
+```{.python .input  n=21}
+a.shape # Array dimensions
+```
+
+```{.python .input  n=22}
+len(a) # Length of array
+```
+
+```{.python .input  n=23}
+b.ndim # Number of array dimensions
+```
+
+```{.python .input  n=24}
+b.size # Number of array elements
+```
+
+```{.python .input  n=25}
+b.dtype # Data type of array elements
+```
+
+```{.python .input  n=29}
+# b.dtype.name # Name of data type
+```
+
+```{.python .input  n=35}
+b.astype('int') # Convert an array to a different type
+```
+
+## Asking For Help
+
+```{.python .input  n=36}
+# np.info(np.ndarray.dtype)
+```
+
+## Array mathematics
+
+### Arithmetic operations
+
+```{.python .input  n=37}
+a - b # Subtraction
+```
+
+```{.python .input  n=38}
+np.subtract(a, b) # Subtraction
+```
+
+```{.python .input  n=39}
+b + a # Addition
+```
+
+```{.python .input  n=40}
+np.add(b, a) # Addition
+```
+
+```{.python .input  n=41}
+a / b # Division
+```
+
+```{.python .input  n=42}
+np.divide(a,b) # Division
+```
+
+```{.python .input  n=43}
+a * b # Multiplication
+```
+
+```{.python .input  n=44}
+np.multiply(a, b) # Multiplication
+```
+
+```{.python .input  n=45}
+np.exp(b) # Exponentiation
+```
+
+```{.python .input  n=46}
+np.sqrt(b) # Square root
+```
+
+```{.python .input  n=47}
+np.sin(a) # Sines of an array
+```
+
+```{.python .input  n=48}
+np.cos(b) # Element-wise cosine
+```
+
+```{.python .input  n=49}
+np.log(a) # Element-wise natural logarithm
+```
+
+```{.python .input  n=50}
+a.dot(b) # Dot product
+```
+
+### Comparison
+
+### Aggregate functions
+
+```{.python .input  n=51}
+a.sum() # Array-wise sum
+```
+
+```{.python .input  n=53}
+# a.min() # Array-wise minimum value
+```
+
+```{.python .input  n=57}
+c = np.array(([[1,2,3], [2,3,4]]))
+# c.max(axis=0) # Maximum value of an array row
+```
+
+```{.python .input  n=56}
+# c.cumsum(axis=1) # Cumulative sum of the elements
+```
+
+```{.python .input  n=58}
+a.mean() # Mean
+```
+
+```{.python .input  n=60}
+# b.median() # Median
+```
+
+```{.python .input  n=61}
+# a.corrcoef() # Correlation coefficient
+```
+
+```{.python .input  n=63}
+# np.std(b) # Standard deviation
+```
+
+## Copying arrays
+
+```{.python .input  n=63}
+# a.view() # Create a view of the array with the same data
+```
+
+```{.python .input  n=63}
+np.copy(a) # Create a copy of the array
+```
+
+```{.python .input  n=63}
+a.copy() # Create a deep copy of the array
+```
+
+## Sorting Arrays
+
+```{.python .input  n=63}
+# a.sort() # Sort an array
+```
+
+```{.python .input  n=63}
+# c.sort(axis=0) # Sort the elements of an array's axis
+```
+
+## Subsetting, slicing, indexing
+
+### Subsetting
+
+```{.python .input  n=63}
+a[2] # Select the element at the 2nd index 3
+```
+
+```{.python .input  n=63}
+c[0,1] # Select the element at row 1 column 2
+```
+
+### Slicing
+
+```{.python .input  n=63}
+a[0:2] # Select items at index 0 and 1
+```
+
+```{.python .input  n=63}
+c[0:2,1] # Select items at rows 0 and 1 in column 1
+```
+
+```{.python .input  n=63}
+c[:1] # Select all items at row 0
+```
+
+```{.python .input  n=63}
+# c[1,...] # Same as [1,:,:]
+```
+
+```{.python .input  n=63}
+a[ : :-1] #Reversed array a array([3, 2, 1])
+```
+
+### Boolean Indexing
+
+```{.python .input  n=63}
+# a[a<2] # Select elements from a less than 2
+```
+
+### Fancy indexing
+
+```{.python .input  n=63}
+c[[1,0,1,0], [0,1,2,0]] # Select elements (1,0),(0,1),(1,2) and (0,0)
+```
+
+```{.python .input  n=63}
+c[[1,0,1,0]][:,[0,1,2,0]] # Select a subset of the matrix’s rows
+```
+
+## Array manipulation
+
+### Transposing array
+
+```{.python .input  n=63}
+np.transpose(c) # Permute array dimensions
+```
+
+```{.python .input  n=63}
+c.T # Permute array dimensions
+```
+
+### Changing array shape
+
+```{.python .input  n=63}
+# b.ravel() # Flatten the array
+```
+
+```{.python .input  n=63}
+# c.reshape(3,-2) # Reshape, but don’t change data
+```
+
+### Adding and removing elements
+
+```{.python .input  n=63}
+# c.resize((6,2)) # Return a new array with shape (6, 2)
+```
+
+```{.python .input  n=63}
+# np.append(h,g) # Append items to an array
+```
+
+```{.python .input  n=63}
+# np.insert(a, 1, 5) # Insert items in an array
+```
+
+```{.python .input  n=63}
+# np.delete(a, [1]) # Delete items from an array
+```
+
+### Combining arrays
+
+```{.python .input  n=63}
+np.concatenate((a,b),axis=0) # Concatenate arrays
+```
+
+```{.python .input  n=63}
+# np.vstack((a,b)) # Stack arrays vertically (row-wise)
+```
+
+```{.python .input  n=63}
+# np.r_[e,f] # Stack arrays vertically (row-wise)
+```
+
+```{.python .input  n=63}
+# np.hstack((e,f)) # Stack arrays horizontally (column-wise)
+```
+
+```{.python .input  n=63}
+# np.column_stack((a,d)) # Create stacked column-wise arrays
+```
+
+```{.python .input  n=63}
+# np.c_[a,d] # Create stacked column-wise arrays
+```
+
+### Splitting arrays
+
+```{.python .input  n=63}
+# np.hsplit(a,3) # Split the array horizontally at the 3rd index
+```
+
+```{.python .input  n=63}
+# np.vsplit(c,2) # Split the array vertically at the 2nd index
+```
+
+## Use GPUs
+
+Prerequisites: A GPU exists and GPU-enabled MXNet is installed.
+
+```{.python .input}
+npx.num_gpus()  # Query number of GPUs
+```
+
+```{.python .input}
+npx.gpu(0), npx.gpu(1)  # Context for the first and second GPUs
+```
+
+```{.python .input}
+gpu_0 = npx.gpu(0) if npx.num_gpus() > 1 else npx.cpu()
+g0 = np.zeros((2,3), ctx=gpu_0)  # Create array on GPU 0
+g0
+```
+
+```{.python .input}
+gpu_1 = npx.gpu(1) if npx.num_gpus() > 2 else npx.cpu()
+g1 = np.random.uniform(size=(2,3), ctx=gpu_1)  # Create array on GPU 1
+g1
+```
+
+```{.python .input}
+# Copy to another GPU
+g1.copyto(gpu_0)
+```
+
+```{.python .input}
+# Return itself if matching the context, otherwise copy
+g1.copyto(gpu_0), g1.copyto(gpu_0)
+```
+
+```{.python .input}
+g1.context  # Query the device an array is on
+```
+
+```{.python .input}
+## The computation is performed by the devices on which the input arrays are
+g0 + g1.copyto(gpu_0)
+```
+
+## Auto differentiation
+
+```{.python .input}
+a.attach_grad() # Allocate gradient for a variable
+a.grad # access the gradient
+```
+
+Compute the $\nabla_a b=\exp(2a)^T a$
+
+```{.python .input}
+from mxnet import autograd
+
+with autograd.record():
+    b = np.exp(2*a).dot(a)
+b.backward()
+a.grad
+```
+
+**Acknowledgement**
+
+Adapted from www.datacamp.com.
diff --git a/docs/python_docs/python/tutorials/getting-started/deepnumpy/deepnumpy-vs-numpy.md b/docs/python_docs/python/tutorials/getting-started/deepnumpy/deepnumpy-vs-numpy.md
new file mode 100644
index 000000000000..60c87af3ee54
--- /dev/null
+++ b/docs/python_docs/python/tutorials/getting-started/deepnumpy/deepnumpy-vs-numpy.md
@@ -0,0 +1,113 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+# Differences between NP on MXNet and NumPy
+
+This topic lists known differences between `mxnet.np` and `numpy`. With this quick reference, NumPy users can more easily adopt  the MXNet NumPy-like API. 
+
+```{.python .input}
+import numpy as onp  # o means original
+from mxnet import np, npx
+npx.set_np()  # Configue MXNet to be NumPy-like
+```
+
+## Missing operators
+
+Many, but not all, operators in NumPy are supported in MXNet. You can find the missing operators in [NP on MXNet reference](/api/python/docs/api/ndarray/index.html). They're displayed in gray blocks instead of having links to their documents. 
+
+In addition, an operator might not contain all arguments available in NumPy. For example, MXNet does not support stride. Check the operator document for more details. 
+
+## Extra functionalities 
+
+The `mxnet.np` module aims to mimic NumPy.  Most extra functionalities that enhance NumPy for deep learning use are available on other modules, such as `npx` for operators used in deep learning and `autograd` for automatic differentiation. The `np` module API is not complete. One notable change is GPU support. Creating routines accepts a `ctx` argument:
+
+```{.python .input}
+gpu = npx.gpu() if npx.num_gpus() > 0 else npx.cpu()
+a = np.array(1, ctx=gpu)
+b = np.random.uniform(ctx=gpu)
+(a, b.context)
+```
+
+Methods to move data across devices. 
+
+```{.python .input}
+a.copyto(npx.cpu()), b.as_in_context(npx.cpu())
+```
+
+## Default data types
+
+NumPy uses 64-bit floating numbers or 64-bit integers by default. 
+
+```{.python .input}
+onp.array([1,2]).dtype, onp.array([1.2,2.3]).dtype
+```
+
+MXNet uses 32-bit floating points as the default date type. It's the default data type for deep learning.
+
+```{.python .input}
+np.array([1,2]).dtype, np.array([1.2,2.3]).dtype
+```
+
+## Scalars
+
+NumPy has classes for scalars, whose base class is 'numpy.generic'. The return values of selecting an element and reduce operators are scalars. 
+
+```{.python .input}
+a = onp.array([1,2])
+type(a[0]), type(a.sum())
+```
+
+A scalar is almost identical to a 0-rank tensor (TODO, there may be subtle difference), but it has a different class. You can check the data type with `isinstance` 
+
+```{.python .input}
+b = a[0]
+(b.ndim, b.size, isinstance(b, onp.generic), isinstance(b, onp.integer),
+ isinstance(b, onp.int64), isinstance(b, onp.ndarray))
+```
+
+MXNet returns 0-rank `ndarray` for scalars. (TODO, may consider to add scalar classes later.) 
+
+```{.python .input}
+a = np.array([1,2])
+type(a[0]), type(a.sum())
+```
+
+```{.python .input}
+b = a[0]
+b.ndim, b.size, isinstance(b, np.ndarray)
+```
+
+## Save
+
+The `save` method in `mxnet.np` saves data into a binary format that's not compatible with NumPy format. For example, it contains the device information. (TODO, needs more discussion here.) 
+
+```{.python .input}
+a = np.array(1, ctx=gpu)
+npx.save('a', a)
+npx.load('a')
+```
+
+## Matplotlib
+
+Sometimes the MXNet ndarray cannot used by other libraries that accept NumPy input, for example matplotlib. The best practice is converting to NumPy with `asnumpy()`.
+
+```{.python .input}
+%matplotlib inline
+import matplotlib.pyplot as plt
+
+plt.plot(np.array([1,2]).asnumpy());
+```
diff --git a/docs/python_docs/python/tutorials/getting-started/deepnumpy/index.rst b/docs/python_docs/python/tutorials/getting-started/deepnumpy/index.rst
new file mode 100644
index 000000000000..cd56ac46c869
--- /dev/null
+++ b/docs/python_docs/python/tutorials/getting-started/deepnumpy/index.rst
@@ -0,0 +1,32 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+   or more contributor license agreements.  See the NOTICE file
+   distributed with this work for additional information
+   regarding copyright ownership.  The ASF licenses this file
+   to you under the Apache License, Version 2.0 (the
+   "License"); you may not use this file except in compliance
+   with the License.  You may obtain a copy of the License at
+
+     http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing,
+   software distributed under the License is distributed on an
+   "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+   KIND, either express or implied.  See the License for the
+   specific language governing permissions and limitations
+   under the License.
+
+What is NP on MXNet
+=====================
+
+NP on MXNet provides a NumPy-like interface with extensions
+for deep learning. It contains two modules, ``mxnet.np``, which is similar to
+NumPy, and ``mxnet.npx``, which contains extended operators that are useful for deep
+learning. 
+
+If this is your first time using NP on MXNet, we recommend that you review the following topics in this section:
+
+.. toctree::
+   :maxdepth: 1
+
+   cheat-sheet
+   deepnumpy-vs-numpy
\ No newline at end of file
diff --git a/docs/python_docs/python/tutorials/getting-started/index.rst b/docs/python_docs/python/tutorials/getting-started/index.rst
index 9402f14061d6..709ee0314b9d 100644
--- a/docs/python_docs/python/tutorials/getting-started/index.rst
+++ b/docs/python_docs/python/tutorials/getting-started/index.rst
@@ -28,6 +28,12 @@ The following tutorials teach how to use MXNet.
 
       A quick overview of the core concepts of MXNet using the Gluon API.
 
+   .. card::
+      :title: What is NP on MXNet
+      :link: deepnumpy/index.html
+
+      What is NP on MXNet
+
    .. card::
       :title: Moving from other frameworks
       :link: to-mxnet/index.html
@@ -57,6 +63,7 @@ The following tutorials teach how to use MXNet.
    :maxdepth: 2
 
    crash-course/index
+   deepnumpy/index
    to-mxnet/index
    gluon_from_experiment_to_deployment
    logistic_regression_explained.md
diff --git a/docs/python_docs/python/tutorials/packages/ndarray/deepnumpy/arrays.indexing.rst b/docs/python_docs/python/tutorials/packages/ndarray/deepnumpy/arrays.indexing.rst
new file mode 100644
index 000000000000..e073d2100454
--- /dev/null
+++ b/docs/python_docs/python/tutorials/packages/ndarray/deepnumpy/arrays.indexing.rst
@@ -0,0 +1,356 @@
+.. _arrays.indexing:
+
+Indexing
+========
+
+.. sectionauthor:: adapted from "Guide to NumPy" by Travis E. Oliphant
+
+.. currentmodule:: mxnet.np
+
+.. index:: indexing, slicing
+
+:class:`ndarrays <ndarray>` can be indexed using the standard Python
+``x[obj]`` syntax, where *x* is the array and *obj* the selection.
+There are three kinds of indexing available: basic
+slicing, advanced indexing, and boolean mask indexing. Which one occurs depends on *obj*.
+
+.. note::
+
+   In Python, ``x[(exp1, exp2, ..., expN)]`` is equivalent to
+   ``x[exp1, exp2, ..., expN]``; the latter is just syntactic sugar
+   for the former.
+
+
+Basic Slicing and Indexing
+--------------------------
+
+Basic slicing extends Python's basic concept of slicing to N
+dimensions. Basic slicing occurs when *obj* is a :class:`slice` object
+(constructed by ``start:stop:step`` notation inside of brackets), an
+integer, or a tuple of slice objects and integers. :const:`Ellipsis`
+and :const:`newaxis` objects can be interspersed with these as
+well.
+
+The simplest case of indexing with *N* integers returns an :ref:`array
+scalar <arrays.scalars>` representing the corresponding item.  As in
+Python, all indices are zero-based: for the *i*-th index :math:`n_i`,
+the valid range is :math:`0 \le n_i < d_i` where :math:`d_i` is the
+*i*-th element of the shape of the array.  Negative indices are
+interpreted as counting from the end of the array (*i.e.*, if
+:math:`n_i < 0`, it means :math:`n_i + d_i`).
+
+All arrays generated by basic slicing are always :term:`views <view>`
+of the original array if the fetched elements are contiguous in memory.
+
+The standard rules of sequence slicing apply to basic slicing on a
+per-dimension basis (including using a step index). Some useful
+concepts to remember include:
+
+- The basic slice syntax is ``i:j:k`` where *i* is the starting index,
+  *j* is the stopping index, and *k* is the step (:math:`k\neq0`).
+  This selects the *m* elements (in the corresponding dimension) with
+  index values *i*, *i + k*, ..., *i + (m - 1) k* where
+  :math:`m = q + (r\neq0)` and *q* and *r* are the quotient and remainder
+  obtained by dividing *j - i* by *k*: *j - i = q k + r*, so that
+  *i + (m - 1) k < j*.
+
+  .. admonition:: Example
+
+     >>> x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
+     >>> x[1:7:2]
+     array([1, 3, 5])
+
+- Negative *i* and *j* are interpreted as *n + i* and *n + j* where
+  *n* is the number of elements in the corresponding dimension.
+  Negative *k* makes stepping go towards smaller indices.
+
+  .. admonition:: Example
+
+      >>> x[-2:10]
+      array([8, 9])
+      >>> x[-3:3:-1]
+      array([7, 6, 5, 4])
+
+- Assume *n* is the number of elements in the dimension being
+  sliced. Then, if *i* is not given it defaults to 0 for *k > 0* and
+  *n - 1* for *k < 0* . If *j* is not given it defaults to *n* for *k > 0*
+  and *-n-1* for *k < 0* . If *k* is not given it defaults to 1. Note that
+  ``::`` is the same as ``:`` and means select all indices along this
+  axis.
+
+  .. admonition:: Example
+
+      >>> x[5:]
+      array([5, 6, 7, 8, 9])
+
+- If the number of objects in the selection tuple is less than
+  *N* , then ``:`` is assumed for any subsequent dimensions.
+
+  .. admonition:: Example
+
+      >>> x = np.array([[[1],[2],[3]], [[4],[5],[6]]])
+      >>> x.shape
+      (2, 3, 1)
+      >>> x[1:2]
+      array([[[4],
+              [5],
+              [6]]])
+
+- :const:`Ellipsis` expands to the number of ``:`` objects needed for the
+  selection tuple to index all dimensions. In most cases, this means that
+  length of the expanded selection tuple is ``x.ndim``. There may only be a
+  single ellipsis present.
+
+  .. admonition:: Example
+
+      >>> x[...,0]
+      array([[1, 2, 3],
+             [4, 5, 6]])
+
+- Each :const:`newaxis` object in the selection tuple serves to expand
+  the dimensions of the resulting selection by one unit-length
+  dimension.  The added dimension is the position of the :const:`newaxis`
+  object in the selection tuple.
+
+  .. admonition:: Example
+
+      >>> x[:,np.newaxis,:,:].shape
+      (2, 1, 3, 1)
+
+- An integer, *i*, returns the same values as ``i:i+1``
+  **except** the dimensionality of the returned object is reduced by
+  1. In particular, a selection tuple with the *p*-th
+  element an integer (and all other entries ``:``) returns the
+  corresponding sub-array with dimension *N - 1*. If *N = 1*
+  then the returned object is an scalar `ndarray` whose `ndim=0`.
+
+- If the selection tuple has all entries ``:`` except the
+  *p*-th entry which is a slice object ``i:j:k``,
+  then the returned array has dimension *N* formed by
+  concatenating the sub-arrays returned by integer indexing of
+  elements *i*, *i+k*, ..., *i + (m - 1) k < j*,
+
+- Basic slicing with more than one non-``:`` entry in the slicing
+  tuple, acts like repeated application of slicing using a single
+  non-``:`` entry, where the non-``:`` entries are successively taken
+  (with all other non-``:`` entries replaced by ``:``). Thus,
+  ``x[ind1,...,ind2,:]`` acts like ``x[ind1][...,ind2,:]`` under basic
+  slicing.
+
+  .. warning:: The above is **not** true for advanced indexing.
+
+- You may use slicing to set values in the array, but (unlike lists) you
+  can never grow the array. The size of the value to be set in
+  ``x[obj] = value`` must be (broadcastable) to the same shape as
+  ``x[obj]``.
+
+.. note::
+
+    Remember that a slicing tuple can always be constructed as *obj*
+    and used in the ``x[obj]`` notation. Slice objects can be used in
+    the construction in place of the ``[start:stop:step]``
+    notation. For example, ``x[1:10:5,::-1]`` can also be implemented
+    as ``obj = (slice(1,10,5), slice(None,None,-1)); x[obj]`` . This
+    can be useful for constructing generic code that works on arrays
+    of arbitrary dimension.
+
+.. data:: newaxis
+   :noindex:
+
+   The :const:`newaxis` object can be used in all slicing operations to
+   create an axis of length one. :const:`newaxis` is an alias for
+   'None', and 'None' can be used in place of this with the same result.
+
+
+Advanced Indexing
+-----------------
+
+Advanced indexing is triggered when the selection object, *obj*, is a
+non-tuple sequence object, an :class:`ndarray` (of data type integer or bool),
+or a tuple with at least one sequence object or ndarray (of data type
+integer or bool). There are two types of advanced indexing: integer
+and Boolean.
+
+Advanced indexing always returns a *copy* of the data (contrast with
+some cases in basic slicing that returns a :term:`view`).
+
+.. warning::
+
+   The definition of advanced indexing means that ``x[(1,2,3),]`` is
+   fundamentally different than ``x[(1,2,3)]``. The latter is
+   equivalent to ``x[1,2,3]`` which will trigger basic selection while
+   the former will trigger advanced indexing. Be sure to understand
+   why this occurs.
+
+   Also recognize that ``x[[1,2,3]]`` will trigger advanced indexing,
+   whereas due to the deprecated Numeric compatibility mentioned above,
+   ``x[[1,2,slice(None)]]`` will trigger basic slicing in the official NumPy
+   which is not currently supported in MXNet `numpy` module.
+
+Integer array indexing
+^^^^^^^^^^^^^^^^^^^^^^
+
+Integer array indexing allows selection of arbitrary items in the array
+based on their *N*-dimensional index. Each integer array represents a number
+of indexes into that dimension.
+
+Purely integer array indexing
+"""""""""""""""""""""""""""""
+
+When the index consists of as many integer arrays as the array being indexed
+has dimensions, the indexing is straight forward, but different from slicing.
+
+Advanced indexes always are broadcasting and
+iterated as *one*::
+
+     result[i_1, ..., i_M] == x[ind_1[i_1, ..., i_M], ind_2[i_1, ..., i_M],
+                                ..., ind_N[i_1, ..., i_M]]
+
+Note that the result shape is identical to the (broadcast) indexing array
+shapes ``ind_1, ..., ind_N``.
+
+.. admonition:: Example
+
+    From each row, a specific element should be selected. The row index is just
+    ``[0, 1, 2]`` and the column index specifies the element to choose for the
+    corresponding row, here ``[0, 1, 0]``. Using both together the task
+    can be solved using advanced indexing:
+
+    >>> x = np.array([[1, 2], [3, 4], [5, 6]])
+    >>> x[[0, 1, 2], [0, 1, 0]]
+    array([1, 4, 5])
+
+Combining advanced and basic indexing
+"""""""""""""""""""""""""""""""""""""
+
+When there is at least one slice (``:``), ellipsis (``...``) or :const:`newaxis`
+in the index (or the array has more dimensions than there are advanced indexes),
+then the behaviour can be more complicated. It is like concatenating the
+indexing result for each advanced index element
+
+In the simplest case, there is only a *single* advanced index. A single
+advanced index can for example replace a slice and the result array will be
+the same, however, it is a copy and may have a different memory layout.
+A slice is preferable when it is possible.
+
+.. admonition:: Example
+
+    >>> x[1:2, 1:3]
+    array([[4, 5]])
+    >>> x[1:2, [1, 2]]
+    array([[4, 5]])
+
+The easiest way to understand the situation may be to think in
+terms of the result shape. There are two parts to the indexing operation,
+the subspace defined by the basic indexing (excluding integers) and the
+subspace from the advanced indexing part. Two cases of index combination
+need to be distinguished:
+
+* The advanced indexes are separated by a slice, :const:`Ellipsis` or :const:`newaxis`.
+  For example ``x[arr1, :, arr2]``.
+* The advanced indexes are all next to each other.
+  For example ``x[..., arr1, arr2, :]`` but *not* ``x[arr1, :, 1]``
+  since ``1`` is an advanced index in this regard.
+
+In the first case, the dimensions resulting from the advanced indexing
+operation come first in the result array, and the subspace dimensions after
+that.
+In the second case, the dimensions from the advanced indexing operations
+are inserted into the result array at the same spot as they were in the
+initial array (the latter logic is what makes simple advanced indexing
+behave just like slicing).
+
+.. admonition:: Example
+
+ Suppose ``x.shape`` is (10,20,30) and ``ind`` is a (2,3,4)-shaped
+ indexing :class:`intp` array, then ``result = x[...,ind,:]`` has
+ shape (10,2,3,4,30) because the (20,)-shaped subspace has been
+ replaced with a (2,3,4)-shaped broadcasted indexing subspace. If
+ we let *i, j, k* loop over the (2,3,4)-shaped subspace then
+ ``result[...,i,j,k,:] = x[...,ind[i,j,k],:]``. This example
+ produces the same result as :meth:`x.take(ind, axis=-2) <ndarray.take>`.
+
+.. admonition:: Example
+
+  Let ``x.shape`` be (10,20,30,40,50) and suppose ``ind_1``
+  and ``ind_2`` can be broadcast to the shape (2,3,4).  Then
+  ``x[:,ind_1,ind_2]`` has shape (10,2,3,4,40,50) because the
+  (20,30)-shaped subspace from X has been replaced with the
+  (2,3,4) subspace from the indices.  However,
+  ``x[:,ind_1,:,ind_2]`` has shape (2,3,4,10,30,50) because there
+  is no unambiguous place to drop in the indexing subspace, thus
+  it is tacked-on to the beginning. It is always possible to use
+  :meth:`.transpose() <ndarray.transpose>` to move the subspace
+  anywhere desired. Note that this example cannot be replicated
+  using :func:`take`.
+
+
+Boolean array indexing
+^^^^^^^^^^^^^^^^^^^^^^
+
+This advanced indexing occurs when obj is an array object of Boolean
+type, such as may be returned from comparison operators. A single
+boolean index array is practically identical to ``x[obj.nonzero()]`` where,
+as described above, :meth:`obj.nonzero() <ndarray.nonzero>` returns a
+tuple (of length :attr:`obj.ndim <ndarray.ndim>`) of integer index
+arrays showing the :const:`True` elements of *obj*. However, it is
+faster when ``obj.shape == x.shape``.
+
+If ``obj.ndim == x.ndim``, ``x[obj]`` returns a 1-dimensional array
+filled with the elements of *x* corresponding to the :const:`True`
+values of *obj*.  The search order will be :term:`row-major`,
+C-style. If *obj* has :const:`True` values at entries that are outside
+of the bounds of *x*, then an index error will be raised. If *obj* is
+smaller than *x* it is identical to filling it with :const:`False`.
+
+.. note::
+
+Boolean indexing currently only supports a single boolean ndarray as a index.
+An composite index including a boolean array is not supported for now.
+
+If there is only one Boolean array and no integer indexing array present,
+this is straight forward. Care must only be taken to make sure that the
+boolean index has *exactly* as many dimensions as it is supposed to work
+with.
+
+.. admonition:: Example
+
+    From an array, select all rows which sum up to less or equal two:
+
+    >>> x = np.array([[0, 1], [1, 1], [2, 2]], dtype=np.int32)
+    >>> rowsum = x.sum(-1)
+    >>> x[rowsum <= 2]
+    array([[0, 1],
+           [1, 1]], dtype=int32)
+
+    But if ``rowsum`` would have two dimensions as well:
+
+    >>> rowsum = x.sum(-1, keepdims=True)
+    >>> rowsum.shape
+    (3, 1)
+    >>> x[rowsum <= 2]  # fail
+    IndexError: boolean index did not match indexed array along dimension 1
+
+Detailed notes
+--------------
+
+These are some detailed notes, which are not of importance for day to day
+indexing (in no particular order):
+
+* For advanced assignments, there is in general no guarantee for the
+  iteration order. This means that if an element is set more than once,
+  it is not possible to predict the final result.
+* An empty (tuple) index is a full scalar index into a zero dimensional array.
+  ``x[()]`` returns a *scalar* `ndarray` if ``x`` has zero dimensions.
+  On the other hand ``x[...]`` always returns a view.
+* If a zero dimensional array is present in the index *and* it is *not considered as* a full
+  integer index as in NumPy. Advanced indexing is not triggered.
+* the ``nonzero`` equivalence for Boolean arrays does not hold for zero
+  dimensional boolean arrays.
+* When the result of an advanced indexing operation has no elements but an
+  individual index is out of bounds, currently no ``IndexError`` is
+  raised as in NumPy.
+
+.. index::
+   single: indexing
+   single: ndarray
diff --git a/docs/python_docs/python/tutorials/packages/ndarray/deepnumpy/arrays.ndarray.rst b/docs/python_docs/python/tutorials/packages/ndarray/deepnumpy/arrays.ndarray.rst
new file mode 100644
index 000000000000..a0e9a8707010
--- /dev/null
+++ b/docs/python_docs/python/tutorials/packages/ndarray/deepnumpy/arrays.ndarray.rst
@@ -0,0 +1,631 @@
+.. _arrays.ndarray:
+
+******************************************
+The N-dimensional array (:class:`ndarray`)
+******************************************
+
+.. currentmodule:: mxnet.np
+
+An :class:`ndarray` is a (usually fixed-size) multidimensional
+container of items of the same type and size. The number of dimensions
+and items in an array is defined by its :attr:`shape <ndarray.shape>`,
+which is a :class:`tuple` of *N* non-negative integers that specify the
+sizes of each dimension. The type of items in the array is specified by
+a separate :ref:`data-type object (dtype) <arrays.dtypes>`, one of which
+is associated with each ndarray.
+
+As with other container objects in Python, the contents of an
+:class:`ndarray` can be accessed and modified by :ref:`indexing or
+slicing <arrays.indexing>` the array (using, for example, *N* integers),
+and via the methods and attributes of the :class:`ndarray`.
+
+.. index:: view, base
+
+Different :class:`ndarrays <ndarray>` can share the same data, so that
+changes made in one :class:`ndarray` may be visible in another. That
+is, an ndarray can be a *"view"* to another ndarray, and the data it
+is referring to is taken care of by the *"base"* ndarray.
+
+
+.. admonition:: Example
+
+   A 2-dimensional array of size 2 x 3, composed of 4-byte integer
+   elements:
+
+   >>> x = np.array([[1, 2, 3], [4, 5, 6]], np.int32)
+   >>> type(x)
+   <class 'mxnet.numpy.ndarray'>
+   >>> x.shape
+   (2, 3)
+   >>> x.dtype
+   dtype('int32')
+
+   The array can be indexed using Python container-like syntax:
+
+   >>> # The element of x in the *second* row, *third* column, namely, 6.
+   >>> x[1, 2]
+   array(6, dtype=int32)  # this is different than the official NumPy which returns a np.int32 object
+
+   For example :ref:`slicing <arrays.indexing>` can produce views of
+   the array if the elements to be sliced is continguous in memory:
+
+   >>> y = x[1,:]
+   >>> y
+   array([9, 5, 6], dtype=int32)  # this also changes the corresponding element in x
+   >>> x
+   array([[1, 2, 3],
+           [9, 5, 6]], dtype=int32)
+
+
+Constructing arrays
+===================
+
+New arrays can be constructed using the routines detailed in
+:ref:`routines.array-creation`, and also by using the low-level
+:class:`ndarray` constructor:
+
+.. autosummary::
+   :toctree: generated/
+
+   ndarray
+
+::
+
+
+Indexing arrays
+===============
+
+Arrays can be indexed using an extended Python slicing syntax,
+``array[selection]``.  Similar syntax is also used for accessing
+fields in a :term:`structured data type`.
+
+.. seealso:: :ref:`Array Indexing <arrays.indexing>`.
+
+.. _memory-layout:
+
+Internal memory layout of an ndarray
+====================================
+
+An instance of class :class:`ndarray` consists of a contiguous
+one-dimensional segment of computer memory (owned by the array, or by
+some other object), combined with an indexing scheme that maps *N*
+integers into the location of an item in the block.  The ranges in
+which the indices can vary is specified by the :obj:`shape
+<ndarray.shape>` of the array. How many bytes each item takes and how
+the bytes are interpreted is defined by the :ref:`data-type object
+<arrays.dtypes>` associated with the array.
+
+.. index:: C-order, Fortran-order, row-major, column-major, stride,
+  offset
+
+.. note::
+
+    `mxnet.numpy.ndarray` currently only supports storing elements in
+    C-order/row-major and contiguous memory space. The following content
+    on explaining a variety of memory layouts of an ndarray
+    are copied from the official NumPy documentation as a comprehensive reference.
+
+A segment of memory is inherently 1-dimensional, and there are many
+different schemes for arranging the items of an *N*-dimensional array
+in a 1-dimensional block. NumPy is flexible, and :class:`ndarray`
+objects can accommodate any *strided indexing scheme*. In a strided
+scheme, the N-dimensional index :math:`(n_0, n_1, ..., n_{N-1})`
+corresponds to the offset (in bytes):
+
+.. math:: n_{\mathrm{offset}} = \sum_{k=0}^{N-1} s_k n_k
+
+from the beginning of the memory block associated with the
+array. Here, :math:`s_k` are integers which specify the :obj:`strides
+<ndarray.strides>` of the array. The :term:`column-major` order (used,
+for example, in the Fortran language and in *Matlab*) and
+:term:`row-major` order (used in C) schemes are just specific kinds of
+strided scheme, and correspond to memory that can be *addressed* by the strides:
+
+.. math::
+
+   s_k^{\mathrm{column}} = \mathrm{itemsize} \prod_{j=0}^{k-1} d_j ,
+   \quad  s_k^{\mathrm{row}} = \mathrm{itemsize} \prod_{j=k+1}^{N-1} d_j .
+
+.. index:: single-segment, contiguous, non-contiguous
+
+where :math:`d_j` `= self.shape[j]`.
+
+Both the C and Fortran orders are :term:`contiguous`, *i.e.,*
+single-segment, memory layouts, in which every part of the
+memory block can be accessed by some combination of the indices.
+
+While a C-style and Fortran-style contiguous array, which has the corresponding
+flags set, can be addressed with the above strides, the actual strides may be
+different. This can happen in two cases:
+
+    1. If ``self.shape[k] == 1`` then for any legal index ``index[k] == 0``.
+       This means that in the formula for the offset :math:`n_k = 0` and thus
+       :math:`s_k n_k = 0` and the value of :math:`s_k` `= self.strides[k]` is
+       arbitrary.
+    2. If an array has no elements (``self.size == 0``) there is no legal
+       index and the strides are never used. Any array with no elements may be
+       considered C-style and Fortran-style contiguous.
+
+Point 1. means that ``self`` and ``self.squeeze()`` always have the same
+contiguity and ``aligned`` flags value. This also means
+that even a high dimensional array could be C-style and Fortran-style
+contiguous at the same time.
+
+.. index:: aligned
+
+An array is considered aligned if the memory offsets for all elements and the
+base offset itself is a multiple of `self.itemsize`. Understanding
+`memory-alignment` leads to better performance on most hardware.
+
+.. note::
+
+    Points (1) and (2) are not yet applied by default. Beginning with
+    NumPy 1.8.0, they are applied consistently only if the environment
+    variable ``NPY_RELAXED_STRIDES_CHECKING=1`` was defined when NumPy
+    was built. Eventually this will become the default.
+
+    You can check whether this option was enabled when your NumPy was
+    built by looking at the value of ``np.ones((10,1),
+    order='C').flags.f_contiguous``. If this is ``True``, then your
+    NumPy has relaxed strides checking enabled.
+
+.. warning::
+
+    It does *not* generally hold that ``self.strides[-1] == self.itemsize``
+    for C-style contiguous arrays or ``self.strides[0] == self.itemsize`` for
+    Fortran-style contiguous arrays is true.
+
+Data in new :class:`ndarrays <ndarray>` is in the :term:`row-major`
+(C) order, unless otherwise specified, but, for example, :ref:`basic
+array slicing <arrays.indexing>` often produces :term:`views <view>`
+in a different scheme.
+
+.. seealso: :ref:`Indexing <arrays.ndarray.indexing>`_
+
+.. note::
+
+   Several algorithms in NumPy work on arbitrarily strided arrays.
+   However, some algorithms require single-segment arrays. When an
+   irregularly strided array is passed in to such algorithms, a copy
+   is automatically made.
+
+.. _arrays.ndarray.attributes:
+
+Array attributes
+================
+
+Array attributes reflect information that is intrinsic to the array
+itself. Generally, accessing an array through its attributes allows
+you to get and sometimes set intrinsic properties of the array without
+creating a new array. The exposed attributes are the core parts of an
+array and only some of them can be reset meaningfully without creating
+a new array. Information on each attribute is given below.
+
+Memory layout
+-------------
+
+The following attributes contain information about the memory layout
+of the array:
+
+.. autosummary::
+   :toctree: generated/
+
+   ndarray.shape
+   ndarray.ndim
+   ndarray.size
+
+::
+
+   ndarray.flags
+   ndarray.strides
+   ndarray.data
+   ndarray.itemsize
+   ndarray.nbytes
+   ndarray.base
+
+Data type
+---------
+
+.. seealso:: :ref:`Data type objects <arrays.dtypes>`
+
+The data type object associated with the array can be found in the
+:attr:`dtype <ndarray.dtype>` attribute:
+
+.. autosummary::
+   :toctree: generated/
+
+   ndarray.dtype
+
+Other attributes
+----------------
+
+.. autosummary::
+   :toctree: generated/
+
+   ndarray.T
+
+::
+
+   ndarray.real
+   ndarray.imag
+   ndarray.flat
+   ndarray.ctypes
+
+.. _array.ndarray.methods:
+
+Array methods
+=============
+
+An :class:`ndarray` object has many methods which operate on or with
+the array in some fashion, typically returning an array result. These
+methods are briefly explained below. (Each method's docstring has a
+more complete description.)
+
+For the following methods there are also corresponding functions in
+:mod:`numpy`: :func:`all`, :func:`any`, :func:`argmax`,
+:func:`argmin`, :func:`argpartition`, :func:`argsort`, :func:`choose`,
+:func:`clip`, :func:`compress`, :func:`copy`, :func:`cumprod`,
+:func:`cumsum`, :func:`diagonal`, :func:`imag`, :func:`max <amax>`,
+:func:`mean`, :func:`min <amin>`, :func:`nonzero`, :func:`partition`,
+:func:`prod`, :func:`ptp`, :func:`put`, :func:`ravel`, :func:`real`,
+:func:`repeat`, :func:`reshape`, :func:`round <around>`,
+:func:`searchsorted`, :func:`sort`, :func:`squeeze`, :func:`std`,
+:func:`sum`, :func:`swapaxes`, :func:`take`, :func:`trace`,
+:func:`transpose`, :func:`var`.
+
+Array conversion
+----------------
+
+.. autosummary::
+   :toctree: generated/
+
+   ndarray.item
+   ndarray.copy
+   ndarray.tolist
+   ndarray.astype
+
+::
+
+   ndarray.itemset
+   ndarray.tostring
+   ndarray.tobytes
+   ndarray.tofile
+   ndarray.dump
+   ndarray.dumps
+   ndarray.byteswap
+   ndarray.view
+   ndarray.getfield
+   ndarray.setflags
+   ndarray.fill
+
+Shape manipulation
+------------------
+
+For reshape, resize, and transpose, the single tuple argument may be
+replaced with ``n`` integers which will be interpreted as an n-tuple.
+
+.. autosummary::
+   :toctree: generated/
+
+   ndarray.reshape
+   ndarray.transpose
+   ndarray.swapaxes
+   ndarray.flatten
+   ndarray.squeeze
+
+::
+
+   ndarray.resize
+   ndarray.ravel
+
+Item selection and manipulation
+-------------------------------
+
+For array methods that take an *axis* keyword, it defaults to
+:const:`None`. If axis is *None*, then the array is treated as a 1-D
+array. Any other value for *axis* represents the dimension along which
+the operation should proceed.
+
+.. autosummary::
+   :toctree: generated/
+
+   ndarray.nonzero
+   ndarray.take
+   ndarray.repeat
+
+
+::
+
+   ndarray.argsort
+   ndarray.sort
+   ndarray.put
+   ndarray.choose
+   ndarray.partition
+   ndarray.argpartition
+   ndarray.searchsorted
+   ndarray.compress
+   ndarray.diagonal
+
+Calculation
+-----------
+
+.. index:: axis
+
+Many of these methods take an argument named *axis*. In such cases,
+
+- If *axis* is *None* (the default), the array is treated as a 1-D
+  array and the operation is performed over the entire array. This
+  behavior is also the default if self is a 0-dimensional array or
+  array scalar. (An array scalar is an instance of the types/classes
+  float32, float64, etc., whereas a 0-dimensional array is an ndarray
+  instance containing precisely one array scalar.)
+
+- If *axis* is an integer, then the operation is done over the given
+  axis (for each 1-D subarray that can be created along the given axis).
+
+.. admonition:: Example of the *axis* argument
+
+   A 3-dimensional array of size 3 x 3 x 3, summed over each of its
+   three axes
+
+   >>> x
+   array([[[ 0,  1,  2],
+           [ 3,  4,  5],
+           [ 6,  7,  8]],
+          [[ 9, 10, 11],
+           [12, 13, 14],
+           [15, 16, 17]],
+          [[18, 19, 20],
+           [21, 22, 23],
+           [24, 25, 26]]])
+   >>> x.sum(axis=0)
+   array([[27, 30, 33],
+          [36, 39, 42],
+          [45, 48, 51]])
+   >>> # for sum, axis is the first keyword, so we may omit it,
+   >>> # specifying only its value
+   >>> x.sum(0), x.sum(1), x.sum(2)
+   (array([[27, 30, 33],
+           [36, 39, 42],
+           [45, 48, 51]]),
+    array([[ 9, 12, 15],
+           [36, 39, 42],
+           [63, 66, 69]]),
+    array([[ 3, 12, 21],
+           [30, 39, 48],
+           [57, 66, 75]]))
+
+The parameter *dtype* specifies the data type over which a reduction
+operation (like summing) should take place. The default reduce data
+type is the same as the data type of *self*. To avoid overflow, it can
+be useful to perform the reduction using a larger data type.
+
+For several methods, an optional *out* argument can also be provided
+and the result will be placed into the output array given. The *out*
+argument must be an :class:`ndarray` and have the same number of
+elements. It can have a different data type in which case casting will
+be performed.
+
+.. autosummary::
+   :toctree: generated/
+
+   ndarray.max
+   ndarray.argmax
+   ndarray.min
+   ndarray.argmin
+   ndarray.clip
+   ndarray.sum
+   ndarray.mean
+   ndarray.prod
+   ndarray.cumsum
+   ndarray.var
+   ndarray.std
+
+::
+
+   ndarray.round
+   ndarray.ptp
+   ndarray.conj
+   ndarray.trace
+   ndarray.cumprod
+   ndarray.all
+   ndarray.any
+
+Arithmetic, matrix multiplication, and comparison operations
+============================================================
+
+.. index:: comparison, arithmetic, matrix, operation, operator
+
+Arithmetic and comparison operations on :class:`ndarrays <ndarray>`
+are defined as element-wise operations, and generally yield
+:class:`ndarray` objects as results.
+
+Each of the arithmetic operations (``+``, ``-``, ``*``, ``/``, ``//``,
+``%``, ``divmod()``, ``**`` or ``pow()``, ``<<``, ``>>``, ``&``,
+``^``, ``|``, ``~``) and the comparisons (``==``, ``<``, ``>``,
+``<=``, ``>=``, ``!=``) is equivalent to the corresponding
+universal function (or :term:`ufunc` for short) in NumPy.  For
+more information, see the section on :ref:`Universal Functions
+<ufuncs>`.
+
+Comparison operators:
+
+.. autosummary::
+   :toctree: generated/
+
+   ndarray.__lt__
+   ndarray.__le__
+   ndarray.__gt__
+   ndarray.__ge__
+   ndarray.__eq__
+   ndarray.__ne__
+
+Truth value of an array (:func:`bool()`):
+
+.. autosummary::
+   :toctree: generated/
+
+   ndarray.__bool__
+
+.. note::
+
+   Truth-value testing of an array invokes
+   :meth:`ndarray.__bool__`, which raises an error if the number of
+   elements in the array is larger than 1, because the truth value
+   of such arrays is ambiguous.
+
+
+Unary operations:
+
+.. autosummary::
+   :toctree: generated/
+
+   ndarray.__neg__
+
+::
+
+   ndarray.__pos__
+   ndarray.__abs__
+   ndarray.__invert__
+
+Arithmetic:
+
+.. autosummary::
+   :toctree: generated/
+
+   ndarray.__add__
+   ndarray.__sub__
+   ndarray.__mul__
+   ndarray.__truediv__
+   ndarray.__mod__
+   ndarray.__pow__
+
+::
+
+   ndarray.__floordiv__
+   ndarray.__divmod__
+   ndarray.__lshift__
+   ndarray.__rshift__
+   ndarray.__and__
+   ndarray.__or__
+   ndarray.__xor__
+
+.. note::
+
+   - Any third argument to :func:`pow()` is silently ignored,
+     as the underlying :func:`ufunc <power>` takes only two arguments.
+
+   - The three division operators are all defined; :obj:`div` is active
+     by default, :obj:`truediv` is active when
+     :obj:`__future__` division is in effect.
+
+   - Because :class:`ndarray` is a built-in type (written in C), the
+     ``__r{op}__`` special methods are not directly defined.
+
+   - The functions called to implement many arithmetic special methods
+     for arrays can be modified using :class:`__array_ufunc__ <numpy.class.__array_ufunc__>`.
+
+Arithmetic, in-place:
+
+.. autosummary::
+   :toctree: generated/
+
+   ndarray.__iadd__
+   ndarray.__isub__
+   ndarray.__imul__
+   ndarray.__itruediv__
+   ndarray.__imod__
+
+::
+
+   ndarray.__ifloordiv__
+   ndarray.__ipow__
+   ndarray.__ilshift__
+   ndarray.__irshift__
+   ndarray.__iand__
+   ndarray.__ior__
+   ndarray.__ixor__
+
+.. warning::
+
+   In place operations will perform the calculation using the
+   precision decided by the data type of the two operands, but will
+   silently downcast the result (if necessary) so it can fit back into
+   the array.  Therefore, for mixed precision calculations, ``A {op}=
+   B`` can be different than ``A = A {op} B``. For example, suppose
+   ``a = ones((3,3))``. Then, ``a += 3j`` is different than ``a = a +
+   3j``: while they both perform the same computation, ``a += 3``
+   casts the result to fit back in ``a``, whereas ``a = a + 3j``
+   re-binds the name ``a`` to the result.
+
+Matrix Multiplication:
+
+.. autosummary::
+   :toctree: generated/
+
+::
+
+   ndarray.__matmul__
+
+
+Special methods
+===============
+
+For standard library functions:
+
+.. autosummary::
+   :toctree: generated/
+
+   ndarray.__reduce__
+   ndarray.__setstate__
+
+::
+
+   ndarray.__copy__
+   ndarray.__deepcopy__
+
+Basic customization:
+
+.. autosummary::
+   :toctree: generated/
+
+::
+
+   ndarray.__array__
+   ndarray.__new__
+   ndarray.__array_wrap__
+
+Container customization: (see :ref:`Indexing <arrays.indexing>`)
+
+.. autosummary::
+   :toctree: generated/
+
+   ndarray.__len__
+   ndarray.__getitem__
+   ndarray.__setitem__
+
+::
+
+   ndarray.__contains__
+
+Conversion; the operations :func:`int()` and :func:`float()`.
+They work only on arrays that have one element in them
+and return the appropriate scalar.
+
+.. autosummary::
+   :toctree: generated/
+
+   ndarray.__int__
+   ndarray.__float__
+
+::
+
+   ndarray.__complex__
+
+String representations:
+
+.. autosummary::
+   :toctree: generated/
+
+   ndarray.__str__
+   ndarray.__repr__
diff --git a/docs/python_docs/python/tutorials/packages/ndarray/deepnumpy/arrays.rst b/docs/python_docs/python/tutorials/packages/ndarray/deepnumpy/arrays.rst
new file mode 100644
index 000000000000..3f2a52620a51
--- /dev/null
+++ b/docs/python_docs/python/tutorials/packages/ndarray/deepnumpy/arrays.rst
@@ -0,0 +1,42 @@
+.. _arrays:
+
+*************
+Array objects
+*************
+
+.. currentmodule:: mxnet.np
+
+``np`` provides an N-dimensional array type, the :ref:`ndarray
+<arrays.ndarray>`, which describes a collection of "items" of the same
+type. The items can be :ref:`indexed <arrays.indexing>` using for
+example N integers.
+
+All ndarrays are :term:`homogenous`: every item takes up the same size
+block of memory, and all blocks are interpreted in exactly the same
+way. How each item in the array is to be interpreted is specified by a
+separate :ref:`data-type object <arrays.dtypes>`, one of which is associated
+with every array. In addition to basic types (integers, floats,
+*etc.*), the data type objects can also represent data structures.
+
+An item extracted from an array, *e.g.*, by indexing, is represented
+by a Python object whose type is one of the :ref:`array scalar types
+<arrays.scalars>` built in NumPy. The array scalars allow easy manipulation
+of also more complicated arrangements of data.
+
+.. note::
+
+   A major difference to ``numpy.ndarray`` is that ``mxnet.np.ndarray``'s scalar
+   is a 0-dim ndarray instead of a scalar object (``numpy.generic``).
+
+.. toctree::
+   :maxdepth: 2
+
+   arrays.ndarray
+   arrays.scalars
+   arrays.dtypes
+   arrays.indexing
+   arrays.nditer
+   arrays.classes
+   maskedarray
+   arrays.interface
+   arrays.datetime
diff --git a/docs/python_docs/python/tutorials/packages/ndarray/deepnumpy/index.rst b/docs/python_docs/python/tutorials/packages/ndarray/deepnumpy/index.rst
new file mode 100644
index 000000000000..7830a4469b61
--- /dev/null
+++ b/docs/python_docs/python/tutorials/packages/ndarray/deepnumpy/index.rst
@@ -0,0 +1,30 @@
+.. _reference:
+
+NP on MXNet reference
+============================
+
+
+.. module:: mxnet.np
+
+This section contains the `mxnet.np` API reference documentation. The topics here explain the functions, modules, and objects
+included in `mxnet.np`. Use the links here to learn more.
+
+
+.. toctree::
+   :maxdepth: 2
+
+   arrays
+   constants
+   ufuncs
+   routines
+   distutils
+   distutils_guide
+   c-api
+   internals
+   swig
+   npx
+
+
+**Acknowledgements**
+
+Large parts of this manual originate from NumPy documents.
diff --git a/docs/python_docs/python/tutorials/packages/ndarray/deepnumpy/npx.rst b/docs/python_docs/python/tutorials/packages/ndarray/deepnumpy/npx.rst
new file mode 100644
index 000000000000..dc0a4a08a6b1
--- /dev/null
+++ b/docs/python_docs/python/tutorials/packages/ndarray/deepnumpy/npx.rst
@@ -0,0 +1,85 @@
+NP Extensions
+====================
+
+.. currentmodule:: mxnet.npx
+
+Compatibility
+-------------
+
+.. autosummary::
+   :toctree: generated/
+
+   set_np
+   reset_np
+
+.. code::
+
+   is_np_array
+   use_np_array
+   is_np_shape
+   use_np_shape
+   np_array
+   np_shape
+
+
+Devices
+---------
+
+
+.. autosummary::
+   :toctree: generated/
+
+   cpu
+   cpu_pinned
+   gpu
+   gpu_memory_info
+   current_context
+   num_gpus
+
+Nerual networks
+-----------------------
+
+.. autosummary::
+   :toctree: generated/
+
+   activation
+   batch_norm
+   convolution
+   dropout
+   embedding
+   fully_connected
+   layer_norm
+   pooling
+   rnn
+   leaky_relu
+   multibox_detection
+   multibox_prior
+   multibox_target
+   roi_pooling
+
+
+More operators
+------------------
+
+.. autosummary::
+   :toctree: generated/
+
+   sigmoid
+   smooth_l1
+   softmax
+   threading
+   topk
+   waitall
+   load
+   save
+   one_hot
+   pick
+   reshape_like
+   batch_flatten
+   batch_dot
+   gamma
+   sequence_mask
+
+.. code::
+
+   seed
\ No newline at end of file
diff --git a/docs/python_docs/python/tutorials/packages/ndarray/deepnumpy/random/index.rst b/docs/python_docs/python/tutorials/packages/ndarray/deepnumpy/random/index.rst
new file mode 100644
index 000000000000..0142b19c2775
--- /dev/null
+++ b/docs/python_docs/python/tutorials/packages/ndarray/deepnumpy/random/index.rst
@@ -0,0 +1,90 @@
+.. _numpyrandom:
+
+.. currentmodule:: mxnet.np.random
+
+np.random
+============
+
+..
+  remove a large part about generator here, this page contains a part of generator.rst
+
+
+Accessing the BitGenerator
+--------------------------
+.. autosummary::
+   :toctree: generated/
+
+::
+
+   bit_generator
+
+Simple random data
+------------------
+.. autosummary::
+   :toctree: generated/
+
+   choice
+
+::
+
+   random
+   integers
+   bytes
+
+Permutations
+------------
+.. autosummary::
+   :toctree: generated/
+
+   shuffle
+
+::
+
+   permutation
+
+Distributions
+-------------
+.. autosummary::
+   :toctree: generated/
+
+
+   normal
+   uniform
+   rand
+   randint
+
+::
+
+   beta
+   binomial
+   chisquare
+   dirichlet
+   exponential
+   f
+   gamma
+   geometric
+   gumbel
+   hypergeometric
+   laplace
+   logistic
+   lognormal
+   logseries
+   multinomial
+   multivariate_normal
+   negative_binomial
+   noncentral_chisquare
+   noncentral_f
+   pareto
+   poisson
+   power
+   rayleigh
+   standard_cauchy
+   standard_exponential
+   standard_gamma
+   standard_normal
+   standard_t
+   triangular
+   vonmises
+   wald
+   weibull
+   zipf
diff --git a/docs/python_docs/python/tutorials/packages/ndarray/deepnumpy/routines.array-creation.rst b/docs/python_docs/python/tutorials/packages/ndarray/deepnumpy/routines.array-creation.rst
new file mode 100644
index 000000000000..2033923e5e75
--- /dev/null
+++ b/docs/python_docs/python/tutorials/packages/ndarray/deepnumpy/routines.array-creation.rst
@@ -0,0 +1,124 @@
+.. _routines.array-creation:
+
+Array creation routines
+=======================
+
+.. seealso:: :ref:`Array creation <arrays.creation>`
+
+.. currentmodule:: mxnet.np
+
+Ones and zeros
+--------------
+.. autosummary::
+   :toctree: generated/
+
+   eye
+   empty
+   full
+   identity
+   ones
+   ones_like
+   zeros
+   zeros_like
+
+.. code::
+
+   full_like
+   empty_like
+
+From existing data
+------------------
+.. autosummary::
+   :toctree: generated/
+
+   array
+   copy
+
+.. code::
+
+   asarray
+   asanyarray
+   ascontiguousarray
+   asmatrix
+   frombuffer
+   fromfile
+   fromfunction
+   fromiter
+   fromstring
+   loadtxt
+
+.. _routines.array-creation.rec:
+
+Creating record arrays (:mod:`np.rec`)
+-----------------------------------------
+
+.. note:: :mod:`np.rec` is the preferred alias for
+   :mod:`np.core.records`.
+
+.. autosummary::
+   :toctree: generated/
+
+.. code::
+
+   core.records.array
+   core.records.fromarrays
+   core.records.fromrecords
+   core.records.fromstring
+   core.records.fromfile
+
+.. _routines.array-creation.char:
+
+Creating character arrays (:mod:`np.char`)
+---------------------------------------------
+
+.. note:: :mod:`np.char` is the preferred alias for
+   :mod:`np.core.defchararray`.
+
+.. autosummary::
+   :toctree: generated/
+
+.. code::
+
+   core.defchararray.array
+   core.defchararray.asarray
+
+Numerical ranges
+----------------
+.. autosummary::
+   :toctree: generated/
+
+   arange
+   linspace
+   logspace
+   meshgrid
+
+.. code::
+
+   geomspace
+   mgrid
+   ogrid
+
+Building matrices
+-----------------
+.. autosummary::
+   :toctree: generated/
+
+   tril
+
+.. code::
+
+   diag
+   diagflat
+   tri
+   triu
+   vander
+
+The Matrix class
+----------------
+.. autosummary::
+   :toctree: generated/
+
+::
+
+   mat
+   bmat
diff --git a/docs/python_docs/python/tutorials/packages/ndarray/deepnumpy/routines.array-manipulation.rst b/docs/python_docs/python/tutorials/packages/ndarray/deepnumpy/routines.array-manipulation.rst
new file mode 100644
index 000000000000..b43d8f117d02
--- /dev/null
+++ b/docs/python_docs/python/tutorials/packages/ndarray/deepnumpy/routines.array-manipulation.rst
@@ -0,0 +1,143 @@
+Array manipulation routines
+***************************
+
+.. currentmodule:: mxnet.np
+
+Basic operations
+================
+.. autosummary::
+   :toctree: generated/
+
+::
+
+    copyto
+
+Changing array shape
+====================
+.. autosummary::
+   :toctree: generated/
+
+
+   reshape
+   ravel
+   ndarray.flatten
+
+::
+
+   ndarray.flat
+
+Transpose-like operations
+=========================
+.. autosummary::
+   :toctree: generated/
+
+   swapaxes
+   ndarray.T
+   transpose
+   moveaxis
+
+::
+
+   rollaxis
+
+Changing number of dimensions
+=============================
+.. autosummary::
+   :toctree: generated/
+
+   expand_dims
+   squeeze
+   broadcast_to
+   broadcast_arrays
+
+::
+
+   atleast_1d
+   atleast_2d
+   atleast_3d
+   broadcast
+
+Changing kind of array
+======================
+.. autosummary::
+   :toctree: generated/
+
+::
+
+   asarray
+   asanyarray
+   asmatrix
+   asfarray
+   asfortranarray
+   ascontiguousarray
+   asarray_chkfinite
+   asscalar
+   require
+
+Joining arrays
+==============
+.. autosummary::
+   :toctree: generated/
+
+   concatenate
+   stack
+   dstack
+   vstack
+
+::
+
+   column_stack
+   hstack
+   block
+
+Splitting arrays
+================
+.. autosummary::
+   :toctree: generated/
+
+   split
+   hsplit
+   vsplit
+
+::
+
+   array_split
+   dsplit
+
+Tiling arrays
+=============
+.. autosummary::
+   :toctree: generated/
+
+   tile
+   repeat
+
+Adding and removing elements
+============================
+.. autosummary::
+   :toctree: generated/
+
+   unique
+
+::
+
+   delete
+   insert
+   append
+   resize
+   trim_zeros
+
+Rearranging elements
+====================
+.. autosummary::
+   :toctree: generated/
+
+   reshape
+   flip
+   roll
+   rot90
+
+::
+
+   fliplr
+   flipud
diff --git a/docs/python_docs/python/tutorials/packages/ndarray/deepnumpy/routines.io.rst b/docs/python_docs/python/tutorials/packages/ndarray/deepnumpy/routines.io.rst
new file mode 100644
index 000000000000..07e487c6eed6
--- /dev/null
+++ b/docs/python_docs/python/tutorials/packages/ndarray/deepnumpy/routines.io.rst
@@ -0,0 +1,114 @@
+Input and output
+****************
+
+.. currentmodule:: mxnet.np
+
+NumPy binary files (NPY, NPZ)
+-----------------------------
+.. autosummary::
+   :toctree: generated/
+
+::
+   load
+   save
+   savez
+   savez_compressed
+
+The format of these binary file types is documented in
+:py:mod:`numpy.lib.format`
+
+Text files
+----------
+.. autosummary::
+   :toctree: generated/
+
+   genfromtxt
+
+::
+
+   loadtxt
+   savetxt
+   fromregex
+   fromstring
+   ndarray.tofile
+   ndarray.tolist
+
+Raw binary files
+----------------
+
+.. autosummary::
+
+
+::
+
+   fromfile
+   ndarray.tofile
+
+String formatting
+-----------------
+.. autosummary::
+   :toctree: generated/
+
+
+::
+
+   array2string
+   array_repr
+   array_str
+   format_float_positional
+   format_float_scientific
+
+Memory mapping files
+--------------------
+.. autosummary::
+   :toctree: generated/
+
+
+::
+
+   memmap
+
+Text formatting options
+-----------------------
+.. autosummary::
+   :toctree: generated/
+
+
+::
+
+   set_printoptions
+   get_printoptions
+   set_string_function
+   printoptions
+
+Base-n representations
+----------------------
+.. autosummary::
+   :toctree: generated/
+
+
+::
+
+   binary_repr
+   base_repr
+
+Data sources
+------------
+.. autosummary::
+   :toctree: generated/
+
+
+::
+
+   DataSource
+
+Binary Format Description
+-------------------------
+.. autosummary::
+   :template: autosummary/minimal_module.rst
+   :toctree: generated/
+
+
+::
+
+    lib.format
diff --git a/docs/python_docs/python/tutorials/packages/ndarray/deepnumpy/routines.linalg.rst b/docs/python_docs/python/tutorials/packages/ndarray/deepnumpy/routines.linalg.rst
new file mode 100644
index 000000000000..073dd0b05be5
--- /dev/null
+++ b/docs/python_docs/python/tutorials/packages/ndarray/deepnumpy/routines.linalg.rst
@@ -0,0 +1,106 @@
+.. _routines.linalg:
+
+.. module:: mxnet.np.linalg
+
+Linear algebra (:mod:`numpy.linalg`)
+************************************
+
+The NumPy linear algebra functions rely on BLAS and LAPACK to provide efficient
+low level implementations of standard linear algebra algorithms. Those
+libraries may be provided by NumPy itself using C versions of a subset of their
+reference implementations but, when possible, highly optimized libraries that
+take advantage of specialized processor functionality are preferred. Examples
+of such libraries are OpenBLAS_, MKL (TM), and ATLAS. Because those libraries
+are multithreaded and processor dependent, environmental variables and external
+packages such as threadpoolctl_ may be needed to control the number of threads
+or specify the processor architecture.
+
+.. _OpenBLAS: https://www.openblas.net/
+.. _threadpoolctl: https://github.com/joblib/threadpoolctl
+
+.. currentmodule:: mxnet.np
+
+Matrix and vector products
+--------------------------
+.. autosummary::
+   :toctree: generated/
+
+   dot
+   vdot
+   inner
+   outer
+   tensordot
+   einsum
+
+::
+
+   linalg.multi_dot
+   matmul
+   einsum_path
+   linalg.matrix_power
+   kron
+
+Decompositions
+--------------
+.. autosummary::
+   :toctree: generated/
+
+   linalg.svd
+
+::
+
+   linalg.cholesky
+   linalg.qr
+
+Matrix eigenvalues
+------------------
+.. autosummary::
+   :toctree: generated/
+
+
+::
+
+   linalg.eig
+   linalg.eigh
+   linalg.eigvals
+   linalg.eigvalsh
+
+Norms and other numbers
+-----------------------
+.. autosummary::
+   :toctree: generated/
+
+   linalg.norm
+   trace
+
+::
+
+   linalg.cond
+   linalg.det
+   linalg.matrix_rank
+   linalg.slogdet
+
+Solving equations and inverting matrices
+----------------------------------------
+.. autosummary::
+   :toctree: generated/
+
+
+::
+
+   linalg.solve
+   linalg.tensorsolve
+   linalg.lstsq
+   linalg.inv
+   linalg.pinv
+   linalg.tensorinv
+
+Exceptions
+----------
+.. autosummary::
+   :toctree: generated/
+
+
+::
+
+   linalg.LinAlgError
diff --git a/docs/python_docs/python/tutorials/packages/ndarray/deepnumpy/routines.math.rst b/docs/python_docs/python/tutorials/packages/ndarray/deepnumpy/routines.math.rst
new file mode 100644
index 000000000000..6dd85cdbbcab
--- /dev/null
+++ b/docs/python_docs/python/tutorials/packages/ndarray/deepnumpy/routines.math.rst
@@ -0,0 +1,213 @@
+Mathematical functions
+**********************
+
+.. currentmodule:: mxnet.np
+
+.. note::
+
+   Currently, most of the math functions only support inputs and outputs of the same dtype.
+   This limitation usually results in imprecise outputs for ndarrays with integral dtype
+   while floating-point values are expected in the output.
+   Appropriate handling of ndarrays integral dtypes is in active development.
+
+
+Trigonometric functions
+-----------------------
+.. autosummary::
+   :toctree: generated/
+
+   sin
+   cos
+   tan
+   arcsin
+   arccos
+   arctan
+   degrees
+   radians
+   hypot
+   arctan2
+   deg2rad
+   rad2deg
+
+::
+
+   unwrap
+
+Hyperbolic functions
+--------------------
+.. autosummary::
+   :toctree: generated/
+
+   sinh
+   cosh
+   tanh
+   arcsinh
+   arccosh
+   arctanh
+
+Rounding
+--------
+.. autosummary::
+   :toctree: generated/
+
+   rint
+   fix
+   floor
+   ceil
+   trunc
+   around
+
+::
+
+   round_
+
+
+Sums, products, differences
+---------------------------
+.. autosummary::
+   :toctree: generated/
+
+   sum
+   prod
+   cumsum
+
+::
+
+   nanprod
+   nansum
+   cumprod
+   nancumprod
+   nancumsum
+   diff
+   ediff1d
+   gradient
+   cross
+   trapz
+
+Exponents and logarithms
+------------------------
+.. autosummary::
+   :toctree: generated/
+
+   exp
+   expm1
+   log
+   log10
+   log2
+   log1p
+
+::
+
+   exp2
+   logaddexp
+   logaddexp2
+
+Other special functions
+-----------------------
+.. autosummary::
+   :toctree: generated/
+
+
+::
+
+   i0
+   sinc
+
+Floating point routines
+-----------------------
+.. autosummary::
+   :toctree: generated/
+
+   ldexp
+
+::
+
+   signbit
+   copysign
+   frexp
+   nextafter
+   spacing
+
+Rational routines
+-----------------
+.. autosummary::
+   :toctree: generated/
+
+   lcm
+
+::
+
+   gcd
+
+Arithmetic operations
+---------------------
+.. autosummary::
+   :toctree: generated/
+
+   add
+   reciprocal
+   negative
+   divide
+   power
+   subtract
+   mod
+   multiply
+   true_divide
+   remainder
+
+::
+
+   positive
+   floor_divide
+   float_power
+
+   fmod
+   modf
+   divmod
+
+Handling complex numbers
+------------------------
+.. autosummary::
+   :toctree: generated/
+
+
+::
+
+   angle
+   real
+   imag
+   conj
+   conjugate
+
+
+Miscellaneous
+-------------
+.. autosummary::
+   :toctree: generated/
+
+   clip
+
+   sqrt
+   cbrt
+   square
+
+   absolute
+   sign
+   maximum
+   minimum
+
+::
+
+   convolve
+
+   fabs
+
+   heaviside
+
+   fmax
+   fmin
+
+   nan_to_num
+   real_if_close
+
+   interp
diff --git a/docs/python_docs/python/tutorials/packages/ndarray/deepnumpy/routines.rst b/docs/python_docs/python/tutorials/packages/ndarray/deepnumpy/routines.rst
new file mode 100644
index 000000000000..e35d6214d171
--- /dev/null
+++ b/docs/python_docs/python/tutorials/packages/ndarray/deepnumpy/routines.rst
@@ -0,0 +1,47 @@
+Routines
+============
+
+In this chapter routine docstrings are presented, grouped by functionality.
+Many docstrings contain example code, which demonstrates basic usage
+of the routine. The examples assume that the `np` module is imported with::
+
+  >>> from mxnet import np, npx
+  >>> npx.set_np()
+
+A convenient way to execute examples is the ``%doctest_mode`` mode of
+IPython, which allows for pasting of multi-line examples and preserves
+indentation.
+
+.. toctree::
+   :maxdepth: 2
+
+   routines.array-creation
+   routines.array-manipulation
+   routines.bitwise
+   routines.char
+   routines.ctypeslib
+   routines.datetime
+   routines.dtype
+   routines.dual
+   routines.emath
+   routines.err
+   routines.fft
+   routines.financial
+   routines.functional
+   routines.help
+   routines.indexing
+   routines.io
+   routines.linalg
+   routines.logic
+   routines.ma
+   routines.math
+   routines.matlib
+   routines.other
+   routines.padding
+   routines.polynomials
+   random/index
+   routines.set
+   routines.sort
+   routines.statistics
+   routines.testing
+   routines.window
diff --git a/docs/python_docs/python/tutorials/packages/ndarray/deepnumpy/routines.sort.rst b/docs/python_docs/python/tutorials/packages/ndarray/deepnumpy/routines.sort.rst
new file mode 100644
index 000000000000..0ae1e92a42b4
--- /dev/null
+++ b/docs/python_docs/python/tutorials/packages/ndarray/deepnumpy/routines.sort.rst
@@ -0,0 +1,49 @@
+Sorting, searching, and counting
+================================
+
+.. currentmodule:: mxnet.np
+
+Sorting
+-------
+.. autosummary::
+   :toctree: generated/
+
+::
+
+   ndarray.sort
+   sort
+   lexsort
+   argsort
+   msort
+   sort_complex
+   partition
+   argpartition
+
+Searching
+---------
+.. autosummary::
+   :toctree: generated/
+
+   argmax
+   argmin
+
+::
+
+   nanargmax
+   nanargmin
+   argwhere
+   nonzero
+   flatnonzero
+   where
+   searchsorted
+   extract
+
+Counting
+--------
+.. autosummary::
+   :toctree: generated/
+
+
+::
+
+   count_nonzero
diff --git a/docs/python_docs/python/tutorials/packages/ndarray/deepnumpy/routines.statistics.rst b/docs/python_docs/python/tutorials/packages/ndarray/deepnumpy/routines.statistics.rst
new file mode 100644
index 000000000000..e9caf40d2ec1
--- /dev/null
+++ b/docs/python_docs/python/tutorials/packages/ndarray/deepnumpy/routines.statistics.rst
@@ -0,0 +1,74 @@
+Statistics
+==========
+
+.. currentmodule:: mxnet.np
+
+
+Order statistics
+----------------
+
+.. autosummary::
+   :toctree: generated/
+
+   min
+   max
+
+::
+
+   amin
+   amax
+   nanmin
+   nanmax
+   ptp
+   percentile
+   nanpercentile
+   quantile
+   nanquantile
+
+Averages and variances
+----------------------
+
+.. autosummary::
+   :toctree: generated/
+
+   mean
+   std
+   var
+
+::
+
+   median
+   average
+   nanmedian
+   nanmean
+   nanstd
+   nanvar
+
+Correlating
+-----------
+
+.. autosummary::
+   :toctree: generated/
+
+
+::
+
+   corrcoef
+   correlate
+   cov
+
+Histograms
+----------
+
+.. autosummary::
+   :toctree: generated/
+
+   histogram
+
+::
+
+   histogram2d
+   histogramdd
+   bincount
+   histogram_bin_edges
+   digitize
diff --git a/docs/python_docs/python/tutorials/packages/ndarray/index.rst b/docs/python_docs/python/tutorials/packages/ndarray/index.rst
index 41b82091fdc2..74562357dc58 100644
--- a/docs/python_docs/python/tutorials/packages/ndarray/index.rst
+++ b/docs/python_docs/python/tutorials/packages/ndarray/index.rst
@@ -45,10 +45,16 @@ NDArray
 
       For Sparse NDArray tutorials
 
+   .. card::
+      :title: NP on MXNet reference
+      :link: deepnumpy/index.html
+
+      This section contains the mxnet.np API reference documentation
 
 .. toctree::
    :hidden:
    :glob:
 
    *
-   sparse/index
\ No newline at end of file
+   sparse/index
+   deepnumpy/index
\ No newline at end of file