diff --git a/.github/ISSUE_TEMPLATE/bug-report.md b/.github/ISSUE_TEMPLATE/bug-report.md
index cd8d5f0e..e13b6adc 100644
--- a/.github/ISSUE_TEMPLATE/bug-report.md
+++ b/.github/ISSUE_TEMPLATE/bug-report.md
@@ -28,7 +28,6 @@ Steps to reproduce the behavior:
  - OS (e.g., Linux):
  - Python version:
  - PyTorch version:
- - PyG version (if installed):
  - CUDA/cuDNN version (if applicable):
  - Any other relevant information:
 
diff --git a/.travis.yml b/.travis.yml
index 8ad9e97b..5e615a6a 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -7,9 +7,6 @@ install:
   - pip install https://download.pytorch.org/whl/cpu/torch-1.7.1%2Bcpu-cp37-cp37m-linux_x86_64.whl
   - pip install https://pytorch-geometric.com/whl/torch-1.7.0+cpu/torch_scatter-2.0.7-cp37-cp37m-linux_x86_64.whl
   - pip install https://pytorch-geometric.com/whl/torch-1.7.0+cpu/torch_sparse-0.6.9-cp37-cp37m-linux_x86_64.whl
-  - pip install https://pytorch-geometric.com/whl/torch-1.7.0+cpu/torch_cluster-1.5.9-cp37-cp37m-linux_x86_64.whl
-  - pip install https://pytorch-geometric.com/whl/torch-1.7.0+cpu/torch_spline_conv-1.2.1-cp37-cp37m-linux_x86_64.whl
-  - pip install torch-geometric
   - pip install packaging==20.9
   - bash ./scripts/installation/metis.sh
   - source ./scripts/installation/gcc.sh
diff --git a/Dockerfile b/Dockerfile
deleted file mode 100644
index c6750225..00000000
--- a/Dockerfile
+++ /dev/null
@@ -1,19 +0,0 @@
-FROM ubuntu:latest
-
-ARG CUDA=cpu
-ARG TORCH=1.7.0
-
-RUN echo BUILDING WITH CUDA===${CUDA} AND TORCH===${TORCH}
-
-RUN apt update
-RUN apt upgrade -y
-RUN apt install python3 python3-pip git -y
-RUN python3 -m pip install torch==${TORCH}+${CUDA} -f https://download.pytorch.org/whl/torch_stable.html
-RUN python3 -m pip install torch-scatter==latest+${CUDA} -f https://pytorch-geometric.com/whl/torch-${TORCH}.html
-RUN python3 -m pip install torch-sparse==latest+${CUDA} -f https://pytorch-geometric.com/whl/torch-${TORCH}.html
-RUN python3 -m pip install torch-cluster==latest+${CUDA} -f https://pytorch-geometric.com/whl/torch-${TORCH}.html
-RUN python3 -m pip install torch-spline-conv==latest+${CUDA} -f https://pytorch-geometric.com/whl/torch-${TORCH}.html
-RUN python3 -m pip install torch-geometric
-
-
-SHELL ["/bin/bash", "-c"]
diff --git a/README.md b/README.md
index 12c8b4f6..558208e0 100644
--- a/README.md
+++ b/README.md
@@ -10,7 +10,7 @@
 
 **[Homepage](https://cogdl.ai)** | **[Paper](https://arxiv.org/abs/2103.00959)** | **[100 GNN papers](./gnn_papers.md)** | **[Leaderboards](./results.md)** | **[Documentation](https://cogdl.readthedocs.io)** | **[Datasets](./cogdl/datasets/README.md)** | **[Join our Slack](https://join.slack.com/t/cogdl/shared_invite/zt-b9b4a49j-2aMB035qZKxvjV4vqf0hEg)** | **[中文](./README_CN.md)**
 
-CogDL is a graph representation learning toolkit that allows researchers and developers to easily train and compare baseline or customized models for node classification, graph classification, and other important tasks in the graph domain. 
+CogDL is a graph deep learning toolkit that allows researchers and developers to easily train and compare baseline or customized models for node classification, graph classification, and other important tasks in the graph domain. 
 
 We summarize the contributions of CogDL as follows:
 
@@ -173,7 +173,7 @@ How to run parallel experiments with GPUs on several models?
 If you want to run parallel experiments on your server with multiple GPUs on multiple models, GCN and GAT, on the Cora dataset:
 
 ```bash
-$ python scripts/parallel_train.py --task node_classification --dataset cora --model gcn gat --device-id 0 1 --seed 0 1 2 3 4
+$ python scripts/parallel_train.py --dataset cora --model gcn gat --devices 0 1 --seed 0 1 2 3 4
 ```
 
 Expected output:
@@ -184,33 +184,6 @@ Expected output:
 | ('cora', 'gat') | 0.8262±0.0032 |
 </details>
 
-<details>
-<summary>
-How to use docker container?
-</summary>
-<br/>
-You might also opt to use a Docker container. There is an image available in this repo that you can build with the Torch and CUDA versions available in your system. To build the docker image just run:
-
-```
-docker build --build-arg CUDA=YOUR_CUDA_VERSION --build-arg TORCH=YOUR_TORCH_VERSION --tag cogdl .
-```
-
-Where `YOUR_CUDA_VERSION` should be cuxxx representing your cuda version (or just cpu) and `YOUR_TORCH_VERSION` should be the version of PyTorch you want to use. For example, to run with CUDA 10.1 and PyTorch 1.7.1 you can run:
-```
-docker build --build-arg CUDA=cu101 --build-arg TORCH=1.7.1 --tag cogdl .
-```
-
-Then you can start the container by running:
-```
-docker run -it -v cogdl:/cogdl cogdl /bin/bash
-```
-
-And then clone your fork or this repository into the cogdl folder:
-```
-git clone https://github.com/THUDM/cogdl /cogdl
-```
-</details>
-
 <details>
 <summary>
 How to use models from other libraries?
@@ -218,7 +191,7 @@ How to use models from other libraries?
 <br/>
 If you are familiar with other popular graph libraries, you can implement your own model in CogDL using modules from PyTorch Geometric (PyG).
 For the installation of PyG, you can follow the instructions from PyG (https://github.com/rusty1s/pytorch_geometric/#installation).
-For the quick-start usage of how to use layers of PyG, you can find some examples in the [examples/pytorch_geometric](https://github.com/THUDM/cogdl/tree/master/examples/pytorch_geometric/).
+For the quick-start usage of how to use layers of PyG, you can find some examples in the [examples/pyg](https://github.com/THUDM/cogdl/tree/master/examples/pyg/).
 </details>
 
 <details>
diff --git a/README_CN.md b/README_CN.md
index 4d29912e..8cbf839c 100644
--- a/README_CN.md
+++ b/README_CN.md
@@ -10,7 +10,7 @@
 
 **[主页](https://cogdl.ai/zh)** | **[论文](https://arxiv.org/abs/2103.00959)** | **[100篇GNN论文](./gnn_papers.md)** | **[排行榜](./results.md)** | **[文档](https://cogdl.readthedocs.io)** | **[数据集](./cogdl/datasets/README.md)** | **[加入我们的Slack](https://join.slack.com/t/cogdl/shared_invite/zt-b9b4a49j-2aMB035qZKxvjV4vqf0hEg)** | **[English](./README.md)**
 
-CogDL是由清华大学计算机系知识工程实验室（KEG）开发的基于图的深度学习的研究工具，基于Python语言和[PyTorch](https://github.com/pytorch/pytorch)库。CogDL允许研究人员和开发人员可以轻松地训练和比较基线算法或自定义模型，以进行结点分类，链接预测，图分类，社区发现等基于图结构的任务。 它提供了许多流行模型的实现，包括：非图神经网络算法例如Deepwalk、LINE、Node2vec、NetMF、ProNE、methpath2vec、PTE、graph2vec、DGK等；图神经网络算法例如GCN、GAT、GraphSAGE、FastGCN、GTN、HAN、GIN、DiffPool等。它也提供了一些下游任务，包括结点分类（分为是否具有节点属性），链接预测（分为同构和异构），图分类（分有监督和⽆监督）以及为这些任务构建各种算法效果的排行榜。
+CogDL是一款图深度学习工具包，基于[PyTorch](https://github.com/pytorch/pytorch)框架。CogDL允许研究人员和开发人员可以轻松地训练和比较基线算法或自定义模型，以进行结点分类，链接预测，图分类，社区发现等基于图结构的任务。 它提供了许多流行模型的实现，包括：非图神经网络算法例如Deepwalk、LINE、Node2vec、NetMF、ProNE、methpath2vec、PTE、graph2vec、DGK等；图神经网络算法例如GCN、GAT、GraphSAGE、FastGCN、GTN、HAN、GIN、DiffPool等。它也提供了一些下游任务，包括结点分类（分为是否具有节点属性），链接预测（分为同构和异构），图分类（分有监督和⽆监督）以及为这些任务构建各种算法效果的排行榜。
 
 CogDL的特性包括：
 
@@ -166,7 +166,7 @@ CogDL提供了一种快速的稀疏矩阵乘的操作（[GE-SpMM](https://arxiv.
 如果你想使用多个 GPU 同时在 Cora 数据集上运行 GCN 和 GAT 模型，可以使用如下指令:
 
 ```bash
-$ python scripts/parallel_train.py --task node_classification --dataset cora --model gcn gat --device-id 0 1 --seed 0 1 2 3 4
+$ python scripts/parallel_train.py --dataset cora --model gcn gat --devices 0 1 --seed 0 1 2 3 4
 ```
 
 预计得到的结果如下:
@@ -177,38 +177,6 @@ $ python scripts/parallel_train.py --task node_classification --dataset cora --m
 | ('cora', 'gat') | 0.8262±0.0032 |
 </details>
 
-<details>
-<summary>
-如何使用docker容器来配置cogdl的环境？
-</summary>
-<br/>
-您也可以选择使用Docker来配置cogdl所需的环境。要构建Docker镜像，只需运行以下命令。
-
-```bash
-docker build --build-arg CUDA=YOUR_CUDA_VERSION --build-arg TORCH=YOUR_TORCH_VERSION --tag cogdl .
-```
-请根据您的CUDA版本（或CPU）更换 `YOUR_CUDA_VERSION` 以及 更换 `YOUR_TORCH_VERSION` 为您使用的PyTorch版本。
-
-
-例如，使用 CUDA 10.1 和 PyTorch 1.7.1 一起运行
-
-```bash
-docker build --build-arg CUDA=cu101 --build-arg TORCH=1.7.1 --tag cogdl .
-```
-
-启动容器
-
-```bash
-docker run -it -v cogdl:/cogdl cogdl /bin/bash
-```
-
-将cogdl克隆到cogdl目录下：
-
-```bash
-git clone https://github.com/THUDM/cogdl /cogdl
-```
-</details>
-
 <details>
 <summary>
 如何使用其他图深度学习库中的模型？
@@ -216,7 +184,7 @@ git clone https://github.com/THUDM/cogdl /cogdl
 <br/>
 如何你对其他图深度学习库（比如PyTorch Geometric）比较熟悉，你可以基于这些库的模块来在CogDL里实现相关模型。
 你可以通过下述的指南来安装相应的库，例如PyTorch Geometric (https://github.com/rusty1s/pytorch_geometric/#installation)。
-对于如何使用PyG的模块来实现模型，你可以在示例中找到一些参考：[examples/pytorch_geometric](https://github.com/THUDM/cogdl/tree/master/examples/pytorch_geometric/)。
+对于如何使用PyG的模块来实现模型，你可以在示例中找到一些参考：[examples/pyg](https://github.com/THUDM/cogdl/tree/master/examples/pyg/)。
 </details>
 
 ## CogDL团队
diff --git a/cogdl/datasets/planetoid_data.py b/cogdl/datasets/planetoid_data.py
index 17b6ae43..81ba0e3b 100644
--- a/cogdl/datasets/planetoid_data.py
+++ b/cogdl/datasets/planetoid_data.py
@@ -163,7 +163,7 @@ def get(self, idx):
         return self.data
 
     def __repr__(self):
-        return "{}()".format(self.name)
+        return "{}".format(self.name)
 
     def __len__(self):
         return 1
diff --git a/cogdl/layers/mlp_layer.py b/cogdl/layers/mlp_layer.py
index 05b2c289..d0f91ee2 100644
--- a/cogdl/layers/mlp_layer.py
+++ b/cogdl/layers/mlp_layer.py
@@ -19,7 +19,7 @@ class MLP(nn.Module):
     hidden_dim : int
         Size of hidden layer dimension.
     use_bn : bool, optional
-        Apply batch normalization if True, default: `True).
+        Apply batch normalization if True, default: `True`.
     """
 
     def __init__(
diff --git a/cogdl/layers/rgcn_layer.py b/cogdl/layers/rgcn_layer.py
index 0ca074e7..ebda70b6 100644
--- a/cogdl/layers/rgcn_layer.py
+++ b/cogdl/layers/rgcn_layer.py
@@ -7,12 +7,12 @@
 
 class RGCNLayer(nn.Module):
     """
-    Implementation of Relational-GCN in paper `"Modeling Relational Data with Graph Convolutional Networks"`
-     <https://arxiv.org/abs/1703.06103>
+    Implementation of Relational-GCN in paper `"Modeling Relational Data with Graph Convolutional Networks"
+    <https://arxiv.org/abs/1703.06103>`_
 
-     Parameters
-     ----------
-     in_feats : int
+    Parameters
+    ----------
+    in_feats : int
         Size of each input embedding.
     out_feats : int
         Size of each output embedding.
diff --git a/cogdl/models/__init__.py b/cogdl/models/__init__.py
index 454e71c9..c5204923 100644
--- a/cogdl/models/__init__.py
+++ b/cogdl/models/__init__.py
@@ -73,18 +73,16 @@ def build_model(args):
     "dgi": "cogdl.models.nn.dgi.DGIModel",
     "mvgrl": "cogdl.models.nn.mvgrl.MVGRL",
     "patchy_san": "cogdl.models.nn.patchy_san.PatchySAN",
-    "chebyshev": "cogdl.models.nn.pyg_cheb.Chebyshev",
     "gcn": "cogdl.models.nn.gcn.GCN",
     "gdc_gcn": "cogdl.models.nn.gdc_gcn.GDC_GCN",
     "graphsage": "cogdl.models.nn.graphsage.Graphsage",
     "compgcn": "cogdl.models.nn.compgcn.LinkPredictCompGCN",
     "drgcn": "cogdl.models.nn.drgcn.DrGCN",
-    "unet": "cogdl.models.nn.pyg_graph_unet.GraphUnet",
+    "unet": "cogdl.models.nn.graph_unet.GraphUnet",
     "gcnmix": "cogdl.models.nn.gcnmix.GCNMix",
     "diffpool": "cogdl.models.nn.diffpool.DiffPool",
     "gcnii": "cogdl.models.nn.gcnii.GCNII",
     "sign": "cogdl.models.nn.sign.SIGN",
-    "pyg_gcn": "cogdl.models.nn.pyg_gcn.GCN",
     "mixhop": "cogdl.models.nn.mixhop.MixHop",
     "gat": "cogdl.models.nn.gat.GAT",
     "han": "cogdl.models.nn.han.HAN",
@@ -92,9 +90,8 @@ def build_model(args):
     "grace": "cogdl.models.nn.grace.GRACE",
     "pprgo": "cogdl.models.nn.pprgo.PPRGo",
     "gin": "cogdl.models.nn.gin.GIN",
-    "dgcnn": "cogdl.models.nn.pyg_dgcnn.DGCNN",
     "grand": "cogdl.models.nn.grand.Grand",
-    "gtn": "cogdl.models.nn.pyg_gtn.GTN",
+    "gtn": "cogdl.models.nn.gtn.GTN",
     "rgcn": "cogdl.models.nn.rgcn.LinkPredictRGCN",
     "deepergcn": "cogdl.models.nn.deepergcn.DeeperGCN",
     "drgat": "cogdl.models.nn.drgat.DrGAT",
@@ -104,7 +101,7 @@ def build_model(args):
     "mlp": "cogdl.models.nn.mlp.MLP",
     "sgc": "cogdl.models.nn.sgc.sgc",
     "sortpool": "cogdl.models.nn.sortpool.SortPool",
-    "srgcn": "cogdl.models.nn.pyg_srgcn.SRGCN",
+    "srgcn": "cogdl.models.nn.srgcn.SRGCN",
     "gcc": "cogdl.models.nn.gcc_model.GCCModel",
     "unsup_graphsage": "cogdl.models.nn.unsup_graphsage.SAGE",
     "graphsaint": "cogdl.models.nn.graphsaint.GraphSAINT",
diff --git a/cogdl/models/base_model.py b/cogdl/models/base_model.py
index 73dfca1f..2194fb08 100644
--- a/cogdl/models/base_model.py
+++ b/cogdl/models/base_model.py
@@ -25,6 +25,9 @@ def _forward_unimplemented(self, *input: Any) -> None:  # abc warning
     def forward(self, *args):
         raise NotImplementedError
 
+    def predict(self, data):
+        return self.forward(data)
+
     @property
     def device(self):
         return next(self.parameters()).device
diff --git a/cogdl/models/nn/gdc_gcn.py b/cogdl/models/nn/gdc_gcn.py
index 8d867e75..2da688cc 100644
--- a/cogdl/models/nn/gdc_gcn.py
+++ b/cogdl/models/nn/gdc_gcn.py
@@ -124,7 +124,6 @@ def get_diffusion(x, edges):
                 else:
                     raise ValueError
 
-            # create PyG Data object
             edges_i = []
             edges_j = []
             edge_attr = []
diff --git a/cogdl/models/nn/pyg_graph_unet.py b/cogdl/models/nn/graph_unet.py
similarity index 100%
rename from cogdl/models/nn/pyg_graph_unet.py
rename to cogdl/models/nn/graph_unet.py
diff --git a/cogdl/models/nn/pyg_gtn.py b/cogdl/models/nn/gtn.py
similarity index 100%
rename from cogdl/models/nn/pyg_gtn.py
rename to cogdl/models/nn/gtn.py
diff --git a/cogdl/models/nn/pyg_srgcn.py b/cogdl/models/nn/srgcn.py
similarity index 93%
rename from cogdl/models/nn/pyg_srgcn.py
rename to cogdl/models/nn/srgcn.py
index ce7287e8..df59c98f 100644
--- a/cogdl/models/nn/pyg_srgcn.py
+++ b/cogdl/models/nn/srgcn.py
@@ -2,17 +2,15 @@
 import torch.nn as nn
 import torch.nn.functional as F
 from cogdl.utils.srgcn_utils import act_attention, act_map, act_normalization
-from cogdl.utils import add_remaining_self_loops
+from cogdl.utils import spmm, add_remaining_self_loops
 from torch_sparse import spspmm
 
 from .. import BaseModel
-from cogdl.utils import spmm
 
 
 class NodeAdaptiveEncoder(nn.Module):
     def __init__(self, num_features, dropout=0.5):
         super(NodeAdaptiveEncoder, self).__init__()
-        # self.fc = nn.Linear(num_features, 1, bias=True)
         self.fc = nn.Parameter(torch.zeros(size=(num_features, 1)))
         nn.init.xavier_normal_(self.fc.data, gain=1.414)
         self.bf = nn.Parameter(torch.zeros(size=(1,)))
@@ -73,7 +71,6 @@ def forward(self, graph, x):
         edge_index = graph.edge_index
         N, dim = x.shape
 
-        # nl_adj_mat_ind, nl_adj_mat_val = add_self_loops(edge_index, num_nodes=N)[0], edge_attr.squeeze()
         nl_adj_mat_ind = add_remaining_self_loops(edge_index, num_nodes=N)[0]
         nl_adj_mat_ind = torch.stack(nl_adj_mat_ind)
         nl_adj_mat_val = torch.ones(nl_adj_mat_ind.shape[1]).to(x.device)
@@ -101,9 +98,7 @@ def forward(self, graph, x):
                 graph.edge_weight = adj_mat_val
                 for _ in range(i + 1):
                     val_h = spmm(graph, val_h)
-                    # val_h = spmm(adj_mat_ind, F.dropout(adj_mat_val, p=self.node_dropout, training=self.training), N, N, val_h)
 
-                # val_h = val_h / norm
                 val_h[val_h != val_h] = 0
                 val_h = val_h + self.bias[i]
                 val_h = self.adaptive_enc[i](val_h)
@@ -148,7 +143,6 @@ def forward(self, graph, x):
         # x = self.dropout(x)
 
         edge_index = graph.edge_index
-        # adj_mat_ind, adj_mat_val = add_self_loops(edge_index, num_nodes=N)[0], edge_attr.squeeze()
         adj_mat_ind = add_remaining_self_loops(edge_index, num_nodes=N)[0]
         adj_mat_ind = torch.stack(adj_mat_ind)
         adj_mat_val = torch.ones(adj_mat_ind.shape[1]).to(x.device)
@@ -168,7 +162,6 @@ def forward(self, graph, x):
         # N, dim = val_h.shape
 
         # MATRIX_MUL
-        # val_h = spmm(adj_mat_ind, F.dropout(adj_mat_val, p=self.node_dropout, training=self.training), N, N, val_h)
         with graph.local_graph():
             graph.edge_index = adj_mat_ind
             graph.edge_weight = adj_mat_val
diff --git a/cogdl/utils/srgcn_utils.py b/cogdl/utils/srgcn_utils.py
index b0c8363b..7770baf3 100644
--- a/cogdl/utils/srgcn_utils.py
+++ b/cogdl/utils/srgcn_utils.py
@@ -5,7 +5,8 @@
 import torch.nn as nn
 import torch.nn.functional as F
 from torch_sparse import spspmm, spmm
-from torch_geometric.utils import degree
+
+from cogdl.utils import get_degrees
 
 
 # ==========
@@ -27,7 +28,7 @@ def forward(self, x, edge_index, edge_attr):
         self.dropout(diag_val)
 
         row, col = edge_index
-        deg = degree(col, x.size(0), dtype=x.dtype)
+        deg = get_degrees(row, col, N)
         deg_inv = deg.pow(-1)
         edge_attr_t = deg_inv[row] * edge_attr
 
@@ -48,7 +49,7 @@ def forward(self, x, edge_index, edge_attr):
         N, dim = x.shape
 
         row, col = edge_index
-        deg = degree(col, x.size(0), dtype=x.dtype)
+        deg = get_degrees(row, col, N)
         deg_inv_sqrt = deg.pow(-0.5)
         edge_attr_t = deg_inv_sqrt[row] * edge_attr * deg_inv_sqrt[col]
 
@@ -71,42 +72,42 @@ def forward(self, x, edge_index, edge_attr):
         return edge_index, edge_attr
 
 
-# class Gaussian(nn.Module):
-#     def __init__(self, in_feat):
-#         super(Gaussian, self).__init__()
-#         self.mu = 0.2
-#         self.theta = 1.
-#         self.steps = 4
+class Gaussian(nn.Module):
+    def __init__(self, in_feat):
+        super(Gaussian, self).__init__()
+        self.mu = 0.2
+        self.theta = 1.0
+        self.steps = 4
+
+    def forward(self, x, edge_index, edge_attr):
+        N = x.shape[0]
+        row, col = edge_index
+        deg = get_degrees(row, col, N)
+        deg_inv = deg.pow(-1)
+        adj = torch.sparse_coo_tensor(edge_index, deg_inv[row] * edge_attr, size=(N, N))
+        identity = torch.sparse_coo_tensor([range(N)] * 2, torch.ones(N), size=(N, N)).to(x.device)
+        laplacian = identity - adj
 
-#     def forward(self, x, edge_index, edge_attr):
-#         N = x.shape[0]
-#         row, col = edge_index
-#         deg = degree(row, x.size(0), dtype=x.dtype)
-#         deg_inv = deg.pow(-1)
-#         adj = torch.sparse_coo_tensor(edge_index, deg_inv[row] * edge_attr , size=(N, N))
-#         identity = torch.sparse_coo_tensor([range(N)] * 2, torch.ones(N), size=(N, N)).to(x.device)
-#         laplacian = identity - adj
+        t0 = identity
+        t1 = laplacian - self.mu * identity
+        t1 = t1.mm(t1.to_dense()).to_sparse()
+        l_x = -0.5 * (t1 - identity)
 
-#         t0 = identity
-#         t1 = laplacian - self.mu * identity
-#         t1 = t1.mm(t1.to_dense()).to_sparse()
-#         l_x = -0.5 * (t1 - identity)
-#         # l_x = -0.5 * ((laplacian - self.mu * identity).pow(2) - identity)
+        ivs = [iv(i, self.theta) for i in range(self.steps)]
+        ivs[1:] = [(-1) ** i * 2 * x for i, x in enumerate(ivs[1:])]
+        ivs = torch.tensor(ivs).to(x.device)
+        result = [t0, l_x]
+        for i in range(2, self.steps):
+            result.append(2 * l_x.mm(result[i - 1].to_dense()).to_sparse().sub(result[i - 2]))
 
-#         ivs = [iv(i, self.theta) for i in range(self.steps)]
-#         ivs[1:] = [(-1) ** i * 2 * x for i, x in enumerate(ivs[1:])]
-#         ivs = torch.tensor(ivs).to(x.device)
-#         result = [t0, l_x]
-#         for i in range(2, self.steps):
-#             result.append(2*l_x.mm(result[i-1].to_dense()).to_sparse().sub(result[i-2]))
+        result = [result[i] * ivs[i] for i in range(self.steps)]
 
-#         result = [result[i] * ivs[i] for i in range(self.steps)]
+        def fn(x, y):
+            return x.add(y)
 
-#         def fn(x, y):
-#             return x.add(y)
-#         res = reduce(fn, result)
+        res = reduce(fn, result)
 
-#         return res._indices(), res._values()
+        return res._indices(), res._values()
 
 
 class PPR(nn.Module):
@@ -117,7 +118,7 @@ def __init__(self, in_feat):
 
     def forward(self, x, edge_index, edge_attr):
         row, col = edge_index
-        deg = degree(col, x.size(0), dtype=x.dtype)
+        deg = get_degrees(row, col, x.shape[0])
         deg_inv_sqrt = deg.pow(-0.5)
         edge_attr_t = deg_inv_sqrt[row] * edge_attr * deg_inv_sqrt[col]
 
@@ -155,7 +156,7 @@ def __init__(self, in_feat):
 
     def forward(self, x, edge_index, edge_attr):
         row, col = edge_index
-        deg = degree(col, x.size(0), dtype=x.dtype)
+        deg = get_degrees(row, col, x.shape[0])
         deg_inv = deg.pow(-1)
         edge_attr_t = self.t * edge_attr * deg_inv[col] - self.t
         return edge_index, edge_attr_t.exp()
@@ -172,6 +173,8 @@ def act_attention(attn_type):
         return PPR
     elif attn_type == "heat":
         return HeatKernel
+    elif attn_type == "gaussian":
+        return Gaussian
     else:
         raise ValueError("no such attention type")
 
diff --git a/cogdl/wrappers/default_match.py b/cogdl/wrappers/default_match.py
index 799358ca..18f3c049 100644
--- a/cogdl/wrappers/default_match.py
+++ b/cogdl/wrappers/default_match.py
@@ -22,7 +22,6 @@ def set_default_wrapper_config():
         "appnp",
         "pprgo",
         "chebyshev",
-        "pyg_gcn",
         "unet",
         "srgcn",
         "revgcn",
diff --git a/docs/requirements.txt b/docs/requirements.txt
index 2391715f..fd4ec001 100644
--- a/docs/requirements.txt
+++ b/docs/requirements.txt
@@ -19,9 +19,6 @@ coveralls
 https://download.pytorch.org/whl/cpu/torch-1.7.1%2Bcpu-cp37-cp37m-linux_x86_64.whl
 https://pytorch-geometric.com/whl/torch-1.7.0+cpu/torch_scatter-2.0.7-cp37-cp37m-linux_x86_64.whl
 https://pytorch-geometric.com/whl/torch-1.7.0+cpu/torch_sparse-0.6.9-cp37-cp37m-linux_x86_64.whl
-https://pytorch-geometric.com/whl/torch-1.7.0+cpu/torch_cluster-1.5.9-cp37-cp37m-linux_x86_64.whl
-https://pytorch-geometric.com/whl/torch-1.7.0+cpu/torch_spline_conv-1.2.1-cp37-cp37m-linux_x86_64.whl
-torch-geometric
 numba
 transformers
 sentencepiece
\ No newline at end of file
diff --git a/docs/source/api/datasets.rst b/docs/source/api/datasets.rst
index 05d804c7..86a69a49 100644
--- a/docs/source/api/datasets.rst
+++ b/docs/source/api/datasets.rst
@@ -49,7 +49,7 @@ Matlab matrix dataset
     :undoc-members:
     :show-inheritance:
 
-PyG OGB dataset
+OGB dataset
 -------------------------------
 
 .. automodule:: cogdl.datasets.ogb
diff --git a/docs/source/api/models.rst b/docs/source/api/models.rst
index 7a99b524..d70d0073 100644
--- a/docs/source/api/models.rst
+++ b/docs/source/api/models.rst
@@ -122,11 +122,6 @@ GNN Model
     :undoc-members:
     :show-inheritance:
 
-.. autoclass:: cogdl.models.nn.pyg_cheb.Chebyshev
-    :members:
-    :undoc-members:
-    :show-inheritance:
-
 .. autoclass:: cogdl.models.nn.gcn.GCN
     :members:
     :undoc-members:
@@ -152,7 +147,7 @@ GNN Model
     :undoc-members:
     :show-inheritance:
 
-.. autoclass:: cogdl.models.nn.pyg_graph_unet.GraphUnet
+.. autoclass:: cogdl.models.nn.graph_unet.GraphUnet
     :members:
     :undoc-members:
     :show-inheritance:
@@ -177,11 +172,6 @@ GNN Model
     :undoc-members:
     :show-inheritance:
 
-.. autoclass:: cogdl.models.nn.pyg_gcn.GCN
-    :members:
-    :undoc-members:
-    :show-inheritance:
-
 .. autoclass:: cogdl.models.nn.mixhop.MixHop
     :members:
     :undoc-members:
@@ -217,17 +207,12 @@ GNN Model
     :undoc-members:
     :show-inheritance:
 
-.. autoclass:: cogdl.models.nn.pyg_dgcnn.DGCNN
-    :members:
-    :undoc-members:
-    :show-inheritance:
-
 .. autoclass:: cogdl.models.nn.grand.Grand
     :members:
     :undoc-members:
     :show-inheritance:
 
-.. autoclass:: cogdl.models.nn.pyg_gtn.GTN
+.. autoclass:: cogdl.models.nn.gtn.GTN
     :members:
     :undoc-members:
     :show-inheritance:
@@ -277,7 +262,7 @@ GNN Model
     :undoc-members:
     :show-inheritance:
 
-.. autoclass:: cogdl.models.nn.pyg_srgcn.SRGCN
+.. autoclass:: cogdl.models.nn.srgcn.SRGCN
     :members:
     :undoc-members:
     :show-inheritance:
diff --git a/docs/source/conf.py b/docs/source/conf.py
index 7fe620d8..3ff74637 100644
--- a/docs/source/conf.py
+++ b/docs/source/conf.py
@@ -81,6 +81,7 @@ def find_version(filename):
     "sphinx.ext.ifconfig",
     "sphinx.ext.viewcode",
     "sphinx.ext.githubpages",
+    "sphinx.ext.napoleon",
     "recommonmark",
     "sphinx_markdown_tables",
 ]
diff --git a/docs/source/index.rst b/docs/source/index.rst
index 8595c123..5848584b 100644
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -17,7 +17,7 @@ We summarize the contributions of CogDL as follows:
 ❗ News
 ------------
 
-- The new **v0.5.0b1 pre-release** designs and implements a unified training loop for GNN. It introduces `DataWrapper` to help prepare the training/validation/test data and `ModelWrapper` to define the training/validation/test steps. 
+- The new **v0.5.0-alpha0 pre-release** designs and implements a unified training loop for GNN. It introduces `DataWrapper` to help prepare the training/validation/test data and `ModelWrapper` to define the training/validation/test steps. 
 - The new **v0.4.1 release** adds the implementation of Deep GNNs and the recommendation task. It also supports new pipelines for generating embeddings and recommendation. Welcome to join our tutorial on KDD 2021 at 10:30 am - 12:00 am, Aug. 14th (Singapore Time). More details can be found in https://kdd2021graph.github.io/. 🎉
 - The new **v0.4.0 release** refactors the data storage (from ``Data`` to ``Graph``) and provides more fast operators to speed up GNN training. It also includes many self-supervised learning methods on graphs. BTW, we are glad to announce that we will give a tutorial on KDD 2021 in August. Please see this `link <https://kdd2021graph.github.io/>`_ for more details. 🎉
 - The new **v0.3.0 release** provides a fast spmm operator to speed up GNN training. We also release the first version of `CogDL paper <https://arxiv.org/abs/2103.00959>`_ in arXiv. You can join `our slack <https://join.slack.com/t/cogdl/shared_invite/zt-b9b4a49j-2aMB035qZKxvjV4vqf0hEg>`_ for discussion. 🎉🎉🎉
diff --git a/examples/pyg/README.md b/examples/pyg/README.md
new file mode 100644
index 00000000..f2806940
--- /dev/null
+++ b/examples/pyg/README.md
@@ -0,0 +1,13 @@
+# Running experiments with PyG modules
+
+If you are familiar with other popular graph libraries, you can implement your own model in CogDL using modules from PyTorch Geometric (PyG).
+
+## Installation
+For the installation of PyG, you can follow the instructions from PyG (https://github.com/rusty1s/pytorch_geometric/#installation).
+
+## Usage
+For the quick-start usage of how to use layers of PyG, you can find some examples in the this folder.
+For example, just run
+```bash
+python gcn.py
+```
diff --git a/cogdl/models/nn/pyg_cheb.py b/examples/pyg/chebnet.py
similarity index 50%
rename from cogdl/models/nn/pyg_cheb.py
rename to examples/pyg/chebnet.py
index 06c794d1..13ce6c0d 100644
--- a/cogdl/models/nn/pyg_cheb.py
+++ b/examples/pyg/chebnet.py
@@ -3,35 +3,14 @@
 import torch.nn.functional as F
 from torch_geometric.nn.conv import ChebConv
 
-from .. import BaseModel
+from cogdl import experiment
+from cogdl.models import BaseModel
+from cogdl.datasets.planetoid_data import CoraDataset
 
 
-class Chebyshev(BaseModel):
-    @staticmethod
-    def add_args(parser):
-        """Add model-specific arguments to the parser."""
-        # fmt: off
-        parser.add_argument("--num-features", type=int)
-        parser.add_argument("--num-classes", type=int)
-        parser.add_argument("--hidden-size", type=int, default=64)
-        parser.add_argument("--num-layers", type=int, default=2)
-        parser.add_argument("--dropout", type=float, default=0.5)
-        parser.add_argument("--filter-size", type=int, default=5)
-        # fmt: on
-
-    @classmethod
-    def build_model_from_args(cls, args):
-        return cls(
-            args.num_features,
-            args.hidden_size,
-            args.num_classes,
-            args.num_layers,
-            args.dropout,
-            args.filter_size,
-        )
-
+class ChebyNet(BaseModel):
     def __init__(self, in_feats, hidden_size, out_feats, num_layers, dropout, filter_size):
-        super(Chebyshev, self).__init__()
+        super(ChebyNet, self).__init__()
 
         self.num_features = in_feats
         self.num_classes = out_feats
@@ -53,5 +32,15 @@ def forward(self, graph):
         x = self.convs[-1](x, edge_index)
         return x
 
-    def predict(self, data):
-        return self.forward(data)
+
+if __name__ == "__main__":
+    cora = CoraDataset()
+    model = ChebyNet(
+        in_feats=cora.num_features,
+        hidden_size=64,
+        out_feats=cora.num_classes,
+        num_layers=2,
+        dropout=0.5,
+        filter_size=5,
+    )
+    ret = experiment(dataset=cora, model=model)
diff --git a/cogdl/models/nn/pyg_dgcnn.py b/examples/pyg/dgcnn.py
similarity index 50%
rename from cogdl/models/nn/pyg_dgcnn.py
rename to examples/pyg/dgcnn.py
index a7a9a3c0..a8d40723 100644
--- a/cogdl/models/nn/pyg_dgcnn.py
+++ b/examples/pyg/dgcnn.py
@@ -2,63 +2,37 @@
 import torch.nn as nn
 from torch_geometric.nn import DynamicEdgeConv, global_max_pool
 
+from cogdl import experiment
+from cogdl.models import BaseModel
+from cogdl.models.nn.mlp import MLP
 from cogdl.utils import split_dataset_general
-
-from .. import BaseModel
-from .mlp import MLP
+from cogdl.datasets.tu_data import MUTAGDataset
 
 
 class DGCNN(BaseModel):
     r"""EdgeConv and DynamicGraph in paper `"Dynamic Graph CNN for Learning on
     Point Clouds" <https://arxiv.org/pdf/1801.07829.pdf>__ .`
-
-    Parameters
-    ----------
-    in_feats : int
-        Size of each input sample.
-    out_feats : int
-        Size of each output sample.
-    hidden_dim : int
-        Dimension of hidden layer embedding.
-    k : int
-        Number of neareast neighbors.
     """
 
-    @staticmethod
-    def add_args(parser):
-        parser.add_argument("--hidden-size", type=int, default=64)
-        parser.add_argument("--batch-size", type=int, default=20)
-        parser.add_argument("--train-ratio", type=float, default=0.7)
-        parser.add_argument("--test-ratio", type=float, default=0.1)
-        parser.add_argument("--lr", type=float, default=0.001)
-
-    @classmethod
-    def build_model_from_args(cls, args):
-        return cls(
-            args.num_features,
-            args.hidden_size,
-            args.num_classes,
-        )
-
     @classmethod
     def split_dataset(cls, dataset, args):
         return split_dataset_general(dataset, args)
 
-    def __init__(self, in_feats, hidden_dim, out_feats, k=20, dropout=0.5):
+    def __init__(self, in_feats, hidden_size, out_feats, k=20, dropout=0.5):
         super(DGCNN, self).__init__()
         mlp1 = nn.Sequential(
-            MLP(2 * in_feats, hidden_dim, hidden_dim, num_layers=3, norm="batchnorm"),
+            MLP(2 * in_feats, hidden_size, hidden_size, num_layers=3, norm="batchnorm"),
             nn.ReLU(),
-            nn.BatchNorm1d(hidden_dim),
+            nn.BatchNorm1d(hidden_size),
         )
         mlp2 = nn.Sequential(
-            MLP(2 * hidden_dim, 2 * hidden_dim, 2 * hidden_dim, num_layers=1, norm="batchnorm"),
+            MLP(2 * hidden_size, 2 * hidden_size, 2 * hidden_size, num_layers=1, norm="batchnorm"),
             nn.ReLU(),
-            nn.BatchNorm1d(2 * hidden_dim),
+            nn.BatchNorm1d(2 * hidden_size),
         )
         self.conv1 = DynamicEdgeConv(mlp1, k, "max")
         self.conv2 = DynamicEdgeConv(mlp2, k, "max")
-        self.linear = nn.Linear(hidden_dim + 2 * hidden_dim, 1024)
+        self.linear = nn.Linear(hidden_size + 2 * hidden_size, 1024)
         self.final_mlp = nn.Sequential(
             nn.Linear(1024, 512),
             nn.BatchNorm1d(512),
@@ -77,3 +51,15 @@ def forward(self, batch):
         h = global_max_pool(h, batch.batch)
         out = self.final_mlp(h)
         return out
+
+
+if __name__ == "__main__":
+    mutag = MUTAGDataset()
+    model = DGCNN(
+        in_feats=mutag.num_features,
+        hidden_size=64,
+        out_feats=mutag.num_classes,
+        k=20,
+        dropout=0.5,
+    )
+    ret = experiment(dataset=mutag, model=model, dw="graph_classification_dw", mw="graph_classification_mw")
diff --git a/examples/pytorch_geometric/gat.py b/examples/pyg/gat.py
similarity index 100%
rename from examples/pytorch_geometric/gat.py
rename to examples/pyg/gat.py
diff --git a/cogdl/models/nn/pyg_gcn.py b/examples/pyg/gcn.py
similarity index 53%
rename from cogdl/models/nn/pyg_gcn.py
rename to examples/pyg/gcn.py
index 1d5b7c3b..847528f6 100644
--- a/cogdl/models/nn/pyg_gcn.py
+++ b/examples/pyg/gcn.py
@@ -3,31 +3,12 @@
 import torch.nn.functional as F
 from torch_geometric.nn.conv import GCNConv
 
-from .. import BaseModel
+from cogdl import experiment
+from cogdl.models import BaseModel
+from cogdl.datasets.planetoid_data import CoraDataset
 
 
 class GCN(BaseModel):
-    @staticmethod
-    def add_args(parser):
-        """Add model-specific arguments to the parser."""
-        # fmt: off
-        parser.add_argument("--num-features", type=int)
-        parser.add_argument("--num-classes", type=int)
-        parser.add_argument("--hidden-size", type=int, default=64)
-        parser.add_argument("--num-layers", type=int, default=2)
-        parser.add_argument("--dropout", type=float, default=0.5)
-        # fmt: on
-
-    @classmethod
-    def build_model_from_args(cls, args):
-        return cls(
-            args.num_features,
-            args.num_classes,
-            args.hidden_size,
-            args.num_layers,
-            args.dropout,
-        )
-
     def __init__(self, num_features, num_classes, hidden_size, num_layers, dropout):
         super(GCN, self).__init__()
 
@@ -50,8 +31,14 @@ def forward(self, graph):
         x = self.convs[-1](x, edge_index, edge_weight)
         return F.log_softmax(x, dim=1)
 
-    def get_embeddings(self, x, edge_index, weight=None):
-        for conv in self.convs[:-1]:
-            x = F.relu(conv(x, edge_index, weight))
-            x = F.dropout(x, p=self.dropout, training=self.training)
-        return x
+
+if __name__ == "__main__":
+    cora = CoraDataset()
+    model = GCN(
+        num_features=cora.num_features,
+        hidden_size=64,
+        num_classes=cora.num_classes,
+        num_layers=2,
+        dropout=0.5,
+    )
+    ret = experiment(dataset=cora, model=model)
diff --git a/examples/pytorch_geometric/unet.py b/examples/pyg/unet.py
similarity index 92%
rename from examples/pytorch_geometric/unet.py
rename to examples/pyg/unet.py
index 2a295264..51e93773 100644
--- a/examples/pytorch_geometric/unet.py
+++ b/examples/pyg/unet.py
@@ -44,4 +44,4 @@ def forward(self, graph):
         dropout=0.1,
         num_nodes=cora.num_nodes,
     )
-    ret = experiment(dataset=cora, model=model, dw="node_classification_dw", mw="node_classification_mw")
+    ret = experiment(dataset=cora, model=model)
diff --git a/scripts/parallel_train.py b/scripts/parallel_train.py
index 2103a300..e2d25db2 100644
--- a/scripts/parallel_train.py
+++ b/scripts/parallel_train.py
@@ -53,7 +53,7 @@ def getpid(_):
         num_workers = 1
     else:
         num_workers = len(device_ids)
-    print("Using {num_workers} workers!")
+    print(f"Using {num_workers} workers!")
 
     results_dict = defaultdict(list)
     with mp.Pool(processes=num_workers) as pool:
diff --git a/tests/tasks/test_graph_classification.py b/tests/tasks/test_graph_classification.py
index 9a3e70f5..94b126db 100644
--- a/tests/tasks/test_graph_classification.py
+++ b/tests/tasks/test_graph_classification.py
@@ -57,12 +57,6 @@ def add_gin_args(args):
     return args
 
 
-def add_dgcnn_args(args):
-    args.hidden_size = 64
-    args.batch_size = 20
-    return args
-
-
 def add_sortpool_args(args):
     args.hidden_size = 64
     args.batch_size = 20
@@ -121,13 +115,6 @@ def test_diffpool_mutag():
     assert ret["test_acc"] > 0
 
 
-def test_dgcnn_proteins():
-    args = get_default_args_graph_clf(dataset="proteins", model="dgcnn")
-    args = add_dgcnn_args(args)
-    ret = train(args)
-    assert ret["test_acc"] > 0
-
-
 def test_sortpool_mutag():
     args = get_default_args_graph_clf(dataset="mutag", model="sortpool")
     args = add_sortpool_args(args)
@@ -151,9 +138,5 @@ def test_patchy_san_mutag():
     test_gin_proteins()
 
     test_sortpool_mutag()
-
     test_diffpool_mutag()
-
-    test_dgcnn_proteins()
-
     test_patchy_san_mutag()
diff --git a/tests/tasks/test_node_classification.py b/tests/tasks/test_node_classification.py
index 3502a255..7972e8df 100644
--- a/tests/tasks/test_node_classification.py
+++ b/tests/tasks/test_node_classification.py
@@ -101,22 +101,6 @@ def test_graphsage_cora():
     assert 0 <= ret["test_acc"] <= 1
 
 
-def test_pyg_cheb_cora():
-    args = get_default_args_for_nc("cora", "chebyshev")
-    args.num_layers = 2
-    args.filter_size = 5
-    ret = train(args)
-    assert 0 <= ret["test_acc"] <= 1
-
-
-def test_pyg_gcn_cora():
-    args = get_default_args_for_nc("cora", "pyg_gcn")
-    args.auxiliary_task = "none"
-    args.num_layers = 2
-    ret = train(args)
-    assert 0 <= ret["test_acc"] <= 1
-
-
 def test_clustergcn_pubmed():
     args = get_default_args_for_nc("pubmed", "gcn", dw="cluster_dw")
     args.cpu = True
@@ -212,7 +196,7 @@ def test_srgcn_cora():
 
     norm_list = ["identity", "row_uniform", "row_softmax", "col_uniform", "symmetry"]
     activation_list = ["relu", "relu6", "sigmoid", "tanh", "leaky_relu", "softplus", "elu", "linear"]
-    attn_list = ["node", "edge", "identity", "heat", "ppr"]  # gaussian
+    attn_list = ["node", "edge", "identity", "heat", "ppr", "gaussian"]
 
     for norm in norm_list:
         args.normalization = norm
@@ -536,8 +520,6 @@ def test_gcc_cora():
     test_mlp_pubmed()
     test_mixhop_citeseer()
     test_graphsage_cora()
-    test_pyg_cheb_cora()
-    test_pyg_gcn_cora()
     test_disengcn_cora()
     test_graph_mix()
     test_srgcn_cora()