diff --git a/docs/en/advanced_guides/customize_dataset.md b/docs/en/advanced_guides/customize_dataset.md
index 31a6e16b2b..690ce3859c 100644
--- a/docs/en/advanced_guides/customize_dataset.md
+++ b/docs/en/advanced_guides/customize_dataset.md
@@ -1,34 +1,34 @@
-# Customize Datasets
+# Customize Dataset
In this tutorial, we will introduce some methods about how to customize your own dataset by online conversion.
-- [Customize Datasets](#customize-datasets)
+- [Customize Dataset](#customize-dataset)
- [General understanding of the Dataset in MMAction2](#general-understanding-of-the-dataset-in-mmaction2)
- [Customize new datasets](#customize-new-datasets)
- [Customize keypoint format for PoseDataset](#customize-keypoint-format-for-posedataset)
## General understanding of the Dataset in MMAction2
-MMAction2 provides specific Dataset class according to the task, e.g. `VideoDataset`/`RawframeDataset` for action recognition, `AVADataset` for spatio-temporal action detection, `PoseDataset` for skeleton-based action recognition. All these specific datasets only need to implement `get_data_info(self, idx)` to build a data list from the annotation file, while other functions are handled by the superclass. The following table shows the inherent relationship and the main function of the modules.
+MMAction2 provides task-specific `Dataset` class, e.g. `VideoDataset`/`RawframeDataset` for action recognition, `AVADataset` for spatio-temporal action detection, `PoseDataset` for skeleton-based action recognition. These task-specific datasets only require the implementation of `load_data_list(self)` for generating a data list from the annotation file. The remaining functions are automatically handled by the superclass (i.e., `BaseActionDataset` and `BaseDataset`). The following table shows the inherent relationship and the main method of the modules.
-| Class Name | Functions |
-| ---------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| MMAction2::VideoDataset | `load_data_list(self)`
Build data list from the annotation file. |
-| MMAction2::BaseActionDataset | `get_data_info(self, idx)`
Given the `idx`, return the corresponding data sample from data list |
-| MMEngine::BaseDataset | `__getitem__(self, idx)`
Given the `idx`, call `get_data_info` to get data sample, then call the `pipeline` to perform transforms and augmentation in `train_pipeline` or `val_pipeline` |
+| Class Name | Class Method |
+| ------------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `MMAction2::VideoDataset` | `load_data_list(self)`
Build data list from the annotation file. |
+| `MMAction2::BaseActionDataset` | `get_data_info(self, idx)`
Given the `idx`, return the corresponding data sample from the data list. |
+| `MMEngine::BaseDataset` | `__getitem__(self, idx)`
Given the `idx`, call `get_data_info` to get the data sample, then call the `pipeline` to perform transforms and augmentation in `train_pipeline` or `val_pipeline` . |
## Customize new datasets
-For most scenarios, we don't need to customize a new dataset class, offline conversion is recommended way to use your data. But customizing a new dataset class is also easy in MMAction2. As above mentioned, a dataset for a specific task usually only needs to implement `load_data_list(self)` to generate the data list from the annotation file. It is worth noting that elements in the `data_list` are `dict` with fields required in the following pipeline.
+Although offline conversion is the preferred method for utilizing your own data in most cases, MMAction2 offers a convenient process for creating a customized `Dataset` class. As mentioned previously, task-specific datasets only require the implementation of `load_data_list(self)` for generating a data list from the annotation file. It is noteworthy that the elements in the `data_list` are `dict` with fields that are essential for the subsequent processes in the `pipeline`.
-Take `VideoDataset` as an example, `train_pipeline`/`val_pipeline` requires `'filename'` in `DecordInit` and `'label'` in `PackActionInput`, so data samples in the data list have 2 fields: `'filename'` and `'label'`.
-you can refer to [customize pipeline](customize_pipeline.md) for more details about the pipeline.
+Taking `VideoDataset` as an example, `train_pipeline`/`val_pipeline` require `'filename'` in `DecordInit` and `'label'` in `PackActionInputs`. Consequently, the data samples in the `data_list` must contain 2 fields: `'filename'` and `'label'`.
+Please refer to [customize pipeline](customize_pipeline.md) for more details about the `pipeline`.
```
data_list.append(dict(filename=filename, label=label))
```
-While `AVADataset` is more complex, elements in the data list consist of several fields about video data, and it further overwrites `get_data_info(self, idx)` to convert keys, which are required in spatio-temporal action detection pipeline.
+However, `AVADataset` is more complex, data samples in the `data_list` consist of several fields about the video data. Moreover, it overwrites `get_data_info(self, idx)` to convert keys that are indispensable in the spatio-temporal action detection pipeline.
```python
@@ -60,21 +60,21 @@ class AVADataset(BaseActionDataset):
## Customize keypoint format for PoseDataset
-MMAction2 currently supports three kinds of keypoint formats: `coco`, `nturgb+d` and `openpose`. If your use one of them, just specify the corresponding format in the following modules:
+MMAction2 currently supports three keypoint formats: `coco`, `nturgb+d` and `openpose`. If you use one of these formats, you may simply specify the corresponding format in the following modules:
-For Graph Convolutional Networks, such as AAGCN, STGCN...
+For Graph Convolutional Networks, such as AAGCN, STGCN, ...
-- transform: argument `dataset` in `JointToBone`.
-- backbone: argument `graph_cfg` in Graph Convolutional Networks.
+- `pipeline`: argument `dataset` in `JointToBone`.
+- `backbone`: argument `graph_cfg` in Graph Convolutional Networks.
-And for PoseC3D:
+For PoseC3D:
-- transform: In `Flip`, specify `left_kp` and `right_kp` according to the keypoint symmetrical relationship, or remove the transform for asymmetric keypoints structure.
-- transform: In `GeneratePoseTarget`, specify `skeletons`, `left_limb`, `right_limb` if `with_limb` is `true`, and `left_kp`, `right_kp` if `with_kp` is `true`.
+- `pipeline`: In `Flip`, specify `left_kp` and `right_kp` based on the symmetrical relationship between keypoints.
+- `pipeline`: In `GeneratePoseTarget`, specify `skeletons`, `left_limb`, `right_limb` if `with_limb` is `True`, and `left_kp`, `right_kp` if `with_kp` is `True`.
-For a custom format, you need to add a new graph layout into models and transforms, which defines the keypoints and their connection relationship.
+If using a custom keypoint format, it is necessary to include a new graph layout in both the `backbone` and `pipeline`. This layout will define the keypoints and their connection relationship.
-Take the coco dataset as an example, we define a layout named `coco` in `Graph`, and set its `inward` as followed, which includes all connections between nodes, each connection is a pair of nodes from far to near. The order of connections does not matter. Other settings about coco are to set the number of nodes to 17, and set node 0 as the center node.
+Taking the `coco` dataset as an example, we define a layout named `coco` in `Graph`. The `inward` connections of this layout comprise all node connections, with each **centripetal** connection consisting of a tuple of nodes. Additional settings for `coco` include specifying the number of nodes as `17` the `node 0` as the central node.
```python
@@ -85,7 +85,7 @@ self.inward = [(15, 13), (13, 11), (16, 14), (14, 12), (11, 5),
self.center = 0
```
-Similarly, we define the `pairs` in `JointToBone`, adding a bone of `(0, 0)` to align the number of bones to the nodes. The `pairs` of coco dataset is as followed, same as above mentioned, the order of pairs does not matter.
+Similarly, we define the `pairs` in `JointToBone`, adding a bone of `(0, 0)` to align the number of bones to the nodes. The `pairs` of coco dataset are shown below, and the order of `pairs` in `JointToBone` is irrelevant.
```python
@@ -94,10 +94,9 @@ self.pairs = ((0, 0), (1, 0), (2, 0), (3, 1), (4, 2), (5, 0),
(12, 0), (13, 11), (14, 12), (15, 13), (16, 14))
```
-For your custom format, just define the above setting as your graph structure, and specify in your config file as followed, we take `STGCN` as an example, assuming you already define a `custom_dataset` in `Graph` and `JointToBone`, and num_classes is n.
+To use your custom keypoint format, simply define the aforementioned settings as your graph structure and specify them in your config file as shown below, In this example, we will use `STGCN`, with `n` denoting the number of classes and `custom_dataset` defined in `Graph` and `JointToBone`.
```python
-
model = dict(
type='RecognizerGCN',
backbone=dict(
diff --git a/docs/en/advanced_guides/customize_logging.md b/docs/en/advanced_guides/customize_logging.md
index aabaad949f..ccbbeeafed 100644
--- a/docs/en/advanced_guides/customize_logging.md
+++ b/docs/en/advanced_guides/customize_logging.md
@@ -1,6 +1,6 @@
# Customize Logging
-MMAction2 produces a lot of logs during the running process, such as loss, iteration time, learning rate, etc. In this section, we will introduce you how to output custom log. More details about the logging system, please refer to [MMEngine](https://mmengine.readthedocs.io/en/latest/advanced_tutorials/logging.html).
+MMAction2 produces a lot of logs during the running process, such as loss, iteration time, learning rate, etc. In this section, we will introduce you how to output custom log. More details about the logging system, please refer to [MMEngine Tutorial](https://mmengine.readthedocs.io/en/latest/advanced_tutorials/logging.html).
- [Customize Logging](#customize-logging)
- [Flexible Logging System](#flexible-logging-system)
@@ -9,13 +9,13 @@ MMAction2 produces a lot of logs during the running process, such as loss, itera
## Flexible Logging System
-MMAction2 configures the logging system by LogProcessor in [default_runtime](/configs/_base_/default_runtime.py) in default, which is equivalent to:
+The MMAction2 logging system is configured by the `LogProcessor` in [default_runtime](/configs/_base_/default_runtime.py) by default, which is equivalent to:
```python
log_processor = dict(type='LogProcessor', window_size=20, by_epoch=True)
```
-Defaultly, LogProcessor catches all filed start with `loss` return by `model.forward`. For example in the following model, `loss1` and `loss2` will be logged automatically without additional configuration.
+By default, the `LogProcessor` captures all fields that begin with `loss` returned by `model.forward`. For instance, in the following model, `loss1` and `loss2` will be logged automatically without any additional configuration.
```python
from mmengine.model import BaseModel
@@ -32,14 +32,14 @@ class ToyModel(BaseModel):
return dict(loss1=loss1, loss2=loss2)
```
-The format of the output log is as followed:
+The output log follows the following format:
```
08/21 02:58:41 - mmengine - INFO - Epoch(train) [1][10/25] lr: 1.0000e-02 eta: 0:00:00 time: 0.0019 data_time: 0.0004 loss1: 0.8381 loss2: 0.9007 loss: 1.7388
08/21 02:58:41 - mmengine - INFO - Epoch(train) [1][20/25] lr: 1.0000e-02 eta: 0:00:00 time: 0.0029 data_time: 0.0010 loss1: 0.1978 loss2: 0.4312 loss: 0.6290
```
-LogProcessor will output the log in the following format:
+`LogProcessor` will output the log in the following format:
- The prefix of the log:
- epoch mode(`by_epoch=True`): `Epoch(train) [{current_epoch}/{current_iteration}]/{dataloader_length}`
@@ -55,11 +55,11 @@ LogProcessor will output the log in the following format:
log_processor outputs the epoch based log by default(`by_epoch=True`). To get an expected log matched with the `train_cfg`, we should set the same value for `by_epoch` in `train_cfg` and `log_processor`.
```
-Based on the rules above, the code snippet will count the average value of the loss1 and the loss2 every 20 iterations. More types of statistical methods, please refer to [MMEngine.LogProcessor](mmengine.runner.LogProcessor).
+Based on the rules above, the code snippet will count the average value of the loss1 and the loss2 every 20 iterations. More types of statistical methods, please refer to [mmengine.runner.LogProcessor](mmengine.runner.LogProcessor).
## Customize log
-The logging system could not only log the loss, lr, .etc but also collect and output the custom log. For example, if we want to statistic the intermediate loss:
+The logging system could not only log the `loss`, `lr`, .etc but also collect and output the custom log. For example, if we want to statistic the intermediate loss:
The `ToyModel` calculate `loss_tmp` in forward, but don't save it into the return dict.
@@ -108,7 +108,7 @@ The `loss_tmp` will be added to the output log:
## Export the debug log
-To export the debug log to the `work_dir`, you can set log_level in config file as followed:
+To export the debug log to the `work_dir`, you can set log_level in config file as follows:
```
log_level='DEBUG'
diff --git a/docs/en/advanced_guides/customize_pipeline.md b/docs/en/advanced_guides/customize_pipeline.md
index 632216ba10..ba7a6a8e5b 100644
--- a/docs/en/advanced_guides/customize_pipeline.md
+++ b/docs/en/advanced_guides/customize_pipeline.md
@@ -1,21 +1,20 @@
# Customize Data Pipeline
-In this tutorial, we will introduce some methods about how to build the data pipeline (i.e., data transformations)for your tasks.
+In this tutorial, we will introduce some methods about how to build the data pipeline (i.e., data transformations) for your tasks.
- [Customize Data Pipeline](#customize-data-pipeline)
- - [Design of Data pipelines](#design-of-data-pipelines)
- - [Modify the training/test pipeline](#modify-the-trainingtest-pipeline)
+ - [Design of Data Pipeline](#design-of-data-pipeline)
+ - [Modify the Training/Testing Pipeline](#modify-the-trainingtest-pipeline)
- [Loading](#loading)
- - [Sampling frames and other processing](#sampling-frames-and-other-processing)
+ - [Sampling Frames and Other Processing](#sampling-frames-and-other-processing)
- [Formatting](#formatting)
- - [Add new data transforms](#add-new-data-transforms)
+ - [Add New Data Transforms](#add-new-data-transforms)
-## Design of Data pipelines
+## Design of Data Pipeline
-The data pipeline means how to process the sample dict when indexing a sample from the dataset. And it
-consists of a sequence of data transforms. Each data transform takes a dict as input, processes it, and outputs a dict for the next data transform.
+The data pipeline refers to the procedure of handling the data sample dict when indexing a sample from the dataset, and comprises a series of data transforms. Each data transform accepts a `dict` as input, processes it, and produces a `dict` as output for the subsequent data transform in the sequence.
-Here is a data pipeline example for SlowFast training on Kinetics for `VideoDataset`. It first use [`decord`](https://github.com/dmlc/decord) to read the raw videos and randomly sample one video clip (the clip has 32 frames, and the interval between frames is 2). Next it applies the random resized crop and random horizontal flip to all frames. Finally the data shape is formatted as `NCTHW`.
+Below is an example data pipeline for training SlowFast on Kinetics using `VideoDataset`. The pipeline initially employs [`decord`](https://github.com/dmlc/decord) to read the raw videos and randomly sample one video clip, which comprises `32` frames with a frame interval of `2`. Subsequently, it applies random resized crop and random horizontal flip to all frames before formatting the data shape as `NCTHW`, which is `(1, 3, 32, 224, 224)` in this example.
```python
train_pipeline = [
@@ -31,18 +30,17 @@ train_pipeline = [
]
```
-All available data transforms in MMAction2 can be found in the [data transforms docs](mmaction.datasets.transforms).
+A comprehensive list of all available data transforms in MMAction2 can be found in the [mmaction.datasets.transforms](mmaction.datasets.transforms).
-## Modify the training/test pipeline
+## Modify the Training/Testing Pipeline
-The data pipeline in MMAction2 is pretty flexible. You can control almost every step of the data
-preprocessing from the config file, but on the other hand, you may be confused facing so many options.
+The data pipeline in MMAction2 is highly adaptable, as nearly every step of the data preprocessing can be configured from the config file. However, the wide array of options may be overwhelming for some users.
-Here is a common practice and guidance for action recognition tasks.
+Below are some general practices and guidance for building a data pipeline for action recognition tasks.
### Loading
-At the beginning of a data pipeline, we usually need to load videos. But if you already extract the frames, you should use `RawFrameDecode` and change the dataset type to `RawframeDataset`:
+At the beginning of a data pipeline, it is customary to load videos. However, if the frames have already been extracted, you should utilize `RawFrameDecode` and modify the dataset type to `RawframeDataset`.
```python
train_pipeline = [
@@ -57,14 +55,13 @@ train_pipeline = [
]
```
-If you want to load data from files with special formats or special locations, you can [implement a new loading
-transform](#add-new-data-transforms) and add it at the beginning of the data pipeline.
+If you need to load data from files with distinct formats (e.g., `pkl`, `bin`, etc.) or from specific locations, you may create a new loading transform and include it at the beginning of the data pipeline. Please refer to [Add New Data Transforms](#add-new-data-transforms) for more details.
-### Sampling frames and other processing
+### Sampling Frames and Other Processing
During training and testing, we may have different strategies to sample frames from the video.
-For example, during testing of SlowFast, we sample multiple clips uniformly:
+For instance, when testing SlowFast, we uniformly sample multiple clips as follows:
```python
test_pipeline = [
@@ -79,9 +76,9 @@ test_pipeline = [
]
```
-In the above example, 10 clips of 32-frame video clips will be sampled for each video. We use `test_mode=True` to uniformly sample these clips (as opposed to randomly sample during training).
+In the above example, 10 video clips, each comprising 32 frames, will be uniformly sampled from each video. `test_mode=True` is employed to accomplish this, as opposed to random sampling during training.
-Another example is that TSN/TSM models sample multiple segments from the video:
+Another example involves `TSN/TSM` models, which sample multiple segments from the video:
```python
train_pipeline = [
@@ -91,20 +88,15 @@ train_pipeline = [
]
```
-```{note}
-Usually, the data augmentation part in the data pipeline handles only video-wise transforms, but not transforms
-like video normalization or mixup/cutmix. It's because we can do image normalization and mixup/cutmix on batch data
-to accelerate with GPUs. To configure video normalization and mixup/cutmix, please use the [data preprocessor]
-(mmaction.models.utils.data_preprocessor).
-```
+Typically, the data augmentations in the data pipeline handles only video-level transforms, such as resizing or cropping, but not transforms like video normalization or mixup/cutmix. This is because we can do video normalization and mixup/cutmix on batched video data
+to accelerate processing using GPUs. To configure video normalization and mixup/cutmix, please use the [mmaction.models.utils.data_preprocessor](mmaction.models.utils.data_preprocessor).
### Formatting
-The formatting is to collect training data from the data information dict and convert these data to
-model-friendly format.
+Formatting involves collecting training data from the data information dict and converting it into a format that is compatible with the model.
-In most cases, you can simply use [`PackActionInputs`](mmaction.datasets.transforms.PackActionInputs), and it will
-convert the image in NumPy array format to PyTorch tensor, and pack the ground truth categories information and
+In most cases, you can simply employ [`PackActionInputs`](mmaction.datasets.transforms.PackActionInputs), and it will
+convert the image in `NumPy Array` format to `PyTorch Tensor`, and pack the ground truth category information and
other meta information as a dict-like object [`ActionDataSample`](mmaction.structures.ActionDataSample).
```python
@@ -114,12 +106,10 @@ train_pipeline = [
]
```
-## Add new data transforms
+## Add New Data Transforms
-1. Write a new data transform in any file, e.g., `my_transform.py`, and place it in
- the folder `mmaction/datasets/transforms/`. The data transform class needs to inherit
- the [`mmcv.transforms.BaseTransform`](mmcv.transforms.BaseTransform) class and override
- the `transform` method which takes a dict as input and returns a dict.
+1. To create a new data transform, write a new transform class in a python file named, for example, `my_transforms.py`. The data transform classes must inherit
+ the [`mmcv.transforms.BaseTransform`](mmcv.transforms.BaseTransform) class and override the `transform` method which takes a `dict` as input and returns a `dict`. Finally, place `my_transforms.py` in the folder `mmaction/datasets/transforms/`.
```python
from mmcv.transforms import BaseTransform
@@ -127,9 +117,12 @@ train_pipeline = [
@TRANSFORMS.register_module()
class MyTransform(BaseTransform):
+ def __init__(self, msg):
+ self.msg = msg
def transform(self, results):
# Modify the data information dict `results`.
+ print(msg, 'MMAction2.')
return results
```
@@ -149,7 +142,7 @@ train_pipeline = [
```python
train_pipeline = [
...
- dict(type='MyTransform'),
+ dict(type='MyTransform', msg='Hello!'),
...
]
```
diff --git a/docs/en/get_started/overview.md b/docs/en/get_started/overview.md
index 8bdccd3451..498b093830 100644
--- a/docs/en/get_started/overview.md
+++ b/docs/en/get_started/overview.md
@@ -2,18 +2,18 @@
## What is MMAction2
-MMAction2 is an open source toolkit based on PyTorch, supporting numerous video understanding models, including action recognition, skeleton-based action recognition, spatio-temporal action detection and temporal action localization. In addition, it supports widely-used academic datasets and provides many useful tools, assisting users in exploring various aspects of models and datasets and implementing high-quality algorithms. Generally, it has the following features.
+MMAction2 is an open source toolkit based on PyTorch, supporting numerous video understanding models, including **action recognition, skeleton-based action recognition, spatio-temporal action detection and temporal action localization**. Moreover, it supports widely-used academic datasets and offers many useful tools, assisting users in exploring various aspects of models and datasets, as well as implementing high-quality algorithms. Generally, the toolkit boasts the following features:
-One-stop, Multi-model: MMAction2 supports various video understanding tasks and implements the latest models for action recognition, localization, detection.
+**One-stop, Multi-model**: MMAction2 supports various video understanding tasks and implements state-of-the-art models for action recognition, localization, detection.
-Modular Design: MMAction2’s modular design allows users to define and reuse modules in the model on demand.
+**Modular Design**: The modular design of MMAction2 enables users to define and reuse modules in the model as required.
-Various Useful Tools: MMAction2 provides many analysis tools, including visualizers, validation scripts, evaluators, etc., to help users troubleshoot, finetune or compare models.
+**Various Useful Tools**: MMAction2 provides an array of analysis tools, such as visualizers, validation scripts, evaluators, etc., to aid users in troubleshooting, fine-tuning, or comparing models.
-Powered by OpenMMLab: Like other algorithm libraries in OpenMMLab family, MMAction2 follows OpenMMLab’s rigorous development guidelines and interface conventions, significantly reducing the learning cost of users familiar with other projects in OpenMMLab family. In addition, benefiting from the unified interfaces among OpenMMLab, you can easily call the models implemented in other OpenMMLab projects (e.g. MMClassification) in MMAction2, facilitating cross-domain research and real-world applications.
+**Powered by OpenMMLab**: Similar to other algorithm libraries in the OpenMMLab family, MMAction2 adheres to OpenMMLab's rigorous development guidelines and interface conventions, considerably reducing the learning cost for users familiar with other OpenMMLab projects. Furthermore, due to the unified interfaces among OpenMMLab projects, it is easy to call models implemented in other OpenMMLab projects (such as MMClassification) in MMAction2, which greatly facilitates cross-domain research and real-world applications.
+ |
Action Recognition |
Skeleton-based Action Recognition |
@@ -21,7 +21,7 @@ Powered by OpenMMLab: Like other algorithm libraries in OpenMMLab family, MMActi
Spatio-Temporal Action Detection |
- + | Spatio-Temporal Action Detection |