diff --git a/demo/README.md b/demo/README.md new file mode 100644 index 0000000000..65b4ff259c --- /dev/null +++ b/demo/README.md @@ -0,0 +1,146 @@ +# Demo + +### Demo link + + * [Video demo](#video-demo): A demo script to predict the recognition result using a single video + * [Webcam demo](#webcam-demo): A demo script to implement real-time action recognition from web camera + +### Video demo + +We provide a demo script to predict the recognition result using a single video. + +```shell +python demo/demo.py ${CONFIG_FILE} ${CHECKPOINT_FILE} ${VIDEO_FILE} {LABEL_FILE} [--use-frames] \ + [--device ${DEVICE_TYPE}] [--fps {FPS}] [--font-size {FONT_SIZE}] [--font-color {FONT_COLOR}] \ + [--target-resolution ${TARGET_RESOLUTION}] [--resize-algorithm {RESIZE_ALGORITHM}] [--out-filename {OUT_FILE}] +``` + +Optional arguments: +- `--use-frames`: If specified, the demo will take rawframes as input. Otherwise, it will take a video as input. +- `DEVICE_TYPE`: Type of device to run the demo. Allowed values are cuda device like `cuda:0` or `cpu`. If not specified, it will be set to `cuda:0`. +- `FPS`: FPS value of the output video when using rawframes as input. If not specified, it wll be set to 30. +- `FONT_SIZE`: Font size of the label added in the video. If not specified, it wll be set to 20. +- `FONT_COLOR`: Font color of the label added in the video. If not specified, it will be `white`. +- `TARGET_RESOLUTION`: Resolution(desired_width, desired_height) for resizing the frames before output when using a video as input. If not specified, it will be None and the frames are resized by keeping the existing aspect ratio. +- `RESIZE_ALGORITHM`: Resize algorithm used for resizing. If not specified, it will be set to `bicubic`. +- `OUT_FILE`: Path to the output file which can be a video format or gif format. If not specified, it will be set to `None` and does not generate the output file. + +Examples: + +Assume that you are located at `$MMACTION2` and have already downloaded the checkpoints to the directory `checkpoints/` + +1. Recognize a video file as input by using a TSN model on cuda by default. + + ```shell + # The demo.mp4 and label_map.txt are both from Kinetics-400 + python demo/demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \ + checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \ + demo/demo.mp4 demo/label_map.txt + ``` + +2. Recognize a list of rawframes as input by using a TSN model on cpu. + + ```shell + python demo/demo.py configs/recognition/tsn/tsn_r50_inference_1x1x3_100e_kinetics400_rgb.py \ + checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \ + PATH_TO_FRAMES/ LABEL_FILE --use-frames --device cpu + ``` + +3. Recognize a video file as input by using a TSN model and then generate an mp4 file. + + ```shell + # The demo.mp4 and label_map.txt are both from Kinetics-400 + python demo/demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \ + checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \ + demo/demo.mp4 demo/label_map.txt --out-filename demo/demo_out.mp4 + ``` + +4. Recognize a list of rawframes as input by using a TSN model and then generate a gif file. + + ```shell + python demo/demo.py configs/recognition/tsn/tsn_r50_inference_1x1x3_100e_kinetics400_rgb.py \ + checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \ + PATH_TO_FRAMES/ LABEL_FILE --use-frames --out-filename demo/demo_out.gif + ``` + +5. Recognize a video file as input by using a TSN model, then generate an mp4 file with a given resolution and resize algorithm. + + ```shell + # The demo.mp4 and label_map.txt are both from Kinetics-400 + python demo/demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \ + checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \ + demo/demo.mp4 demo/label_map.txt --target-resolution 340 256 --resize-algorithm bilinear \ + --out-filename demo/demo_out.mp4 + ``` + + ```shell + # The demo.mp4 and label_map.txt are both from Kinetics-400 + # If either dimension is set to -1, the frames are resized by keeping the existing aspect ratio + # For --target-resolution 170 -1, original resolution (340, 256) -> target resolution (170, 128) + python demo/demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \ + checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \ + demo/demo.mp4 demo/label_map.txt --target-resolution 170 -1 --resize-algorithm bilinear \ + --out-filename demo/demo_out.mp4 + ``` + +6. Recognize a video file as input by using a TSN model, then generate an mp4 file with a label in a red color and 10px fontsize. + + ```shell + # The demo.mp4 and label_map.txt are both from Kinetics-400 + python demo/demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \ + checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \ + demo/demo.mp4 demo/label_map.txt --font-size 10 --font-color red \ + --out-filename demo/demo_out.mp4 + ``` + +7. Recognize a list of rawframes as input by using a TSN model and then generate an mp4 file with 24 fps. + + ```shell + python demo/demo.py configs/recognition/tsn/tsn_r50_inference_1x1x3_100e_kinetics400_rgb.py \ + checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \ + PATH_TO_FRAMES/ LABEL_FILE --use-frames --fps 24 --out-filename demo/demo_out.gif + ``` + +### Webcam demo + +We provide a demo script to implement real-time action recognition from web camera. + +```shell +python demo/webcam_demo.py ${CONFIG_FILE} ${CHECKPOINT_FILE} ${LABEL_FILE} \ + [--device ${DEVICE_TYPE}] [--camera-id ${CAMERA_ID}] [--threshold ${THRESHOLD}] \ + [--average-size ${AVERAGE_SIZE}] +``` + +Optional arguments: +- `DEVICE_TYPE`: Type of device to run the demo. Allowed values are cuda device like `cuda:0` or `cpu`. If not specified, it will be set to `cuda:0`. +- `CAMERA_ID`: ID of camera device If not specified, it will be set to 0. +- `THRESHOLD`: Threshold of prediction score for action recognition. Only label with score higher than the threshold will be shown. If not specified, it will be set to 0. +- `AVERAGE_SIZE`: Number of latest clips to be averaged for prediction. If not specified, it will be set to 1. + +Examples: + +Assume that you are located at `$MMACTION2` and have already downloaded the checkpoints to the directory `checkpoints/` + +1. Recognize the action from web camera as input by using a TSN model on cpu, averaging the score per 5 times + and outputting result labels with score higher than 0.2. + + ```shell + python demo/webcam_demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \ + checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth demo/label_map.txt --average-size 5 \ + --threshold 0.2 --device cpu + ``` + +2. Recognize the action from web camera as input by using a I3D model on gpu by default, averaging the score per 5 times + and outputting result labels with score higher than 0.2. + + ```shell + python demo/webcam_demo.py configs/recognition/i3d/i3d_r50_video_inference_32x2x1_100e_kinetics400_rgb.py \ + checkpoints/i3d_r50_32x2x1_100e_kinetics400_rgb_20200614-c25ef9a4.pth demo/label_map.txt \ + --average-size 5 --threshold 0.2 + ``` + +**Note:** Considering the efficiency difference for users' hardware, Some modifications might be done to suit the case. +Users can change: +1). `SampleFrames` step (especially the number of `clip_len` and `num_clips`) of `test_pipeline` in the config file. +2). Change to the suitable Crop methods like `TenCrop`, `ThreeCrop`, `CenterCrop`, etc. in `test_pipeline` of the config file. +3). Change the number of `--average-size`. The smaller, the faster. diff --git a/docs/changelog.md b/docs/changelog.md index fe367eda2b..f47bb17ee1 100644 --- a/docs/changelog.md +++ b/docs/changelog.md @@ -9,6 +9,7 @@ **Improvements** - Add random seed for building filelists ([#323](https://github.com/open-mmlab/mmaction2/pull/323)) +- Move docs about demo to `demo/README.md` ([#329](https://github.com/open-mmlab/mmaction2/pull/329)) **Bug Fixes** - Fix a bug in BaseDataset when `data_prefix` is None ([#314](https://github.com/open-mmlab/mmaction2/pull/314)) diff --git a/docs/getting_started.md b/docs/getting_started.md index 37ea4dd969..f73484700c 100644 --- a/docs/getting_started.md +++ b/docs/getting_started.md @@ -93,145 +93,6 @@ Assume that you have already downloaded the checkpoints to the directory `checkp --launcher slurm --eval top_k_accuracy ``` -### Video demo - -We provide a demo script to predict the recognition result using a single video. - -```shell -python demo/demo.py ${CONFIG_FILE} ${CHECKPOINT_FILE} ${VIDEO_FILE} {LABEL_FILE} [--use-frames] \ - [--device ${DEVICE_TYPE}] [--fps {FPS}] [--font-size {FONT_SIZE}] [--font-color {FONT_COLOR}] \ - [--target-resolution ${TARGET_RESOLUTION}] [--resize-algorithm {RESIZE_ALGORITHM}] [--out-filename {OUT_FILE}] -``` - -Optional arguments: -- `--use-frames`: If specified, the demo will take rawframes as input. Otherwise, it will take a video as input. -- `DEVICE_TYPE`: Type of device to run the demo. Allowed values are cuda device like `cuda:0` or `cpu`. If not specified, it will be set to `cuda:0`. -- `FPS`: FPS value of the output video when using rawframes as input. If not specified, it wll be set to 30. -- `FONT_SIZE`: Font size of the label added in the video. If not specified, it wll be set to 20. -- `FONT_COLOR`: Font color of the label added in the video. If not specified, it will be `white`. -- `TARGET_RESOLUTION`: Resolution(desired_width, desired_height) for resizing the frames before output when using a video as input. If not specified, it will be None and the frames are resized by keeping the existing aspect ratio. -- `RESIZE_ALGORITHM`: Resize algorithm used for resizing. If not specified, it will be set to `bicubic`. -- `OUT_FILE`: Path to the output file which can be a video format or gif format. If not specified, it will be set to `None` and does not generate the output file. - -Examples: - -Assume that you are located at `$MMACTION2` and have already downloaded the checkpoints to the directory `checkpoints/` - -1. Recognize a video file as input by using a TSN model on cuda by default. - - ```shell - # The demo.mp4 and label_map.txt are both from Kinetics-400 - python demo/demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \ - checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \ - demo/demo.mp4 demo/label_map.txt - ``` - -2. Recognize a list of rawframes as input by using a TSN model on cpu. - - ```shell - python demo/demo.py configs/recognition/tsn/tsn_r50_inference_1x1x3_100e_kinetics400_rgb.py \ - checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \ - PATH_TO_FRAMES/ LABEL_FILE --use-frames --device cpu - ``` - -3. Recognize a video file as input by using a TSN model and then generate an mp4 file. - - ```shell - # The demo.mp4 and label_map.txt are both from Kinetics-400 - python demo/demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \ - checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \ - demo/demo.mp4 demo/label_map.txt --out-filename demo/demo_out.mp4 - ``` - -4. Recognize a list of rawframes as input by using a TSN model and then generate a gif file. - - ```shell - python demo/demo.py configs/recognition/tsn/tsn_r50_inference_1x1x3_100e_kinetics400_rgb.py \ - checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \ - PATH_TO_FRAMES/ LABEL_FILE --use-frames --out-filename demo/demo_out.gif - ``` - -5. Recognize a video file as input by using a TSN model, then generate an mp4 file with a given resolution and resize algorithm. - - ```shell - # The demo.mp4 and label_map.txt are both from Kinetics-400 - python demo/demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \ - checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \ - demo/demo.mp4 demo/label_map.txt --target-resolution 340 256 --resize-algorithm bilinear \ - --out-filename demo/demo_out.mp4 - ``` - - ```shell - # The demo.mp4 and label_map.txt are both from Kinetics-400 - # If either dimension is set to -1, the frames are resized by keeping the existing aspect ratio - # For --target-resolution 170 -1, original resolution (340, 256) -> target resolution (170, 128) - python demo/demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \ - checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \ - demo/demo.mp4 demo/label_map.txt --target-resolution 170 -1 --resize-algorithm bilinear \ - --out-filename demo/demo_out.mp4 - ``` - -6. Recognize a video file as input by using a TSN model, then generate an mp4 file with a label in a red color and 10px fontsize. - - ```shell - # The demo.mp4 and label_map.txt are both from Kinetics-400 - python demo/demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \ - checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \ - demo/demo.mp4 demo/label_map.txt --font-size 10 --font-color red \ - --out-filename demo/demo_out.mp4 - ``` - -7. Recognize a list of rawframes as input by using a TSN model and then generate an mp4 file with 24 fps. - - ```shell - python demo/demo.py configs/recognition/tsn/tsn_r50_inference_1x1x3_100e_kinetics400_rgb.py \ - checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \ - PATH_TO_FRAMES/ LABEL_FILE --use-frames --fps 24 --out-filename demo/demo_out.gif - ``` - -### Webcam demo - -We provide a demo script to implement real-time action recognition from web camera. - -```shell -python demo/webcam_demo.py ${CONFIG_FILE} ${CHECKPOINT_FILE} ${LABEL_FILE} \ - [--device ${DEVICE_TYPE}] [--camera-id ${CAMERA_ID}] [--threshold ${THRESHOLD}] \ - [--average-size ${AVERAGE_SIZE}] -``` - -Optional arguments: -- `DEVICE_TYPE`: Type of device to run the demo. Allowed values are cuda device like `cuda:0` or `cpu`. If not specified, it will be set to `cuda:0`. -- `CAMERA_ID`: ID of camera device If not specified, it will be set to 0. -- `THRESHOLD`: Threshold of prediction score for action recognition. Only label with score higher than the threshold will be shown. If not specified, it will be set to 0. -- `AVERAGE_SIZE`: Number of latest clips to be averaged for prediction. If not specified, it will be set to 1. - -Examples: - -Assume that you are located at `$MMACTION2` and have already downloaded the checkpoints to the directory `checkpoints/` - -1. Recognize the action from web camera as input by using a TSN model on cpu, averaging the score per 5 times - and outputting result labels with score higher than 0.2. - -```shell -python demo/webcam_demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \ - checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth demo/label_map.txt --average-size 5 \ - --threshold 0.2 --device cpu -``` - -2. Recognize the action from web camera as input by using a I3D model on gpu by default, averaging the score per 5 times - and outputting result labels with score higher than 0.2. - -```shell -python demo/webcam_demo.py configs/recognition/i3d/i3d_r50_video_inference_32x2x1_100e_kinetics400_rgb.py \ - checkpoints/i3d_r50_32x2x1_100e_kinetics400_rgb_20200614-c25ef9a4.pth demo/label_map.txt \ - --average-size 5 --threshold 0.2 -``` - -**Note:** Considering the efficiency difference for users' hardware, Some modifications might be done to suit the case. -Users can change: -1). `SampleFrames` step (especially the number of `clip_len` and `num_clips`) of `test_pipeline` in the config file. -2). Change to the suitable Crop methods like `TenCrop`, `ThreeCrop`, `CenterCrop`, etc. in `test_pipeline` of the config file. -3). Change the number of `--average-size`. The smaller, the faster. ### High-level APIs for testing a video and rawframes. diff --git a/docs/index.rst b/docs/index.rst index 122c8cc151..c3cc0a19aa 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -6,6 +6,7 @@ Welcome to MMAction2's documentation! install.md getting_started.md + demo.md benchmark.md config.md diff --git a/docs/merge_docs.sh b/docs/merge_docs.sh index cf1c02b016..d927524325 100755 --- a/docs/merge_docs.sh +++ b/docs/merge_docs.sh @@ -1,5 +1,7 @@ #!/usr/bin/env bash +sed -i '$a\\n' ../demo/README.md + sed -i 's/(\/tools\/data\/activitynet\/preparing_activitynet.md/(#activitynet/g' supported_datasets.md sed -i 's/(\/tools\/data\/kinetics\/preparing_kinetics.md/(#kinetics/g' supported_datasets.md sed -i 's/(\/tools\/data\/mit\/preparing_mit.md/(#moments-in-time/g' supported_datasets.md @@ -18,11 +20,13 @@ sed -i 's/(\/tools\/data\/ava\/preparing_ava.md/(#ava/g' supported_datasets.md cat ../configs/localization/*/*.md > localization_models.md cat ../configs/recognition/*/*.md > recognition_models.md cat ../tools/data/*/*.md > prepare_data.md +cat ../demo/README.md > demo.md sed -i 's/#/##&/' localization_models.md sed -i 's/#/##&/' recognition_models.md sed -i 's/md###t/html#t/g' localization_models.md sed -i 's/md###t/html#t/g' recognition_models.md +sed -i "s/md###t/html#t/g" demo.md sed -i 's/# Preparing/# /g' prepare_data.md sed -i 's/#/##&/' prepare_data.md @@ -45,3 +49,4 @@ sed -i 's/](\/docs\//](/g' ./tutorials/*.md sed -i 's=](/=](https://github.com/open-mmlab/mmaction2/tree/master/=g' ./tutorials/*.md sed -i 's/](\/docs\//](/g' supported_datasets.md sed -i 's=](/=](https://github.com/open-mmlab/mmaction2/tree/master/=g' supported_datasets.md +sed -i 's=](/=](https://github.com/open-mmlab/mmaction2/tree/master/=g' demo.md