Skip to content

Latest commit

 

History

History
164 lines (116 loc) · 5.89 KB

data_preparation.md

File metadata and controls

164 lines (116 loc) · 5.89 KB

Data Preparation

Kinetics

For more information about Kinetics dataset, please refer the official website. You can take the following steps to prepare the dataset:

  1. Download the videos via the official scripts.

  2. Preprocess the downloaded videos by resizing to the short edge size of 256.

  3. Prepare the csv files for training, validation, and testing set as train.csv, val.csv, test.csv. The format of the csv file is:

path_to_video_1 label_1
path_to_video_2 label_2
path_to_video_3 label_3
...
path_to_video_N label_N

All the Kinetics models in the Model Zoo are trained and tested with the same data as Non-local Network and PySlowFast. For dataset specific issues, please reach out to the dataset provider.

Charades

We follow PySlowFast to prepare the Charades dataset as follow:

  1. Download the Charades RGB frames from official website.

  2. Download the frame list from the following links: (train, val).

Something-Something V2

We follow PySlowFast to prepare the Something-Something V2 dataset as follow:

  1. Download the dataset and annotations from official website.

  2. Download the frame list from the following links: (train, val).

  3. Extract the frames from downloaded videos at 30 FPS. We used ffmpeg-4.1.3 with command:

    ffmpeg -i "${video}" -r 30 -q:v 1 "${out_name}"
    
  4. The extracted frames should be organized to be consistent with the paths in frame lists.

AVA (Actions V2.2)

The AVA Dataset could be downloaded from the official site

We followed the same downloading and preprocessing procedure as the Long-Term Feature Banks for Detailed Video Understanding do.

You could follow these steps to download and preprocess the data:

  1. Download videos
DATA_DIR="../../data/ava/videos"

if [[ ! -d "${DATA_DIR}" ]]; then
  echo "${DATA_DIR} doesn't exist. Creating it.";
  mkdir -p ${DATA_DIR}
fi

wget https://s3.amazonaws.com/ava-dataset/annotations/ava_file_names_trainval_v2.1.txt

for line in $(cat ava_file_names_trainval_v2.1.txt)
do
  wget https://s3.amazonaws.com/ava-dataset/trainval/$line -P ${DATA_DIR}
done
  1. Cut each video from its 15th to 30th minute. AVA has valid annotations only in this range.
IN_DATA_DIR="../../data/ava/videos"
OUT_DATA_DIR="../../data/ava/videos_15min"

if [[ ! -d "${OUT_DATA_DIR}" ]]; then
  echo "${OUT_DATA_DIR} doesn't exist. Creating it.";
  mkdir -p ${OUT_DATA_DIR}
fi

for video in $(ls -A1 -U ${IN_DATA_DIR}/*)
do
  out_name="${OUT_DATA_DIR}/${video##*/}"
  if [ ! -f "${out_name}" ]; then
    ffmpeg -ss 900 -t 901 -i "${video}" "${out_name}"
  fi
done
  1. Extract frames
IN_DATA_DIR="../../data/ava/videos_15min"
OUT_DATA_DIR="../../data/ava/frames"

if [[ ! -d "${OUT_DATA_DIR}" ]]; then
  echo "${OUT_DATA_DIR} doesn't exist. Creating it.";
  mkdir -p ${OUT_DATA_DIR}
fi

for video in $(ls -A1 -U ${IN_DATA_DIR}/*)
do
  video_name=${video##*/}

  if [[ $video_name = *".webm" ]]; then
    video_name=${video_name::-5}
  else
    video_name=${video_name::-4}
  fi

  out_video_dir=${OUT_DATA_DIR}/${video_name}/
  mkdir -p "${out_video_dir}"

  out_name="${out_video_dir}/${video_name}_%06d.jpg"

  ffmpeg -i "${video}" -r 30 -q:v 1 "${out_name}"
done
  1. Download annotations
DATA_DIR="../../data/ava/annotations"

if [[ ! -d "${DATA_DIR}" ]]; then
  echo "${DATA_DIR} doesn't exist. Creating it.";
  mkdir -p ${DATA_DIR}
fi

wget https://research.google.com/ava/download/ava_v2.2.zip -P ${DATA_DIR}
unzip -q ${DATA_DIR}/ava_v2.2.zip -d ${DATA_DIR}
  1. Download "frame lists" (train, val) and put them in the frame_lists folder (see structure above).

  2. Download person boxes that are generated using a person detector trained on AVA - (train, val, test) and put them in the annotations folder (see structure above). Copy files to the annotations directory mentioned in step 4. If you prefer to use your own person detector, please generate detection predictions files in the suggested format in step 6.

Download the ava dataset with the following structure:

ava
|_ frames
|  |_ [video name 0]
|  |  |_ [video name 0]_000001.jpg
|  |  |_ [video name 0]_000002.jpg
|  |  |_ ...
|  |_ [video name 1]
|     |_ [video name 1]_000001.jpg
|     |_ [video name 1]_000002.jpg
|     |_ ...
|_ frame_lists
|  |_ train.csv
|  |_ val.csv
|_ annotations
   |_ [official AVA annotation files]
   |_ ava_train_predicted_boxes.csv
   |_ ava_val_predicted_boxes.csv