This repository contains the link to test data as well as the code to replicate the experiments in ``Camera clustering for scalable stream-based active distillation". Our methodology and full project scope is detailed in our paper paper. Some additional informations can be found in our Supplementary Materials document https://drive.google.com/file/d/1Jutgzw-nT0-8b_B2lO-XXDmF8jjsljMM/view?usp=sharing. You can also find a brief link below.
The WALT Watch And Learn 2D Amodal Representation using Time-lapse Imagery (WALT) dataset is available through Carnegie Mellon University's official website at https://www.cs.cmu.edu/~walt/. Please note that accessing the full dataset requires acceptance of their license terms. Our group offers additional annotations solely for subsets of images from the test set used in our methods. We have secured a written agreement from Carnegie Mellon for this purpose. All uses of the dataset, including the additional annotations, must adhere to the licensing terms specified by Carnegie Mellon.
The WALT, captured by nine distinct cameras, offers a wide range of environmental settings. Figure below showcases each camera’s viewpoint, highlighting the variety in visual conditions and challenges pertinent to object detection tasks.
![]() Cam 1 |
![]() Cam 2 |
![]() Cam 3 |
![]() Cam 4 |
![]() Cam 5 |
![]() Cam 6 |
![]() Cam 7 |
![]() Cam 8 |
![]() Cam 9 |
A subset of 1850 images was meticulously labeled to identify "vehicle" instances, resulting in 12,577 annotated instances. Table below summarizes the labeled dataset, providing insights into the annotation process and offering links to Roboflow for easy access to the data.
To access the dataset, append the specified path to this preamble: https://universe.roboflow.com/sbad-dvvax
.
In our experiment, in section VI.B, "Impact of Teacher size", we manually annotated 96 images sampled by a YOLOv8n^COCO Student utilizing the Least-Confidence
or Top-Confidence
approach and subsequently compared the performance of these Students against scenarios in which a Teacher model pseudo-annotated the same set of images and a human. In Table [tab:roboflowLinks], we present the images sampled by each strategy, which we annotated, thereby creating the equivalent human annotations.
URLs are relative to the preamble: https://universe.roboflow.com/sbad-dvvax/.
Camera_id | Week | Strategy | Roboflow link |
---|---|---|---|
1 | 1 | Top-Confidence |
cam1_confbias_topconf96/dataset/3 |
1 | 1 | Least-Confidence |
cam1_confbias_leastconf96/dataset/3 |
2 | 2 | Top-Confidence |
cam2_confbias_topconf96/dataset/3 |
2 | 1 | Least-Confidence |
cam2_confbias_leastconf96/dataset/5 |
3 | 5 | Top-Confidence |
cam3_confbias_topconf96/dataset/2 |
3 | 5 | Least-Confidence |
cam3_confbias_leastconf96/dataset/3 |
Before beginning the installation process, ensure you have:
- A Linux 20.04 system.
- An active wandb account for experiment tracking.
- Conda or virtualenv prepared on your machine.
- Videos dataset split according to the Dataset section of at least 5000 samples per image
The code was developed under Linux 20.04.
We recommend working in a virtualenv or conda environment.
conda create -y --name CSBAD python pip
conda activate CSBAD
To reproduce the results, you need to install the requirements of the YOLOv8 framework AND:
cd ..
pip install -r requirements.txt
We use wandb to log all experimentations. You can either use your own account, or create a team. Either way, you will need to login and setup an entity to push logs and model versions to.
- Create a wandb entity
- Setup wandb to send logs :
wandb login
Required dataset structure :
Modifications to this structure are permissible but require appropriate configuration adjustments.
Ensure your dataset adheres to the following structure:
Dataset ├── cam{1} │ ├── week{1} │ │ └── bank │ │ │ ├── images │ │ │ └── labels_${STUDENT-MODEL}w_conf | └── labels${TEACHER-MODEL} | | └── test │ │ ├── images │ │ └── labels . . │ └── week{i} │ └── ... . └── cam{j} └── ...
## 3. Getting Started <a name="getting-started"></a>
### Generation of the pseudo labels (populate bank)
To generate the pseudo labels, execute the following command:
```bash
python generate_pseudo_labels.py --folder "YOURPATH/WALT"
OPTIONS:
--extension "jpg" or "png"
--model-name "yolov8n, yolov8s, yolov8m, yolov8l, yolov8x6" or any custom model
--output-conf # Produce labels with confidence, useful for confidence_based strategies
Note: The 'bank' folder must contain an 'images' folder with all the images. If you are on Windows, only use "
and not '
.
Before conducting an experiment, ensure your wandb entity and the project are correctly set up in the experiments/model/yolov8.yaml
Hydra config file.
To conduct an experiment, follow these steps:
- Populate a
val
folder based on the data contained in the bank folder. - Populate a
train
folder based on astrategy
applied on the data contained in the bank folder. Thestrategy
ensures that no images are duplicated between the train and the val sets. - Launch a training on a
yolov8n
model based on the previously generated sets. The scripts automatically launch everything to wandb.
You can launch an experiment by executing the main script:
python train.py
In case of debugging, you can add HYDRA_FULL_ERROR=1
as an environment variable to see the traceback more clearly.
HYDRA_FULL_ERROR=1 python train.py
Hydra is a configuration management tool that retrieves the information included in the *.yaml
files to facilitate the deployment of the application.
To modify an experiment you can modify the configuration file experiments/experiment.yaml
. At your first use, you will have to modify the paths to the dataset and your wandb username.
You need to modify
- experiments/model/yolov8.yaml (WANDB entity)
- experiments/experiment.yaml (Insert your data folder)
The logs and outputs of the runs are stored in the
output
folder.
remark: if you are using Windows, do not forget to adapt your paths by using /
instead of not \
or //
.
IMPORTANT !
We use a specific run naming format to track the experiments in wandb and run testing. We do that using the name attribute in the dataset config file. Look at
experiments/dataset/WALT.yaml
for an example. NAMING CONVENTION: {Source_domain}{Target_domain}{Student}{Teacher}{Strategy}>{Active-Learning-Setting}{Total_Samples}{Iteration_Level}{Epochs}> >_{Validation_Set} Finer-granularity can be expected as {Source_domain} = {dataset}->{domain}-{period}
python train.py --multirun n_samples=300 cam=1o2o3 week=1o1o5 strategy=thresh-top-confidence student=yolov8n teacher=yolov8m
This will train by sampling 100 samples from camera 1,2,3 using thresh-top-confindence based on the student_yolov8n confidence. The teacher model generates afterwards a yolov8x6 teacher.
You can use the download to get all the models of a specific project from wandb. Then you use the inference tool to test the models on the dataset. Finally use the inference_coco tool to generate the same testing metrics for the student and teacher models. All the testing results are concatenated in a single file.
We have created a test.py
file that executes all of these steps in a single command.
python test.py --run-prefix WALT --entity YourEntity --project WALT --template testing/templates/WALT.yaml --dataset_path "YOURPATH/WALT/"
Flags :
--entity
: wandb project team--project
: wandb project name--run-prefix
: project name used as prefix in the runs names (sometimes it's different than the wandb project name, like in the case of "study")--template
: template file for data.yaml used to specify test "sub" datasets--dataset_path
: parent path containing all dataset folders (camX in case of WALT)--query_filter
: you can choose to download and test only specific models by filtering through characters or words in the run names.--wandb-download
: you can set this to false, if you would like to run all testing pipeline without the download script
Results will be in testdir/project
where you'll find :
wandb
folder containing downloaded weights.inference_results.csv
file containing inference resultsplots
folder containing generated plots for each metric in the inference results (To modify plots look intesting/plot.py
)
You can also run each script/tool individually :
- Download models :
python testing/download.py -e YourEntity -p WALT -f ./testdir/WALT/wandb -lf -d
- Test downloaded models on the test set :
python3 ./testing/inference.py -w ./testing/WALT/wandb -d "YOURPATH/WALT/" -p WALT -y testing/templates/WALT.yaml -f test -c ./testdir/WALT/inference_results.csv
- Test pretrained Student and Teacher models on dataset :
python3 testing/inference_coco.py --model yolov8n --csv_path ./testdir/WALT/inference_results.csv --dataset "s05c016->"YOURPATH/WALT/s05c016/" --dataset--data-template testing/templates/Ai-city.yaml --folder test
- Plot graphs :
python testing/plot.py --csv_path ./testdir/WALT/inference_results.csv --save_path ./testdir/Ai-city/plots
Use the --help
flag for more information on the usage of each script.
To obtain the clustering, you have to train one model per camera and then cross-evaluate this model on each test sets. Namely:
python train.py n_samples=256 cam=X week=Y strategy=thresh-top-confidence, where (X,Y) is the following pairs (1,1);(2,1);(3,5,);(4,2);(5,3);(6,4);(7,4);(8,3);(9,1). This will use the default settings as the student=yolov8n, teacher=yolov8x6 and epochs=100.
You can then use the testing module and rearrange the resulting csv. For convenience, we already provided the ``cross_performance.csv" and you can find out the clusters by running
python python cluster_cameras.py