Towards Realistic Evaluation of Industrial Continual Learning Scenarios with an Emphasis on Energy Consumption and Computational Footprint
[Paper
] [Poster
] [Summary Video
]
Abstract: Incremental Learning (IL) aims to develop Machine Learning (ML) models that can learn from continuous streams of data and mitigate catastrophic forgetting. We analyze the current state-of-the-art Class-IL implementations and demonstrate why the current body of research tends to be one-dimensional, with an excessive focus on accuracy metrics. A realistic evaluation of Continual Learning methods should also emphasize energy consumption and overall computational load for a comprehensive understanding. This paper addresses research gaps between current IL research and industrial project environments, including varying incremental tasks and the introduction of Joint Training in tandem with IL. We introduce InVar-100 (Industrial Objects in Varied Contexts), a novel dataset meant to simulate the visual environments in industrial setups and perform various experiments for IL. Additionally, we incorporate explainability (using class activations) to interpret the model predictions. Our approach, RECIL (Real-world Scenarios and Energy Efficiency considerations for Class Incremental Learning), provides meaningful insights about the applicability of IL approaches in practical use cases. The overarching aim is to tie the Incremental Learning and Green AI fields together and encourage the application of CIL methods in real-world scenarios. Code and dataset are available.
The Industrial Objects in Varied Contexts (InVar) Dataset was internally produced by our team and contains 100 objects in a total of 20,800 images (208 images per class). The objects consist of common automotive, machine, and robotics lab parts. Each class contains 4 sub-categories (52 images each) with different attributes and visual complexities.
White background (Dwh): The object is against a clean white background, and the object is clear, centred, and in focus.
Stationary Setup (Dst): These images are also taken against a clean background using a stationary camera setup, with uncentered objects at a constant distance. The images have lower DPI resolution with occasional cropping.
Handheld (Dha): These images are taken with the user holding the objects, with occasional occlusion.
Cluttered background (Dcl): These images are taken with the object placed along with other objects from the lab in the background with no occlusion.
The dataset was produced by our staff at different workstations and labs in Berlin. Human subjects, when present in the images (e.g. holding the object), remain anonymised. More details regarding the objects used for digitisation are available in the metadata file.
https://huggingface.co/datasets/vivek9chavan/InVar-100
The InVar-100 dataset can also be accessed here: http://dx.doi.org/10.24406/fordatis/266.3
Our code borrows heavily form the following repositories:
https://github.com/G-U-N/PyCIL
https://github.com/facebookresearch/dino
https://github.com/facebookresearch/VICRegL
If you find our work or any of our materials useful, please cite our paper:
@InProceedings{Chavan_2023_ICCV,
author = {Chavan, Vivek and Koch, Paul and Schl\"uter, Marian and Briese, Clemens},
title = {Towards Realistic Evaluation of Industrial Continual Learning Scenarios with an Emphasis on Energy Consumption and Computational Footprint},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2023},
pages = {11506-11518}
}