Zhiwen Chen1
Jinjian Wu1
Junhui Hou2
Leida Li1
Leida Li1
Guangming Shi1
1Xidian University
2City University of Hong Kong
The neuromorphic event cameras can efficiently sense the latent geometric structures and motion clues of a scene by generating asynchronous and sparse event signals. Due to the irregular layout of the event signals, how to leverage their plentiful spatio-temporal information for recognition tasks remains a significant challenge. Existing methods tend to treat events as dense image-like or point-serie representations. However, they either suffer from severe destruction on the sparsity of event data or fail to encode robust spatial cues. To fully exploit their inherent sparsity with reconciling the spatio-temporal information, we introduce a compact event representation, namely 2D-1T event cloud sequence (2D-1T ECS). We couple this representation with a novel light-weight spatiotemporal learning framework (ECSNet) that accommodates both object classification and action recognition tasks. The core of our framework is a hierarchical spatial relation module. Equipped with specially designed surface-event-based sampling unit and local event normalization unit to enhance the inter-event relation encoding, this module learns robust geometric features from the 2D event clouds. And we propose a motion attention module for efficiently capturing long-term temporal context evolving with the 1T cloud sequence. Empirically, the experiments show that our framework achieves par or even better state-of-the-art performance. Importantly, our approach cooperates well with the sparsity of event data without any sophisticated operations, hence leading to low computational costs and prominent inference speeds.
Clone the repository locally:
pip install git+https://github.com/happychenpipi/ECSNet.git
Create and activate a conda environment and install the required packages:
conda create -n ecsnet python=3.7
conda activate ecsnet
bash install_ecsnet.sh
In this work, we evaluate our method on a wide range of event-based classification datasets, such as N-MNIST, N-Caltech101, N-Cars, CIFAR10-DVS datasets and so on. Please download these data with the link below and put in ./data.
python ./train.py
python ./test.py
Thanks to N-MNIST, N-Caltech101, N-Cars, CIFAR10-DVS datasets, PointMLP and NVS2Graph projects.
Feedbacks and comments are welcome! Feel free to contact us via zhiwen.chen@stu.xidian.edu.cn.
If you use ECSNet in your research, please use the following BibTeX entry.
@article{chen2022ecsnet,
title={Ecsnet: Spatio-temporal feature learning for event camera},
author={Chen, Zhiwen and Wu, Jinjian and Hou, Junhui and Li, Leida and Dong, Weisheng and Shi, Guangming},
journal={IEEE Transactions on Circuits and Systems for Video Technology},
volume={33},
number={2},
pages={701--712},
year={2022},
publisher={IEEE}
}