Event-based Asynchronous Sparse Convolutional Networks

Aug 23, 2020·

Nico Messikommer*

Daniel Gehrig*

Antonio Loquercio

Davide Scaramuzza

· 1 min read

PDF Cite Code Video

Abstract

Event cameras are bio-inspired sensors that respond to perpixel brightness changes in the form of asynchronous and sparse “events”. Recently, pattern recognition algorithms, such as learning-based methods, have made significant progress with event cameras by converting events into synchronous dense, image-like representations and applying traditional machine learning methods developed for standard cameras. However, these approaches discard the spatial and temporal sparsity inherent in event data at the cost of higher computational complexity and latency. In this work, we present a general framework for converting models trained on synchronous image-like event representations into asynchronous models with identical output, thus directly leveraging the intrinsic asynchronous and sparse nature of the event data. We show both theoretically and experimentally that this drastically reduces the computational complexity and latency of high-capacity, synchronous neural networks without sacrificing accuracy. In addition, our framework has several desirable characteristics: (i) it exploits spatio-temporal sparsity of events explicitly, (ii) it is agnostic to the event representation, network architecture, and task, and (iii) it does not require any train-time change, since it is compatible with the standard neural networks’ training process. We thoroughly validate the proposed framework on two computer vision tasks: object detection and object recognition. In these tasks, we reduce the computational complexity up to 20 times with respect to high-latency neural networks. At the same time, we outperform state-ofthe-art asynchronous approaches up to 24% in prediction accuracy.

Type

Conference paper

Publication

European Conf. on Computer Vision (ECCV)

This work received a Best Presentation Award at the ONSVP workshop at ICRA 2021 in Xi’an.

Last updated on Aug 23, 2020

Computer Vision

← Learning Monocular Dense Depth from Events Nov 25, 2020

Video to Events: Recycling Video Dataset for Event Cameras Jun 16, 2020 →