Our research activities receive another great achievement!
Our collaboration with Institut Pascal (Clermont Auvergne) is once more strengthened in the upcoming issue of “Microprocessors and Microsistems” published by Elsevier.
In Volume 77, September 2020, our contribution to “On building a CNN-based multi-view smart camera for real-time object detection” will be published.
Congrats to Jonathan and to our Kamel!
Here the abstract:
Multi-view image sensing is currently gaining momentum, fostered by new applications such as autonomous vehicles and self-propelled robots. In this paper, we prototype and evaluate a multi-view smart vision system for object recognition. The system exploits an optimized Multi-View Convolutional Neural Network (MVCNN) in which the processing is distributed among several sensors (heads) and a camera body. The main challenge for designing such a system comes from the computationally expensive workload of real-time MVCNNs which is difficult to support with embedded processing and high frame rates. This paper focuses on the decisions to be taken for distributing an MVCNN on the camera heads, each camera head embedding a Field-Programmable Gate Array (FPGA) for processing images on the stream. In particular, we show that the first layer of the AlexNet network can be processed at the nearest of the sensors, by performing a Direct Hardware Mapping (DHM) using a dataflow model of computation. The feature maps produced by the first layers are merged and processed by a camera central processing node that executes the remaining layers.
The proposed system exploits state-of-the-art deep learning optimization methods, such as parameter removing and data quantization. We demonstrate that accuracy drops caused by these optimizations can be compensated by the multi-view nature of the captured information. Experimental results conducted with the AlexNet CNN show that the proposed partitioning and resulting optimizations can fit the first layer of the multi-view network in low-end FPGAs. Among all the tested configurations, we propose 2 setups with an equivalent accuracy compared to the original network on the ModelNet40 dataset. The first one is composed of 4 cameras based on a Cyclone III E120 FPGA to embed the least expensive version in terms of logic resources while the second version requires 2 cameras based on a Cyclone 10 GX220 FPGA. This distributed computing with workload reduction is demonstrated to be a practical solution when building a real-time multi-view smart camera processing several frames per second.