A large percentage of the cameras that are used in today’s vehicles support VGA resolutions, but vehicles that are currently being designed are quickly moving to 1 megapixel (MP) and 2 MP cameras. Higher resolutions are important where smaller portions of the visual field have to be examined. A car traveling at 70 MPH will cover more than 300 feet in three seconds. At 300 feet at VGA resolutions, a pedestrian will not be easily distinguished from the background. At the much higher resolution that a 2 MP camera provides, a pedestrian can be recognized and the vehicle can warn the driver or take evasive action, if needed, while there is still enough time to effectively respond.
The use of higher resolution cameras comes with added cost and higher power consumption due to the increase in memory and bus bandwidth as well as the processing power needed to evaluate the camera output in real-time. While it is not difficult to design a vision processor that can handle the input from a 2 MP camera, the real challenge is controlling the increase in cost and power consumption. This requires specialized, power-efficient vision processors that minimize memory bandwidth and the power needed to process the video stream. In addition to managing the input from 2 MP cameras, vision processors must also evaluate input from other sensors (radar, LIDAR, infrared, etc) and combine it with the vision input to make decisions. The requirement to interpret data from multiple inputs significantly increases the capabilities and accuracy of automotive systems, and results in additional load on the vision processor. While this processing could be offloaded to other processors in the car, most car designers are keeping the processing and analysis of the sensor input close to the source. This design decision reduces the potential for problems, the need for memory buffering, and the power consumption resulting from moving large amounts of data around the vehicle. However, it also puts greater demands on the vision processor to analyze the sensor input, refine it, and send the results on to the vehicle’s systems. This also has to be done with little to no increase in the power consumption of the camera sensor module, which includes the vision processor.
In recent years, automotive vision applications have started using convolution neural network (CNN) technology, which operates much like our brains do to identify objects and conditions in visual images. CNN graphs are trained to recognize any object or multiple objects and to classify them, and the graphs are then programmed into a vision processor. The CNN vision capability is more accurate than other vision algorithms, and is, in fact, approaching the accuracy and recognition capabilities of humans. This is very desirable in vehicles where recognition and accuracy are critical for understanding the objects to avoid or ignore.