While the scalar unit and vector unit are programmed using C and OpenCL C (for vectorization), the CNN engine does not have to be manually programmed. The final graph and weights (coefficients) from the Training Phase can be fed into a CNN mapping tool and the embedded vision processor’s CNN engine can be configured and ready to execute facial analysis.
Images or video frames captured from a camera lens and image sensor are fed into the embedded vision processor. It can be difficult for CNN to handle significant variations in lighting conditions or facial poses, so pre-processing of the images make the faces more uniform. The heterogeneous architecture of a sophisticated embedded vision processor and CNN allows the CNN engine to classifying the image while the vector unit is preprocessing the next image – light normalization, image scaling, plane rotation, etc., and the scalar unit handles the decision making (i.e., what to do with the CNN detection results).
Image resolution, frame rate, number of graph layers and desired accuracy all factor into the number of parallel multiply-accumulations needed and performance requirements. Synopsys’ EV6x Embedded Vision Processors with CNN can run at up to 800MHz on 28nm process technologies, and offer performance of up to 880 MACs simultaneously.
Once the CNN is configured and trained to detect emotions, it can be more easily reconfigured to handle facial analysis tasks like determining an age range, identifying gender or ethnicity, and recognizing the presence of facial hair or glasses.