While SLAM provides the ability to determine a camera’s location in the environment and a 3D model of the environment, perceiving and recognizing items in that environment require deep learning algorithms like CNNs. CNNs, the current state-of-the-art for implementing deep neural networks for vision, complement SLAM algorithms in AR systems by enhancing the user’s AR experience or adding new capabilities to the AR system.
CNNs can be very accurate when performing object recognition tasks – which include localization (identifying the location of an object in an image) and classification (identifying the image class – i.e., dog vs cat, Labrador vs German Shepard) based on pre-training of the neural network’s coefficients. While SLAM can help a camera move through an environment without running into objects, CNN can identify that the object is a couch, refrigerator, or desk and highlight where it is in the field of view. Popular CNN graphs for real-time object detection – which include classification and localization – are YOLO v2, Faster R-CNN, and Single shot multibox detector (SSD).
CNN object detection graphs can be specialized to detect faces or hands. With CNN-based facial detection and recognition, AR systems can add a name and social media information above a person’s face in the AR environment. Using CNN to detect the user’s hands allow game developers to place a device or instrument needed in the game player’s virtual hand. Detecting a hand’s existence is easier than determining the hand positioning. Some CNN-based solutions require a depth camera output as well as R-G-B sensor output to train and execute a CNN graph.
CNNs can also be applied successfully to semantic segmentation. Unlike object detection, which only cares about the pixels in an image that could be an object of interest, semantic segmentation is concerned about every pixel. For example, in an automotive scene, a semantic segmentation CNN would label all the pixels of the sky, road, buildings, individual cars as a group, which is critical for self-driving car navigation. Applied to AR, semantic segmentation can find ceilings, walls, and the floor as well as furniture or other objects in the space. Semantic knowledge of a scene enables realistic interactions between the real and virtual objects.