Training Real-World Self-Driving Cars with Video Games

Robert Laganière, Professor, University of Ottawa and Founder and Chief Science Officer at Sensor Cortek and Tempo Analytics

Whether or not you agree that “data is the new oil,” system architects developing artificial intelligence (AI) chips for autonomous vehicles need huge amounts of data to train the neural networks to make the correct decisions. 

A challenge of building and using deep learning systems is to train the network using data that is as true to the real situation as possible. When the deep network is built, it is trained to perform recognition on an “actual set.” The deep network will perform very well on the data that you have used to train it, but when you deploy the network in a new environment, the performance can vary dramatically. The rate of variation in the performance is generally referred to as the “generalization power” of a deep neural net. This concept is still in the research stage, along with techniques called “domain adaptation” that attempt to leverage labeled data from a source domain to accurately model a new environment.

To overcome the challenges of generalization power and domain adaptation to build a safe AI product that behaves as expected, training systems require high volumes of data. In addition, they need a large diversity of data to ensure that the network will behave well in all situations. 

This is especially true in the case of autonomous driving. To build a fully autonomous vehicle, the training data must consider all of the possible weather situations, road geographies, types of driver actions, and vulnerable road users that the car might encounter. Developing a strongly trained network requires not just millions of kilometers of driving experience, but also exabytes of data (1 EB = 1000 PB = 1,000,000 TB). Most companies do not have the financial means to deploy cars over millions of kilometers. In addition, it’s impractical to go through every possible scenario in the real world, like pedestrians jaywalking in dense traffic, animals on the road, or a biker running a red light, to develop a dataset.

Using Synthetic Images to Create Synthetic Datasets

One solution to train networks more quickly and cost-effectively is using synthetic data. Synthetic data is increasingly being used for machine learning applications, where a model is trained on a synthetically generated dataset with the intention of transferring its learning to the real world. Figure 1 shows an example of a synthetic data set, Playing for Benchmarks, that is publicly available. Playing for Benchmarks is built on the video game Grand Theft Auto, which is a fairly photo-realistic view of driving in a city. Using synthetic images to train a network provides a wide range of driving scenarios without having to go outside with a real car. Synthetic data sets can also help get around security, safety, or privacy concerns that can come into play with real world datasets.

 

Figure 1: Grand Theft Auto used as training data

As synthetic datasets are generated programmatically, their features and length can be varied, and they also can include levels of randomness. For safety-critical applications like autonomous driving, synthetic datasets can include a variety of scenes, objects, and lighting conditions. Random noise can be interjected to simulate dirty cameras, fog, and other visual obstructions.

More importantly, since you control everything in the dataset, such as knowing where the cars, trees, pedestrians, and signs appear, you get the data annotation for free. Generating perfectly accurate labels for real-world datasets can be expensive or impossible to create. Free annotation is a great benefit. A team of researchers from Intel Labs and Darmstadt University in Germany developed a method of extracting data annotation for training from Grand Theft Auto. The researchers created a software layer between the game and a computer’s hardware that classifies objects in the road scenes shown in the game. While it would be nearly impossible to have humans label all of the scenes with similar detail manually, the classification software provides the labels for the machine-learning algorithm automatically. The algorithm can then recognize trucks, cars, signs, pedestrians, and other objects.

A synthetic dataset can be modified to improve the model and training by changing the weather, the lighting conditions, the number and density of cars, pedestrians, and bicycles, and other “natural” variants.

Additional datasets can be produced using tools such as CARLA, an open-source gaming engine, that can generate driving scenarios despite being less photo-realistic than Grand Theft Auto (Figure 2). Synscapes 7D is the most photo-realistic publicly available dataset but is more restricted in terms of diversity.

Figure 2: Example scenes from CARLA and Synscapes 7D datasets

Can Synthetic Datasets Be Used for Real-World Cars?

While the concept of using synthetic datasets is an interesting academic exercise, the real question is, “Is it a good idea to use synthetic images to train a network that will be deployed in the real world?”

To test the concept, we experimented with two strategies. To train a network with a dataset that contains both synthetic and real-world images, the first option is to simply put everything together and train the network with a mixture of synthetic and real-world data (Figure 3). 

Figure 3: Option 1 – Mixing synthetic and real-world data

The second option is to use “fine tuning” (Figure 4). First, we trained the network using the synthetic data set, then refined the network using the real-world data.

Figure 4: Option 2 – Training the network with synthetic data, then refining the network using real-world data

An additional consideration in this case is to establish the correct proportion of synthetic vs real datasets to get the best performance. To this end, we performed a series of training procedures always keeping the total amount of images used to 15000. We used the Kitti/Cityscape (KC) real-world dataset to compare the performance of the two concepts. The baselines (Figure 5) show the results obtained when real-world-only KC dataset images are used to train the neural network. The other three lines show a mixture of synthetic and real-world data, from 2.5% real-world data with 97.5% synthetic data, to 10% real-world data with 90% synthetic data. (Results that show higher average precision and higher average recall, up and to the right, are superior.) The three points on the non-KC lines correspond to the synthetic datasets. For the mixed dataset (Figure 5, left), the precision is approximately the same as the baseline, but the recall is slightly less. However, the overall results compare well to the baseline.

The second approach, fine-tuning (Figure 5, right), shows greater recall than the mixed datasets, likely because the synthetic data includes more diversity to produce better detection of pedestrians and cars. In particular, the two circled datasets (10% real-world data with 90% synthetic data) showed stronger results in recall and precision than the baseline, while CARLA was slightly more disappointing, probably because it was not diverse enough. 

Figure 5: Mixed datasets showed similar results as the baseline, but the fine-tuning technique provided better results than the baseline.

Conclusion

To train autonomous vehicles, more data is better, and using synthetic data provides significant benefits, like more diverse scenarios and automatic data labels, over real-world-only data, while minimizing data collection costs. The more data that you have, the better chance you have to produce a strong system. Fine-tuning synthetic data can yield results that are as good as, or better than, using only real-world data. Even at a 10:1 ratio between synthetic and real-world data, the performance results are comparable to using real-world data alone. If you use synthetic data, be sure to include a large diversity of environments, objects, and lighting conditions, which will affect your results more than having high photo-realism.

 

For More Information:

Nowruzi F.E., Kapoor P., Kolhatkar D., Al Hassanat F., Laganière R. Rebut J., How much real data do we actually need: Analyzing object detection performance using synthetic and real data, ICML Workshop on Autonomous Driving, Long Beach, CA, June 2019.