One of the heaviest impacting factors on resources when developing Advanced Driver Assistance Systems is the process of real data collection. A fully-equipped data rig must be deployed on streets for extended periods — expensive in hardware maintenance, data post-processing, and manual annotation subject to human error. Moreover, diversity is constrained to geography: there is only so much that can be captured within 24 hours.

We propose an automated pipeline to leverage the scripting and physics capabilities of the Unity game engine to generate synthetic datasets cost-effectively. A wide range of data formats can be produced including RGB-D to Velodyne-like point clouds. Total environmental control via generation parameters means lighting, weather, and scene context can be freely manipulated. Preliminary results show our synthetic data holds the potential to boost 2D object detection performance and model robustness.

Aside from being a means of data generation, our Unity platform can serve as a safe testbed for collaborative algorithms. A complex road network utilizing V2X techniques can be adequately simulated while accounting for bandwidth constraints.

Synthetic data collage
Research Seminar

Powerful Unity3D Foundation

Built on top of Unity3D, our platform operates within a virtual world rich with high-fidelity models and textures. Unity's C# scripting API enables full automation of the data collection process. We generate vast amounts of data with ensured variability — a single RTX 2080 Ti achieves an image generation rate of 5 Hz.

Windridge scene

Accurate Ground Truth

By using a simulation, we inherently have access to every variable that defines the virtual world — every generated dataset comes with complementary ground truth accurate down to the pixel and virtually for free. Realtime ground truth can also be simulated, such as pseudo car odometry and camera ego-motion.

Deterministic Control of Conditions

A wide range of weather conditions can be simulated through techniques used in the game industry. From sunlight direction to harsh snow, the inclusion of adverse conditions serves as a deterministic method to benchmark neural network performance.

Diverse Datasets

Manipulation of conditions allows us to generate datasets of high variability to avoid data bias. Ground truth such as 2D bounding boxes can also exhibit randomness through changing poses and occlusion amount. Preliminary experiments demonstrate that our synthetic data provides object detection boost when trained alongside real data, and holds the same generalizing capabilities as other real datasets.

Diverse dataset samples Bounding box ground truth

Collaborative Testbed

The creation of a real collaborative dataset proves to be difficult because of technicalities. Our platform eliminates issues of inaccurate pose data, unstable hardware synchronization, and laborious point cloud annotation. We can generate collaborative datasets of 2D and 3D formats tailored for V2X edge computing.

← Back to Engineering