Addressing Multi-Petabyte Growth in ADAS Development and Simulation
Advanced Driver Assistance Systems, or ADAS, are the fastest growing segment in automotive electronics, designed to automate and improve safe driving. Today, we use several ADAS features such as Adaptive Light Control, Adaptive Cruise Control, lane departure warnings, traffic sign recognition and many others. Almost all car manufacturers and leading suppliers such as Bosch, Autoliv, Continental, Mobileye and others are working on ADA systems with the final goal of building a car that can drive completely autonomously – without any driver involvement.
The Society of Automotive Engineers (SAE) has defined 6 levels to describe the degree of automation:
As the automation levels increase, so do the validation efforts required to develop these assistance systems. The majority of the ADA systems built into mass-produced cars today are between levels 2 and 4. For these systems, millions of miles need to be captured and simulated before the final control units are ready for production.
For ADAS development beyond SAE level 2, data needs to be captured from millions and millions of miles and stored centrally for data enrichment (for example, video frames need to be tagged to provide environmental context, object identification and more). Car manufacturers and suppliers have fleets of cars, equipped with all kinds of sensors, which drive around the world to capture this data.
This is the REAL big data
Data granularity, resolution of video, radar and lidar is constantly evolving – meaning the amount of data generated per second is growing. Imagine a state-of-the-art front looking radar (FLR) operating at 2800 MBit/s. A typical sub-project for ADAS development and simulation may require 200,000 miles of captured data. Let’s assume the data is recorded at an average speed of 60 MPH.
That means: for 200,000 miles at 60 MPH we capture 3,333.3 hours of data.
Given the data rate of 2800 Mbit/s = 350 MBytes/s = 21 GBytes/min = 1260 GBytes/h
Considering the 3,333.3 hours to capture, we end up with 1260 GB/h * 3,333.3h = 4.2 PB
And this is only for a single radar sensor! Now consider that for SAE Level 3 ADAS and above, usually a million miles or more are required and that a high-end car contains easily more than 10 different sensors. That is HUGE!
I recently learned that a typical data set from a single car is about 30TB/day; a fleet of 10 cars would then produce 300 TB per day. All this data needs to be stored, labeled, cleaned, managed, backed-up etc.
In addition to the pure data size for an actual development, there are several regulatory and or contractual requirements to consider as well. For example, data must be kept for 15 years (or longer) depending on geographic regulations or contractual obligations. And for every project, at any point in time during this 15 years, vendors may need to run simulations within a reasonable short timeframe (example – in response to a safety glitch and subsequent recall), which means the data must be kept online. At this scale, it’s simply not possible to archive data on tape and restore it in a reasonable time (which could be measured in days).
Storage system requirements
The majority of Automotive OEMs and suppliers, working on ADAS use scale out NAS storage systems to satisfy the above-mentioned capacity requirements. Let’s have a look for the most important requirements to consider in a storage system:
- Capacity and Simplicity
As explained above, an ADAS environment definitely needs a storage environment that scales up to multiple petabytes in a single filesystem. This means it needs to have a node-based architecture that allows adding capacity when new requirements arise with no downtime or impact on running simulations. At the same time, the architecture must avoid having traditional RAID architectures that force the administrator to deal with hundreds, or even thousands, of RAID-arrays, aggregates, volumes and filesystems.
- Efficient data retention
As stated earlier, there are often regulatory and/or contractual requirements to keep raw data online and available for simulation, sometimes for up to 15 years. In order to simulate certain scenarios or to simulate the whole mileage for an update software release, the data cannot be retrieved from tape due to time restrictions. That means that a different method must be used in order to store data on a more efficient media type.
Isilon supports data tiering within the filesystem to different media types – including the cloud. With Isilon, this is policy-based and is easy to maintain. Simple policies, for example, could keep new data on the fastest node in the cluster, while data that has not been used could be moved to a lower, less costly tier.