New Pedestrian Detector from Google Could Make Self-Driving Cars Cheaper
Google’s self-driving cars roam the sunny streets of Mountain View, Calif., in public but much of the technology that powers them has never seen the light of day. Yesterday, attendees at the IEEE International Conference on Robotics and Automation (ICRA) in Seattle got a rare glimpse into a new safety feature the tech giant is working on.
Anelia Angelova, a research scientist at Google working on computer vision and machine learning, presented a new pedestrian detection system that works on video images alone. Recognizing, tracking, and avoiding human beings is a critical capability in any driverless car, and Google’s vehicles areduly festooned with lidar, radar, and cameras to ensure that they identify people within hundreds of meters.
But that battery of sensors is expensive; in particular, the spinning lidar unit on the roof can cost nearly $10,000 (or more if for multiple units). If autonomous vehicles could reliably locate humans using cheap cameras alone, it would lower their cost and, hopefully, usher in an era of robotic crash-free motoring all the sooner. But video cameras have their issues. “Visual information gives you a wider view [than radars] but is slower to process,” Angelova told IEEE Spectrum.
At least it used to be. The best video analysis systems use deep neural networks—machine learning algorithms that can be trained to classify images (and other kinds of data) extremely accurately. Deep neural networks rely on multiple processing layers between the input and output layers. For image recognition, the input layer learns features of the pixels of an image. The next layer learns combinations of those features, and so on through the intermediate layers, with more sophisticated correlations gradually emerging. The output layer makes a guess about what the system is looking at.
Modern deep networks can outperform humans in tasks such as recognizing faces, with accuracy rates of over 99.5 percent. But traditional deep networks applied to pedestrian detection are very slow, dividing each street image into 100,000 or more tiny patches, explains Angelova, and then analyzing each in turn. This can take seconds or even minutes per frame, making them useless for navigating city streets. Long before a car using such a network has identified a pedestrian, it might have run the person over.