By Ben Rathaus
In autonomous driving, seeing is not necessarily the same as understanding. Within the broader task of perception, the goal of the tracking algorithm is to actively find all targets in the vehicle’s field of view and to create a continuous and accurate description of the driving environment. When done correctly, the result is a coherent understanding of the dynamics of all environmental objects — whether a vulnerable road user, speeding car, or bicycle– which improves path planning and hazard avoidance, and optimizes the perception algorithm itself.
Dynamic object tracking is a critical tool for connecting the dots created between all of the data points collected in the driving environment. It is a complex challenge, and high resolution imaging radar is the key to successfully overcoming it.
The decision making process for driving is a very complex task. Three of the most important inputs into decision making are ego-motion information (linear velocity and turn rate), free-space map of the driving environment (a two dimensional occupancy grid), and last but not least an object list that consists of all the dynamic objects in the car’s vicinity. Both free space mapping and object tracking assess objects in a vehicle’s environment. But while free space mapping can give an accurate picture of a vehicle’s surroundings, it is not sufficient for understanding the dynamics of the objects detected — where they came from, where they are headed, and at what speed. Moreover, free space mapping includes for its calculations both stationary and dynamic objects in the driving environment, while tracking includes only the dynamic objects.
Through tracking, then, an autonomous vehicle is able to determine an object’s whereabouts in relation to itself and to understand the object’s dynamic characteristics, such as its speed, size, shape, altitude, and orientation. This builds a more complete picture of what is happening on and near the road, creating safer navigation.
Today, tracking is accomplished primarily with cameras. While cameras excel at object classification, recognition of road markings, and identifying boundary boxes, they also have drawbacks: they are ineffective in conditions of poor lighting and challenging weather, they lack good depth perception, are unable to directly measure relative velocity, and their reaction time is slow because they require several frames to identify speed change.
As with most perception tasks, though, tracking is best accomplished by employing more than one sensor. 4D Imaging Radar is the perfect complement to cameras for this goal, because it succeeds where cameras struggle — longer range, direct measurement of distance and relative velocity, and recognizing connections between frames — and where traditional radars utterly fail — correctly analyzing elevation of objects, and attributing tight bounding boxes with valid orientations due to the order of magnitude better spatial resolution. Finally, it empowers all of the new and advanced applications for autonomous driving, which cannot be accomplished without sophisticated tracking.
Consider the example that was captured in a real-time experiment with our radar on a highway just outside Tel Aviv. While driving in Israeli heavy traffic at approximately 70 km/h, our tracking algorithm picked up what seemed at first to be spurious high elevation objects. As we drove on, and line of sight occlusions cleared, we found that behind a few road signs there was a cargo train, rolling heavily on a bridge some 70 meters ahead, more or less perpendicularly to the highway path. With a traditional radar that cannot distinguish elevation satisfactorily, the train would have been detected, as all other objects, on the same plane as the ego-car. If the ego-car ran on Autopilot mode, it would have two bad choices: either to emergency brake in the middle of the highway, or to ignore the radar altogether. Only with an imaging radar can one exploit the innate advantages that radar systems present, while refraining from emergency braking at an unacceptable rate.
Object tracking is all about achieving a coherent understanding of the driving scenario. To do that, it will not suffice to see all the objects, seen as clusters of radar detections, in the current radar frame, but it is necessary to track the evolution of the scene and be able to make the proper connections between back-to-back frames. The challenge, then, is to transition from many single-frame descriptions of the instantaneous driving environment, to a multi-frame description, that enables scene understanding and motion vector estimation of each object.
It is important to note that, due to the limitations of traditional radars, such tracking cannot run reliably on such radars. Elevation, which was covered in the previous section, is a significant limitation, but definitely not the only one. The poor azimuth resolution is another limiting factor because it does not distinguish nearby objects or attribute objects the correct size (or bounding box).
The way that the algorithm operates, of course, can be divided into what happens on the current frame level, and what happens between frames. On the current frame level, everything in the field of view is registered to create an object list. We discard for the time being the stationary objects in the environment, such as guardrails, street lamps, and traffic lights. The dynamic detections are then clustered, and within each cluster we look for the dynamic characteristics of each object.
Next, we associate each object with what was seen in the previous frame. In other words, in addition to understanding where each object is in relation to our own vehicle, our algorithm calculates where each object is in relation to where it just was. In this way, the algorithm can also determine if there are objects that no longer need to be mapped — objects we can stop paying attention to — and if there are new objects that have now entered the field of view. Prior hypotheses are also re-evaluated; for example what was thought to be a large object (such as a truck) might actually be two separate vehicles traveling closely to one another. In this sense, tracking is a process of constantly validating or invalidating our object list, and updating it according to our findings.
By comparing frame to frame, tracking also generates a limited prediction of the whereabouts of the currently detected objects. This is important for facilitating the next frame’s tracking, and in some cases even to continue tracking objects that are occluded briefly, minimizing the time until they are re-found and actively tracked again. Tracking thus not only makes the overall calculation process easier, it enables more fluid, natural autonomous decision making. By aggregating all of the data, an autonomous vehicle can use object tracking to understand what it sees.
Human drivers are constantly and continuously tracking their vehicle’s surroundings. Is the object ahead stationary or in motion? Is it likely to jump into my path? Is it speeding up or slowing down? These questions are essential to safe driving, but human drivers are notoriously bad at answering them. Relying instead on autonomous technology, in which equipment and algorithms conduct this tracking automatically, the collective goal is that we will reach a point when the technology makes fewer errors than the human driver. Integrating 4D Imaging Radar technology will help to make that goal a reality.
Connect to learn more© Arbe , All rights reserved