Those scenarios from Google were obviously selected for the presentation, but very much represent state of the art in _real time_ processing for autonomous driving systems from the time of the presentation.
I’m not sure what is implied with saying it’s a recording - both the Google and Tesla presentations are “recordings” and equal opportunity to pick best case examples, but I would bet strongly there is nothing not RAW = “real time“ for their respective compute platforms.
The top down viewpoint helps show off the quality (still by no means perfect) of the world representation. If you projected Tesla’s model into 3D you would see far more jitter than in the video overlay for a variety of reasons.
That said, I think comparing them directly on specific technical components is a bit of a sidebar. They are taking two very different paths along the way to a still ambiguous problem. Both are leading their respective approaches, but have fundamentally different and unproven assumptions.
Also worth looking not just at how accurately objects are detected but what the visualizations show about the intent of other road users. The Google video shows predicted trajectories for important objects in a number of scenes. We don’t get to see any of that clearly from Tesla, and that is by no means a small part of the problem. Not sure if it is there and not shown, just highlighting there is a lot more downstream even once you are finding objects reliably in the sensors.
I’m not sure what is implied with saying it’s a recording - both the Google and Tesla presentations are “recordings” and equal opportunity to pick best case examples, but I would bet strongly there is nothing not RAW = “real time“ for their respective compute platforms.
The top down viewpoint helps show off the quality (still by no means perfect) of the world representation. If you projected Tesla’s model into 3D you would see far more jitter than in the video overlay for a variety of reasons.
That said, I think comparing them directly on specific technical components is a bit of a sidebar. They are taking two very different paths along the way to a still ambiguous problem. Both are leading their respective approaches, but have fundamentally different and unproven assumptions.
Also worth looking not just at how accurately objects are detected but what the visualizations show about the intent of other road users. The Google video shows predicted trajectories for important objects in a number of scenes. We don’t get to see any of that clearly from Tesla, and that is by no means a small part of the problem. Not sure if it is there and not shown, just highlighting there is a lot more downstream even once you are finding objects reliably in the sensors.