Five Paths to Solving Robotics
In his AI Speaker Series presentation at Sutter Hill Ventures, Google DeepMind's Ted Xiao outlined five worldviews on how to achieve useful, ubiquitous robotics and dug into his team's work integrating frontier models like Gemini directly into robotic systems. Here' my notes from his talk:
We're at a unique moment in robotics where there's no consensus on the path forward. Unlike other AI breakthroughs where approaches quickly consolidated, robotics remains wide open with multiple reasonable paths showing early signs of success. Ted presented five worldviews, each with smart researchers and builders pursuing them with conviction:
Industry Incumbent
These researchers believe general-purpose robotics is the wrong goal. Purpose-built solutions actually work today - from industrial automation to appliances we don't even call robots anymore. When robotics succeeds, we just call them tools. The path forward: directly optimize for specific use cases using decades of control theory and hardware expertise.
Humanoid Company
These researchers see hardware as the primary bottleneck. Once platforms stabilize, researchers excel at extracting performance - drones went from fragile research prototypes to consumer products, quadrupeds became robust commercial platforms. Humanoid form factors matter because the world is built for humans, and human-like robots can better leverage internet-scale human data.
Robot Foundation Model Startup
These researchers focuses on robot data and algorithms as the key. Generality is non-negotiable - transformative technologies are general by nature. The core challenge: building an "internet of robotics data" either vertically (solve one domain completely, then expand) or horizontally (achieve robotics' GPT-2 moment first, then improve).
These researchers argue frontier models are the only existence proof of technology that can model internet-scale data with human-level performance. You can't solve robotics without incorporating these "magical artifacts" into the exploration process. Frontier model trends and compute lead robotics by about two years.
These researchers take the most radical position: just solve AGI and ask it to solve robotics. The Platonic Representation Hypothesis suggests that as AI models improve across domains, their internal representations converge. Perfect language understanding might inherently include physical understanding.
Gemini Robotics
Ted's team at Google DeepMind pursued the Bitter Lesson approach, building robotics capabilities directly into Gemini rather than treating frontier models as black boxes.
Their Gemini Robotics system first enhanced embodied reasoning - teaching the model to understand the physical world better through 2D bounding boxes in cluttered scenes, 3D understanding with depth and orientation, pointing for granular precision, and grasp angles for manipulation. The system then learned low-level control with diverse robot actions, operating at 50Hz control frequency with quarter-second end-to-end latency. This unlocked three key advances:
Interactivity: The robot responds to dynamic scenes, following objects as they move and adapting to human interference
Dexterity: Beyond rigid objects, it can fold clothes, wrap headphone wires, and manipulate shoelaces
Generalization: Handles visual distribution shifts (new lighting, distractors), semantic variations (typos, different languages), and spatial changes (different sized objects requiring different strategies)
When deployed at a conference with completely novel conditions - crowds, different lighting, new table - the system maintained reasonable behavior for arbitrary user requests, showing sparks of that GPT-2 moment where it attempts something sensible regardless of input.
Dark Horses and Emerging Paradigms
Several emerging paradigms could completely upend current approaches.
Video World Models learning physics without robots through action-conditioned video generation
Robot-Free Data from simulation or humans with head-mounted cameras
Thinking Models applying frontier models' reasoning capabilities to robotics
Locomotion-Manipulation Unity bridging RL-based locomotion with foundation model manipulation
There's no consensus on which path will win. Each approach has reasonable arguments and early signs of success. The lack of agreement isn't a weakness - it's what makes this the most exciting time in robotics history.
Luke Wroblewski's Blog
- Luke Wroblewski's profile
- 86 followers

