Google Robotics– Early Days

Another use case for AI at the edge.

  • Google’s latest release demonstrates a major use case for running AI on devices, but there are still so many shortcomings that this gigantic opportunity may not materialise for many years.
  • Google has launched Gemini Robotics On-Device, a Vision-Language-Action (VLA) model which has been optimised to run locally as opposed to in the cloud.
  • This is a demonstration of just how important running AI at the edge will become when one is deploying robots.
  • This is for two reasons:
  • First, latency where real-time robots are very latency sensitive and they need to be able to react in real-time.
  • For example, a robot picking up a glass needs to know how much pressure to apply to the glass to prevent it from slipping, but not so much that the glass breaks.
  • This task has virtually zero latency tolerance, meaning that these sorts of decisions need to be taken by an AI system resident inside the robot as opposed to the cloud.
  • Second, reliability: where any robot needs to be able to continue to operate if there is a network outage.
  • For example, people who have paid thousands of dollars for house robot will not be willing to tolerate burned food when coming from work because their home network router broke down.
  • Furthermore, household robots need to be safe, and a situation where the robot freezes on the stairs, topples over and crushes an elderly relative will also not be tolerated.
  • The simple answer is to run the critical systems on the robot, and as smaller models get more capable, this is increasingly becoming a practical reality.
  • Google then goes on to discuss how well the On-Device VLA model can generalise, and it is here where my scepticism starts to rise.
  • For example, this model has been trained for a two-armed robot only and true generalisation would mean that it would be able to handle any permutation of arms and legs.
  • Furthermore, when the model was put into other two-armed robots, it needed to be retrained to be able to carry out certain tasks.
  • Consequently, what Google really means by a generalised model is that it is offering a model with a common starting point, but a lot of training is required to adapt it to different configurations and tasks.
  • This is not because Google is doing anything wrong, but is due to the nature of AI models that use statistical pattern recognition to take decisions and/or execute tasks.
  • This is RFM Research’s old causality chestnut, and until this issue is fixed or largely mitigated, a truly general robot model will not be achievable.
  • This will mean that mass manufacture of robots for generalised use cases will not be economically viable, which in turn will delay the take-off of the mass market.
  • This does not mean that there is not a market for robots now (factories, gimmicky domestic hoovers and so on), but it does mean that the really big opportunity remains very far away.
  • The advent of LLMs has brought robotics onto the horizon in terms of viability, as voice and natural language have finally become a good user interface between humans and machines, which is why there is increasing interest in this sector but one needs to be realistic about the timeline.
  • The robots are coming, but they are walking very slowly.

RICHARD WINDSOR

Richard is founder, owner of research company, Radio Free Mobile. He has 16 years of experience working in sell side equity research. During his 11 year tenure at Nomura Securities, he focused on the equity coverage of the Global Technology sector.