Think of all the classic sci-fi depictions of robots, from the Star Wars androids to the unnervingly human main protagonist in Ex Machina. You’ll see a clear pattern of robotics going hand-in-hand with artificial intelligence (AI).
But in the real world, robotics and AI have evolved along separate technological paths. While robotics is a field of mechanical and computer engineering mostly concerned with advanced hardware automation, AI is a distinct field concerned with computer-based decision-making, problem-solving and comprehension at the software level.
There are obvious crossovers between the two. You can add AI to a robot to make it ‘intelligent’ like any other computerised device. But so far, that has been the exception rather than the rule. Most robots in existence are ‘dumb’ in the sense that they perform tasks based on a relatively narrow set of pre-programmed rules. They automate, but they don’t make decisions autonomously. And they certainly don’t learn.
This is true of the service robot sector that Oxhoo operates in. The Keenon brand food service, cleaning and transport bots we supply are remarkable machines in their own right. They can navigate spaces independently and safely using sensors, pick up items via a combination of sensors and voice guidance, and even find their way up and down elevators in multi-storey buildings.
But what we’re yet to see with service robotics is the full power of AI unleashed. That could all be about to change quickly, however. AI technology is not only evolving at breakneck speed. It’s also becoming more accessible, meaning it is easier and easier for manufacturers to add AI capabilities to their hardware.
Talking robots
The first way we will see service robots ‘get smart’ is the introduction of natural language interfaces. These are already popular in smart speakers and mobile voice assistants like Siri and Alexa. They boil down to being able to interact with a device by speaking to it. But the real game-changer for voice-controlled robots will be the emerging field of Large Language and Speech Models (LLaSMs).
Large Language Models (LLMs) are the text-based AI engines that drive Generative AI, famous for its ability to generate text and answer complex questions in a human-like way. LLaSMs aim to extend this to understanding and using spoken language in accurate, nuanced, and more contextualised ways.
At the moment, a voice interface on a dining robot is capable of following simple instructions. You can tell it to take a meal to table 2 and it can follow a pre-set route. But with more advanced speech comprehension capabilities, you could give a robot much more detailed and complex instructions. And even more excitingly, they would be able to communicate with diners, and understand and act on their instructions, too.
In the near future, then, restaurant workers will be able to ask a robot to take meals to table 2 and then head to table 5 to take their order, without having to break their flow to press buttons. Diners will be able to relay their orders verbally, and even ask questions about the dishes that the robot will be able to answer or make specific requests. The value of having a robot as part of the service team will increase the more they can do. And it will lead to a more natural experience for diners, hotel guests and more.
Smart navigation
How to move large robotic machines around safely has been one of the great challenges of robotic engineering, especially when they are deployed to operate around people. Sensor technology has already reached a considerable level of sophistication to allow ‘markerless’ or free-form navigation without the need to install guidance infrastructure. But most autonomous moving robots still rely on pre-set routes.
AI is changing that by giving robots (and on-board vehicle systems) the power to make decisions about the routes they take based on real-time information they gather about the surrounding environment. Machine Learning AI means mobile robots can memorise the patterns of every route they ever take, and use that as a data source for deciding on which routes to take in the future.
In a service context, that then becomes the ability to load up a robot and ask it to deliver specific dishes to tables 2, 5, 6 and 12. And it will do it in the most efficient order, while avoiding collisions and making decisions along the way.
The next phase beyond this is what computer scientists call embodied AI. Embodied AI is based on the understanding that human intelligence is not just built on the cognitive capabilities of our brains, but is ‘embodied’ in the multisensory interactions we have with the world around us. When we navigate, for example, we combine visual and touch sensory ‘data’ with the ‘memory’ of our own size and shape and how it interacts in a 3D space.
This is very relevant to the world of customer service in, say, a restaurant, where getting the right meals to the right tables in good time is not just a matter of order – cook – deliver. It depends on a vast and complex web of interactions, from the member of waiting staff standing at the table in conversation to take the order, to kitchen staff and waiting staff then liaising on which order to get ready, when, and how to group them together to get them across the restaurant to the right place (without dropping them).
From a robotics perspective, doing all of that efficiently, safely and successfully involves joining all the dots between the different data sources available to the robot – its language processing capabilities, its sensors and computer vision system, its Machine Learning ‘memory’ of routes across the floor. To reach the level of contextual nuance human decision-making is capable of, robots will need to add many more inputs, because that’s what embodied intelligence in a human sense means – navigating a vast array of sensory, learned and remembered inputs moment to moment.
If you’re interested in learning more about what service robots can do for your business now and in the future, get in touch with the Oxhoo team.