New Foundations: Controlling robots with natural language
The integration of Large Language Models (LLMs) in robotics is a rapidly evolving field, with numerous projects pushing the boundaries of what’s possible. These projects are not just isolated experiments, but pieces of a larger puzzle that collectively paint a picture of a future where robots are more intelligent, adaptable and interactive.
SayCan and Code as Policies are two early papers that indicate how an LLM can understand a task in natural language and create actions from it. “Code as Policies” leverages the ability of LLMs to output code and demonstrate how the language model can produce the actual code to perform a robotic action.
Instruct2Act connects the sense-making ability with vision capabilities. This way the robotic application (in this case a simulation) can identify, localize and segment (define object outlines for the best grabbing position) known or unknown objects according to the task. Similarly, NL-MAP connects the “SayCan” project with a mapping step, where the robot scans a room for objects before it can output tasks. The TidyBot research project focuses on a real world application for LLMs and robotics. A team at Princeton university developed a robot that can tidy up a room. It adapts to personal preferences (”socks in 3rd drawer on the right”) and benefits from general language understanding. For example, it knows that trash should go into the trash bin because it was trained on internet-scale language data.
Interactive Language achieves robotic actions from spoken commands by training a neural network on demonstrated moves connected with language and vision data.
While much of the work related to this technology is still in its early stages and limited to lab research, some applications such as PickGPT from logistics company Sereact’s, are starting to show the vast commercial potential.