Embodied artificial intelligence refers to robotic systems that possess a physical form, such as cameras or mechanical arms, allowing them to interact with the real world. While these machines are currently useful in structured settings like factories, they often struggle in unpredictable environments such as homes. For example, a robot programmed to make coffee might fail completely if a cup is moved to a cupboard because it strictly follows a pre-written script and cannot adjust to the change. This limitation inspired a new project at Singapore Management University.
The goal of this research project is to create robots that can reason and adapt rather than just follow orders. This technology is especially important for Singapore, which has an aging population. If successful, these adaptive robots could assist the elderly with complex chores like meal preparation or act as rehabilitation companions that adjust exercises based on a patient’s recovery progress.
Teaching robots to think and adapt
The researchers are addressing three main problems to make this vision a reality. First, many current robots rely too much on Large Language Models (LLMs). While smart, these models often create plans that do not work in the physical world. Second, robots often fail to notice small changes in their surroundings. Third, they lack the flexibility to change their behavior when things go wrong.
To solve this, the group is training their system using approximately 60,000 video sequences of kitchen tasks. This extensive training helps the software understand how objects move and interact in real time. Consequently, if a robot cannot find a mug, it can use its training to infer that the mug might be in a cupboard and update its plan to look there. The system is designed to be modular, meaning its software components can be easily swapped or connected to future technologies like building blocks. Finally, to ensure safety, the researchers emphasize strict rules and human supervision so that these machines remain safe companions for human users.