Researchers have found a way to turn a short video of a room into a 3D digital twin (a virtual copy of a real space). This digital twin lets users open drawers, cabinets, and move objects on a countertop. A digital twin is a detailed model that mimics a real place and can be used for things like making realistic video games or teaching robots to work in specific rooms. The process starts with a simple video, even one taken with a phone, and turns it into an interactive space.
The technology uses artificial intelligence (AI). The researchers combined different AI models to create this digital twin. One model makes the images look nice, while another ensures the room’s measurements are correct. They also added a perception module, a part that figures out which objects can move and how, like how a drawer should slide. Another model fills in the unseen parts, like the inside of a drawer, to make the twin complete.
Uses in games and robot training
This digital twin works with game engines, the software used to build video games. Researchers showed this by making a game where players knock over kitchen items like a kettle with balls. The twin can also train robots through a process called real-to-sim-to-real transfer, where a robot learns in the virtual space and then applies it in the real world. For example, a robotic arm was trained on a kitchen twin and successfully put objects in a drawer.
The researchers see this as a way to train robots cheaply and safely by uploading a home video to prepare them before delivery. Right now, it works with hard objects like a kettle, but plans include adding soft or breakable items like cloth or windows. The goal is to expand beyond single rooms to entire buildings or outdoor areas, which could help design cities or improve farming.
Support for this work came from companies like Intel and Meta, as well as government groups. The researchers aim to eventually create more and more digital twins of many types of space, opening up many future possibilities.
This research has been presented at a conference on computer vision and pattern recognition.