For robots to move into homes, they’ll need to learn to listen, suggests MIT Technology Review.
"Researchers at the Robotics and Embodied AI Lab at Stanford University have built a system for collecting audio data, consisting of a GoPro camera and a gripper with a microphone.
“Thus far, robots have been training on videos that are muted,” says Zeyi Liu, a PhD student at Stanford and lead author of the study. “But there is so much helpful data in audio.”
The results, published in a paper on arXiv: "When using vision alone in the dice test, the robot could tell 27% of the time if there were dice in the cup, but that rose to 94% when sound was included."
Citation: Zeyi Liu et al. ManiWAV: Learning Robot Manipulation from
In-the-Wild Audio-Visual Data. arXiv https://arxiv.org/pdf/2406.19464 (open access)