Researchers from Texas A&M University and the Korea Advanced Institute of Science and Technology have created a new artificial intelligence (AI) system called OmniPredict. This system helps self-driving cars predict what pedestrians might do next. It uses a multimodal large language model that processes different kinds of data like images and text together. Unlike older systems that only react to what they see, OmniPredict combines pictures of scenes with other details, such as vehicle speed and pedestrian positions, to guess actions in real time.
The system works by analyzing inputs like overall street views, close-up images, bounding boxes that outline people, and speed data. It sorts behaviors into categories such as crossing the road, being blocked from view, general actions, or gaze direction. Early tests show it achieves high accuracy without needing extra training on specific data.
A shift to proactive safety
This approach could make roads safer by allowing cars to anticipate dangers instead of just responding to them. For example, it might prevent accidents at busy crosswalks or in bad weather. The researchers tested OmniPredict on tough datasets like JAAD and WiDEVIEW, which are collections of videos showing pedestrian actions in various conditions. It reached 67 percent accuracy, outperforming other models by 10 percent. It also handles tricky situations, such as hidden people or unusual behaviors, and responds faster.
Beyond cars, the system could help in other areas, like spotting threats in military or emergency settings by reading body language or stress signs. While not yet ready for real roads, OmniPredict suggests a future where machines think more like humans, using reasoning to understand motives. This could lead to fewer crashes, smoother traffic, and better decisions in complex places. The findings, published in Computers & Electrical Engineering, highlight how AI can blend seeing with predicting to improve overall safety in the street.