If you use consumer artificial intelligence (AI) like chatbots, you might notice "brain fog," where the AI forgets parts of the conversation. This issue also affects AI agents, which are programs that combine large language models with traditional computer code to handle complex, multistep tasks. As these agents grow more complicated, involving many decisions, the large language models often make mistakes that build up over time.
Researchers from Asari AI, Caltech, and MIT created a new tool called EnCompass to fix this. It lets programmers spot and correct errors quickly without changing the main code. Instead, they can try different search strategies - explore and evaluate different possible sequences of steps in an AI agent's execution. - to find the best results. EnCompass was presented at the Conference on Neural Information Processing Systems (NeurIPS) in San Diego last month.
How EnCompass works
With EnCompass, programmers add simple labels to their code: "branchpoints" for key decision spots and "scores" for places to check how well a path is working. This separates the agent's main rules from how it searches for solutions, making experiments easier. For example, in translating code from Java to Python using a large language model, errors might happen in different functions. Traditional methods require rewriting code with complex loops to backtrack and fix issues, which gets hard as agents become bigger.
EnCompass avoids this by letting programmers switch search strategies fast. Simple ones include "global best-of-N," running the agent many times and picking the top outcome, or "local best-of-N," trying different options for each step and choosing the best one. A more advanced strategy, beam search, explores multiple branches like a tree, branching out from promising points and pruning less useful ones. Tests showed EnCompass cuts the code needed ti implement search by 80 percent and boosts accuracy from 15 percent to 40 percent on code translation tasks. This could help build reliable AI for tough problems in health, government, and engineering. The work is detailed in a research paper presented at NeurIPS.