Open-Source AI Agents: Replicating DeepResearch in 24 Hours

OpenAI’s recent release of DeepResearch, a web-browsing AI agent capable of summarizing content and answering complex queries, has sparked significant interest in the AI community. The system demonstrated strong results on the GAIA benchmark, achieving 67% accuracy on one-shot questions and 47.6% on the most challenging multi-step reasoning tasks. However, the core agentic framework behind DeepResearch remains undisclosed, prompting an effort to develop an open-source alternative.

A group of researchers took on the challenge of replicating DeepResearch within 24 hours, open-sourcing their work along the way. Their approach involved combining a large language model (LLM) with an agentic framework that guides the model in using external tools like web search. While powerful LLMs are freely available, integrating them into structured agentic systems significantly enhances their capabilities, as seen in performance jumps of up to 60 percentage points when compared to standalone models.

Agentic Frameworks and the GAIA Benchmark

Agentic frameworks serve as a crucial layer that enables LLMs to perform complex tasks in structured steps. Instead of merely generating text, the model can interact with external tools, execute sequences of actions, and iteratively refine its output. GAIA, the benchmark used for evaluation, is designed to test precisely these capabilities. It presents questions that require multimodal reasoning, data retrieval, and stepwise problem-solving.

One example from GAIA illustrates the depth of these challenges: identifying fruits from a 2008 painting, cross-referencing them with an ocean liner’s 1949 breakfast menu, and ordering them in a specified format. Solving such problems requires the agent to chain multiple search and reasoning steps correctly—something beyond the capabilities of a standard LLM without external tool usage.

Building an Open DeepResearch Alternative

A key innovation in the open-source replication effort was the use of a CodeAgent rather than traditional JSON-based action representations. Research by Wang et al. (2024) highlights the advantages of expressing agent actions in code:

  • Efficiency: Code-based actions reduce token generation, cutting the number of steps required by 30% compared to JSON.
  • Cost-effectiveness: Fewer steps mean fewer LLM calls, lowering computation costs.
  • Better state management: Variables can store data for reuse, enabling more complex workflows.
  • Familiarity for LLMs: Since models are trained on large volumes of code, they perform better when expressing actions in a coding format.

By leveraging CodeAgents, the open-source implementation improved validation performance on GAIA from 46% (previous best for open systems) to 54%. When switching back to JSON, performance dropped to 33%, reinforcing the value of code-based task execution.

Key Tools and Next Steps

To achieve full functionality, the project integrated a simple text-based web browser and a document reader, borrowing tools from Microsoft Research’s Magentic-One agent. However, for true parity with DeepResearch, further improvements are needed:

  • Expanding file format support
  • Enhancing precision in document handling
  • Transitioning to a vision-based web browser for richer interaction

One of OpenAI’s likely advantages in DeepResearch is its proprietary Operator browser, which allows deeper engagement with web content. The next goal in the open-source effort is to build GUI agents capable of screen interaction using a mouse and keyboard—an approach that could further close the performance gap.

Several community-driven implementations have also emerged, exploring alternative methods for indexing data, web browsing, and querying LLMs. Future work will involve benchmarking open-source LLMs like DeepSeek R1, testing vision LMs, and comparing traditional tool-calling mechanisms against code-native agents.

The open-source replication effort is ongoing, and contributions are welcomed. Developers interested in participating can explore the project on smolagents, try out live demos, or check out the full details here.