Researchers from Tsinghua University, Ohio State University, and UC Berkeley have developed a tool, AgentBench, to measure the capabilities of large language models (LLMs) as real-world agents. The tool tests models’ abilities to perform complex tasks in various environments, such as operating within an SQL database and online shopping. The study revealed that top-tier models like GPT-4 significantly outperformed open-source models, indicating their potential for developing a potent, continuously learning agent.
Read more at Cointelegraph…