Beyond ChatGPT: NExT-GPT is an OpenSource Model That Lets You Master AI With Audio, Video and Text

NExT-GPT, a multimodal AI large language model developed by the National University of Singapore and Tsinghua University, can process and generate combinations of text, images, audio, and video. This open-source model, pitched as an “any-to-any” system, allows for more natural interactions than text-only models. It uses a technique called “modality-switching instruction tuning” to improve cross-modal reasoning abilities and unique tokens to handle different inputs. NExT-GPT represents an open-source alternative to multimodal AI products from tech giants like Google and OpenAI.

Beyond ChatGPT: NExT-GPT is an OpenSource Model That Lets You Master AI With Audio, Video and Text

Related

OpenAI Codex CLI: Executable AI Reasoning Hits Your Terminal

GPT-4.1, GPT-4.1 mini, and GPT-4.1 nano

DolphinGemma: Unveiling the Language of the Seas with AI

Grok 3 API Debuts with Scalable Models for Code, Data, and Enterprise Tasks

Smarter GitHub Automation with the MCP Server

China Unveils GPMI: A Single-Cable Standard for 8K Video and High Power

When Weather Apps Steal Your SSH Keys

Llama 4

Tame Your Terminal: Managing AI Coding Agents with Claude Squad