Beyond ChatGPT: NExT-GPT is an OpenSource Model That Lets You Master AI With Audio, Video and Text

NExT-GPT, a multimodal AI large language model developed by the National University of Singapore and Tsinghua University, can process and generate combinations of text, images, audio, and video. This open-source model, pitched as an “any-to-any” system, allows for more natural interactions than text-only models. It uses a technique called “modality-switching instruction tuning” to improve cross-modal reasoning abilities and unique tokens to handle different inputs. NExT-GPT represents an open-source alternative to multimodal AI products from tech giants like Google and OpenAI.

Beyond ChatGPT: NExT-GPT is an OpenSource Model That Lets You Master AI With Audio, Video and Text

Related

Why Passwords Aren’t the Problem—But How We Use Them Is

Claude 3.7 Sonnet Set to Expand Context Window to 500K Tokens

IngressNightmare: Critical Flaws in NGINX Controller Expose Kubernetes Clusters to RCE

Google’s Gemini 2.5 Pro Thinks Slower to Answer Smarter

In Pursuit of Efficiency: Rethinking AI with DeepSeek-V3-0324

AI-Generated Research: Charting New Territory in Peer-Reviewed Science

Awesome MCP Clients, A New Way To Interact With LLMs

Are We Living Inside a Spinning Black Hole?

The New OpenAI Responses API: A Technical Deep Dive