NExT-GPT, a multimodal AI large language model developed by the National University of Singapore and Tsinghua University, can process and generate combinations of text, images, audio, and video. This open-source model, pitched as an “any-to-any” system, allows for more natural interactions than text-only models. It uses a technique called “modality-switching instruction tuning” to improve cross-modal reasoning abilities and unique tokens to handle different inputs. NExT-GPT represents an open-source alternative to multimodal AI products from tech giants like Google and OpenAI.