DeepSpeed-Chat is a new system introduced by Microsoft Researchers to make training large conversational AI models like ChatGPT fast, affordable and accessible. The system combines optimizations from DeepSpeed training and inference into a unified framework called Hybrid Engine to deliver unparalleled efficiency for RLHF training.
Key highlights:
- Enables training 13B and 30B parameter conversational models for under $300 and $600 respectively on Azure cloud with end-to-end time of 9 hours and 18 hours. This is over 15x faster than existing systems.
- Supports training models with hundreds of billions of parameters. A 175B model can be trained in under 1 day on a small cluster.
- Makes RLHF accessible with ability to train 13B models on a single GPU.
The system provides an easy-to-use interface allowing users to train ChatGPT-like models from a single script. It replicates the full InstructGPT training pipeline with 3 key stages – supervised finetuning, reward model finetuning and RLHF.
DeepSpeed-Chat offers superior throughput, achieving over 10x higher training speed compared to existing PyTorch-based systems. The gains come from Hybrid Engine’s ability to seamlessly transition between optimized inference and training modes.
This development is a significant step towards democratizing access to large conversational AI models. Researchers and startups with limited resources can now train high-quality models without expensive infrastructure. The open-sourcing also enables broader innovation in this rapidly evolving field.
Possible implications:
- Wider adoption of conversational AI across consumer apps, enterprise services etc. thanks to easy access to high-quality models
- Fueling research into techniques like instruction tuning, prompt programming etc to improve robustness and capabilities of LLMs
- Enabling small companies and startups to compete in conversational AI space against tech giants
In summary, DeepSpeed-Chat’s ability to train ChatGPT-scale models efficiently opens up exciting possibilities for both research and practical applications of conversational AI. The democratization of access can greatly accelerate progress in this transformative technology.