SambaNova has made a significant leap in AI inference efficiency by running DeepSeek-R1 671B—the largest open-source large language model—on a single rack. Traditionally, deploying such a model required 40 racks with 320 of the latest GPUs. SambaNova’s dataflow architecture and SN40L RDU chips have shrunk this requirement down to just 16 RDUs in a single rack, delivering 3X the speed and 5X the efficiency compared to GPUs.
DeepSeek-R1, a 671-billion parameter Mixture of Experts model, has drastically reduced training costs. However, its complex reasoning capabilities make inference computationally demanding, limiting its widespread adoption. SambaNova has addressed this challenge by optimizing inference performance, achieving a speed of 198 tokens per second per user, with plans to scale this further. This breakthrough enhances real-time, cost-effective AI deployment for developers and enterprises.
Dr. Andrew Ng, Founder of DeepLearning.AI, highlights the importance of inference speed for reasoning models like DeepSeek-R1, which require high token throughput to improve response quality. Benchmarking firm Artificial Analysis has validated SambaNova’s deployment, measuring 195+ tokens per second, making it the fastest implementation of DeepSeek-R1 to date.
SambaNova is scaling its infrastructure aggressively, promising 100X the current global capacity for DeepSeek-R1 by year-end. With a unique three-tier memory architecture, it plans to deliver a total rack throughput of 20,000 tokens per second, eliminating GPU memory constraints and data communication bottlenecks.
Blackbox AI, a leading autonomous coding platform, is among the first to leverage SambaNova’s capabilities. CEO Robert Rizk notes that running the full DeepSeek-R1 model—rather than a distilled version—significantly enhances accuracy, benefiting millions of users.
DeepSeek-R1 is now accessible via SambaNova Cloud, with select users gaining API access. For more details, visit SambaNova.