In a significant development for AI hardware, etched.com engineers have unveiled Sohu, a specialized chip architecture designed specifically for transformer neural networks. This new hardware approach moves beyond traditional GPU-based processing by etching transformer components directly into silicon.
The Architecture
Sohu’s single-core design implements multicast speculative decoding, allowing it to achieve impressive performance metrics – notably a throughput exceeding 500,000 tokens per second. The architecture supports various transformer implementations, including Mixture of Experts (MoE) models, and incorporates advanced decoding methods like beam search and Monte Carlo Tree Search (MCTS).
Technical Specifications
Each Sohu chip comes equipped with 144 GB of HBM3E memory, enabling it to handle models with up to 100 trillion parameters. Early benchmarks indicate that a single Sohu chip can outperform both NVIDIA’s 8xH100 and 8xB200 configurations when running LLaMA 70B, while operating at lower cost and power consumption.
Real-World Applications
The architecture’s capabilities extend beyond raw performance metrics. Sohu enables:
- Near-instantaneous voice processing, handling thousands of words in milliseconds
- Enhanced code completion leveraging tree search capabilities
- Parallel processing of hundreds of model responses
- Scalable real-time content generation
Infrastructure Integration
Perhaps most notably, Sohu comes with a fully open-source software stack, potentially lowering the barrier to entry for organizations looking to deploy advanced AI systems. Think of Sohu as a dedicated expressway for AI traffic, compared to the general-purpose roads that GPUs provide.
Performance and Efficiency
Initial testing suggests Sohu processes AI models approximately ten times faster and more cost-effectively than current GPU solutions. This efficiency gain comes from the purpose-built nature of the architecture – by designing specifically for transformer networks, Sohu eliminates the overhead associated with general-purpose computing hardware.
Future Implications
While these advancements are promising, they should be viewed within the broader context of AI hardware evolution. Sohu represents a specialized tool in the AI computing toolkit, potentially complementing rather than replacing existing GPU infrastructure for certain applications.
The development of Sohu highlights an important trend in AI hardware: the move toward application-specific integrated circuits (ASICs) designed explicitly for AI workloads. As organizations continue to scale their AI operations, such specialized hardware solutions may become increasingly important for maintaining efficiency and controlling costs.