Sohu: Purpose-Built Silicon for Next-Generation AI Processing

In a significant development for AI hardware, etched.com engineers have unveiled Sohu, a specialized chip architecture designed specifically for transformer neural networks. This new hardware approach moves beyond traditional GPU-based processing by etching transformer components directly into silicon.

The Architecture

Sohu’s single-core design implements multicast speculative decoding, allowing it to achieve impressive performance metrics – notably a throughput exceeding 500,000 tokens per second. The architecture supports various transformer implementations, including Mixture of Experts (MoE) models, and incorporates advanced decoding methods like beam search and Monte Carlo Tree Search (MCTS).

Technical Specifications

Each Sohu chip comes equipped with 144 GB of HBM3E memory, enabling it to handle models with up to 100 trillion parameters. Early benchmarks indicate that a single Sohu chip can outperform both NVIDIA’s 8xH100 and 8xB200 configurations when running LLaMA 70B, while operating at lower cost and power consumption.

Real-World Applications

The architecture’s capabilities extend beyond raw performance metrics. Sohu enables:

Near-instantaneous voice processing, handling thousands of words in milliseconds
Enhanced code completion leveraging tree search capabilities
Parallel processing of hundreds of model responses
Scalable real-time content generation

Infrastructure Integration

Perhaps most notably, Sohu comes with a fully open-source software stack, potentially lowering the barrier to entry for organizations looking to deploy advanced AI systems. Think of Sohu as a dedicated expressway for AI traffic, compared to the general-purpose roads that GPUs provide.

Performance and Efficiency

Initial testing suggests Sohu processes AI models approximately ten times faster and more cost-effectively than current GPU solutions. This efficiency gain comes from the purpose-built nature of the architecture – by designing specifically for transformer networks, Sohu eliminates the overhead associated with general-purpose computing hardware.

Future Implications

While these advancements are promising, they should be viewed within the broader context of AI hardware evolution. Sohu represents a specialized tool in the AI computing toolkit, potentially complementing rather than replacing existing GPU infrastructure for certain applications.

The development of Sohu highlights an important trend in AI hardware: the move toward application-specific integrated circuits (ASICs) designed explicitly for AI workloads. As organizations continue to scale their AI operations, such specialized hardware solutions may become increasingly important for maintaining efficiency and controlling costs.

Sohu: Purpose-Built Silicon for Next-Generation AI Processing

The Architecture

Technical Specifications

Real-World Applications

Infrastructure Integration

Performance and Efficiency

Future Implications

Related

When the Vending Machine Went Sentient

Constant-Time Breakthrough Raises the Hash-Table Speed Limit

Star Wars Reimagined: China’s Laser Satellite Outpaces Starlink

Court Rules AI’s Use of Books as Fair Use but Slams Pirated Collection Storage

Introducing the OWASP AI Testing Guide: A New Standard for AI Security Testing

The Low-Background Steel Problem of AI

Chinese AI Firms Dodge US Chip Bans with Cross-Border Data Smuggling to Malaysia

OpenAI open-sources a demo of a UI testing agent

Financial Dynamics in Agentic AI: Cursor’s Rise Versus GitHub Copilot