torchscale: Transformers at any scale

AI summary: Microsoft has released TorchScale, a PyTorch library that enables developers and researchers to scale up Transformers efficiently. The library includes features that improve modeling generality, training stability, and efficiency. It supports various architectures, including Encoder, Decoder, and EncoderDecoder, and offers a range of key features such as DeepNorm, SubLN, X-MoE, and Multiway architecture. TorchScale also provides examples for different tasks and plans to include more in the future.
Read more at GitHub…

torchscale: Transformers at any scale

Related

OpenAI Codex CLI: Executable AI Reasoning Hits Your Terminal

GPT-4.1, GPT-4.1 mini, and GPT-4.1 nano

DolphinGemma: Unveiling the Language of the Seas with AI

Grok 3 API Debuts with Scalable Models for Code, Data, and Enterprise Tasks

Smarter GitHub Automation with the MCP Server

China Unveils GPMI: A Single-Cable Standard for 8K Video and High Power

When Weather Apps Steal Your SSH Keys

Llama 4

Tame Your Terminal: Managing AI Coding Agents with Claude Squad