torchscale: Transformers at any scale


AI summary: Microsoft has released TorchScale, a PyTorch library that enables developers and researchers to scale up Transformers efficiently. The library includes features that improve modeling generality, training stability, and efficiency. It supports various architectures, including Encoder, Decoder, and EncoderDecoder, and offers a range of key features such as DeepNorm, SubLN, X-MoE, and Multiway architecture. TorchScale also provides examples for different tasks and plans to include more in the future.
Read more at GitHub…