Run 100B+ language models at home, BitTorrent‑style

  • Run large language models like BLOOM-176B collaboratively.
  • Beyond classic language model APIs — you can employ any fine-tuning and sampling methods by executing custom paths through the model or accessing its hidden states.
  • Inference runs at ≈ 1 sec per step (token) — 10x faster than possible with offloading, enough for chatbots and other interactive apps. 
  • Parallel inference reaches hundreds of tokens/sec.

Check out: https://petals.ml/