Meta Unveils MobileLLM: A Leap Forward in On-Device AI Language Models


Meta released weights for MobileLLM, a groundbreaking language models optimized for on-device applications. Tailored for environments with limited resources, MobileLLM leverages an auto-regressive transformer architecture, incorporating advanced techniques such as SwiGLU activation, deep and thin structures, embedding sharing, and grouped-query attention. This model showcases significant performance improvements, with the MobileLLM-125M and MobileLLM-350M versions achieving accuracy boosts of 2.7% and 4.3% respectively over existing models in zero-shot commonsense reasoning tasks. The scalability of MobileLLM’s design philosophy is further evidenced by state-of-the-art results across larger model variants, up to 1.5 billion parameters.

MobileLLM models, ranging from 125M to 1.5B parameters, are trained on publicly available data, supporting text input and output modalities with a context length of 2k tokens. The models are accessible for further fine-tuning or evaluation through HuggingFace and the MobileLLM codebase on GitHub, offering a straightforward path for developers to leverage this technology.

Training MobileLLM models requires substantial computational resources, with the largest 1.5B parameter model taking approximately 18 days on 32 NVIDIA A100 80G GPUs. Evaluation on zero-shot common sense reasoning tasks demonstrates MobileLLM’s superior performance across various model sizes, significantly outperforming other models in the field.

The implications for mobile computing are significant. Current flagship phones like the iPhone 15 and Google Pixel 8 Pro have 6-12GB of RAM, making it impossible to run multi-billion parameter models efficiently. A mobile app should not exceed 10% of the DRAM, as noted in the paper.

Regarding energy efficiency: A 7B-parameter LLM consumes 0.7 J/token, while a 350M 8-bit model uses only about 0.035 J/token. This means an iPhone could support conversational use for an entire day. The decoding speed reaches 50 tokens/s for the 125M model, compared to 3-6 tokens/second for the LLaMA 7B model in current iPhone apps like MLC Chat.

MobileLLM is licensed under CC-BY-NC 4.0, promoting research and development in optimizing language models for on-device use. This initiative by Meta not only advances the field of natural language processing but also opens new avenues for deploying powerful AI models in resource-constrained environments.
Read more…