Meta Unveils MobileLLM: A Leap Forward in On-Device AI Language Models

Meta released weights for MobileLLM, a groundbreaking language models optimized for on-device applications. Tailored for environments with limited resources, MobileLLM leverages an auto-regressive transformer architecture, incorporating advanced techniques such as SwiGLU activation, deep and thin structures, embedding sharing, and grouped-query attention. This model showcases significant performance improvements, with the MobileLLM-125M and MobileLLM-350M versions achieving accuracy boosts of 2.7% and 4.3% respectively over existing models in zero-shot commonsense reasoning tasks. The scalability of MobileLLM’s design philosophy is further evidenced by state-of-the-art results across larger model variants, up to 1.5 billion parameters.

MobileLLM models, ranging from 125M to 1.5B parameters, are trained on publicly available data, supporting text input and output modalities with a context length of 2k tokens. The models are accessible for further fine-tuning or evaluation through HuggingFace and the MobileLLM codebase on GitHub, offering a straightforward path for developers to leverage this technology.

Training MobileLLM models requires substantial computational resources, with the largest 1.5B parameter model taking approximately 18 days on 32 NVIDIA A100 80G GPUs. Evaluation on zero-shot common sense reasoning tasks demonstrates MobileLLM’s superior performance across various model sizes, significantly outperforming other models in the field.

The implications for mobile computing are significant. Current flagship phones like the iPhone 15 and Google Pixel 8 Pro have 6-12GB of RAM, making it impossible to run multi-billion parameter models efficiently. A mobile app should not exceed 10% of the DRAM, as noted in the paper.

Regarding energy efficiency: A 7B-parameter LLM consumes 0.7 J/token, while a 350M 8-bit model uses only about 0.035 J/token. This means an iPhone could support conversational use for an entire day. The decoding speed reaches 50 tokens/s for the 125M model, compared to 3-6 tokens/second for the LLaMA 7B model in current iPhone apps like MLC Chat.

MobileLLM is licensed under CC-BY-NC 4.0, promoting research and development in optimizing language models for on-device use. This initiative by Meta not only advances the field of natural language processing but also opens new avenues for deploying powerful AI models in resource-constrained environments.
Read more…

Meta Unveils MobileLLM: A Leap Forward in On-Device AI Language Models

Related

Why Passwords Aren’t the Problem—But How We Use Them Is

Claude 3.7 Sonnet Set to Expand Context Window to 500K Tokens

IngressNightmare: Critical Flaws in NGINX Controller Expose Kubernetes Clusters to RCE

Google’s Gemini 2.5 Pro Thinks Slower to Answer Smarter

In Pursuit of Efficiency: Rethinking AI with DeepSeek-V3-0324

AI-Generated Research: Charting New Territory in Peer-Reviewed Science

Awesome MCP Clients, A New Way To Interact With LLMs

Are We Living Inside a Spinning Black Hole?

The New OpenAI Responses API: A Technical Deep Dive