Transformers Learn Math: The Power of Random Initialization

AI summary: Researchers from UW Madison have been exploring how large language models like GPT-3/4, PaLM, and LaMDA can learn fundamental mathematical operations. They found that factors such as data format and size, model size, pretraining, and prompting style all play a role. The study also revealed that these models struggle to generalize beyond the training digit lengths, suggesting they learn arithmetic more as a mapping function rather than a flexible procedure. The findings provide insights into the rapid development of arithmetic capabilities in these models.

Transformers Learn Math: The Power of Random Initialization

Related

OpenAI Codex CLI: Executable AI Reasoning Hits Your Terminal

GPT-4.1, GPT-4.1 mini, and GPT-4.1 nano

DolphinGemma: Unveiling the Language of the Seas with AI

Grok 3 API Debuts with Scalable Models for Code, Data, and Enterprise Tasks

Smarter GitHub Automation with the MCP Server

China Unveils GPMI: A Single-Cable Standard for 8K Video and High Power

When Weather Apps Steal Your SSH Keys

Llama 4

Tame Your Terminal: Managing AI Coding Agents with Claude Squad