Unsloth Fixing Gemma bugs


Unsloth developers Daniel and Michael Han have dedicated the past week to addressing a series of bugs in Google’s Gemma, an AI model that had shown promise but was plagued by technical issues. The duo has successfully rectified these bugs, which ranged from simple typos affecting token generation to more complex problems such as incorrect casting in Keras and precision errors in RoPE (Rotary Positional Embeddings) calculations. Notably, they’ve improved the handling of layer normalization and the GELU activation function, ensuring these processes are executed with the appropriate precision to avoid loss of information. These fixes have been implemented in Colab notebooks and are also reflected in the latest Hugging Face transformers version 4.38.2. The brothers continue to work on further improvements, including a pull request for the approximate GELU function. They encourage the community to support their efforts through donations and engagement on their Discord server and social media platforms.
Read more at Unsloth – Unslow finetuning for AI & LLMs…