Goldfish Loss: A New Approach to Training Privacy-Conscious Language Models


Large language models (LLMs), like those used for AI-driven text generation, often risk violating privacy and copyright due to their capacity to memorize and regurgitate training data verbatim. A new approach to training these models, dubbed the “goldfish loss,” aims to address this issue by modifying the next-token prediction objective used during training. This method involves excluding a pseudo-random subset of tokens from the loss computation, ensuring that these tokens are not memorized by the model, thereby preventing the exact reproduction of text sequences from the training set.

Extensive testing with billion-scale Llama-2 models has shown that this technique effectively reduces memorization without significantly impacting the model’s performance on standard benchmarks. The goldfish loss proves to be a simple yet effective strategy for training LLMs in a way that respects privacy and copyright, making it a viable option for both pre-trained and scratch-trained models in commercial applications.

For a detailed exploration of this training modification, visit the full study here.