New versions of StarCoder: 1B, 3B and 7B models announced
1T tokens, 80+ programming languages with 8k context window, MQA & FIM!
StarCoderBase-1B, a 1B parameter model, has been trained on 80+ programming languages to generate code snippets. Using Multi Query Attention and a Fill-in-the-Middle objective, it can serve as a technical assistant, although the generated code may contain inefficiencies or bugs. The model, trained on GitHub code, respects permissive licenses and provides a search index for proper attribution. It was trained using 128 Tesla A100 GPUs over 11 days. The model is licensed under the BigCode OpenRAIL-M v1 license agreement.
Read more…