GPT-4: Speculative Sampling (SSp) enables large language models to generate tokens up to 3 times faster with the help of a smaller model, without compromising completion quality. This technique is particularly useful for live token generation and works best when there are easily guessable tokens. SSp maintains an almost identical memory footprint and offers a relatively simple code implementation.
Read more at GitHub…
One thought on “GitHub – dust-tt/llama-ssp: Experiments on speculative sampling with Llama models”
Comments are closed.