Speculative decoding tradeoffs GPU
And there I’ve crunched the numbers on GPU. The configuration is a 3GPU system 1x RTX4080 + 2x RTX3090. Baseline has been set by the following llama.cpp generation command (same generation config and prompts has been used in the Speculative…