
Speculative decoding tradeoffs GPU with Large models (Llama 70B & 8B)
I’ve rerun the Speculative Decoding experiment with some larger models where I’ve paired a Llama 3 70B primary model with the 8B draft model, to see if a larger model Llama 3 70B can benefit more from a draft model…