GPU GGUF inference comparison
The following is the leader board of all GPUs I’ve tested with llama-bench. For the model I’m using Q8_0 and Q4_K_M quantized Llama-3.1-8B-Instruct from the following repo: DevQuasar The reasoning behind the model selection is to use a generic widely…