Category Uncategorized

GPU GGUF inference comparison

The following is the leader board of all GPUs I’ve tested with llama-bench. For the model I’m using Q8_0 and Q4_K_M quantized Llama-3.1-8B-Instruct from the following repo: DevQuasar The reasoning behind the model selection is to use a generic widely…

All about AMD and ROCm

Hardware Setup If you’re like me and using server-grade AMD Instinct passive-cooled AI accelerators, you should ensure you can provide adequate cooling. For this, I rely on a 120mm blower PWM fan. I’m particularly using the following I’ve also…