GPU GGUF inference comparison

The following is the leader board of all GPUs I’ve tested with llama-bench. For the model I’m using Q8_0 and Q4_K_M quantized Llama-3.1-8B-Instruct from the following repo: DevQuasar The reasoning behind the model selection is to use a generic widely…

All about AMD and ROCm

Hardware Setup If you’re like me and using server-grade AMD Instinct passive-cooled AI accelerators, you should ensure you can provide adequate cooling. For this, I rely on a 120mm blower PWM fan. I’m particularly using the following I’ve also…

Reasoning System prompt

NousResearch has recently released its reasoning model, DeepHermes-3-Llama-3-8B-Preview, with an interesting twist: you can toggle between standard LLM behavior and enhanced reasoning mode simply by using a specific system prompt. I was curious to see what would happen if I…