GPU GGUF inference comparison

The following is the leader board of all GPUs I’ve tested with llama-bench. For the model I’m using Q8_0 and Q4_K_M quantized Llama-3.1-8B-Instruct from the following repo: DevQuasar The reasoning behind the model selection is to use a generic widely…

All about AMD and ROCm

Hardware Setup If you’re like me and using server-grade AMD Instinct passive-cooled AI accelerators, you should ensure you can provide adequate cooling. For this, I rely on a 120mm blower PWM fan. I’m particularly using the following I’ve also…

Reasoning System prompt

NousResearch has recently released its reasoning model, DeepHermes-3-Llama-3-8B-Preview, with an interesting twist: you can toggle between standard LLM behavior and enhanced reasoning mode simply by using a specific system prompt. I was curious to see what would happen if I…

Google MINI-002X Search Appliance

Story The Google Search Appliance (GSA) emerged as part of Google’s early exploration into enterprise solutions. Launched in 2002, the GSA allowed organizations to deploy Google’s web search power internally. Originally positioned as a hardware solution to simplify corporate data…

GSTS system prompt

This is just a quick post of a System prompt created by me for task solving.GSTS stands for Goal Strategy Tactic Steps Key characteristics Analytical and Articulate: Emphasizes the assistant’s ability to thoughtfully express and reframe ideas, which would enhance…