I’ve experimented with the Orca2 model in the last few weeks, and the generation quality for the parameter size has truly impressed me. I believe that the Orca2 7B and 13B quantized models offer the optimal combination of quality and performance when run locally.
On an M2 Pro with 32GB memory with the 13b 5bit quantized model with Apple Metal enabled generates ~15token/sec (in LM Studio).