AMD Radeon Instinct MI50 – cheap inference

Just tried a less popular AI accelerator option, the AMD Radeon Instinct MI50, and it has performed surprisingly well in inference. But before looking at the results, what is this card?

The card was launched in November 2018 by AMD. It is built around the VEGA 20 chip, which uses 7nm TSMC processing. The chip has 3840 streaming processors (equivalent to Nvidia’s CUDA cores). The card also has 16GB of HBM2 memory with 1TB/s bandwidth on the 4096-bit bus. Altogether, this provides respectable performance for its price.
> Peak Half Precision (FP16) Performance 26.5 TFLOPs
> Peak Single Precision (FP32) Performance13.3 TFLOPs
> Peak Double Precision (FP64) Performance6.6 TFLOPs
> Peak INT8 Performance53 TOPs

The form factor is a passive-cooled, 2-unit height PCIe (PCIe4 16x) that is linkable via 2x Infinity Fabric, offering a communication interface between cards of up to 92GB/s.

Okay, but why do I get one of these? Is it because of the price: $109 from eBay? :D.

First task was to design a cooling shroud so I can use my default 120mm blower fan. I think the design came along pretty good.

And this is the final print:

The 3d model is available for download from my repository:
https://github.com/csabakecskemeti/3d_models_public/blob/main/MI50_shroud_blower_v2_a.stl

This time, I used a small and very underpowered Celeron-based SBC.

As for OS first I’ve tried DEB12 but the ROCm install instructions failed. I’ve decided to not deal with this issue but install an Ubuntu 24.x as the instructions mentioned that. The AMD ROCm install instructions worked flawlessly, you can find them here:

https://rocm.docs.amd.com/projects/install-on-linux/en/latest/install/quick-start.html
https://rocm.docs.amd.com/projects/install-on-linux/en/latest/install/post-install.html

rocm-smi command listed the GPU after a restart.

For inference I’ve used Llama.cpp simply followed the build documentation.
First identified the GPU with the rocminfo | grep gfx | head -1 | awk '{print $2}' command. Than just use the parametrized build command.

HIPCXX="$(hipconfig -l)/clang" HIP_PATH="$(hipconfig -R)"     cmake -S . -B build -DGGML_HIP=ON -DAMDGPU_TARGETS=gfx906  -DCMAKE_BUILD_TYPE=Release     && cmake --build build --config Release -- -j 16

The ease of the installation made me happy, but I got even happier when I performance tested the text generation with llama-bench using the Meta-Llama 8B Q8 quantized model.

56 tok/s from a GPU that has cost $109… I think it’s pretty good!

The ease of installation made me happy, but I became even happier when I performance tested the text generation using llama-bench with the Meta-Llama 8B Q8 quantized model.t it another time.

Conclusion

I think the MI50 is a very capable card for playing around with local inference. With 8-12B parameter quantized models, it is perfect for someone who wants to test and host models at home, or develop some AI applications. This cheap hardware would allow them to host it locally.

I wouldn’t invest in a whole cluster of it, as AMD states that the MI50 ROCm support is discontinued, and future ROCm releases might remove it from the list of supported devices.
https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html