Valve Developer Contributes Major Improvement To RADV Vulkan For Llama.cpp AI

Rhys Perry of Valve's Linux graphics team who specializes on the RADV Radeon Vulkan driver and ACO compiler has contributed a significant improvement to further enhance the Vulkan-backend performance of Llama.cpp AI inferencing on AMD Radeon hardware.
Opened last week was the merge request radv: use CU mode when LDS is used. While that alone isn't enough to excite end-user interest, the merge request message was simply:
"This improves performance of llama.cpp."
Okay... But no additional context to the performance improvement for Llama.cpp.
Fortunately, thanks to Adriano Martins, is some additional insight and ends up making this merge extremely interesting. Adriano commented:
"this makes radv now fly past amdvlk and rocm with llama.cpp, albeit for prompt processing only"
Not only is the RADV for Llama.cpp past processing faster than the official (former) AMDVLK Vulkan driver but also ROCm.
Pretty great results going from around 3586 tokens/s with Llama 7B Q4 to around 4046 tokens/s with these patches for pp512. Or around a 13% improvement from these three patches at least for Llama 7B.
LLama.cpp with Vulkan was already performing well on Radeon GPUs while now should be even better.
This merge happened just in time for making it into next month's Mesa 25.3 stable release.
Valve's open-source developers/contractors do amazing work for the Linux software ecosystem beyond just gaming as we've shown many times over the past few years.
New Llama.cpp benchmarks on Phoronix soon.
20 Comments