Latest Open-Source AMD Improvements Allowing For Better Llama.cpp AI Performance Against Windows 11

Written by Michael Larabel in Software on 17 September 2025 at 10:48 AM EDT. Page 4 of 5. 21 Comments.

Next up is looking at the CPU-based AI inferencing performance between Windows 11 25H2 and Ubuntu Linux on the AMD Ryzen 9 9950X3D.

Llama.cpp benchmark with settings of Backend: CPU, Model: Qwen3-8B-Q8_0, Test: Text Generation 128. Linux 6.17 + Mesa 25.3-dev was the fastest.
Llama.cpp benchmark with settings of Backend: CPU, Model: Qwen3-8B-Q8_0, Test: Prompt Processing 512. Linux 6.17 + Mesa 25.3-dev was the fastest.
Llama.cpp benchmark with settings of Backend: CPU, Model: Qwen3-8B-Q8_0, Test: Prompt Processing 1024. Linux 6.17 + Mesa 25.3-dev was the fastest.

The CPU benchmarks of Llama.cpp between Windows and Linux for the AMD Ryzen 9 9950X3D didn't end up being nearly as interesting as the Radeon RDNA4 GPU benchmarks.

Llama.cpp benchmark with settings of Backend: CPU, Model: gpt-oss-20b-Q8_0, Test: Text Generation 128. Linux 6.17 + Mesa 25.3-dev was the fastest.

For the most part the CPU benchmarks with Llama.cpp were like what we're used to seeing in Windows vs. Linux comparisons: the greater processor performance being found on Linux. In some test cases the results were rather close and in a few cases it took moving to the Linux 6.17 kernel before it edged out over Windows 11.

Llama.cpp benchmark with settings of Backend: CPU, Model: gpt-oss-20b-Q8_0, Test: Prompt Processing 512. Linux 6.17 + Mesa 25.3-dev was the fastest.
Llama.cpp benchmark with settings of Backend: CPU, Model: gpt-oss-20b-Q8_0, Test: Prompt Processing 1024. Linux 6.17 + Mesa 25.3-dev was the fastest.
Llama.cpp benchmark with settings of Backend: CPU, Model: Llama-3.1-Tulu-3-8B-Q8_0, Test: Text Generation 128. Linux 6.17 + Mesa 25.3-dev was the fastest.
Llama.cpp benchmark with settings of Backend: CPU, Model: Llama-3.1-Tulu-3-8B-Q8_0, Test: Prompt Processing 512. Linux 6.17 + Mesa 25.3-dev was the fastest.
Llama.cpp benchmark with settings of Backend: CPU, Model: Llama-3.1-Tulu-3-8B-Q8_0, Test: Prompt Processing 1024. Linux 6.17 + Mesa 25.3-dev was the fastest.
Llama.cpp benchmark with settings of Backend: CPU, Model: Mistral-7B-Instruct-v0.3-Q8_0, Test: Text Generation 128. Linux 6.17 + Mesa 25.3-dev was the fastest.
Llama.cpp benchmark with settings of Backend: CPU, Model: Mistral-7B-Instruct-v0.3-Q8_0, Test: Prompt Processing 512. Linux 6.17 + Mesa 25.3-dev was the fastest.
Llama.cpp benchmark with settings of Backend: CPU, Model: Mistral-7B-Instruct-v0.3-Q8_0, Test: Prompt Processing 1024. Linux 6.17 + Mesa 25.3-dev was the fastest.
Llama.cpp benchmark with settings of Backend: CPU, Model: DeepSeek-R1-Distill-Llama-8B-Q8_0, Test: Text Generation 128. Linux 6.17 + Mesa 25.3-dev was the fastest.
Llama.cpp benchmark with settings of Backend: CPU, Model: granite-3.0-3b-a800m-instruct-Q8_0, Test: Text Generation 128. Linux 6.17 + Mesa 25.3-dev was the fastest.
Llama.cpp benchmark with settings of Backend: CPU, Model: DeepSeek-R1-Distill-Llama-8B-Q8_0, Test: Prompt Processing 512. Linux 6.17 + Mesa 25.3-dev was the fastest.
Llama.cpp benchmark with settings of Backend: CPU, Model: DeepSeek-R1-Distill-Llama-8B-Q8_0, Test: Prompt Processing 1024. Linux 6.17 + Mesa 25.3-dev was the fastest.
Llama.cpp benchmark with settings of Backend: CPU, Model: granite-3.0-3b-a800m-instruct-Q8_0, Test: Prompt Processing 512. Windows 11 25H2 was the fastest.
Llama.cpp benchmark with settings of Backend: CPU, Model: granite-3.0-3b-a800m-instruct-Q8_0, Test: Prompt Processing 1024. Windows 11 25H2 was the fastest.

Long story short, the best Llama.cpp CPU-based performance was found with Linux and when moving to Linux 6.17 were gains in some of the areas compared to running the Ryzen 9 9950X3D with Linux 6.14.

Related Articles