Build and Publish ROCm-Compatible Python Packages
Open, MediumPublic8 Estimated Story Points
Actions

Assigned To

Authored By

	isarantopoulos
	Dec 10 2024, 11:41 AM

Description

As an ML engineer I want to develop and automate the process of building Python packages (e.g., Flash Attention, vLLM) optimized for AMD ROCm and PyTorch. At the moment there are no published wheels for the packages we want to use in our deployments when we utilize an AMD GPU. Examples of these packages include:

quantization: AWQ, GPTQ, bitsandbytes (it is the only one that has an alpha release for a multiple backends)
flash attention 2
inference optimization frameworks: vllm

These packages should be production-ready and distributed as wheels through a public repository for streamlined deployment and usage.

While starting working on this the team has been using ml-labs to build these packages and test things. While ml-labs is a great environment to experiment and figure out what we need it is by no means a production environment.
We have the following requirements for this work:

an environment where we can use the upstream docker images that are based on rocm/pytorch
establish CI/CD processes that build and publish these packages

These could happen on Gitlab CI which also has a package registry where we can publish our wheels.
One thing to consider is the following: if we use the upstream docker images to do with work we need to make sure that the packages we built will work in our environment. The upstream uses ubuntu while we use debian. While we don't expect any issues this is something to keep in mind.

Details

Related Changes in GitLab:

	Title	Reference	Author	Source Branch	Dest Branch
	Draft: Add Debian/ROCm/Pytorch dockerfiles	repos/machine-learning/rocm-wheelhouse!1	isaranto	add-dockerfiles	main

Customize query in GitLab

Related Objects

Mentioned In: P71788 Error building bitsandbytes wheel on ml-lab
Mentioned Here: P72019 Build & Test vllm wheel using Python bdist_wheel
P71986 Build & Test bitsandbytes wheel using Python bdist_wheel
T382343: [LLM] ML-lab benchmarking
P71677 Build & Test Flash Attention 2 wheel using Python bdist_wheel
P71747 Build & Test AutoGPTQ wheel using Python bdist_wheel
P71748 Build & Test AutoAWQ wheels using Python bdist_wheel
P71788 Error building bitsandbytes wheel on ml-lab
P71890 Error building vLLM wheel on ml-lab

Event Timeline

isarantopoulos created this task.Dec 10 2024, 11:41 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 10 2024, 11:41 AM

isarantopoulos updated the task description. (Show Details)Dec 10 2024, 11:45 AM

isarantopoulos set the point value for this task to 8.

isarantopoulos moved this task from Unsorted to Ready To Go on the Machine-Learning-Team board.Dec 10 2024, 3:32 PM

isarantopoulos updated the task description. (Show Details)Dec 16 2024, 10:47 AM

isarantopoulos triaged this task as High priority.Dec 17 2024, 2:39 PM

isarantopoulos moved this task from Ready To Go to Blocked on the Machine-Learning-Team board.

I have created the following repository for this work on gitlab https://gitlab.wikimedia.org/repos/machine-learning/rocm-wheelhouse
I have identified the following issues which we should either resolve or decide on before proceeding with this work:

We need to use an image that has ROCm + PyTorch installed. The official images available on dockerhub use ubuntu and none of them uses python 3.11 which we use in debian stable (bookworm). We should either build a debian image with ROCm drivers installed for this purpose or we should reassess our production images and use ubuntu images for LLM workloads which would make things easier. There are no official ROCm packages for bookworm and likely trixie is going to support pytorch-rocm natively
access to 3rd party images from gitlab runners: our gitlab runners don't have access to all third party images available on dockerhub. More information can be found in the permission matrix on the wikitech page for gitlab runners. We could use the shared runners where we'd need to add the image we want in the allowed images (profile::gitlab::runner::allowed_images in operations/puppet/hieradata/cloud.yaml)

Since these python packages aren't going to be so frequent we should first make sure we have reproducible builds from a local environment and then reconsider if we need a CI/CD process or not.
The local env would consist of a docker image based on debian stable with rocm drivers installed + pytorch for rocm.
Then inside this image we can build flash attention and vllm (and any other python package we need that doesn't have a published wheel).

isaranto opened https://gitlab.wikimedia.org/repos/machine-learning/rocm-wheelhouse/-/merge_requests/1

Draft: Add Debian/ROCm/Pytorch dockerfiles

There is an official image now available for rocm/vllm for the MI300 ( blog | dockerhub images)
My understanding is that this image is built from the dockerfile on the vllm repo although this is not explicitly stated anywhere. However the 2 tags available for MI300 are ~22GB each. When I build the Dockerfile.rocm for MI210 the resulting image is ~36GB

DOCKER_BUILDKIT=1 docker build --build-arg FX_GFX_ARCHS="gfx90a" -f Dockerfile.rocm -t vllm-rocm

In order to run benchmarks(T382343#10437184) for awq, gptq, and fa2 in the ml-lab environment, I built wheels compatible with Debian 12 (bookworm), AMD ROCm 6.1, and torch 2.4.1+rocm6.1. Below is the process I used to build these wheels:

1.AutoGPTQ:

P71747 Build & Test AutoGPTQ wheel using Python bdist_wheel

1	# Create conda env or activate it if it already exists
2	$ conda create -n autogptq-env python=3.11 -y
3	$ conda activate autogptq-env
4	$ export PYTHONPATH="$(conda info --base)/envs/autogptq-env/lib/python3.11/site-packages:/srv/pytorch-rocm/venv/lib/python3.11/site-packages"
5
6
7	# Install build requirements
8	$ pip install numpy gekko pandas packaging ninja wheel setuptools
9
10
11	# Download autogptq repo
12	$ git clone https://github.com/PanQiWei/AutoGPTQ.git
13	$ cd AutoGPTQ/
14
15
16	# Build autogptq wheel
17	$ GPU_ARCHS=gfx90a PYTORCH_ROCM_ARCH=gfx90a ROCM_VERSION=6.1 python3 setup.py bdist_wheel
18
19
20	# On build completion, .whl is in dist/ dir
21	$ ls -al dist/
22	-rw-r--r-- 1 kevinbazira wikidev 711313 Dec 30 06:00 auto_gptq-0.8.0.dev0+rocm6.1-cp311-cp311-linux_x86_64.whl
23
24
25	# Deactivate the conda env
26	$ conda deactivate
27
28
29	# Test autogptq installation in another env - (in this case it's not a conda env)
30	$ python3 -m venv autogptq_from_bdist_wheel_venv
31	$ source autogptq_from_bdist_wheel_venv/bin/activate
32	$ export "PYTHONPATH=/srv/pytorch-rocm/venv/lib/python3.11/site-packages/:$PYTHONPATH"
33	$ pip install wheel setuptools
34	$ pip install auto_gptq-0.8.0.dev0+rocm6.1-cp311-cp311-linux_x86_64.whl
35
36
37	$ pip list
38	Package Version
39	------------------- ------------------
40	accelerate 1.2.1
41	aiohappyeyeballs 2.4.4
42	aiohttp 3.11.11
43	aiosignal 1.3.2
44	attrs 24.3.0
45	auto_gptq 0.8.0.dev0+rocm6.1
46	certifi 2024.12.14
47	charset-normalizer 3.4.1
48	datasets 3.2.0
49	dill 0.3.8
50	filelock 3.13.1
51	frozenlist 1.5.0
52	fsspec 2024.2.0
53	gekko 1.2.1
54	huggingface-hub 0.27.0
55	idna 3.10
56	Jinja2 3.1.3
57	MarkupSafe 2.1.5
58	mpmath 1.3.0
59	multidict 6.1.0
60	multiprocess 0.70.16
61	networkx 3.2.1
62	numpy 1.26.3
63	packaging 24.2
64	pandas 2.2.3
65	peft 0.14.0
66	pillow 10.2.0
67	pip 23.0.1
68	propcache 0.2.1
69	psutil 6.1.1
70	pyarrow 18.1.0
71	python-dateutil 2.9.0.post0
72	pytorch-triton-rocm 3.0.0
73	pytz 2024.2
74	PyYAML 6.0.2
75	regex 2024.11.6
76	requests 2.32.3
77	rouge 1.0.1
78	safetensors 0.4.5
79	sentencepiece 0.2.0
80	setuptools 66.1.1
81	six 1.17.0
82	sympy 1.12
83	threadpoolctl 3.5.0
84	tokenizers 0.21.0
85	torch 2.4.1+rocm6.1
86	torchaudio 2.4.1+rocm6.1
87	torchvision 0.19.1+rocm6.1
88	tqdm 4.67.1
89	transformers 4.47.1
90	typing_extensions 4.9.0
91	tzdata 2024.2
92	urllib3 2.3.0
93	wheel 0.44.0
94	xxhash 3.5.0
95	yarl 1.18.3
96
97
98	$ python3
99	Python 3.11.2 (main, Sep 14 2024, 03:00:30) [GCC 12.2.0] on linux
100	Type "help", "copyright", "credits" or "license" for more information.
101	>>> import auto_gptq
102	/srv/home/kevinbazira/test_autogptq_wheel/autogptq_from_bdist_wheel_venvx/lib/python3.11/site-packages/auto_gptq/nn_modules/triton_utils/kernels.py:410: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
103	@custom_fwd
104	/srv/home/kevinbazira/test_autogptq_wheel/autogptq_from_bdist_wheel_venvx/lib/python3.11/site-packages/auto_gptq/nn_modules/triton_utils/kernels.py:418: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
105	@custom_bwd
106	/srv/home/kevinbazira/test_autogptq_wheel/autogptq_from_bdist_wheel_venvx/lib/python3.11/site-packages/auto_gptq/nn_modules/triton_utils/kernels.py:461: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
107	@custom_fwd(cast_inputs=torch.float16)
108	>>>
109	>>>
110	>>> auto_gptq
111	<module 'auto_gptq' from '/srv/home/kevinbazira/test_autogptq_wheel/autogptq_from_bdist_wheel_venvx/lib/python3.11/site-packages/auto_gptq/__init__.py'>
112	>>>

2.AutoAWQ Kernels and AutoAWQ:

P71748 Build & Test AutoAWQ wheels using Python bdist_wheel

1	# Create conda env or activate it if it already exists
2	$ conda create -n autoawq-env python=3.11 -y
3	$ conda activate autoawq-env
4	$ export PYTHONPATH="$(conda info --base)/envs/autoawq-env/lib/python3.11/site-packages:/srv/pytorch-rocm/venv/lib/python3.11/site-packages"
5
6
7	# Install build requirements
8	$ pip install numpy packaging ninja wheel setuptools
9
10
11	#####################################################################
12	# BUILD AutoAWQ KERNELS WHEEL FIRST #
13	#####################################################################
14
15
16	# Download autoawq kernels repo
17	git clone https://github.com/casper-hansen/AutoAWQ_kernels.git
18	cd AutoAWQ_kernels/
19
20
21	# Build autoawq kernels wheel
22	$ GPU_ARCHS=gfx90a PYTORCH_ROCM_ARCH=gfx90a ROCM_VERSION=6.1 python3 setup.py bdist_wheel
23
24
25	# On build completion, the autoawq kernels .whl is in dist/ dir
26	$ ls -al dist/
27	-rw-r--r-- 1 kevinbazira wikidev 359228 Dec 31 06:06 autoawq_kernels-0.0.9+rocm61-cp311-cp311-linux_x86_64.whl
28
29
30	#####################################################################
31	# BUILD AutoAWQ WHEEL AFTER BUILDING KERNELS WHEEL AS SHOWN ABOVE #
32	#####################################################################
33
34
35	# Install autoawq kernels using the wheel
36	$ pip install autoawq_kernels-0.0.9+rocm61-cp311-cp311-linux_x86_64.whl
37
38
39	# Download autoawq repo
40	git clone https://github.com/casper-hansen/AutoAWQ.git
41	cd AutoAWQ/
42
43
44	# Build autoawq wheel
45	$ GPU_ARCHS=gfx90a PYTORCH_ROCM_ARCH=gfx90a ROCM_VERSION=6.1 python3 setup.py bdist_wheel
46
47
48	# On build completion, the autoawq .whl is in dist/ dir
49	$ ls -al dist/
50	-rw-r--r-- 1 kevinbazira wikidev 107464 Dec 31 06:24 autoawq-0.2.7.post3-py3-none-any.whl
51
52
53	#####################################################################
54	# TEST BOTH AutoAWQ KERNELS & AutoAWQ WHEELS #
55	#####################################################################
56
57
58	# Deactivate the conda env
59	$ conda deactivate
60
61
62	# Test autoawq installation in another env - (in this case it's not a conda env)
63	$ python3 -m venv autoawq_from_bdist_wheel_venv
64	$ source autoawq_from_bdist_wheel_venv/bin/activate
65	$ export "PYTHONPATH=/srv/pytorch-rocm/venv/lib/python3.11/site-packages/:$PYTHONPATH"
66	$ pip install wheel setuptools
67	$ pip install autoawq_kernels-0.0.9+rocm61-cp311-cp311-linux_x86_64.whl
68	$ pip install autoawq-0.2.7.post3-py3-none-any.whl
69
70
71	$ pip list
72	Package Version
73	------------------- --------------
74	accelerate 1.2.1
75	aiohappyeyeballs 2.4.4
76	aiohttp 3.11.11
77	aiosignal 1.3.2
78	attrs 24.3.0
79	autoawq 0.2.7.post3
80	autoawq_kernels 0.0.9+rocm61
81	certifi 2024.12.14
82	charset-normalizer 3.4.1
83	datasets 3.2.0
84	dill 0.3.8
85	filelock 3.13.1
86	frozenlist 1.5.0
87	fsspec 2024.2.0
88	huggingface-hub 0.27.0
89	idna 3.10
90	Jinja2 3.1.3
91	MarkupSafe 2.1.5
92	mpmath 1.3.0
93	multidict 6.1.0
94	multiprocess 0.70.16
95	networkx 3.2.1
96	numpy 1.26.3
97	packaging 24.2
98	pandas 2.2.3
99	pillow 10.2.0
100	pip 23.0.1
101	propcache 0.2.1
102	psutil 6.1.1
103	pyarrow 18.1.0
104	python-dateutil 2.9.0.post0
105	pytorch-triton-rocm 3.0.0
106	pytz 2024.2
107	PyYAML 6.0.2
108	regex 2024.11.6
109	requests 2.32.3
110	safetensors 0.4.5
111	setuptools 66.1.1
112	six 1.17.0
113	sympy 1.12
114	tokenizers 0.21.0
115	torch 2.4.1+rocm6.1
116	torchaudio 2.4.1+rocm6.1
117	torchvision 0.19.1+rocm6.1
118	tqdm 4.67.1
119	transformers 4.47.1
120	triton 3.1.0
121	typing_extensions 4.9.0
122	tzdata 2024.2
123	urllib3 2.3.0
124	wheel 0.45.1
125	xxhash 3.5.0
126	yarl 1.18.3
127	zstandard 0.23.0
128
129
130	$ python3
131	Python 3.11.2 (main, Sep 14 2024, 03:00:30) [GCC 12.2.0] on linux
132	Type "help", "copyright", "credits" or "license" for more information.
133	>>> import awq
134	>>>
135	>>>
136	>>> awq
137	<module 'awq' from '/srv/home/kevinbazira/autoawq_from_bdist_wheel_venv/lib/python3.11/site-packages/awq/__init__.py'>
138	>>>

3.Flash Attention 2:

P71677 Build & Test Flash Attention 2 wheel using Python bdist_wheel

1	# Create conda env or activate it if it already exists
2	$ conda create -n flash-env python=3.11 -y
3	$ conda activate flash-env
4	$ pip install -U ninja packaging wheel
5	$ export PYTHONPATH="$(conda info --base)/envs/flash-env/lib/python3.11/site-packages:/srv/pytorch-rocm/venv/lib/python3.11/site-packages"
6
7	# Install build requirements
8	$ pip install wheel setuptools
9
10	# Download fa2 repo
11	$ git clone https://github.com/ROCm/flash-attention.git
12	$ cd flash-attention/
13
14	# Build fa2 wheel
15	$ GPU_ARCHS=gfx90a PYTORCH_ROCM_ARCH=gfx90a python3 setup.py bdist_wheel
16
17	# On build completion, .whl is in dist/ dir
18	$ ls -al dist/
19	-rw-r--r-- 1 kevinbazira wikidev 42340756 Dec 10 12:23 flash_attn-2.7.0.post2-cp311-cp311-linux_x86_64.whl
20
21	# Deactivate the conda env
22	$ conda deactivate flash-env
23
24	# Test fa2 installation in another env - (in this case it's not a conda env)
25	$ python3 -m venv fa2_from_bdist_wheel_venv
26	$ source fa2_from_bdist_wheel_venv/bin/activate
27	$ export "PYTHONPATH=/srv/pytorch-rocm/venv/lib/python3.11/site-packages/:$PYTHONPATH"
28	$ pip install wheel setuptools
29	$ pip install flash_attn-2.7.0.post2-cp311-cp311-linux_x86_64.whl
30
31	$ pip list
32	Package Version
33	------------------- --------------
34	einops 0.8.0
35	filelock 3.13.1
36	flash_attn 2.7.0.post2
37	fsspec 2024.2.0
38	Jinja2 3.1.3
39	MarkupSafe 2.1.5
40	mpmath 1.3.0
41	networkx 3.2.1
42	numpy 1.26.3
43	pillow 10.2.0
44	pip 23.0.1
45	pytorch-triton-rocm 3.0.0
46	setuptools 66.1.1
47	sympy 1.12
48	torch 2.4.1+rocm6.1
49	torchaudio 2.4.1+rocm6.1
50	torchvision 0.19.1+rocm6.1
51	typing_extensions 4.9.0
52	wheel 0.45.1
53
54	$ python3
55	Python 3.11.2 (main, Sep 14 2024, 03:00:30) [GCC 12.2.0] on linux
56	Type "help", "copyright", "credits" or "license" for more information.
57	>>> import flash_attn
58	>>>

This process failed for both bitsandbytes(P71788) and vllm(P71890) wheels. The successfully built wheels can be found here: https://gitlab.wikimedia.org/repos/machine-learning/huggingface-optimum-benchmark-automation/-/tree/main/wheels

In T381859#10443645, @kevinbazira wrote:

This process failed for bitsandbytes(P71788) wheels.

The problem here is that hip header files get picked up from /usr/include/hip instead of /opt/rocm/include:

...
In file included from /usr/include/hip/hip_fp16.h:29:                           
/usr/include/hip/amd_detail/amd_hip_fp16.h:1520:21: error: use of undeclared identifier '__llvm_amdgcn_rcp_f16'
 1520 |                     __llvm_amdgcn_rcp_f16(static_cast<__half_raw>(x).data)};
      | 
...

These files are compatible with ROCm 5.2.x instead of 6.1.x (and the bundled llvm). The way I was able to build it was by running:

export CPLUS_INCLUDE_PATH=/opt/rocm-6.1.0/include

before running cmake ... and subsequent commands, so that this path gets searched before /usr/include for header files.
I haven't tried building vllm but looking at your logs, it looks like this is also at least part of the problem there:

/usr/include/hip/amd_detail/amd_hip_fp16.h:1520:21: error: use of undeclared identifier '__llvm_amdgcn_rcp_f16'
 1520 |                     __llvm_amdgcn_rcp_f16(static_cast<__half_raw>(x).data)};
      |                     ^

and so the above workaround should in theory also help to unblock us there.

In T381859#10448020, @MunizaA wrote:

In T381859#10443645, @kevinbazira wrote:

This process failed for bitsandbytes(P71788) wheels.

The problem here is that hip header files get picked up from /usr/include/hip instead of /opt/rocm/include

Thank you so much for this tip Muniza, after setting C++ headers location, the bitsandbytes wheel built successfully:

P71986 Build & Test bitsandbytes wheel using Python bdist_wheel

1	# Create conda env or activate it if it already exists
2	$ conda create -n bitsandbytes-env python=3.11 -y
3	$ conda activate bitsandbytes-env
4	$ export PYTHONPATH="$(conda info --base)/envs/bitsandbytes-env/lib/python3.11/site-packages:/srv/pytorch-rocm/venv/lib/python3.11/site-packages"
5
6
7	# Clone the github repo
8	$ git clone --recurse https://github.com/ROCm/bitsandbytes.git
9	$ cd bitsandbytes
10	$ git checkout rocm_enabled_multi_backend
11
12
13	# Install dependencies
14	$ pip install -r requirements-dev.txt
15
16
17	# Set C++ headers location. Thanks to Muniza in T381859#10448020
18	$ export CPLUS_INCLUDE_PATH=/opt/rocm-6.1.0/include
19
20
21	# Use -DBNB_ROCM_ARCH to specify target GPU arch
22	$ cmake -DBNB_ROCM_ARCH="gfx90a" -DCOMPUTE_BACKEND=hip -S .
23
24
25	# Compile the project
26	$ make
27
28
29	# Build autogptq wheel
30	$ GPU_ARCHS=gfx90a PYTORCH_ROCM_ARCH=gfx90a ROCM_VERSION=6.1 python3 setup.py bdist_wheel
31
32
33	# On build completion, .whl is in dist/ dir
34	$ ls -al dist/
35	-rw-r--r-- 1 kevinbazira wikidev 453559 Jan 13 07:23 bitsandbytes-0.43.3.dev0-cp311-cp311-linux_x86_64.whl
36
37
38	# Deactivate the conda env
39	$ conda deactivate
40
41
42	# Test bitsandbytes installation in another env - (in this case it's not a conda env)
43	$ python3 -m venv bitsandbytes_from_bdist_wheel_venv
44	$ source bitsandbytes_from_bdist_wheel_venv/bin/activate
45	$ export "PYTHONPATH=/srv/pytorch-rocm/venv/lib/python3.11/site-packages/:$PYTHONPATH"
46	$ pip install wheel setuptools
47	$ pip install bitsandbytes-0.43.3.dev0-cp311-cp311-linux_x86_64.whl
48
49
50	$ pip list
51	Package Version
52	------------------- --------------
53	bitsandbytes 0.43.3.dev0
54	contourpy 1.3.1
55	cycler 0.12.1
56	einops 0.8.0
57	filelock 3.13.1
58	fonttools 4.55.3
59	fsspec 2024.2.0
60	iniconfig 2.0.0
61	Jinja2 3.1.3
62	kiwisolver 1.4.8
63	lion-pytorch 0.2.3
64	MarkupSafe 2.1.5
65	matplotlib 3.9.4
66	mpmath 1.3.0
67	networkx 3.2.1
68	numpy 1.26.3
69	packaging 24.2
70	pandas 2.2.3
71	pillow 10.2.0
72	pip 23.0.1
73	pluggy 1.5.0
74	pyparsing 3.2.1
75	pytest 8.3.4
76	python-dateutil 2.9.0.post0
77	pytorch-triton-rocm 3.0.0
78	pytz 2024.2
79	scipy 1.14.1
80	setuptools 66.1.1
81	six 1.17.0
82	sympy 1.12
83	torch 2.4.1+rocm6.1
84	torchaudio 2.4.1+rocm6.1
85	torchvision 0.19.1+rocm6.1
86	typing_extensions 4.9.0
87	tzdata 2024.2
88	wheel 0.43.0
89
90
91	$ python3
92	Python 3.11.2 (main, Sep 14 2024, 03:00:30) [GCC 12.2.0] on linux
93	Type "help", "copyright", "credits" or "license" for more information.
94	>>> import bitsandbytes
95	g++ (Debian 12.2.0-14) 12.2.0
96	Copyright (C) 2022 Free Software Foundation, Inc.
97	This is free software; see the source for copying conditions. There is NO
98	warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
99
100	>>>

It can be found here: https://gitlab.wikimedia.org/repos/machine-learning/huggingface-optimum-benchmark-automation/-/blob/main/wheels/bitsandbytes-0.43.3.dev0-cp311-cp311-linux_x86_64.whl

kevinbazira mentioned this in P71788 Error building bitsandbytes wheel on ml-lab.Jan 13 2025, 8:05 AM

In T381859#10448020, @MunizaA wrote:

In T381859#10443645, @kevinbazira wrote:

This process failed for vllm(P71890) wheels.

The problem here is that hip header files get picked up from /usr/include/hip instead of /opt/rocm/include
I haven't tried building vllm but looking at your logs, it looks like this is also at least part of the problem there

Indeed the build process picking C++ headers from the wrong location was part of the problem. After setting C++ headers location to point to ROCm 6.1.x, the vllm wheel built successfully:

P72019 Build & Test vllm wheel using Python bdist_wheel

1	# Create conda env or activate it if it already exists
2	$ conda create -n vllm-env python=3.11 -y
3	$ conda activate vllm-env
4	$ export PYTHONPATH="$(conda info --base)/envs/vllm-env/lib/python3.11/site-packages:/srv/pytorch-rocm/venv/lib/python3.11/site-packages"
5
6
7	# Install build requirements
8	$ pip install packaging ninja cmake wheel setuptools setuptools_scm
9
10
11	# Download vllm repo
12	$ git clone https://github.com/vllm-project/vllm.git
13	$ cd vllm/
14
15
16	# use old commit that supported rocm6.1:
17	# https://github.com/vllm-project/vllm/blob/ee5f34b1c2c71b2d56054a5ca23fe1c50c1458bb/Dockerfile.rocm#L2
18	$ git checkout ee5f34b1c2c71b2d56054a5ca23fe1c50c1458bb
19
20
21	# loosen requirements to support typing_extensions v4.9.0 installed on ml-lab
22	$ echo "psutil
23	sentencepiece # Required for LLaMA tokenizer.
24	numpy < 2.0.0
25	requests
26	tqdm
27	py-cpuinfo
28	transformers >= 4.43.2 # Required for Chameleon and Llama 3.1 hotfox.
29	tokenizers >= 0.19.1 # Required for Llama 3.
30	protobuf # Required by LlamaTokenizer.
31	# fastapi < 0.113.0; python_version < '3.9'
32	# fastapi >= 0.114.1; python_version >= '3.9'
33	fastapi
34	aiohttp
35	# openai >= 1.40.0 # Ensure modern openai package (ensure types module present)
36	openai
37	uvicorn[standard]
38	# pydantic >= 2.9 # Required for fastapi >= 0.113.0
39	# pydantic
40	pillow # Required for image processing
41	prometheus_client >= 0.18.0
42	prometheus-fastapi-instrumentator >= 7.0.0
43	tiktoken >= 0.6.0 # Required for DBRX tokenizer
44	lm-format-enforcer == 0.10.6
45	outlines >= 0.0.43, < 0.1
46	# typing_extesnion >= 4.10
47	typing_extensions <= 4.9.0
48	filelock >= 3.10.4 # filelock starts to support 'mode' argument from 3.10.4
49	partial-json-parser # used for parsing partial JSON outputs
50	pyzmq
51	msgspec
52	gguf == 0.10.0
53	importlib_metadata
54	# mistral_common >= 1.4.3
55	pyyaml
56	six>=1.16.0; python_version > '3.11' # transitive dependency of pandas that needs to be the latest version for python 3.12
57	setuptools>=74.1.1; python_version > '3.11' # Setuptools is used by triton, we need to ensure a modern version is installed for 3.12+ so that it does not try to import distutils, which was removed in 3.12
58	einops # Required for Qwen2-VL."> requirements-common.txt
59	$ pip install requirements-rocm.txt
60
61
62	# Build vllm wheel after setting C++ headers location; thanks to Muniza in T381859#10448020.
63	$ CPLUS_INCLUDE_PATH=/opt/rocm-6.1.0/include GPU_ARCHS=gfx90a PYTORCH_ROCM_ARCH=gfx90a ROCM_VERSION=6.1 python3 setup.py bdist_wheel
64
65
66	# On build completion, .whl is in dist/ dir
67	$ ls -al dist/
68	-rw-r--r-- 1 kevinbazira wikidev 19314245 Jan 14 06:51 vllm-0.6.1.post3.dev96+gee5f34b1.d20250114.rocm614-cp311-cp311-linux_x86_64.whl
69
70
71	# Deactivate the conda env
72	$ conda deactivate
73
74
75	# Test vllm installation in another env - (in this case it's not a conda env)
76	$ python3 -m venv vllm_from_bdist_wheel_venv
77	$ source vllm_from_bdist_wheel_venv/bin/activate
78	$ export "PYTHONPATH=/srv/pytorch-rocm/venv/lib/python3.11/site-packages/:$PYTHONPATH"
79	$ pip install wheel setuptools
80	$ pip install vllm-0.6.1.post3.dev96+gee5f34b1.d20250114.rocm614-cp311-cp311-linux_x86_64.whl
81
82
83	$ pip list
84	Package Version
85	--------------------------------- ---------------------------------------------
86	accelerate 1.2.1
87	aiohappyeyeballs 2.4.4
88	aiohttp 3.11.11
89	aiosignal 1.3.2
90	annotated-types 0.7.0
91	anyio 4.8.0
92	async-timeout 5.0.1
93	attrs 24.3.0
94	awscli 1.36.39
95	boto3 1.35.98
96	botocore 1.35.98
97	certifi 2024.12.14
98	charset-normalizer 3.4.1
99	click 8.1.8
100	cloudpickle 3.1.0
101	cmake 3.31.4
102	colorama 0.4.6
103	datasets 3.2.0
104	dill 0.3.8
105	diskcache 5.6.3
106	distro 1.9.0
107	docutils 0.16
108	einops 0.8.0
109	fastapi 0.115.6
110	filelock 3.13.1
111	frozenlist 1.5.0
112	fsspec 2024.2.0
113	gguf 0.10.0
114	h11 0.14.0
115	hiredis 3.1.0
116	httpcore 1.0.7
117	httptools 0.6.4
118	httpx 0.28.1
119	huggingface-hub 0.27.1
120	idna 3.10
121	importlib_metadata 8.5.0
122	iniconfig 2.0.0
123	interegular 0.3.3
124	Jinja2 3.1.3
125	jmespath 1.0.1
126	jsonschema 4.23.0
127	jsonschema-specifications 2024.10.1
128	lark 1.2.2
129	libnacl 2.1.0
130	llvmlite 0.43.0
131	lm-format-enforcer 0.10.6
132	MarkupSafe 2.1.5
133	mpmath 1.3.0
134	msgpack 1.1.0
135	msgspec 0.19.0
136	multidict 6.1.0
137	multiprocess 0.70.16
138	nest-asyncio 1.6.0
139	networkx 3.2.1
140	ninja 1.11.1.3
141	numba 0.60.0
142	numpy 1.26.3
143	openai 1.39.0
144	outlines 0.0.46
145	packaging 24.2
146	pandas 2.2.3
147	partial-json-parser 0.2.1.1.post5
148	peft 0.14.0
149	pillow 10.2.0
150	pip 23.0.1
151	pluggy 1.5.0
152	prometheus_client 0.21.1
153	prometheus-fastapi-instrumentator 7.0.0
154	propcache 0.2.1
155	protobuf 5.29.3
156	psutil 6.1.1
157	py-cpuinfo 9.0.0
158	pyairports 2.1.1
159	pyarrow 18.1.0
160	pyasn1 0.6.1
161	pycountry 24.6.1
162	pydantic 2.9.2
163	pydantic_core 2.23.4
164	pytest 8.3.4
165	pytest-asyncio 0.25.2
166	python-dateutil 2.9.0.post0
167	python-dotenv 1.0.1
168	pytorch-triton-rocm 3.0.0
169	pytz 2024.2
170	PyYAML 6.0.2
171	pyzmq 26.2.0
172	ray 2.40.0
173	redis 5.2.1
174	referencing 0.35.1
175	regex 2024.11.6
176	requests 2.32.3
177	rpds-py 0.22.3
178	rsa 4.7.2
179	s3transfer 0.10.4
180	safetensors 0.5.2
181	sentencepiece 0.2.0
182	setuptools 66.1.1
183	setuptools-scm 8.1.0
184	six 1.17.0
185	sniffio 1.3.1
186	starlette 0.41.3
187	sympy 1.12
188	tensorizer 2.9.1
189	tiktoken 0.8.0
190	tokenizers 0.21.0
191	torch 2.4.1+rocm6.1
192	torchaudio 2.4.1+rocm6.1
193	torchvision 0.19.1+rocm6.1
194	tqdm 4.67.1
195	transformers 4.48.0
196	typing_extensions 4.9.0
197	tzdata 2024.2
198	urllib3 2.3.0
199	uvicorn 0.34.0
200	uvloop 0.21.0
201	vllm 0.6.1.post3.dev96+gee5f34b1.d20250114.rocm614
202	watchfiles 1.0.4
203	websockets 14.1
204	wheel 0.44.0
205	xxhash 3.5.0
206	yarl 1.18.3
207	zipp 3.21.0
208
209
210	$ python3
211	Python 3.11.2 (main, Sep 14 2024, 03:00:30) [GCC 12.2.0] on linux
212	Type "help", "copyright", "credits" or "license" for more information.
213	>>> import vllm
214	Traceback (most recent call last):
215	File "<stdin>", line 1, in <module>
216	File "/srv/home/kevinbazira/build_vllm_wheel/test/vllm_from_bdist_wheel_venv/lib/python3.11/site-packages/vllm/__init__.py", line 3, in <module>
217	from vllm.engine.arg_utils import AsyncEngineArgs, EngineArgs
218	File "/srv/home/kevinbazira/build_vllm_wheel/test/vllm_from_bdist_wheel_venv/lib/python3.11/site-packages/vllm/engine/arg_utils.py", line 11, in <module>
219	from vllm.config import (CacheConfig, ConfigFormat, DecodingConfig,
220	File "/srv/home/kevinbazira/build_vllm_wheel/test/vllm_from_bdist_wheel_venv/lib/python3.11/site-packages/vllm/config.py", line 12, in <module>
221	from vllm.model_executor.layers.quantization import QUANTIZATION_METHODS
222	File "/srv/home/kevinbazira/build_vllm_wheel/test/vllm_from_bdist_wheel_venv/lib/python3.11/site-packages/vllm/model_executor/__init__.py", line 1, in <module>
223	from vllm.model_executor.parameter import (BasevLLMParameter,
224	File "/srv/home/kevinbazira/build_vllm_wheel/test/vllm_from_bdist_wheel_venv/lib/python3.11/site-packages/vllm/model_executor/parameter.py", line 7, in <module>
225	from vllm.distributed import get_tensor_model_parallel_rank
226	File "/srv/home/kevinbazira/build_vllm_wheel/test/vllm_from_bdist_wheel_venv/lib/python3.11/site-packages/vllm/distributed/__init__.py", line 1, in <module>
227	from .communication_op import *
228	File "/srv/home/kevinbazira/build_vllm_wheel/test/vllm_from_bdist_wheel_venv/lib/python3.11/site-packages/vllm/distributed/communication_op.py", line 6, in <module>
229	from .parallel_state import get_tp_group
230	File "/srv/home/kevinbazira/build_vllm_wheel/test/vllm_from_bdist_wheel_venv/lib/python3.11/site-packages/vllm/distributed/parallel_state.py", line 39, in <module>
231	from vllm.utils import supports_custom_op
232	File "/srv/home/kevinbazira/build_vllm_wheel/test/vllm_from_bdist_wheel_venv/lib/python3.11/site-packages/vllm/utils.py", line 33, in <module>
233	from typing_extensions import ParamSpec, TypeIs, assert_never
234	ImportError: cannot import name 'TypeIs' from 'typing_extensions' (/srv/pytorch-rocm/venv/lib/python3.11/site-packages/typing_extensions.py)
235	>>>
236	>>>
237	>>>

This wheel is not yet working as expected because of the typing_extensions dependency (P72019#288792) but once this is fixed all should be good.🤞

In an effort to build packages locally but with docker I've made the following attempt:

✅ use official rocm image based in ubuntu 22.04 ( rocm/dev-ubuntu-22.04:6.1-complete )
✅ install python3.11 
✅ create a virtual environment and install any required dependencies + torch-rocm
❌ build flash attention

I am getting the following error

python -m build --no-isolation --wheel .
* Getting build dependencies for wheel...

ERROR Backend subprocess exited when trying to invoke get_requires_for_build_wheel

and if I pass --skip-dependency-check it will again fail in the build_wheel step.
Of course there is something wrong with the docker image which can be found in the open MR.

isarantopoulos moved this task from Blocked to Unsorted on the Machine-Learning-Team board.Jan 21 2025, 2:58 PM

isarantopoulos moved this task from Unsorted to In Progress on the Machine-Learning-Team board.Jan 21 2025, 3:28 PM

isarantopoulos claimed this task.Jan 21 2025, 3:33 PM

isarantopoulos lowered the priority of this task from High to Medium.Jan 28 2025, 2:38 PM

isarantopoulos moved this task from In Progress to Blocked on the Machine-Learning-Team board.Feb 11 2025, 9:06 AM

Build and Publish ROCm-Compatible Python PackagesOpen, MediumPublic8 Estimated Story PointsActions

Description

Details

Related Objects

Event Timeline

Build and Publish ROCm-Compatible Python Packages
Open, MediumPublic8 Estimated Story Points
Actions