Page MenuHomePhabricator

Build and Publish ROCm-Compatible Python Packages
Open, MediumPublic8 Estimated Story Points

Description

As an ML engineer I want to develop and automate the process of building Python packages (e.g., Flash Attention, vLLM) optimized for AMD ROCm and PyTorch. At the moment there are no published wheels for the packages we want to use in our deployments when we utilize an AMD GPU. Examples of these packages include:

These packages should be production-ready and distributed as wheels through a public repository for streamlined deployment and usage.

While starting working on this the team has been using ml-labs to build these packages and test things. While ml-labs is a great environment to experiment and figure out what we need it is by no means a production environment.
We have the following requirements for this work:

  • an environment where we can use the upstream docker images that are based on rocm/pytorch
  • establish CI/CD processes that build and publish these packages

These could happen on Gitlab CI which also has a package registry where we can publish our wheels.
One thing to consider is the following: if we use the upstream docker images to do with work we need to make sure that the packages we built will work in our environment. The upstream uses ubuntu while we use debian. While we don't expect any issues this is something to keep in mind.

Details

Related Changes in GitLab:
TitleReferenceAuthorSource BranchDest Branch
Draft: Add Debian/ROCm/Pytorch dockerfilesrepos/machine-learning/rocm-wheelhouse!1isarantoadd-dockerfilesmain
Customize query in GitLab

Event Timeline

isarantopoulos set the point value for this task to 8.
isarantopoulos moved this task from Ready To Go to Blocked on the Machine-Learning-Team board.

I have created the following repository for this work on gitlab https://gitlab.wikimedia.org/repos/machine-learning/rocm-wheelhouse
I have identified the following issues which we should either resolve or decide on before proceeding with this work:

  1. We need to use an image that has ROCm + PyTorch installed. The official images available on dockerhub use ubuntu and none of them uses python 3.11 which we use in debian stable (bookworm). We should either build a debian image with ROCm drivers installed for this purpose or we should reassess our production images and use ubuntu images for LLM workloads which would make things easier. There are no official ROCm packages for bookworm and likely trixie is going to support pytorch-rocm natively
  2. access to 3rd party images from gitlab runners: our gitlab runners don't have access to all third party images available on dockerhub. More information can be found in the permission matrix on the wikitech page for gitlab runners. We could use the shared runners where we'd need to add the image we want in the allowed images (profile::gitlab::runner::allowed_images in operations/puppet/hieradata/cloud.yaml)

Since these python packages aren't going to be so frequent we should first make sure we have reproducible builds from a local environment and then reconsider if we need a CI/CD process or not.
The local env would consist of a docker image based on debian stable with rocm drivers installed + pytorch for rocm.
Then inside this image we can build flash attention and vllm (and any other python package we need that doesn't have a published wheel).

There is an official image now available for rocm/vllm for the MI300 ( blog | dockerhub images)
My understanding is that this image is built from the dockerfile on the vllm repo although this is not explicitly stated anywhere. However the 2 tags available for MI300 are ~22GB each. When I build the Dockerfile.rocm for MI210 the resulting image is ~36GB

DOCKER_BUILDKIT=1 docker build --build-arg FX_GFX_ARCHS="gfx90a" -f Dockerfile.rocm -t vllm-rocm

In order to run benchmarks(T382343#10437184) for awq, gptq, and fa2 in the ml-lab environment, I built wheels compatible with Debian 12 (bookworm), AMD ROCm 6.1, and torch 2.4.1+rocm6.1. Below is the process I used to build these wheels:

1.AutoGPTQ:

1# Create conda env or activate it if it already exists
2$ conda create -n autogptq-env python=3.11 -y
3$ conda activate autogptq-env
4$ export PYTHONPATH="$(conda info --base)/envs/autogptq-env/lib/python3.11/site-packages:/srv/pytorch-rocm/venv/lib/python3.11/site-packages"
5
6
7# Install build requirements
8$ pip install numpy gekko pandas packaging ninja wheel setuptools
9
10
11# Download autogptq repo
12$ git clone https://github.com/PanQiWei/AutoGPTQ.git
13$ cd AutoGPTQ/
14
15
16# Build autogptq wheel
17$ GPU_ARCHS=gfx90a PYTORCH_ROCM_ARCH=gfx90a ROCM_VERSION=6.1 python3 setup.py bdist_wheel
18
19
20# On build completion, .whl is in dist/ dir
21$ ls -al dist/
22-rw-r--r-- 1 kevinbazira wikidev 711313 Dec 30 06:00 auto_gptq-0.8.0.dev0+rocm6.1-cp311-cp311-linux_x86_64.whl
23
24
25# Deactivate the conda env
26$ conda deactivate
27
28
29# Test autogptq installation in another env - (in this case it's not a conda env)
30$ python3 -m venv autogptq_from_bdist_wheel_venv
31$ source autogptq_from_bdist_wheel_venv/bin/activate
32$ export "PYTHONPATH=/srv/pytorch-rocm/venv/lib/python3.11/site-packages/:$PYTHONPATH"
33$ pip install wheel setuptools
34$ pip install auto_gptq-0.8.0.dev0+rocm6.1-cp311-cp311-linux_x86_64.whl
35
36
37$ pip list
38Package Version
39------------------- ------------------
40accelerate 1.2.1
41aiohappyeyeballs 2.4.4
42aiohttp 3.11.11
43aiosignal 1.3.2
44attrs 24.3.0
45auto_gptq 0.8.0.dev0+rocm6.1
46certifi 2024.12.14
47charset-normalizer 3.4.1
48datasets 3.2.0
49dill 0.3.8
50filelock 3.13.1
51frozenlist 1.5.0
52fsspec 2024.2.0
53gekko 1.2.1
54huggingface-hub 0.27.0
55idna 3.10
56Jinja2 3.1.3
57MarkupSafe 2.1.5
58mpmath 1.3.0
59multidict 6.1.0
60multiprocess 0.70.16
61networkx 3.2.1
62numpy 1.26.3
63packaging 24.2
64pandas 2.2.3
65peft 0.14.0
66pillow 10.2.0
67pip 23.0.1
68propcache 0.2.1
69psutil 6.1.1
70pyarrow 18.1.0
71python-dateutil 2.9.0.post0
72pytorch-triton-rocm 3.0.0
73pytz 2024.2
74PyYAML 6.0.2
75regex 2024.11.6
76requests 2.32.3
77rouge 1.0.1
78safetensors 0.4.5
79sentencepiece 0.2.0
80setuptools 66.1.1
81six 1.17.0
82sympy 1.12
83threadpoolctl 3.5.0
84tokenizers 0.21.0
85torch 2.4.1+rocm6.1
86torchaudio 2.4.1+rocm6.1
87torchvision 0.19.1+rocm6.1
88tqdm 4.67.1
89transformers 4.47.1
90typing_extensions 4.9.0
91tzdata 2024.2
92urllib3 2.3.0
93wheel 0.44.0
94xxhash 3.5.0
95yarl 1.18.3
96
97
98$ python3
99Python 3.11.2 (main, Sep 14 2024, 03:00:30) [GCC 12.2.0] on linux
100Type "help", "copyright", "credits" or "license" for more information.
101>>> import auto_gptq
102/srv/home/kevinbazira/test_autogptq_wheel/autogptq_from_bdist_wheel_venvx/lib/python3.11/site-packages/auto_gptq/nn_modules/triton_utils/kernels.py:410: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
103 @custom_fwd
104/srv/home/kevinbazira/test_autogptq_wheel/autogptq_from_bdist_wheel_venvx/lib/python3.11/site-packages/auto_gptq/nn_modules/triton_utils/kernels.py:418: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
105 @custom_bwd
106/srv/home/kevinbazira/test_autogptq_wheel/autogptq_from_bdist_wheel_venvx/lib/python3.11/site-packages/auto_gptq/nn_modules/triton_utils/kernels.py:461: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
107 @custom_fwd(cast_inputs=torch.float16)
108>>>
109>>>
110>>> auto_gptq
111<module 'auto_gptq' from '/srv/home/kevinbazira/test_autogptq_wheel/autogptq_from_bdist_wheel_venvx/lib/python3.11/site-packages/auto_gptq/__init__.py'>
112>>>

2.AutoAWQ Kernels and AutoAWQ:
1# Create conda env or activate it if it already exists
2$ conda create -n autoawq-env python=3.11 -y
3$ conda activate autoawq-env
4$ export PYTHONPATH="$(conda info --base)/envs/autoawq-env/lib/python3.11/site-packages:/srv/pytorch-rocm/venv/lib/python3.11/site-packages"
5
6
7# Install build requirements
8$ pip install numpy packaging ninja wheel setuptools
9
10
11#####################################################################
12# BUILD AutoAWQ KERNELS WHEEL FIRST #
13#####################################################################
14
15
16# Download autoawq kernels repo
17git clone https://github.com/casper-hansen/AutoAWQ_kernels.git
18cd AutoAWQ_kernels/
19
20
21# Build autoawq kernels wheel
22$ GPU_ARCHS=gfx90a PYTORCH_ROCM_ARCH=gfx90a ROCM_VERSION=6.1 python3 setup.py bdist_wheel
23
24
25# On build completion, the autoawq kernels .whl is in dist/ dir
26$ ls -al dist/
27-rw-r--r-- 1 kevinbazira wikidev 359228 Dec 31 06:06 autoawq_kernels-0.0.9+rocm61-cp311-cp311-linux_x86_64.whl
28
29
30#####################################################################
31# BUILD AutoAWQ WHEEL AFTER BUILDING KERNELS WHEEL AS SHOWN ABOVE #
32#####################################################################
33
34
35# Install autoawq kernels using the wheel
36$ pip install autoawq_kernels-0.0.9+rocm61-cp311-cp311-linux_x86_64.whl
37
38
39# Download autoawq repo
40git clone https://github.com/casper-hansen/AutoAWQ.git
41cd AutoAWQ/
42
43
44# Build autoawq wheel
45$ GPU_ARCHS=gfx90a PYTORCH_ROCM_ARCH=gfx90a ROCM_VERSION=6.1 python3 setup.py bdist_wheel
46
47
48# On build completion, the autoawq .whl is in dist/ dir
49$ ls -al dist/
50-rw-r--r-- 1 kevinbazira wikidev 107464 Dec 31 06:24 autoawq-0.2.7.post3-py3-none-any.whl
51
52
53#####################################################################
54# TEST BOTH AutoAWQ KERNELS & AutoAWQ WHEELS #
55#####################################################################
56
57
58# Deactivate the conda env
59$ conda deactivate
60
61
62# Test autoawq installation in another env - (in this case it's not a conda env)
63$ python3 -m venv autoawq_from_bdist_wheel_venv
64$ source autoawq_from_bdist_wheel_venv/bin/activate
65$ export "PYTHONPATH=/srv/pytorch-rocm/venv/lib/python3.11/site-packages/:$PYTHONPATH"
66$ pip install wheel setuptools
67$ pip install autoawq_kernels-0.0.9+rocm61-cp311-cp311-linux_x86_64.whl
68$ pip install autoawq-0.2.7.post3-py3-none-any.whl
69
70
71$ pip list
72Package Version
73------------------- --------------
74accelerate 1.2.1
75aiohappyeyeballs 2.4.4
76aiohttp 3.11.11
77aiosignal 1.3.2
78attrs 24.3.0
79autoawq 0.2.7.post3
80autoawq_kernels 0.0.9+rocm61
81certifi 2024.12.14
82charset-normalizer 3.4.1
83datasets 3.2.0
84dill 0.3.8
85filelock 3.13.1
86frozenlist 1.5.0
87fsspec 2024.2.0
88huggingface-hub 0.27.0
89idna 3.10
90Jinja2 3.1.3
91MarkupSafe 2.1.5
92mpmath 1.3.0
93multidict 6.1.0
94multiprocess 0.70.16
95networkx 3.2.1
96numpy 1.26.3
97packaging 24.2
98pandas 2.2.3
99pillow 10.2.0
100pip 23.0.1
101propcache 0.2.1
102psutil 6.1.1
103pyarrow 18.1.0
104python-dateutil 2.9.0.post0
105pytorch-triton-rocm 3.0.0
106pytz 2024.2
107PyYAML 6.0.2
108regex 2024.11.6
109requests 2.32.3
110safetensors 0.4.5
111setuptools 66.1.1
112six 1.17.0
113sympy 1.12
114tokenizers 0.21.0
115torch 2.4.1+rocm6.1
116torchaudio 2.4.1+rocm6.1
117torchvision 0.19.1+rocm6.1
118tqdm 4.67.1
119transformers 4.47.1
120triton 3.1.0
121typing_extensions 4.9.0
122tzdata 2024.2
123urllib3 2.3.0
124wheel 0.45.1
125xxhash 3.5.0
126yarl 1.18.3
127zstandard 0.23.0
128
129
130$ python3
131Python 3.11.2 (main, Sep 14 2024, 03:00:30) [GCC 12.2.0] on linux
132Type "help", "copyright", "credits" or "license" for more information.
133>>> import awq
134>>>
135>>>
136>>> awq
137<module 'awq' from '/srv/home/kevinbazira/autoawq_from_bdist_wheel_venv/lib/python3.11/site-packages/awq/__init__.py'>
138>>>

3.Flash Attention 2:
1# Create conda env or activate it if it already exists
2$ conda create -n flash-env python=3.11 -y
3$ conda activate flash-env
4$ pip install -U ninja packaging wheel
5$ export PYTHONPATH="$(conda info --base)/envs/flash-env/lib/python3.11/site-packages:/srv/pytorch-rocm/venv/lib/python3.11/site-packages"
6
7# Install build requirements
8$ pip install wheel setuptools
9
10# Download fa2 repo
11$ git clone https://github.com/ROCm/flash-attention.git
12$ cd flash-attention/
13
14# Build fa2 wheel
15$ GPU_ARCHS=gfx90a PYTORCH_ROCM_ARCH=gfx90a python3 setup.py bdist_wheel
16
17# On build completion, .whl is in dist/ dir
18$ ls -al dist/
19-rw-r--r-- 1 kevinbazira wikidev 42340756 Dec 10 12:23 flash_attn-2.7.0.post2-cp311-cp311-linux_x86_64.whl
20
21# Deactivate the conda env
22$ conda deactivate flash-env
23
24# Test fa2 installation in another env - (in this case it's not a conda env)
25$ python3 -m venv fa2_from_bdist_wheel_venv
26$ source fa2_from_bdist_wheel_venv/bin/activate
27$ export "PYTHONPATH=/srv/pytorch-rocm/venv/lib/python3.11/site-packages/:$PYTHONPATH"
28$ pip install wheel setuptools
29$ pip install flash_attn-2.7.0.post2-cp311-cp311-linux_x86_64.whl
30
31$ pip list
32Package Version
33------------------- --------------
34einops 0.8.0
35filelock 3.13.1
36flash_attn 2.7.0.post2
37fsspec 2024.2.0
38Jinja2 3.1.3
39MarkupSafe 2.1.5
40mpmath 1.3.0
41networkx 3.2.1
42numpy 1.26.3
43pillow 10.2.0
44pip 23.0.1
45pytorch-triton-rocm 3.0.0
46setuptools 66.1.1
47sympy 1.12
48torch 2.4.1+rocm6.1
49torchaudio 2.4.1+rocm6.1
50torchvision 0.19.1+rocm6.1
51typing_extensions 4.9.0
52wheel 0.45.1
53
54$ python3
55Python 3.11.2 (main, Sep 14 2024, 03:00:30) [GCC 12.2.0] on linux
56Type "help", "copyright", "credits" or "license" for more information.
57>>> import flash_attn
58>>>

This process failed for both bitsandbytes(P71788) and vllm(P71890) wheels. The successfully built wheels can be found here: https://gitlab.wikimedia.org/repos/machine-learning/huggingface-optimum-benchmark-automation/-/tree/main/wheels

This process failed for bitsandbytes(P71788) wheels.

The problem here is that hip header files get picked up from /usr/include/hip instead of /opt/rocm/include:

...
In file included from /usr/include/hip/hip_fp16.h:29:                           
/usr/include/hip/amd_detail/amd_hip_fp16.h:1520:21: error: use of undeclared identifier '__llvm_amdgcn_rcp_f16'
 1520 |                     __llvm_amdgcn_rcp_f16(static_cast<__half_raw>(x).data)};
      | 
...

These files are compatible with ROCm 5.2.x instead of 6.1.x (and the bundled llvm). The way I was able to build it was by running:

export CPLUS_INCLUDE_PATH=/opt/rocm-6.1.0/include

before running cmake ... and subsequent commands, so that this path gets searched before /usr/include for header files.
I haven't tried building vllm but looking at your logs, it looks like this is also at least part of the problem there:

/usr/include/hip/amd_detail/amd_hip_fp16.h:1520:21: error: use of undeclared identifier '__llvm_amdgcn_rcp_f16'
 1520 |                     __llvm_amdgcn_rcp_f16(static_cast<__half_raw>(x).data)};
      |                     ^

and so the above workaround should in theory also help to unblock us there.

This process failed for bitsandbytes(P71788) wheels.

The problem here is that hip header files get picked up from /usr/include/hip instead of /opt/rocm/include

Thank you so much for this tip Muniza, after setting C++ headers location, the bitsandbytes wheel built successfully:

1# Create conda env or activate it if it already exists
2$ conda create -n bitsandbytes-env python=3.11 -y
3$ conda activate bitsandbytes-env
4$ export PYTHONPATH="$(conda info --base)/envs/bitsandbytes-env/lib/python3.11/site-packages:/srv/pytorch-rocm/venv/lib/python3.11/site-packages"
5
6
7# Clone the github repo
8$ git clone --recurse https://github.com/ROCm/bitsandbytes.git
9$ cd bitsandbytes
10$ git checkout rocm_enabled_multi_backend
11
12
13# Install dependencies
14$ pip install -r requirements-dev.txt
15
16
17# Set C++ headers location. Thanks to Muniza in T381859#10448020
18$ export CPLUS_INCLUDE_PATH=/opt/rocm-6.1.0/include
19
20
21# Use -DBNB_ROCM_ARCH to specify target GPU arch
22$ cmake -DBNB_ROCM_ARCH="gfx90a" -DCOMPUTE_BACKEND=hip -S .
23
24
25# Compile the project
26$ make
27
28
29# Build autogptq wheel
30$ GPU_ARCHS=gfx90a PYTORCH_ROCM_ARCH=gfx90a ROCM_VERSION=6.1 python3 setup.py bdist_wheel
31
32
33# On build completion, .whl is in dist/ dir
34$ ls -al dist/
35-rw-r--r-- 1 kevinbazira wikidev 453559 Jan 13 07:23 bitsandbytes-0.43.3.dev0-cp311-cp311-linux_x86_64.whl
36
37
38# Deactivate the conda env
39$ conda deactivate
40
41
42# Test bitsandbytes installation in another env - (in this case it's not a conda env)
43$ python3 -m venv bitsandbytes_from_bdist_wheel_venv
44$ source bitsandbytes_from_bdist_wheel_venv/bin/activate
45$ export "PYTHONPATH=/srv/pytorch-rocm/venv/lib/python3.11/site-packages/:$PYTHONPATH"
46$ pip install wheel setuptools
47$ pip install bitsandbytes-0.43.3.dev0-cp311-cp311-linux_x86_64.whl
48
49
50$ pip list
51Package Version
52------------------- --------------
53bitsandbytes 0.43.3.dev0
54contourpy 1.3.1
55cycler 0.12.1
56einops 0.8.0
57filelock 3.13.1
58fonttools 4.55.3
59fsspec 2024.2.0
60iniconfig 2.0.0
61Jinja2 3.1.3
62kiwisolver 1.4.8
63lion-pytorch 0.2.3
64MarkupSafe 2.1.5
65matplotlib 3.9.4
66mpmath 1.3.0
67networkx 3.2.1
68numpy 1.26.3
69packaging 24.2
70pandas 2.2.3
71pillow 10.2.0
72pip 23.0.1
73pluggy 1.5.0
74pyparsing 3.2.1
75pytest 8.3.4
76python-dateutil 2.9.0.post0
77pytorch-triton-rocm 3.0.0
78pytz 2024.2
79scipy 1.14.1
80setuptools 66.1.1
81six 1.17.0
82sympy 1.12
83torch 2.4.1+rocm6.1
84torchaudio 2.4.1+rocm6.1
85torchvision 0.19.1+rocm6.1
86typing_extensions 4.9.0
87tzdata 2024.2
88wheel 0.43.0
89
90
91$ python3
92Python 3.11.2 (main, Sep 14 2024, 03:00:30) [GCC 12.2.0] on linux
93Type "help", "copyright", "credits" or "license" for more information.
94>>> import bitsandbytes
95g++ (Debian 12.2.0-14) 12.2.0
96Copyright (C) 2022 Free Software Foundation, Inc.
97This is free software; see the source for copying conditions. There is NO
98warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
99
100>>>

It can be found here: https://gitlab.wikimedia.org/repos/machine-learning/huggingface-optimum-benchmark-automation/-/blob/main/wheels/bitsandbytes-0.43.3.dev0-cp311-cp311-linux_x86_64.whl

This process failed for vllm(P71890) wheels.

The problem here is that hip header files get picked up from /usr/include/hip instead of /opt/rocm/include
I haven't tried building vllm but looking at your logs, it looks like this is also at least part of the problem there

Indeed the build process picking C++ headers from the wrong location was part of the problem. After setting C++ headers location to point to ROCm 6.1.x, the vllm wheel built successfully:

1# Create conda env or activate it if it already exists
2$ conda create -n vllm-env python=3.11 -y
3$ conda activate vllm-env
4$ export PYTHONPATH="$(conda info --base)/envs/vllm-env/lib/python3.11/site-packages:/srv/pytorch-rocm/venv/lib/python3.11/site-packages"
5
6
7# Install build requirements
8$ pip install packaging ninja cmake wheel setuptools setuptools_scm
9
10
11# Download vllm repo
12$ git clone https://github.com/vllm-project/vllm.git
13$ cd vllm/
14
15
16# use old commit that supported rocm6.1:
17# https://github.com/vllm-project/vllm/blob/ee5f34b1c2c71b2d56054a5ca23fe1c50c1458bb/Dockerfile.rocm#L2
18$ git checkout ee5f34b1c2c71b2d56054a5ca23fe1c50c1458bb
19
20
21# loosen requirements to support typing_extensions v4.9.0 installed on ml-lab
22$ echo "psutil
23sentencepiece # Required for LLaMA tokenizer.
24numpy < 2.0.0
25requests
26tqdm
27py-cpuinfo
28transformers >= 4.43.2 # Required for Chameleon and Llama 3.1 hotfox.
29tokenizers >= 0.19.1 # Required for Llama 3.
30protobuf # Required by LlamaTokenizer.
31# fastapi < 0.113.0; python_version < '3.9'
32# fastapi >= 0.114.1; python_version >= '3.9'
33fastapi
34aiohttp
35# openai >= 1.40.0 # Ensure modern openai package (ensure types module present)
36openai
37uvicorn[standard]
38# pydantic >= 2.9 # Required for fastapi >= 0.113.0
39# pydantic
40pillow # Required for image processing
41prometheus_client >= 0.18.0
42prometheus-fastapi-instrumentator >= 7.0.0
43tiktoken >= 0.6.0 # Required for DBRX tokenizer
44lm-format-enforcer == 0.10.6
45outlines >= 0.0.43, < 0.1
46# typing_extesnion >= 4.10
47typing_extensions <= 4.9.0
48filelock >= 3.10.4 # filelock starts to support 'mode' argument from 3.10.4
49partial-json-parser # used for parsing partial JSON outputs
50pyzmq
51msgspec
52gguf == 0.10.0
53importlib_metadata
54# mistral_common >= 1.4.3
55pyyaml
56six>=1.16.0; python_version > '3.11' # transitive dependency of pandas that needs to be the latest version for python 3.12
57setuptools>=74.1.1; python_version > '3.11' # Setuptools is used by triton, we need to ensure a modern version is installed for 3.12+ so that it does not try to import distutils, which was removed in 3.12
58einops # Required for Qwen2-VL."> requirements-common.txt
59$ pip install requirements-rocm.txt
60
61
62# Build vllm wheel after setting C++ headers location; thanks to Muniza in T381859#10448020.
63$ CPLUS_INCLUDE_PATH=/opt/rocm-6.1.0/include GPU_ARCHS=gfx90a PYTORCH_ROCM_ARCH=gfx90a ROCM_VERSION=6.1 python3 setup.py bdist_wheel
64
65
66# On build completion, .whl is in dist/ dir
67$ ls -al dist/
68-rw-r--r-- 1 kevinbazira wikidev 19314245 Jan 14 06:51 vllm-0.6.1.post3.dev96+gee5f34b1.d20250114.rocm614-cp311-cp311-linux_x86_64.whl
69
70
71# Deactivate the conda env
72$ conda deactivate
73
74
75# Test vllm installation in another env - (in this case it's not a conda env)
76$ python3 -m venv vllm_from_bdist_wheel_venv
77$ source vllm_from_bdist_wheel_venv/bin/activate
78$ export "PYTHONPATH=/srv/pytorch-rocm/venv/lib/python3.11/site-packages/:$PYTHONPATH"
79$ pip install wheel setuptools
80$ pip install vllm-0.6.1.post3.dev96+gee5f34b1.d20250114.rocm614-cp311-cp311-linux_x86_64.whl
81
82
83$ pip list
84Package Version
85--------------------------------- ---------------------------------------------
86accelerate 1.2.1
87aiohappyeyeballs 2.4.4
88aiohttp 3.11.11
89aiosignal 1.3.2
90annotated-types 0.7.0
91anyio 4.8.0
92async-timeout 5.0.1
93attrs 24.3.0
94awscli 1.36.39
95boto3 1.35.98
96botocore 1.35.98
97certifi 2024.12.14
98charset-normalizer 3.4.1
99click 8.1.8
100cloudpickle 3.1.0
101cmake 3.31.4
102colorama 0.4.6
103datasets 3.2.0
104dill 0.3.8
105diskcache 5.6.3
106distro 1.9.0
107docutils 0.16
108einops 0.8.0
109fastapi 0.115.6
110filelock 3.13.1
111frozenlist 1.5.0
112fsspec 2024.2.0
113gguf 0.10.0
114h11 0.14.0
115hiredis 3.1.0
116httpcore 1.0.7
117httptools 0.6.4
118httpx 0.28.1
119huggingface-hub 0.27.1
120idna 3.10
121importlib_metadata 8.5.0
122iniconfig 2.0.0
123interegular 0.3.3
124Jinja2 3.1.3
125jmespath 1.0.1
126jsonschema 4.23.0
127jsonschema-specifications 2024.10.1
128lark 1.2.2
129libnacl 2.1.0
130llvmlite 0.43.0
131lm-format-enforcer 0.10.6
132MarkupSafe 2.1.5
133mpmath 1.3.0
134msgpack 1.1.0
135msgspec 0.19.0
136multidict 6.1.0
137multiprocess 0.70.16
138nest-asyncio 1.6.0
139networkx 3.2.1
140ninja 1.11.1.3
141numba 0.60.0
142numpy 1.26.3
143openai 1.39.0
144outlines 0.0.46
145packaging 24.2
146pandas 2.2.3
147partial-json-parser 0.2.1.1.post5
148peft 0.14.0
149pillow 10.2.0
150pip 23.0.1
151pluggy 1.5.0
152prometheus_client 0.21.1
153prometheus-fastapi-instrumentator 7.0.0
154propcache 0.2.1
155protobuf 5.29.3
156psutil 6.1.1
157py-cpuinfo 9.0.0
158pyairports 2.1.1
159pyarrow 18.1.0
160pyasn1 0.6.1
161pycountry 24.6.1
162pydantic 2.9.2
163pydantic_core 2.23.4
164pytest 8.3.4
165pytest-asyncio 0.25.2
166python-dateutil 2.9.0.post0
167python-dotenv 1.0.1
168pytorch-triton-rocm 3.0.0
169pytz 2024.2
170PyYAML 6.0.2
171pyzmq 26.2.0
172ray 2.40.0
173redis 5.2.1
174referencing 0.35.1
175regex 2024.11.6
176requests 2.32.3
177rpds-py 0.22.3
178rsa 4.7.2
179s3transfer 0.10.4
180safetensors 0.5.2
181sentencepiece 0.2.0
182setuptools 66.1.1
183setuptools-scm 8.1.0
184six 1.17.0
185sniffio 1.3.1
186starlette 0.41.3
187sympy 1.12
188tensorizer 2.9.1
189tiktoken 0.8.0
190tokenizers 0.21.0
191torch 2.4.1+rocm6.1
192torchaudio 2.4.1+rocm6.1
193torchvision 0.19.1+rocm6.1
194tqdm 4.67.1
195transformers 4.48.0
196typing_extensions 4.9.0
197tzdata 2024.2
198urllib3 2.3.0
199uvicorn 0.34.0
200uvloop 0.21.0
201vllm 0.6.1.post3.dev96+gee5f34b1.d20250114.rocm614
202watchfiles 1.0.4
203websockets 14.1
204wheel 0.44.0
205xxhash 3.5.0
206yarl 1.18.3
207zipp 3.21.0
208
209
210$ python3
211Python 3.11.2 (main, Sep 14 2024, 03:00:30) [GCC 12.2.0] on linux
212Type "help", "copyright", "credits" or "license" for more information.
213>>> import vllm
214Traceback (most recent call last):
215 File "<stdin>", line 1, in <module>
216 File "/srv/home/kevinbazira/build_vllm_wheel/test/vllm_from_bdist_wheel_venv/lib/python3.11/site-packages/vllm/__init__.py", line 3, in <module>
217 from vllm.engine.arg_utils import AsyncEngineArgs, EngineArgs
218 File "/srv/home/kevinbazira/build_vllm_wheel/test/vllm_from_bdist_wheel_venv/lib/python3.11/site-packages/vllm/engine/arg_utils.py", line 11, in <module>
219 from vllm.config import (CacheConfig, ConfigFormat, DecodingConfig,
220 File "/srv/home/kevinbazira/build_vllm_wheel/test/vllm_from_bdist_wheel_venv/lib/python3.11/site-packages/vllm/config.py", line 12, in <module>
221 from vllm.model_executor.layers.quantization import QUANTIZATION_METHODS
222 File "/srv/home/kevinbazira/build_vllm_wheel/test/vllm_from_bdist_wheel_venv/lib/python3.11/site-packages/vllm/model_executor/__init__.py", line 1, in <module>
223 from vllm.model_executor.parameter import (BasevLLMParameter,
224 File "/srv/home/kevinbazira/build_vllm_wheel/test/vllm_from_bdist_wheel_venv/lib/python3.11/site-packages/vllm/model_executor/parameter.py", line 7, in <module>
225 from vllm.distributed import get_tensor_model_parallel_rank
226 File "/srv/home/kevinbazira/build_vllm_wheel/test/vllm_from_bdist_wheel_venv/lib/python3.11/site-packages/vllm/distributed/__init__.py", line 1, in <module>
227 from .communication_op import *
228 File "/srv/home/kevinbazira/build_vllm_wheel/test/vllm_from_bdist_wheel_venv/lib/python3.11/site-packages/vllm/distributed/communication_op.py", line 6, in <module>
229 from .parallel_state import get_tp_group
230 File "/srv/home/kevinbazira/build_vllm_wheel/test/vllm_from_bdist_wheel_venv/lib/python3.11/site-packages/vllm/distributed/parallel_state.py", line 39, in <module>
231 from vllm.utils import supports_custom_op
232 File "/srv/home/kevinbazira/build_vllm_wheel/test/vllm_from_bdist_wheel_venv/lib/python3.11/site-packages/vllm/utils.py", line 33, in <module>
233 from typing_extensions import ParamSpec, TypeIs, assert_never
234ImportError: cannot import name 'TypeIs' from 'typing_extensions' (/srv/pytorch-rocm/venv/lib/python3.11/site-packages/typing_extensions.py)
235>>>
236>>>
237>>>

This wheel is not yet working as expected because of the typing_extensions dependency (P72019#288792) but once this is fixed all should be good.🤞

In an effort to build packages locally but with docker I've made the following attempt:

✅ use official rocm image based in ubuntu 22.04 ( rocm/dev-ubuntu-22.04:6.1-complete )
✅ install python3.11 
✅ create a virtual environment and install any required dependencies + torch-rocm
❌ build flash attention

I am getting the following error

python -m build --no-isolation --wheel .
* Getting build dependencies for wheel...

ERROR Backend subprocess exited when trying to invoke get_requires_for_build_wheel

and if I pass --skip-dependency-check it will again fail in the build_wheel step.
Of course there is something wrong with the docker image which can be found in the open MR.

isarantopoulos lowered the priority of this task from High to Medium.Jan 28 2025, 2:38 PM