Skip to content

Commit 4c23a0d

Browse files
Optional backend kwargs (#307)
1 parent a2700a8 commit 4c23a0d

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

47 files changed

+350
-410
lines changed

.github/workflows/test_api_rocm.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@ jobs:
3333
with:
3434
machine_type: single-gpu
3535
install_extras: testing,timm,diffusers,codecarbon
36+
test_file: test_api.py
3637
pytest_keywords: api and cuda
3738
secrets:
3839
HF_TOKEN: ${{ secrets.HF_TOKEN }}

.github/workflows/test_cli_cuda_tensorrt_llm.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ jobs:
4444

4545
- name: Install dependencies
4646
run: |
47-
pip install -e .[testing]
47+
pip install -e .[testing,tesnsorrt-llm]
4848
4949
- name: Run tests
5050
run: |
@@ -57,7 +57,7 @@ jobs:
5757
}}
5858
name: Run examples
5959
run: |
60-
huggingface-cli delete-cache
60+
rm -rf /root/.cache/huggingface
6161
pytest tests/test_examples.py -x -s -k "cli and cuda and trt"
6262
6363
cli_cuda_tensorrt_llm_multi_gpu_tests:
@@ -84,7 +84,7 @@ jobs:
8484

8585
- name: Install dependencies
8686
run: |
87-
pip install -e .[testing]
87+
pip install -e .[testing,tesnsorrt-llm]
8888
8989
- name: Run tests (sequential)
9090
run: |

.github/workflows/test_cli_rocm_pytorch.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@ jobs:
3535
with:
3636
machine_type: single-gpu
3737
install_extras: testing,diffusers,timm,peft,autoawq,auto-gptq
38+
test_file: test_cli.py
3839
pytest_keywords: cli and cuda and pytorch and not (dp or ddp or device_map or deepspeed) and not bnb
3940

4041
run_cli_rocm_pytorch_multi_gpu_tests:
@@ -52,4 +53,5 @@ jobs:
5253
with:
5354
machine_type: multi-gpu
5455
install_extras: testing,diffusers,timm,peft
56+
test_file: test_cli.py
5557
pytest_keywords: cli and cuda and pytorch and (dp or ddp or device_map)

examples/cpu_ipex_bert.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,8 +17,8 @@ launcher:
1717
backend:
1818
device: cpu
1919
export: true
20-
no_weights: false # because on multi-node machines, intializing weights could harm performance
21-
torch_dtype: float32 # but use bfloat16 on compatible Intel CPUs
20+
no_weights: false # on multi-node machines, intializing weights in the benchmark could harm performance
21+
torch_dtype: float32 # use bfloat16 on compatible Intel CPUs
2222
model: google-bert/bert-base-uncased
2323

2424
scenario:

examples/cpu_ipex_llama.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,8 +17,8 @@ launcher:
1717
backend:
1818
device: cpu
1919
export: true
20-
no_weights: false # because on multi-node machines, intializing weights could harm performance
21-
torch_dtype: float32 # but use bfloat16 on compatible Intel CPUs
20+
no_weights: false # on multi-node machines, intializing weights in the benchmark could harm performance
21+
torch_dtype: float32 # use bfloat16 on compatible Intel CPUs
2222
model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
2323

2424
scenario:

examples/cpu_onnxruntime_timm.yaml

Lines changed: 0 additions & 20 deletions
This file was deleted.

examples/cpu_openvino_8bit_bert.yaml

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,11 @@ backend:
1212
device: cpu
1313
reshape: true
1414
no_weights: true
15-
load_in_8bit: false # enable 8bit on compatible Intel CPU machines
15+
load_in_8bit: true
1616
model: google-bert/bert-base-uncased
17+
reshape_kwargs:
18+
batch_size: 1
19+
sequence_length: 128
1720

1821
scenario:
1922
memory: true

examples/cpu_openvino_diffusion.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,9 +11,14 @@ name: openvino_diffusion
1111
backend:
1212
device: cpu
1313
export: true
14+
task: text-to-image
1415
model: stabilityai/stable-diffusion-2-1
1516
half: false # enable half-precision on compatible Intel CPU machines
1617

1718
scenario:
1819
input_shapes:
1920
batch_size: 1
21+
sequence_length: 16
22+
23+
call_kwargs:
24+
num_inference_steps: 4

examples/cuda_tgi_llama.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ backend:
1616
device: cuda
1717
device_ids: 0
1818
cuda_graphs: 0 # remove for better perf but bigger memory footprint
19+
no_weights: false # investigate later
1920
model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
2021

2122
scenario:

examples/cuda_trt_llama.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ launcher:
1515
backend:
1616
device: cuda
1717
device_ids: 0
18+
no_weights: true
1819
max_batch_size: 4
1920
max_new_tokens: 32
2021
max_prompt_length: 64

0 commit comments

Comments
 (0)