Skip to content

[Feature] support ep in mixed mode #3001

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jul 30, 2025
Merged

[Feature] support ep in mixed mode #3001

merged 7 commits into from
Jul 30, 2025

Conversation

ltd0924
Copy link
Collaborator

@ltd0924 ltd0924 commented Jul 24, 2025

support expert parallel in mixed mode

example:
'''
python -m fastdeploy.entrypoints.openai.api_server
--model ERNIE-4.5-300B-A47B-BF16
--port 8180 --metrics-port 8181
--engine-worker-queue-port 8182
--cache-queue-port 8183
--quantization wint4
--data-parallel-size 8 --tensor-parallel-size 1
--enable-expert-parallel
--scheduler-name "splitwise"
--scheduler-host "127.0.0.1"
--scheduler-port 6379
--scheduler-ttl 9000
'''

Note:
When deploying, you need to configure and install Redis as a scheduler. where scheduler host is the address of Redis, and scheduler port is the port number of Redis.

You can refer to the documentation for installing REDIS.

Copy link

paddle-bot bot commented Jul 24, 2025

Thanks for your contribution!

@ltd0924 ltd0924 changed the title [test] support ep in mixed mode [Feature] support ep in mixed mode Jul 29, 2025
@ltd0924 ltd0924 merged commit d17886d into PaddlePaddle:develop Jul 30, 2025
12 of 18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants