forked from abetlen/llama-cpp-python
-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Currently there exist two server implementations:
llama_cpp/server/__main__.py
, the module that's runnable by consumers of the library withpython3 -m llama_cpp.server
examples/high_level_api/fastapi_server.py
, which is probably a copy-pasted example by folks hacking around
IMO this is confusing. As a new user of the library I see they've both been updated relatively recently but looking side-by-side there's a diff.
The one in the module seems better:
- supports logits_all
- supports use_mmap
- has experimental cache support (with some mutex thing going on)
- some stuff with streaming support was moved around more recently than fastapi_server.py
So IMO the example server should go away (perhaps just import the module's server and run it after #1 is done)
Metadata
Metadata
Assignees
Labels
No labels