merge API server example / module

Currently there exist two server implementations:

- `llama_cpp/server/__main__.py`, the module that's runnable by consumers of the library with `python3 -m llama_cpp.server`
- `examples/high_level_api/fastapi_server.py`, which is probably a copy-pasted example by folks hacking around

IMO this is confusing. As a new user of the library I see they've both been updated relatively recently but looking side-by-side there's a diff.

The one in the module seems better:
- supports logits_all
- supports use_mmap
- has experimental cache support (with some mutex thing going on)
- some stuff with streaming support was moved around more recently than fastapi_server.py

So IMO the example server should go away (perhaps just import the module's server and run it after #1 is done)





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

merge API server example / module #2

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

merge API server example / module #2

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions