Skip to content

merge API server example / module #2

@Stonelinks

Description

@Stonelinks

Currently there exist two server implementations:

  • llama_cpp/server/__main__.py, the module that's runnable by consumers of the library with python3 -m llama_cpp.server
  • examples/high_level_api/fastapi_server.py, which is probably a copy-pasted example by folks hacking around

IMO this is confusing. As a new user of the library I see they've both been updated relatively recently but looking side-by-side there's a diff.

The one in the module seems better:

  • supports logits_all
  • supports use_mmap
  • has experimental cache support (with some mutex thing going on)
  • some stuff with streaming support was moved around more recently than fastapi_server.py

So IMO the example server should go away (perhaps just import the module's server and run it after #1 is done)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions