Skip to content

enwask/ghostwheel-examples

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

ghostwheel examples

Ghostwheel is an internal inference server for limited use (if you were linked to this repository, it's probably for you—otherwise, it's probably not). This repository's main purpose is to host a Jupyter notebook with various examples for using ghostwheel, which can hopefully serve as a quickstart for your project. Note that you'll need to run the notebook on the internal network (Imperial-WPA) for ghostwheel to be discoverable.

About ghostwheel

Ghostwheel is implemented as an API layer on top of Ollama, exposing multiple local LLMs running on dedicated GPUs. The ghostwheel API is intended as a replacement for Ollama within projects or libraries that can use it as a backend (e.g. Langchain), though some implementations may require modification to send our authorization header with outgoing requests. See the API docs linked below for how to authenticate with this header.

You can use ghostwheel in a few ways, and examples of some of these methods are demonstrated in the demo notebook:

Note that ghostwheel does not work as a drop-in backend host for the Ollama CLI (i.e. with ollama run and the OLLAMA_HOST environment variable). If you want to use ghostwheel from the command line, you can interact with the API directly via cURL or similar.

Usage

The ghostwheel API

Ghostwheel can be accessed from the internal network (Imperial-WPA) at https://ese-timewarp.ese.ic.ac.uk. This root path serves the API docs (also hosted at /docs) which are available to anyone over the internal network; the endpoints under /api require authentication with your API key.

As an Ollama replacement

For the most part, ghostwheel can work as a replacement for any* Ollama backend. Again, some implementations may require modification to include our authorization header in outgoing requests. Also, any code that expects to administer LLMs on the Ollama server (via api/pull, api/delete etc.) will likely throw an error when attempting to use these endpoints as they are not exposed through ghostwheel. An example of this is my fork of WebUI linked above; the admin panel displays a 405 response when attempting to delete or modify LLMs on the server.

More speficially, ghostwheel only implements Ollama's primary completion endpoints (api/generate and api/chat) as well as api/tags. This is because the remaining endpoints are only used for administration, management of loaded models, etc., which the end user (that's you!) won't need access to. Additionally, there is a minor difference in the API spec: the keep_alive parameter is omitted from both endpoints, for the same reasons listed above. You don't need it.

Available LLMs

The ghostwheel API docs contain an up-to-date list of valid identifiers for LLMs you can call through ghostwheel. (Again, note this domain is only accessible from the internal network). The docs are regenerated with any change to the backend application, so this list is kept current when any new models are deployed. In addition to the Ollama calls mentioned above, we also provide a api/list_models endpoint, should you want to programatically determine which LLMs are available to you.

For more information describing parameters for the completion endpoints and response specifications, check out the Ollama API docs on GitHub.

About

Example usage of the ghostwheel inference API

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published