Skip to content

Commit

Permalink
Python docs review (2025-03-04)
Browse files Browse the repository at this point in the history
  • Loading branch information
ncoghlan authored Mar 4, 2025
2 parents 6defd51 + 8191073 commit 06eee57
Show file tree
Hide file tree
Showing 13 changed files with 61 additions and 75 deletions.
4 changes: 2 additions & 2 deletions 1_python/1_getting-started/project-setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,13 @@ description: "Set up your `lmstudio-python` app or script."
index: 2
---

`lmstudio` is a library published on Python that allows you to use `lmstudio-python` in your own projects.
`lmstudio` is a library published on PyPI that allows you to use `lmstudio-python` in your own projects.
It is open source and developed on GitHub.
You can find the source code [here](https://github.com/lmstudio-ai/lmstudio-python).

## Installing `lmstudio-python`

As it is published to Python, `lmstudio-python` may be installed using `pip`
As it is published to PyPI, `lmstudio-python` may be installed using `pip`
or your preferred project dependency manager (`pdm` is shown, but other
Python project management tools offer similar dependency addition commands).

Expand Down
4 changes: 2 additions & 2 deletions 1_python/1_getting-started/repl.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@ index: 2
---

To enable interactive use, `lmstudio-python` offers a convenience API which manages
its resources via `atexit` hooks, allowing the a default synchronous client session
to be used across multiple interactive comments.
its resources via `atexit` hooks, allowing a default synchronous client session
to be used across multiple interactive commands.

This convenience API is shown in the examples throughout the documentation as the
`Python (convenience API)` tab (alongside the `Python (scoped resource API)` examples,
Expand Down
16 changes: 8 additions & 8 deletions 1_python/1_llm-prediction/chat-completion.md
Original file line number Diff line number Diff line change
Expand Up @@ -132,23 +132,23 @@ You can ask the LLM to predict the next response in the chat context using the `

```lms_code_snippet
variants:
Streaming:
"Non-streaming":
language: python
code: |
# The `chat` object is created in the previous step.
prediction_stream = model.respond_stream(chat)
result = model.respond(chat)
for fragment in prediction_stream:
print(fragment.content, end="", flush=True)
print() # Advance to a new line at the end of the response
print(result)
"Non-streaming":
Streaming:
language: python
code: |
# The `chat` object is created in the previous step.
result = model.respond(chat)
prediction_stream = model.respond_stream(chat)
print(result)
for fragment in prediction_stream:
print(fragment.content, end="", flush=True)
print() # Advance to a new line at the end of the response
```

## Customize Inferencing Parameters
Expand Down
31 changes: 16 additions & 15 deletions 1_python/1_llm-prediction/completion.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,23 +39,23 @@ Once you have a loaded model, you can generate completions by passing a string t

```lms_code_snippet
variants:
Streaming:
"Non-streaming":
language: python
code: |
# The `chat` object is created in the previous step.
prediction_stream = model.complete_stream("My name is", config={"maxTokens": 100})
result = model.complete("My name is", config={"maxTokens": 100})
for fragment in prediction_stream:
print(fragment.content, end="", flush=True)
print() # Advance to a new line at the end of the response
print(result)
"Non-streaming":
Streaming:
language: python
code: |
# The `chat` object is created in the previous step.
result = model.complete("My name is", config={"maxTokens": 100})
prediction_stream = model.complete_stream("My name is", config={"maxTokens": 100})
print(result)
for fragment in prediction_stream:
print(fragment.content, end="", flush=True)
print() # Advance to a new line at the end of the response
```

## 3. Print Prediction Stats
Expand All @@ -64,21 +64,22 @@ You can also print prediction metadata, such as the model used for generation, n

```lms_code_snippet
variants:
Streaming:
"Non-streaming":
language: python
code: |
# After iterating through the prediction fragments,
# the overall prediction result may be obtained from the stream
result = prediction_stream.result()
# `result` is the response from the model.
print("Model used:", result.model_info.display_name)
print("Predicted tokens:", result.stats.predicted_tokens_count)
print("Time to first token (seconds):", result.stats.time_to_first_token_sec)
print("Stop reason:", result.stats.stop_reason)
"Non-streaming":
Streaming:
language: python
code: |
# `result` is the response from the model.
# After iterating through the prediction fragments,
# the overall prediction result may be obtained from the stream
result = prediction_stream.result()
print("Model used:", result.model_info.display_name)
print("Predicted tokens:", result.stats.predicted_tokens_count)
print("Time to first token (seconds):", result.stats.time_to_first_token_sec)
Expand Down
5 changes: 4 additions & 1 deletion 1_python/1_llm-prediction/parameters.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,10 @@ Set inference-time parameters such as `temperature`, `maxTokens`, `topP` and mor

<!-- See [`LLMPredictionConfigInput`](./../api-reference/llm-prediction-config-input) for all configurable fields. -->

Another useful inference-time configuration parameter is [`structured`](<(./structured-responses)>), which allows you to rigorously enforce the structure of the output using a JSON or Pydantic schema.
Note that while `structured` can be set to a JSON schema definition as an inference-time configuration parameter,
the preferred approach is to instead set the [dedicated `response_format` parameter](<(./structured-responses)>),
which allows you to more rigorously enforce the structure of the output using a JSON or class based schema
definition.

# Load Parameters

Expand Down
16 changes: 9 additions & 7 deletions 1_python/1_llm-prediction/structured-response.md
Original file line number Diff line number Diff line change
Expand Up @@ -130,32 +130,34 @@ schema = {
book = result.parsed
print(book)
# ^
# ^
# Note that `book` is correctly typed as { title: string, author: string, year: number }
Streaming:
language: python
code: |
prediction_stream = model.respond_stream("Tell me about The Hobbit", response_format=schema)
# Optionally stream the response
# for fragment in prediction:
# print(fragment.content, end="", flush=True)
# print()
# Stream the response
for fragment in prediction:
print(fragment.content, end="", flush=True)
print()
# Note that even for structured responses, the *fragment* contents are still only text
# Get the final structured result
result = prediction_stream.result()
book = result.parsed
print(book)
# ^
# ^
# Note that `book` is correctly typed as { title: string, author: string, year: number }
```

<!--
TODO: Info about structured generation caveats
<!-- ## Overview
## Overview
Once you have [downloaded and loaded](/docs/basics/index) a large language model,
you can use it to respond to input through the API. This article covers getting JSON structured output, but you can also
Expand Down
4 changes: 3 additions & 1 deletion 1_python/1_llm-prediction/working-with-chats.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,9 @@ variants:

For more complex tasks, it is recommended to use the `Chat` helper class.
It provides various commonly used methods to manage the chat.
Here is an example with the `Chat` class.
Here is an example with the `Chat` class, where the initial system prompt
is supplied when initializing the chat instance, and then the initial user
message is added via the corresponding method call.

```lms_code_snippet
variants:
Expand Down
11 changes: 5 additions & 6 deletions 1_python/2_agent/tools.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,11 @@ is typically going to be the most convenient):

This means that your wording will affect the quality of the generation. Make sure to always provide a clear description of the tool so the model knows how to use it.

When a tool call fails, the language model may be able to respond appropriately to the failure.
The SDK does not yet automatically convert raised exceptions to text and report them
to the language model, but it can be beneficial for tool implementations to do so.
In many cases, when notified of an error, a language model is able to adjust its
request to avoid the failure.


## Tools with External Effects (like Computer Use or API Calls)

Expand Down Expand Up @@ -103,11 +107,6 @@ can essentially turn your LLMs into autonomous agents that can perform tasks on
```

The SDK does not yet automatically convert raised exceptions to text and report them
to the language model, but it can be beneficial for tool implementations to do so.
In many cases, when notified of an error, a language model is able to adjust its
request to avoid the failure.

### Example code using the `create_file` tool:

```lms_code_snippet
Expand Down
32 changes: 3 additions & 29 deletions 1_python/4_tokenization/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,9 @@ Models use a tokenizer to internally convert text into "tokens" they can deal wi

## Tokenize

You can tokenize a string with a loaded LLM or embedding model using the SDK. In the below examples, `llm` can be replaced with an embedding model `emb`.
You can tokenize a string with a loaded LLM or embedding model using the SDK.
In the below examples, the LLM reference can be replaced with an
embedding model reference without requiring any other changes.

```lms_code_snippet
variants:
Expand Down Expand Up @@ -74,31 +76,3 @@ You can determine if a given conversation fits into a model's context by doing t
print("Fits in context:", does_chat_fit_in_context(model, chat))
```

<!-- ### Context length comparisons
The below examples check whether a conversation is over a LLM's context length
(replace `llm` with `emb` to check for an embedding model).
```lms_code_snippet
variants:
"Python (convenience API)":
language: python
code: |
import { LMStudioClient, Chat } from "@lmstudio/sdk";
const client = new LMStudioClient()
const llm = client.llm.model()
# To check for a string, simply tokenize
var tokens = llm.tokenize("Hello, world!")
# To check for a Chat, apply the prompt template first
const chat = Chat.createEmpty().withAppended("user", "Hello, world!")
const templatedChat = llm.applyPromptTemplate(chat)
tokens = llm.tokenize(templatedChat)
# If the prompt's length in tokens is less than the context length, you're good!
const contextLength = llm.getContextLength()
const isOkay = (tokens.length < contextLength)
``` -->
3 changes: 2 additions & 1 deletion 1_python/5_manage-models/loading.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,8 @@ AI models are huge. It can take a while to load them into memory. LM Studio's SD

## Get the Current Model with `.model()`

If you already have a model loaded in LM Studio (either via the GUI or `lms load`), you can use it by calling `.model()` without any arguments.
If you already have a model loaded in LM Studio (either via the GUI or `lms load`),
you can use it by calling `.model()` without any arguments.

```lms_code_snippet
variants:
Expand Down
4 changes: 3 additions & 1 deletion 1_python/6_model-info/_get-load-config.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,9 @@ TODO: Python SDK has this interface hidden until we can translate server config
LM Studio allows you to configure certain parameters when loading a model
[through the server UI](/docs/advanced/per-model) or [through the API](/docs/api/sdk/load-model).

You can retrieve the config with which a given model was loaded using the SDK. In the below examples, `llm` can be replaced with an embedding model `emb`.
You can retrieve the config with which a given model was loaded using the SDK.
In the below examples, the LLM reference can be replaced with an
embedding model reference without requiring any other changes.

```lms_protip
Context length is a special case that [has its own method](/docs/api/sdk/get-context-length).
Expand Down
4 changes: 3 additions & 1 deletion 1_python/6_model-info/get-model-info.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,9 @@ You can access general information and metadata about a model itself from a load
instance of that model.

Currently, the SDK exposes the model's default `identifier`
and the `path` used to [load it](/docs/api/sdk/load-model). In the below examples, `llm` can be replaced with an embedding model `emb`.
and the `path` used to [load it](/docs/api/sdk/load-model).
In the below examples, the LLM reference can be replaced with an
embedding model reference without requiring any other changes.

```lms_code_snippet
variants:
Expand Down
2 changes: 1 addition & 1 deletion 1_python/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ description: "Getting started with LM Studio's Python SDK"

## Installing the SDK

`lmstudio-python` is available as a pypi package. You can install it using pip.
`lmstudio-python` is available as a PyPI package. You can install it using pip.

```lms_code_snippet
variants:
Expand Down

0 comments on commit 06eee57

Please sign in to comment.