Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: token prediction (speculative decoding) #405

Merged
merged 74 commits into from
Jan 7, 2025
Merged
Show file tree
Hide file tree
Changes from 72 commits
Commits
Show all changes
74 commits
Select commit Hold shift + click to select a range
6c4243f
feat(minor): dynamically load `llama.cpp` backends
giladgd Dec 11, 2024
32b7f9e
docs: remove Intel AMX trick, since it's being automatically used in …
giladgd Dec 11, 2024
6504b23
docs: update custom cmake options
giladgd Dec 14, 2024
561f9eb
docs: parse custom cmake options nested under ifs
giladgd Dec 14, 2024
1489749
docs: sitemap fixes
giladgd Dec 14, 2024
dafe3b9
docs: user input safety
giladgd Dec 14, 2024
78410ef
feat(minor): more token values support in `SpecialToken`
giladgd Dec 14, 2024
1477f34
docs: improve type docs and types
giladgd Dec 14, 2024
acac07b
feat(minor): improve memory usage estimation
giladgd Dec 14, 2024
404c711
docs: context shift strategy
giladgd Dec 15, 2024
554554b
docs: CUDA in Docker troubleshooting
giladgd Dec 15, 2024
5cc01f3
docs: typo
giladgd Dec 15, 2024
dff0f5e
feat: token prediction, `controlledEvaluate`
giladgd Dec 25, 2024
2c1af7a
fix: typos
giladgd Dec 25, 2024
46c2251
fix: improve types
giladgd Dec 25, 2024
31f16bd
fix: bug
giladgd Dec 25, 2024
ad95c77
fix: eslint config
giladgd Dec 25, 2024
74daf27
fix: detect running under Rosetta on Apple Silicone and show an error…
giladgd Dec 25, 2024
e703a47
feat: build on arm64 using LLVM, use Visual Studio's CMake when avail…
giladgd Dec 29, 2024
602cb3c
fix: embedding context deadlock
giladgd Dec 29, 2024
275c005
feat: try compiling with LLVM on Windows x64 when available
giladgd Dec 29, 2024
729de00
fix: switch from `"nextTick"` to `"nextCycle"` for the default batch …
giladgd Dec 29, 2024
93346f9
feat(minor): improve memory usage estimation
giladgd Dec 30, 2024
a5365d2
feat(minor): improve memory usage estimation
giladgd Dec 31, 2024
6c4e11d
docs: improve building from source
giladgd Dec 31, 2024
07bbc4e
fix: check for Rosetta usage on macOS x64 when using the `inspect gpu…
giladgd Dec 31, 2024
6a13bbf
feat: `experimentalChunkDocument`
giladgd Jan 1, 2025
d4adeb4
Merge remote-tracking branch 'refs/remotes/origin/master' into gilad/…
giladgd Jan 1, 2025
8c2a54b
fix: switch back to the latest `llama.cpp` release in the CI
giladgd Jan 1, 2025
084dfd4
fix: missing includes
giladgd Jan 1, 2025
a5b8ad4
fix: Windows cmake build
giladgd Jan 1, 2025
8ba5938
fix: Windows cmake build
giladgd Jan 1, 2025
9d859eb
fix: Windows cmake build
giladgd Jan 1, 2025
89d6442
fix: Windows cmake build
giladgd Jan 1, 2025
3e16195
fix: Windows cmake build
giladgd Jan 1, 2025
e050cdf
fix: Windows cmake build
giladgd Jan 1, 2025
c24c6ff
fix: Windows build
giladgd Jan 1, 2025
1714685
fix: Windows build
giladgd Jan 1, 2025
6357bdc
fix: Windows build
giladgd Jan 1, 2025
5337ede
fix: Windows build
giladgd Jan 1, 2025
fa71485
fix: Windows build
giladgd Jan 1, 2025
904b4e2
fix: perform a separate MSVC build on Windows
giladgd Jan 1, 2025
6a0d2cb
fix: Windows build
giladgd Jan 1, 2025
19eb89b
fix: Windows build
giladgd Jan 1, 2025
bd0e954
fix: Windows build
giladgd Jan 1, 2025
beaefbf
fix: Windows build
giladgd Jan 1, 2025
c1afcfd
test: fix tests
giladgd Jan 1, 2025
87171db
fix: Windows build
giladgd Jan 1, 2025
08fab6b
fix: Windows build
giladgd Jan 1, 2025
7cc01a6
fix: Windows build
giladgd Jan 1, 2025
76519b6
fix: Windows build
giladgd Jan 1, 2025
7d76b61
fix: Windows build
giladgd Jan 1, 2025
cff255c
fix: Windows build
giladgd Jan 1, 2025
80d9a2f
fix: Windows build
giladgd Jan 2, 2025
144c9c4
test: fix tests
giladgd Jan 2, 2025
b3ec8c2
fix: Windows build
giladgd Jan 2, 2025
a08aaf1
fix: prevent loading Vulkan if the device is unsupported
giladgd Jan 2, 2025
96987c0
fix: prevent loading Vulkan if the device is unsupported
giladgd Jan 2, 2025
26fcb2a
fix: bug
giladgd Jan 2, 2025
1c83476
fix: bug
giladgd Jan 2, 2025
9b01397
fix: remove deprecated CLS token
giladgd Jan 4, 2025
f050fa4
fix: bug
giladgd Jan 4, 2025
62ee6e3
feat: `evaluateWithMetadata`, token confidence
giladgd Jan 4, 2025
72645e7
docs: document `useMmap`
giladgd Jan 4, 2025
854d902
fix: add missing include
giladgd Jan 4, 2025
e65b839
docs: improve examples
giladgd Jan 5, 2025
d53a07e
test: fix tests
giladgd Jan 5, 2025
2527413
fix: pipe error logs in `inspect gpu` command
giladgd Jan 5, 2025
bb33a5d
test: fix tests
giladgd Jan 6, 2025
4e1c676
feat: reranking (`LlamaRankingContext`)
giladgd Jan 6, 2025
38f56f4
docs: explain about reranking
giladgd Jan 6, 2025
bd7f6bf
style: lint
giladgd Jan 6, 2025
55f5b26
test: fix tests
giladgd Jan 6, 2025
12bc47a
fix: adapt to breaking `llama.cpp` changes
giladgd Jan 6, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .config/typedoc.json
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,6 @@
"interfacePropertiesFormat": "list",
"sort": ["source-order"],
"docsRoot": "../docs",
"intentionallyNotExported": ["MergeOptionalUnionTypes", "GbnfJsonSchemaToTSType", "_LlamaText"],
"intentionallyNotExported": ["MergeOptionalUnionTypes", "PickOptions", "GbnfJsonSchemaToTSType", "_LlamaText"],
"useHTMLEncodedBrackets": true
}
3 changes: 1 addition & 2 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,7 @@ jobs:
- name: Download latest llama.cpp release
env:
CI: true
# pinned to `b4291` temporarily until the Windows on Arm64 build is fixed
run: node ./dist/cli/cli.js source download --release b4291 --skipBuild --noBundle --noUsageExample --updateBinariesReleaseMetadataAndSaveGitBundle
run: node ./dist/cli/cli.js source download --release latest --skipBuild --noBundle --noUsageExample --updateBinariesReleaseMetadataAndSaveGitBundle
- name: Upload build artifact
uses: actions/upload-artifact@v4
with:
Expand Down
11 changes: 10 additions & 1 deletion .vitepress/config.ts
Original file line number Diff line number Diff line change
Expand Up @@ -132,13 +132,16 @@ export default defineConfig({
item.lastmod = new Date(buildDate);
item.changefreq = "daily";
item.priority = 0.9;
} else if (item.url === "guide/") {
item.changefreq = "daily";
item.priority = 0.7;
} else if (item.url.startsWith("api/") || item.url.startsWith("cli/")) {
item = {
...item,
lastmod: new Date(buildDate),
changefreq: "weekly",
priority: item.url.startsWith("cli/")
? 0.7
? 0.6
: 0.5
};
} else if (item.lastmod == null && item.url.startsWith("blog/")) {
Expand Down Expand Up @@ -358,6 +361,9 @@ export default defineConfig({
}
},
markdown: {
languageAlias: {
"js-highlight": "javascript"
},
codeTransformers: [
transformerTwoslash({
explicitTrigger: false,
Expand Down Expand Up @@ -482,7 +488,10 @@ export default defineConfig({
{text: "External Chat State", link: "/external-chat-state"},
{text: "Token Bias", link: "/token-bias"},
{text: "Objects Lifecycle", link: "/objects-lifecycle"},
{text: "Chat Context Shift", link: "/chat-context-shift"},
{text: "Batching", link: "/batching"},
{text: "Token Prediction", link: "/token-prediction"},
{text: "Low Level API", link: "/low-level-api"},
{text: "Awesome List", link: "/awesome"},
{text: "Troubleshooting", link: "/troubleshooting"},
{text: "Tips and Tricks", link: "/tips-and-tricks"}
Expand Down
3 changes: 2 additions & 1 deletion .vitepress/config/apiReferenceSidebar.ts
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import {DefaultTheme} from "vitepress";
/* eslint import/no-unresolved: "off" */
import typedocSidebar from "../../docs/api/typedoc-sidebar.json"; // if this import fails, run `npm run docs:generateTypedoc`
import typedocSidebar from "../../docs/api/typedoc-sidebar.json";

const categoryOrder = [
"Functions",
Expand Down Expand Up @@ -28,6 +28,7 @@ const classesOrder = [
"LlamaCompletion",
"LlamaEmbeddingContext",
"LlamaEmbedding",
"LlamaRankingContext",
"LlamaGrammar",
"LlamaJsonSchemaGrammar",
"LlamaText",
Expand Down
3 changes: 2 additions & 1 deletion .vitepress/theme/style.css
Original file line number Diff line number Diff line change
Expand Up @@ -354,7 +354,8 @@ div.search-keyboard-shortcuts[class] kbd:last-of-type {
}

.language-ts > .lang,
.language-shell > .lang {
.language-shell > .lang,
.language-js-highlight > .lang {
display: none;
}

Expand Down
6 changes: 3 additions & 3 deletions .vitepress/utils/parseCmakeListsTxtOptions.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
const maxLinesSpan = 10;

const cmakeOptionRegex =
/^\s*option\([\s\t\n\r]*(?<key>\S+)[\s\t\n\r]+"(?<description>(?:\\"|[^"])*)"[\s\t\n\r]+(?<defaultValue>\S+)[\s\t\n\r]*\)/;
export function parseCmakeListsTxtOptions(cmakeListsTxtString: string) {
const lines = cmakeListsTxtString.split("\n");

Expand All @@ -8,9 +10,7 @@ export function parseCmakeListsTxtOptions(cmakeListsTxtString: string) {
const match = lines
.slice(index, index + maxLinesSpan)
.join("\n")
.match(
/^option\([\s\t\n\r]*(?<key>\S+)[\s\t\n\r]+"(?<description>(?:\\"|[^"])*)"[\s\t\n\r]+(?<defaultValue>\S+)[\s\t\n\r]*\)/
);
.match(cmakeOptionRegex);
if (match == null || match.groups == null || match?.index !== 0)
return null;

Expand Down
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,9 @@
* [Use the CLI to chat with a model without writing any code](#try-it-without-installing)
* Up-to-date with the latest `llama.cpp`. Download and compile the latest release with a [single CLI command](https://node-llama-cpp.withcat.ai//guide/building-from-source#downloading-a-release)
* Enforce a model to generate output in a parseable format, [like JSON](https://node-llama-cpp.withcat.ai/guide/chat-session#json-response), or even force it to [follow a specific JSON schema](https://node-llama-cpp.withcat.ai/guide/chat-session#response-json-schema)
* [Provide a model with functions it can call on demand](https://node-llama-cpp.withcat.ai/guide/chat-session#function-calling) to retrieve information of perform actions
* [Provide a model with functions it can call on demand](https://node-llama-cpp.withcat.ai/guide/chat-session#function-calling) to retrieve information or perform actions
* [Embedding support](https://node-llama-cpp.withcat.ai/guide/embedding)
* [Safe against special token injection attacks](https://node-llama-cpp.withcat.ai/guide/llama-text#input-safety-in-node-llama-cpp)
* Great developer experience with full TypeScript support, and [complete documentation](https://node-llama-cpp.withcat.ai/guide/)
* Much more

Expand Down
53 changes: 51 additions & 2 deletions docs/guide/building-from-source.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,13 +25,62 @@ This is useful for building from source on machines that aren't connected to the
:::

::: info

If `cmake` is not installed on your machine, `node-llama-cpp` will automatically download `cmake` to an internal directory and try to use it to build `llama.cpp` from source.

If the build fails, make sure you have the required dependencies of `cmake` installed on your machine. More info is available [here](https://github.com/cmake-js/cmake-js#:~:text=projectRoot/build%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%5Bstring%5D-,Requirements%3A,-CMake) (you don't have to install `cmake` or `cmake-js`, just the dependencies).
:::

::: details Dependencies for macOS
If the build fails on macOS with the error `"/usr/bin/cc" is not able to compile a simple test program`,
try running this command to install the Xcode command line tools:
```shell
xcode-select --install
```
:::

::: details Dependencies for Windows x64
If the build fails on your machine, ensure you have all the necessary build tools installed.

You can install all the dependencies via [WinGet](https://learn.microsoft.com/en-us/windows/package-manager/winget/) using this command:
```shell
winget install --id Microsoft.VisualStudio.2022.BuildTools --force --override "--add Microsoft.VisualStudio.Component.VC.CMake.Project Microsoft.VisualStudio.Component.VC.CoreBuildTools Microsoft.VisualStudio.Component.VC.Tools.x86.x64 Microsoft.VisualStudio.Component.VC.ATL Microsoft.VisualStudio.Component.VC.ATLMFC Microsoft.VisualStudio.Component.VC.Llvm.ClangToolset Microsoft.VisualStudio.Component.VC.Llvm.Clang Microsoft.VisualStudio.Component.VC.Redist.14.Latest Microsoft.Component.VC.Runtime.UCRTSDK Microsoft.VisualStudio.Component.Windows10SDK Microsoft.VisualStudio.Component.Windows10SDK.20348"
```
> WinGet is built-in on Windows 11 and modern Windows 10 versions

If the build fails on macOS with the error `"/usr/bin/cc" is not able to compile a simple test program`, try running `xcode-select --install` to install the Xcode command line tools.
---

You can also install all the dependencies manually using the [Visual C++ Build Tools installer](https://visualstudio.microsoft.com/visual-cpp-build-tools/):
* **`Workloads` tab:** select `Desktop development with C++`
* **`Individual components` tab**: select the following:
* C++ ATL for latest v143 build tools (x86 & x64)
* C++ MFC for latest v143 build tools (x86 & x64)
* C++ CMake tools for Windows
* C++ Clang Compiler for Windows
* MSBuild support for LLVM (clang-cl) toolset
* Windows Universal CRT SDK
:::

::: details Dependencies for Windows on Arm
On Windows on Arm you need to install additional build tools to build `llama.cpp` from source.

You can install all the dependencies via [WinGet](https://learn.microsoft.com/en-us/windows/package-manager/winget/) using this command:
```shell
winget install --id Microsoft.VisualStudio.2022.BuildTools --force --override "--add Microsoft.VisualStudio.Component.VC.CMake.Project Microsoft.VisualStudio.Component.VC.CoreBuildTools Microsoft.VisualStudio.Component.VC.Tools.x86.x64 Microsoft.VisualStudio.Component.VC.Tools.ARM64 Microsoft.VisualStudio.Component.VC.ATL Microsoft.VisualStudio.Component.VC.ATL.ARM64 Microsoft.VisualStudio.Component.VC.ATLMFC Microsoft.VisualStudio.Component.VC.MFC.ARM64 Microsoft.VisualStudio.Component.VC.Llvm.ClangToolset Microsoft.VisualStudio.Component.VC.Llvm.Clang Microsoft.VisualStudio.Component.VC.Redist.14.Latest Microsoft.Component.VC.Runtime.UCRTSDK Microsoft.VisualStudio.Component.Windows10SDK Microsoft.VisualStudio.Component.Windows10SDK.20348"
```
> WinGet is built-in on Windows 11 and modern Windows 10 versions

---

You can also install all the dependencies manually using the [Visual C++ Build Tools installer](https://visualstudio.microsoft.com/visual-cpp-build-tools/):
* **`Workloads` tab:** select `Desktop development with C++`
* **`Individual components` tab**: select the following:
* MSVC v143 - VS 2022 C++ ARM64 build tools (latest)
* C++ ATL for latest v143 build tools (ARM64/ARM64EC)
* C++ MFC for latest v143 build tools (ARM64/ARM64EC)
* C++ CMake tools for Windows
* C++ Clang Compiler for Windows
* MSBuild support for LLVM (clang-cl) toolset
* Windows Universal CRT SDK
:::

## `source download` and `source build` Commands
Expand Down
111 changes: 111 additions & 0 deletions docs/guide/chat-context-shift.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
# Chat Context Shift Strategy {#background}
When the chat history gets longer than the sequence's context size, we have to remove the oldest tokens from the context state to make room for new tokens to be generated.
This is called a context shift.

`node-llama-cpp` has a smart mechanism to handle context shifts on the chat level, so the oldest messages are truncated (from their beginning) or removed from the context state, while keeping the system prompt in place to ensure the model follows the guidelines you set for it.

You can override `node-llama-cpp`'s default context shift strategy
when using [`LlamaChatSession`](../api/classes/LlamaChatSession.md) or [`LlamaChat`](../api/classes/LlamaChat.md)
by providing a custom context shift strategy.

## The Default Context Shift Strategy {#default-strategy}
The [default context shift strategy](../api/type-aliases/LLamaChatContextShiftOptions.md#strategy) is `eraseFirstResponseAndKeepFirstSystem`.

This strategy attempts to truncate the oldest model responses (from their beginning) or remove them completely from the chat history while keeping the first system prompt in place.
If a response is completely removed, the prompt that came before it will be removed as well.

## Implementing a Custom Context Shift Strategy {#custom-strategy}
A [custom context shift strategy](../api/type-aliases/LLamaChatContextShiftOptions.md#strategy) is a function that receives the full chat history as input and
returns a new chat history that when tokenized will result in an array of tokens shorter than the desired max size.

The context shift strategy will be called only when the context state needs to be shifted.

If the context shift strategy returns an invalid chat history (e.g., a chat history that is too long),
the prompting function will abort the evaluation and throw an error.

A custom context shift strategy can be a simple logic that prioritizes which data to remove,
or it can even use a language model to summarize information to shorten the chat history.

It's important to keep the last user prompt and model response as-is to prevent infinite generation loops.

```typescript
import {fileURLToPath} from "url";
import path from "path";
import {getLlama, LlamaChatSession} from "node-llama-cpp";

const __dirname = path.dirname(fileURLToPath(import.meta.url));

const llama = await getLlama();
const model = await llama.loadModel({
modelPath: path.join(__dirname, "models", "Meta-Llama-3.1-8B-Instruct.Q4_K_M.gguf")
});
const context = await model.createContext();

// ---cut---
const session = new LlamaChatSession({
contextSequence: context.getSequence(),
contextShift: {
strategy({
chatHistory, chatWrapper, maxTokensCount, tokenizer,
lastShiftMetadata
}) {
// clone the chat history to not mutate the original
const newChatHistory = chatHistory.map(
(item) => structuredClone(item)
);

function getTokensLeftToRemove() {
const {
contextText
} = chatWrapper.generateContextState({chatHistory});
const tokenUsage = contextText.tokenize(tokenizer).length;

return Math.max(0, tokenUsage - maxTokensCount);
}

while (getTokensLeftToRemove() > 0 && newChatHistory.length > 2) {
for (let i = 0; i < newChatHistory.length - 2; i++) {
const chatItem = newChatHistory[i]!;

if (i === 0 && chatItem.type === "system")
// don't remove the first system message
continue;
else if (chatItem.type === "model") {
// remove the model response
newChatHistory.splice(i, 1);
i--;

// remove the user messages that
// came before the model response
while (
i > 0 &&
newChatHistory[i - 1]?.type === "user"
) {
newChatHistory.splice(i - 1, 1);
i--;
}
} else if (chatItem.type === "system") {
// don't remove system messages on their own
continue;
} else if (chatItem.type === "user") {
// don't remove user messages on their own
continue;
} else {
// ensure we handle all message types.
// otherwise, this will error
void (chatItem satisfies never);
}
}
}

return {
chatHistory: newChatHistory,

// this metadata will be passed to the next context shift
// strategy call as the `lastShiftMetadata` argument
metadata: {}
};
}
}
});
```
14 changes: 14 additions & 0 deletions docs/guide/choosing-a-model.md
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,20 @@ Here are a few concepts to be aware of when choosing a model:

Many embedding models include terms like `embed` in their name.

* **Reranking models** - models that are trained to rerank (sort) a list of documents
based on their relevance to a given query.
These models are usually smaller and faster than general-purpose models,
making them more efficient and practical for reranking tasks.

Reranking models are often significantly smaller (sometimes as small as 500MB), faster,
and consume less memory than general-purpose models, making them more efficient and practical.

While general-purpose models can also be used for reranking,
doing this requires prompting the model, which is more cumbersome and inefficient than
using a specialized model with a [ranking context](./embedding.md#reranking) for this task.

Many reranking models include terms like `rerank` or `reranker` in their name.

### How much data do you plan to feed the model at once with?
If you plan to feed the model with a lot of data at once, you'll need a model that supports a large context size.
The larger the context size is, the more data the model can process at once.
Expand Down
6 changes: 5 additions & 1 deletion docs/guide/cmakeOptions.data.ts
Original file line number Diff line number Diff line change
Expand Up @@ -68,12 +68,16 @@ function parseCmakeOptions(cmakeListsTxt: string, optionFilter: ((key: string) =
for (let i = 0; i < cmakeOptions.length; i++) {
const option = cmakeOptions[i]!;

if (!optionFilter(option.key) || option.key === "GGML_LLAMAFILE" || option.key === "GGML_CURL" || option.key === "GGML_RPC") {
if (!optionFilter(option.key) || option.key === "GGML_LLAMAFILE" || option.key === "GGML_CURL" || option.key === "GGML_RPC" ||
option.key === "GGML_WASM_SINGLE_FILE" || option.key === "BUILD_SHARED_LIBS" || option.key === "GGML_BACKEND_DL"
) {
cmakeOptions.splice(i, 1);
i--;
continue;
} else if (option.key === "GGML_METAL" && option.defaultValue === "${GGML_METAL_DEFAULT}")
option.defaultValue = htmlEscapeWithCodeMarkdown("`ON` on macOS on Apple Silicon, `OFF` otherwise");
else if (option.key === "GGML_BLAS" && option.defaultValue === "${GGML_BLAS_DEFAULT}")
option.defaultValue = htmlEscapeWithCodeMarkdown("`ON` on macOS, `OFF` otherwise");
else if (option.key === "GGML_METAL_EMBED_LIBRARY" && option.defaultValue === "${GGML_METAL}")
option.defaultValue = htmlEscapeWithCodeMarkdown("`ON` on macOS, `OFF` otherwise");
else if (option.defaultValue === "${GGML_STANDALONE}") {
Expand Down
8 changes: 7 additions & 1 deletion docs/guide/docker.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ FROM node:22

# Replace `x86_64` with `sbsa` for ARM64
ENV NVARCH=x86_64
ENV INSTALL_CUDA_VERSION=12.6
ENV INSTALL_CUDA_VERSION=12.5

SHELL ["/bin/bash", "-c"]
RUN apt-get update && \
Expand Down Expand Up @@ -172,3 +172,9 @@ docker run --rm -it --runtime=nvidia --gpus=all my-image:tag
podman run --rm -it --device nvidia.com/gpu=all --security-opt=label=disable --gpus=all my-image:tag
```
:::

### Getting an `system has unsupported display driver / cuda driver combination` Error
Ensure that the `INSTALL_CUDA_VERSION` in the Dockerfile matches
or is older than the CUDA version installed on the host machine.

> You can check what is the installed CUDA version using `nvidia-smi --version`.
Loading
Loading