Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: token prediction (speculative decoding) #405

Merged
merged 74 commits into from
Jan 7, 2025
Merged

Conversation

giladgd
Copy link
Contributor

@giladgd giladgd commented Jan 1, 2025

Description of change

  • feat: token prediction (speculative decoding)
  • feat: DraftSequenceTokenPredictor
  • feat: InputLookupTokenPredictor
  • feat: controlledEvaluate
  • feat: reranking (LlamaRankingContext)
  • feat: experimentalChunkDocument
  • feat: evaluateWithMetadata
  • feat: token confidence
  • feat: build on arm64 using LLVM, use Visual Studio's CMake when available
  • feat: try compiling with LLVM on Windows x64 when available
  • feat(minor): dynamically load llama.cpp backends
  • feat(minor): more token values support in SpecialToken
  • feat(minor): improve memory usage estimation
  • fix: check for Rosetta usage on macOS x64 when using the inspect gpu command
  • fix: detect running under Rosetta on Apple Silicone and show an error message instead of crashing
  • fix: switch from "nextTick" to "nextCycle" for the default batch dispatcher
  • fix: remove deprecated CLS token
  • fix: pipe error logs in inspect gpu command
  • docs: improve building from source
  • docs: CUDA in Docker troubleshooting
  • docs: context shift strategy
  • docs: improve type docs and types
  • docs: user input safety
  • docs: sitemap fixes
  • docs: remove Intel AMX trick, since it's being automatically used in the prebuilt binaries now
  • docs: parse custom cmake options nested under ifs
  • docs: update custom cmake options

Pull-Request Checklist

  • Code is up-to-date with the master branch
  • npm run format to apply eslint formatting
  • npm run test passes with this change
  • This pull request links relevant issues as Fixes #0000
  • There are new or updated unit tests validating the change
  • Documentation has been updated to reflect this change
  • The new commits and pull request title follow conventions explained in pull request guidelines (PRs that do not follow this convention will not be merged)

* `DraftSequenceTokenPredictor`
* `InputLookupTokenPredictor`
for improved performance and compatibility
@giladgd giladgd requested a review from ido-pluto January 1, 2025 03:22
Copy link
Contributor

@ido-pluto ido-pluto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@giladgd giladgd merged commit 632a7bf into master Jan 7, 2025
18 checks passed
@giladgd giladgd deleted the gilad/dynamicBackends branch January 7, 2025 00:03
Copy link

github-actions bot commented Jan 8, 2025

🎉 This PR is included in version 3.4.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants