Add Anyscale Endpoint support and Llama Tokenizer #173

kylehh · 2023-11-14T23:05:45Z

Problem

No support for open source models
No support for open source tokenizers

Solution

Add Anyscale Endpoint LLM client which hosts Llama2 , Mistrial and Zephyr models.
Add Llama Tokenizer based on huggingface's tokenizers library

Type of Change

Bug fix (non-breaking change which fixes an issue)
[x ] New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update
Infrastructure change (CI configs, etc)
Non-code change (docs, etc)
None of the above: (explain here)

Test Plan

Describe specific steps for validating this change.

Get your HF and Anyscale tokens and add them in the config/anyscale.yaml
canopy start --config anyscale.yaml

igiloh-pinecone

@kylehh Thank you very much for your contribution!! This is highly appreciated!

Please see a few comments. The main 3 issues:

We plan to merge Upgrade openai sdk to v.1.2.3 #171 today, which adds a base_url init argument to OpenAILLM class. With that change, the implementation of AnyscaleLLM could be as easy as inheriting OpenAILLM and passing hard coded base_url to super().__init__().
The new classes, AnyscaleLLM and LlamaTokenizer require their own unit classes before this PR can be merged.
Let's figure out together if it's really required for every Canopy user to generate a HF identification token just for the Tokenizer, or can we find a more user friendly solution.

config/anyscale.yaml

src/canopy/llm/anyscale.py

src/canopy/tokenizer/llama.py

src/canopy/llm/anyscale.py

src/canopy/tokenizer/llama.py

src/canopy/utils/openai_exceptions.py

The underlying HuggingFace tokenizer has a proper `.tokenize()` method, as well as a `.convert_tokens_to_string()` method. Together with a few configurations like `add_bos_token=False` the tokenizer now passes all tests

Added a few more asserts, and more elaborate prints on failure

The new LlamaTokenizer class should be almost identical to OpenAI tokenizer, other than the expected tokens being slightly different

igiloh-pinecone

HI @kylehh,

As we discussed over slack - there are some changes needed for the Tokenizer, please see suggested changes in kylehh#1.

In addition, there a few more issues to solve, namely the unit test for AnyscaleLLM and the linter tests

src/canopy/llm/anyscale.py

config/anyscale.yaml

src/canopy/llm/anyscale.py

pyproject.toml

src/canopy/tokenizer/llama.py

tests/system/llm/test_anyscale.py

src/canopy/llm/anyscale.py

A few improvements to `LLamaTokenizer` + its unit test

igiloh-pinecone

LGTM!

Thank you @kylehh for all of your efforts. Merging.

It is required for the system tests

Add Anyscale support and Llama Tokenizer

3e27c34

igiloh-pinecone suggested changes Nov 15, 2023

View reviewed changes

kylehh and others added 6 commits November 16, 2023 08:48

remove openai_exceptions.py for merge

ad037c4

udpate anyscale LLM and tokenizer

7f260e3

simplify anyscale.yaml

beb85d2

[tokenizer] LlamaTokenizer - use proper HF methods

e44c6a5

The underlying HuggingFace tokenizer has a proper `.tokenize()` method, as well as a `.convert_tokens_to_string()` method. Together with a few configurations like `add_bos_token=False` the tokenizer now passes all tests

[tests] Improve Tokenizer tests

baa6be0

Added a few more asserts, and more elaborate prints on failure

[tests] Fix test_llama_tokenizer

88ad4ee

The new LlamaTokenizer class should be almost identical to OpenAI tokenizer, other than the expected tokens being slightly different

igiloh-pinecone reviewed Nov 21, 2023

View reviewed changes

kylehh added 7 commits November 21, 2023 04:50

Merge pull request #1 from igiloh-pinecone/improve_llamatokenizer

a9eef3b

A few improvements to `LLamaTokenizer` + its unit test

update tokenizer

7d58ac4

add README and run test with flake8

228d1ab

add anycale test

8f196db

update pyproject.toml

c347dec

llama.py format upate

93a05af

format change

6018d64

igiloh-pinecone approved these changes Nov 26, 2023

View reviewed changes

igiloh-pinecone added this pull request to the merge queue Nov 26, 2023

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 26, 2023

igiloh-pinecone added this pull request to the merge queue Nov 26, 2023

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 26, 2023

igiloh-pinecone added 2 commits November 26, 2023 14:27

[test] test_anyscale - cleanup

e2f4dff

[CI] Added Anyscale API key to PR workflow

84e1794

It is required for the system tests

igiloh-pinecone enabled auto-merge November 26, 2023 12:30

igiloh-pinecone added this pull request to the merge queue Nov 26, 2023

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 26, 2023

[test] bug fix in skip mechanism

33e31d5

igiloh-pinecone added this pull request to the merge queue Nov 27, 2023

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 27, 2023

[test] Bug fix in mark.skip

045e09d

igiloh-pinecone added this pull request to the merge queue Nov 27, 2023

Merged via the queue into pinecone-io:main with commit 63c2da4 Nov 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Anyscale Endpoint support and Llama Tokenizer #173

Add Anyscale Endpoint support and Llama Tokenizer #173

kylehh commented Nov 14, 2023 •

edited

Loading

igiloh-pinecone left a comment

igiloh-pinecone left a comment

igiloh-pinecone left a comment

Add Anyscale Endpoint support and Llama Tokenizer #173

Add Anyscale Endpoint support and Llama Tokenizer #173

Conversation

kylehh commented Nov 14, 2023 • edited Loading

Problem

Solution

Type of Change

Test Plan

igiloh-pinecone left a comment

Choose a reason for hiding this comment

igiloh-pinecone left a comment

Choose a reason for hiding this comment

igiloh-pinecone left a comment

Choose a reason for hiding this comment

kylehh commented Nov 14, 2023 •

edited

Loading