Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build: Upgrade transformers to the latest version 4.34.1 #5994

Merged
merged 25 commits into from
Oct 24, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
e056325
Upgrade transformers to the latest version 4.34.0 so that Haystack ca…
grantmwilliams Oct 6, 2023
6d98b28
update release notes
grantmwilliams Oct 6, 2023
685c218
Merge branch 'main' into bump-transformers-to-4-34
grantmwilliams Oct 6, 2023
c3746ba
updated missing lazy import
grantmwilliams Oct 6, 2023
da4249f
Update .github workflows imports
grantmwilliams Oct 6, 2023
df2abfc
bump more versions in .github workflows
grantmwilliams Oct 6, 2023
eb28cc1
rever import sorting
grantmwilliams Oct 6, 2023
82f16ca
Update to catch runtime errors to match haystack_hub changes
grantmwilliams Oct 6, 2023
1d54ca2
Merge branch 'main' into bump-transformers-to-4-34
grantmwilliams Oct 6, 2023
e88ccfd
add language parameter value to whisper test
julian-risch Oct 8, 2023
e592478
Merge branch 'main' into bump-transformers-to-4-34
julian-risch Oct 11, 2023
e8f32dc
bump transformers version in linting preview workflow
julian-risch Oct 11, 2023
2f28516
bump transformers version in linting preview workflow
julian-risch Oct 11, 2023
a6e5485
Merge branch 'main' into bump-transformers-to-4-34
julian-risch Oct 16, 2023
5229fdd
bump version to v4.34.1
julian-risch Oct 19, 2023
3f4b3ce
Merge branch 'main' into bump-transformers-to-4-34
julian-risch Oct 19, 2023
db68156
resolve mypy issue with reused variables
julian-risch Oct 19, 2023
515a315
install openai-whisper without dependencies
julian-risch Oct 24, 2023
e778820
Merge branch 'main' into bump-transformers-to-4-34
julian-risch Oct 24, 2023
55619fe
remove audio extra, update whisper install instructions
julian-risch Oct 24, 2023
fb1f662
remove audio extra, update whisper install instructions
julian-risch Oct 24, 2023
97f8352
keep audio extra but add version
julian-risch Oct 24, 2023
3284178
keep audio extra with no constraints
julian-risch Oct 24, 2023
a15137a
remove audio extra
julian-risch Oct 24, 2023
2bab959
Merge branch 'main' into bump-transformers-to-4-34
julian-risch Oct 24, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .github/workflows/e2e_preview.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,9 @@ jobs:
sudo apt install ffmpeg # for local Whisper tests

- name: Install Haystack
run: pip install .[dev,preview] langdetect transformers[torch,sentencepiece]==4.32.1 'sentence-transformers>=2.2.0' pypdf openai-whisper tika 'azure-ai-formrecognizer>=3.2.0b2'
run: |
pip install .[dev,preview] langdetect transformers[torch,sentencepiece]==4.34.1 'sentence-transformers>=2.2.0' pypdf tika 'azure-ai-formrecognizer>=3.2.0b2'
pip install --no-deps llvmlite numba 'openai-whisper>=20230918' # prevent outdated version of tiktoken pinned by openai-whisper

- name: Run tests
run: pytest e2e/preview
5 changes: 4 additions & 1 deletion .github/workflows/linting.yml
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,9 @@ jobs:
python-version: ${{ env.PYTHON_VERSION }}

- name: Install Haystack
run: pip install ".[all,dev]"
run: |
pip install ".[all,dev]"
pip install --no-deps llvmlite numba "openai-whisper>=20230918"

- name: Mypy
if: steps.files.outputs.any_changed == 'true'
Expand Down Expand Up @@ -74,6 +76,7 @@ jobs:
- name: Install Haystack
run: |
pip install ".[all,dev]"
pip install --no-deps llvmlite numba "openai-whisper>=20230918"
pip install ./haystack-linter

- name: Pylint
Expand Down
7 changes: 5 additions & 2 deletions .github/workflows/linting_preview.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,9 @@ jobs:
python-version: ${{ env.PYTHON_VERSION }}

- name: Install Haystack
run: pip install .[dev,preview] langdetect transformers[torch,sentencepiece]==4.32.1 'sentence-transformers>=2.2.0' pypdf openai-whisper tika 'azure-ai-formrecognizer>=3.2.0b2'
run: |
pip install .[dev,preview] langdetect transformers[torch,sentencepiece]==4.34.1 'sentence-transformers>=2.2.0' pypdf tika 'azure-ai-formrecognizer>=3.2.0b2'
pip install --no-deps llvmlite numba 'openai-whisper>=20230918' # prevent outdated version of tiktoken pinned by openai-whisper

- name: Mypy
if: steps.files.outputs.any_changed == 'true'
Expand Down Expand Up @@ -72,7 +74,8 @@ jobs:

- name: Install Haystack
run: |
pip install .[dev,preview] langdetect transformers[torch,sentencepiece]==4.32.1 'sentence-transformers>=2.2.0' pypdf openai-whisper tika 'azure-ai-formrecognizer>=3.2.0b2'
pip install .[dev,preview] langdetect transformers[torch,sentencepiece]==4.34.1 'sentence-transformers>=2.2.0' pypdf tika 'azure-ai-formrecognizer>=3.2.0b2'
pip install --no-deps llvmlite numba 'openai-whisper>=20230918' # prevent outdated version of tiktoken pinned by openai-whisper
pip install ./haystack-linter

- name: Pylint
Expand Down
8 changes: 6 additions & 2 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -202,7 +202,9 @@ jobs:
python-version: ${{ env.PYTHON_VERSION }}

- name: Install Haystack
run: pip install .[preview,dev] langdetect transformers[torch,sentencepiece]==4.32.1 sentence-transformers>=2.2.0 pypdf openai-whisper tika 'azure-ai-formrecognizer>=3.2.0b2'
run: |
pip install .[dev,preview] langdetect transformers[torch,sentencepiece]==4.34.1 'sentence-transformers>=2.2.0' pypdf tika 'azure-ai-formrecognizer>=3.2.0b2'
pip install --no-deps llvmlite numba 'openai-whisper>=20230918' # prevent outdated version of tiktoken pinned by openai-whisper

- name: Run
run: pytest --cov-report xml:coverage.xml --cov="haystack" -m "unit" test/preview
Expand Down Expand Up @@ -946,7 +948,9 @@ jobs:
sudo apt install ffmpeg # for local Whisper tests

- name: Install Haystack
run: pip install .[dev,preview] langdetect transformers[torch,sentencepiece]==4.32.1 'sentence-transformers>=2.2.0' pypdf openai-whisper tika 'azure-ai-formrecognizer>=3.2.0b2'
run: |
pip install .[dev,preview] langdetect transformers[torch,sentencepiece]==4.34.1 'sentence-transformers>=2.2.0' pypdf tika 'azure-ai-formrecognizer>=3.2.0b2'
pip install --no-deps llvmlite numba 'openai-whisper>=20230918' # prevent outdated version of tiktoken pinned by openai-whisper

- name: Run tests
run: |
Expand Down
16 changes: 12 additions & 4 deletions .github/workflows/tests_preview.yml
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,9 @@ jobs:
python-version: ${{ env.PYTHON_VERSION }}

- name: Install Haystack
run: pip install .[dev,preview] langdetect transformers[torch,sentencepiece]==4.32.1 'sentence-transformers>=2.2.0' pypdf openai-whisper tika 'azure-ai-formrecognizer>=3.2.0b2'
run: |
pip install .[dev,preview] langdetect transformers[torch,sentencepiece]==4.34.1 'sentence-transformers>=2.2.0' pypdf tika 'azure-ai-formrecognizer>=3.2.0b2'
pip install --no-deps llvmlite numba 'openai-whisper>=20230918' # prevent outdated version of tiktoken pinned by openai-whisper

- name: Run
run: pytest -m "unit" test/preview
Expand Down Expand Up @@ -175,7 +177,9 @@ jobs:
sudo apt install ffmpeg # for local Whisper tests

- name: Install Haystack
run: pip install .[dev,preview] langdetect transformers[torch,sentencepiece]==4.32.1 'sentence-transformers>=2.2.0' pypdf openai-whisper tika 'azure-ai-formrecognizer>=3.2.0b2'
run: |
pip install .[dev,preview] langdetect transformers[torch,sentencepiece]==4.34.1 'sentence-transformers>=2.2.0' pypdf tika 'azure-ai-formrecognizer>=3.2.0b2'
pip install --no-deps llvmlite numba 'openai-whisper>=20230918' # prevent outdated version of tiktoken pinned by openai-whisper

- name: Run
run: pytest --maxfail=5 -m "integration" test/preview
Expand Down Expand Up @@ -230,7 +234,9 @@ jobs:
colima start

- name: Install Haystack
run: pip install .[dev,preview] langdetect transformers[torch,sentencepiece]==4.32.1 'sentence-transformers>=2.2.0' pypdf openai-whisper tika 'azure-ai-formrecognizer>=3.2.0b2'
run: |
pip install .[dev,preview] langdetect transformers[torch,sentencepiece]==4.34.1 'sentence-transformers>=2.2.0' pypdf tika 'azure-ai-formrecognizer>=3.2.0b2'
pip install --no-deps llvmlite numba 'openai-whisper>=20230918' # prevent outdated version of tiktoken pinned by openai-whisper

- name: Run Tika
run: docker run -d -p 9998:9998 apache/tika:2.9.0.0
Expand Down Expand Up @@ -282,7 +288,9 @@ jobs:
python-version: ${{ env.PYTHON_VERSION }}

- name: Install Haystack
run: pip install .[dev,preview] langdetect transformers[torch,sentencepiece]==4.32.1 'sentence-transformers>=2.2.0' pypdf openai-whisper tika 'azure-ai-formrecognizer>=3.2.0b2'
run: |
pip install .[dev,preview] langdetect transformers[torch,sentencepiece]==4.34.1 'sentence-transformers>=2.2.0' pypdf tika 'azure-ai-formrecognizer>=3.2.0b2'
pip install --no-deps llvmlite numba 'openai-whisper>=20230918' # prevent outdated version of tiktoken pinned by openai-whisper

- name: Run
run: pytest --maxfail=5 -m "integration" test/preview -k 'not tika'
Expand Down
2 changes: 2 additions & 0 deletions haystack/nodes/audio/whisper_transcriber.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,8 @@ class WhisperTranscriber(BaseComponent):

To use Whisper locally, install it following the instructions on
the Whisper [GitHub repo](https://github.com/openai/whisper) and omit the `api_key` parameter.
You can work around a dependency conflict caused by openai-whisper pinning an older tiktoken version than required
by Haystack if you install via `pip install --no-deps numba llvmlite 'openai-whisper>=20230918'`.

To use the API implementation, provide an api_key. You can get one by signing up
for an [OpenAI account](https://beta.openai.com/).
Expand Down
5 changes: 4 additions & 1 deletion haystack/preview/components/audio/whisper_local.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,10 @@
from haystack.preview import component, Document, default_to_dict, ComponentError
from haystack.preview.lazy_imports import LazyImport

with LazyImport("Run 'pip install openai-whisper'") as whisper_import:
with LazyImport(
"Run 'pip install transformers[torch]==4.34.1' to install torch and "
"'pip install --no-deps numba llvmlite 'openai-whisper>=20230918'' to install whisper."
) as whisper_import:
import torch
import whisper

Expand Down
2 changes: 1 addition & 1 deletion haystack/preview/components/rankers/similarity.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
logger = logging.getLogger(__name__)


with LazyImport(message="Run 'pip install transformers[torch,sentencepiece]==4.32.1'") as torch_and_transformers_import:
with LazyImport(message="Run 'pip install transformers[torch,sentencepiece]==4.34.1'") as torch_and_transformers_import:
import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer

Expand Down
8 changes: 4 additions & 4 deletions haystack/preview/components/readers/extractive.py
grantmwilliams marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
from haystack.preview.lazy_imports import LazyImport

with LazyImport(
"Run 'pip install transformers[torch,sentencepiece]==4.32.1 sentence-transformers>=2.2.0'"
"Run 'pip install transformers[torch,sentencepiece]==4.34.1 sentence-transformers>=2.2.0'"
) as torch_and_transformers_import:
from transformers import AutoModelForQuestionAnswering, AutoTokenizer
from tokenizers import Encoding
Expand Down Expand Up @@ -192,17 +192,17 @@ def _postprocess(
start_candidates = start_candidates.cpu()
end_candidates = end_candidates.cpu()

start_candidates = [
start_candidates_char_indices = [
[encoding.token_to_chars(start)[0] for start in candidates]
for candidates, encoding in zip(start_candidates, encodings)
]
end_candidates = [
end_candidates_char_indices = [
[encoding.token_to_chars(end)[1] for end in candidates]
for candidates, encoding in zip(end_candidates, encodings)
]
probabilities = candidates.values.cpu()

return start_candidates, end_candidates, probabilities
return start_candidates_char_indices, end_candidates_char_indices, probabilities

def _nest_answers(
self,
Expand Down
8 changes: 2 additions & 6 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ dependencies = [
"requests",
"httpx",
"pydantic<2",
"transformers==4.32.1",
"transformers==4.34.1",
"pandas",
"rank_bm25",
"scikit-learn>=1.3.0", # TF-IDF and metrics
Expand All @@ -62,7 +62,6 @@ dependencies = [
"networkx", # graphs library
"quantulum3", # quantities extraction from text
"posthog", # telemetry
# audio's espnet-model-zoo requires huggingface-hub version <0.8 while we need >=0.5 to be able to use create_repo in FARMReader
"tenacity", # retry decorator
"sseclient-py", # server side events for OpenAI streaming
"more_itertools", # utilities
Expand Down Expand Up @@ -102,7 +101,7 @@ preview = [
"more-itertools", # TextDocumentSplitter
]
inference = [
"transformers[torch,sentencepiece]==4.32.1",
"transformers[torch,sentencepiece]==4.34.1",
"sentence-transformers>=2.2.0", # See haystack/nodes/retriever/_embedding_encoder.py, _SentenceTransformersEmbeddingEncoder
"huggingface-hub>=0.5.0",
]
Expand Down Expand Up @@ -152,9 +151,6 @@ docstores = [
docstores-gpu = [
"farm-haystack[elasticsearch,faiss-gpu,weaviate,pinecone,opensearch]",
]
audio = [
"openai-whisper"
]
aws = [
"boto3",
# Costraint botocore to avoid taking to much time to resolve the dependency tree.
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
enhancements:
- |
Upgrade Transformers to the latest version 4.34.1.
This version adds support for the new Mistral, Persimmon, BROS, ViTMatte, and Nougat models.
2 changes: 1 addition & 1 deletion test/modeling/test_model_loading.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ def test_basic_loading(pretrained_model_name_or_path, lm_class, monkeypatch):

@pytest.mark.unit
def test_basic_loading_unknown_model():
with pytest.raises(OSError):
with pytest.raises(RuntimeError):
get_language_model("model_that_doesnt_exist")


Expand Down
2 changes: 1 addition & 1 deletion test/preview/components/audio/test_whisper_local.py
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,7 @@ def test_transcribe_stream(self):
@pytest.mark.integration
@pytest.mark.skipif(sys.platform in ["win32", "cygwin"], reason="ffmpeg not installed on Windows CI")
def test_whisper_local_transcriber(self, preview_samples_path):
comp = LocalWhisperTranscriber(model_name_or_path="medium")
comp = LocalWhisperTranscriber(model_name_or_path="medium", whisper_params={"language": "english"})
comp.warm_up()
output = comp.run(
audio_files=[
Expand Down