Skip to content

Commit

Permalink
feat: 🎸 add support for pdf2image
Browse files Browse the repository at this point in the history
✅ Closes: #688
  • Loading branch information
severo committed Jan 20, 2023
1 parent d7d1dd7 commit 47d3297
Show file tree
Hide file tree
Showing 5 changed files with 46 additions and 3 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/_quality-python.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ jobs:
${{ inputs.working-directory }}/poetry.lock
- name: Install packages for workers that use datasets
if: ${{ inputs.is-datasets-worker }}
run: sudo apt update; sudo apt install -y libicu-dev ffmpeg libavcodec-extra libsndfile1 llvm pkg-config
run: sudo apt update; sudo apt install -y libicu-dev ffmpeg libavcodec-extra libsndfile1 llvm pkg-config poppler-utils
- name: Install dependencies
# "poetry env use" is required: https://github.com/actions/setup-python/issues/374#issuecomment-1088938718
run: |
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/_unit-tests-python.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ jobs:
${{ inputs.working-directory }}/poetry.lock
- name: Install packages for workers that use datasets
if: ${{ inputs.is-datasets-worker }}
run: sudo apt update; sudo apt install -y libicu-dev ffmpeg libavcodec-extra libsndfile1 llvm pkg-config
run: sudo apt update; sudo apt install -y libicu-dev ffmpeg libavcodec-extra libsndfile1 llvm pkg-config poppler-utils
- name: Install dependencies
# "poetry env use" is required: https://github.com/actions/setup-python/issues/374#issuecomment-1088938718
run: |
Expand Down
1 change: 1 addition & 0 deletions workers/datasets_based/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ ENV PYTHONFAULTHANDLER=1 \
RUN apt-get update \
&& apt-get install -y build-essential unzip wget python3-dev make \
libicu-dev ffmpeg libavcodec-extra libsndfile1 llvm pkg-config \
poppler-utils \
&& rm -rf /var/lib/apt/lists/*

RUN pip install -U --no-cache-dir pip
Expand Down
42 changes: 41 additions & 1 deletion workers/datasets_based/poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 2 additions & 0 deletions workers/datasets_based/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,10 @@ lxml = "^4.9.1"
nlp = "^0.4.0"
nltk = "^3.6.5"
openpyxl = "^3.0.9"
pdf2image = "^1.16.2"
py7zr = "^0.20.1"
pydub = "^0.25.1"
pypdf2 = "^3.0.1"
python = "3.9.15"
rarfile = "^4.0"
scikit-learn = "^1.0"
Expand Down

0 comments on commit 47d3297

Please sign in to comment.