Skip to content

Commit 35ec21e

Browse files
fix: decide table extraction (#3090)
This PR aims to add backward compatibility for the deprecated `pdf_infer_table_structure` parameter. A missing part of turning table extraction for PDFs and Images off by default in #3035, which was turned on in #2588. --------- Co-authored-by: ryannikolaidis <1208590+ryannikolaidis@users.noreply.github.com> Co-authored-by: christinestraub <christinestraub@users.noreply.github.com>
1 parent 31a53c8 commit 35ec21e

File tree

5 files changed

+391
-6
lines changed

5 files changed

+391
-6
lines changed

CHANGELOG.md

+4-3
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
## 0.14.3-dev2
1+
## 0.14.3-dev3
22

33
### Enhancements
44

@@ -9,9 +9,10 @@
99

1010
### Fixes
1111

12-
**Turn off XML resolve entities** Sets `resolve_entities=False` for XML parsing with `lxml`
12+
* **Add backward compatibility for the deprecated pdf_infer_table_structure parameter**.
13+
* **Add the missing `form_extraction_skip_tables` argument to the `partition_pdf_or_image` call**.
14+
* **Turn off XML resolve entities** Sets `resolve_entities=False` for XML parsing with `lxml`
1315
to avoid text being dynamically injected into the XML document.
14-
* Add the missing `form_extraction_skip_tables` argument to the `partition_pdf_or_image` call.
1516

1617
* **Chromadb change from Add to Upsert using element_id to make idempotent**
1718

test_unstructured/partition/test_auto.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -350,7 +350,7 @@ def test_auto_partition_pdf_uses_table_extraction():
350350
"unstructured.partition.pdf_image.ocr.process_file_with_ocr",
351351
) as mock_process_file_with_model:
352352
partition(filename, pdf_infer_table_structure=True, strategy=PartitionStrategy.HI_RES)
353-
assert mock_process_file_with_model.call_args[1]["infer_table_structure"] is False
353+
assert mock_process_file_with_model.call_args[1]["infer_table_structure"]
354354

355355

356356
def test_auto_partition_pdf_with_fast_strategy(monkeypatch):

0 commit comments

Comments
 (0)