Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] can NOT ingest pptx files with embedded image to dataprep microservice #1325

Open
2 of 7 tasks
lianhao opened this issue Feb 25, 2025 · 2 comments · May be fixed by #1334
Open
2 of 7 tasks

[Bug] can NOT ingest pptx files with embedded image to dataprep microservice #1325

lianhao opened this issue Feb 25, 2025 · 2 comments · May be fixed by #1334
Assignees
Labels
bug Something isn't working

Comments

@lianhao
Copy link
Collaborator

lianhao commented Feb 25, 2025

Priority

P2-High

OS type

Ubuntu

Hardware type

Xeon-GNR

Installation method

  • Pull docker images from hub.docker.com
  • Build docker images from source
  • Other

Deploy method

  • Docker
  • Docker Compose
  • Kubernetes Helm Charts
  • Other

Running nodes

Single Node

What's the version?

commit id: 589587a

Description

When try to ingest a pptx file with embedded images, it will fails as the followings:

Reproduce steps

launch the dataprep, ingest a pptx file with embedded images will fail

Raw log

[2025-02-24 08:50:22,475] [   ERROR] - opea_dataprep_microservice - Error during dataprep ingest invocation: [Errno 13] Permission denied: './image.jpg'
INFO:     127.0.0.1:34676 - "POST /v1/dataprep/ingest HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/uvicorn/protocols/http/h11_impl.py", line 403, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in __call__
    return await self.app(scope, receive, send)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
... ...
  File "/home/user/comps/dataprep/src/opea_dataprep_microservice.py", line 68, in ingest_files
    response = await loader.ingest_files(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/comps/dataprep/src/opea_dataprep_loader.py", line 23, in ingest_files
    return await self.component.ingest_files(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/comps/dataprep/src/integrations/redis.py", line 396, in ingest_files
    ingest_data_to_redis(
  File "/home/user/comps/dataprep/src/integrations/redis.py", line 271, in ingest_data_to_redis
    content = document_loader(path)
              ^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/comps/dataprep/src/utils.py", line 362, in document_loader
    return load_pptx(doc_path)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/user/comps/dataprep/src/utils.py", line 260, in load_pptx
    with open(img_path, "wb") as f:
         ^^^^^^^^^^^^^^^^^^^^
PermissionError: [Errno 13] Permission denied: './image.jpg'

Attachments

No response

@xiguiw
Copy link
Collaborator

xiguiw commented Feb 25, 2025

Yes, the upload files are saved to disk, then load it by filer parser

await save_content_to_local_disk(save_path, file)

Keep files in memory for this step.
But list files and other need to be changed and evaluted.

@lianhao
Copy link
Collaborator Author

lianhao commented Feb 25, 2025

I'll propose a PR later to fix this following the doc/docx file convention, after I fixed #1324

@lianhao lianhao changed the title [Bug] [security hardening] can NOT ingest pptx files to dataprep microservice in security hardened env [Bug] can NOT ingest pptx files to dataprep microservice Feb 25, 2025
@lianhao lianhao changed the title [Bug] can NOT ingest pptx files to dataprep microservice [Bug] can NOT ingest pptx files with embedded image to dataprep microservice Feb 25, 2025
@lianhao lianhao linked a pull request Feb 26, 2025 that will close this issue
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants