Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validators cause unwanted fetch of file URL #526

Open
1 of 8 tasks
fmigneault opened this issue Mar 20, 2020 · 0 comments
Open
1 of 8 tasks

Validators cause unwanted fetch of file URL #526

fmigneault opened this issue Mar 20, 2020 · 0 comments
Labels

Comments

@fmigneault
Copy link
Collaborator

fmigneault commented Mar 20, 2020

Description

When the WPS execution occurs and that an input reference URL (for example remote JSON for below references), the format validator will actually pull the file because file property refers to UrlHandler.file, which in turn does the request and write the chucks locally.

def validatejson(data_input, mode):

name = data_input.file

In UrlHandler.file :

pywps/pywps/inout/basic.py

Lines 461 to 467 in d05483d

with open(self._file, 'wb') as f:
data_size = 0
for chunk in reference_file.iter_content(chunk_size=1024):
data_size += len(chunk)
if int(data_size) > int(max_byte_size):
raise FileSizeExceeded(error_message)
f.write(chunk)

This might be ok for usual data processing execution because most processes want the file to be generated locally at some point, but it shouldn't be done during validation when MODE < STRICT as it is not required for checking the extension from the name (which is what MODE.SIMPLE attempts to do).
The behavior is valid when MODE >= STRICT because the full contents are validated, but it is not required for simple mime-type checks.

My general use case is that I need to reference https://<somewhere> files and pass them down to further remote processes. Therefore, I want to validate that the file type is correct, but not fetch them right away (the child process will do so).
As the file could be quite big, fetching it 2 times (parent/child process) for basic validation is not great. Since the file is not required locally until I explicitly call file property, the validator shouldn't do so if it doesn't need it.

Environment

  • operating system:
  • Python version:
  • PyWPS version: 4.2.4
  • source/distribution
  • git clone
  • Debian
  • PyPI
  • zip/tar.gz
  • other (please specify):
  • web server
  • Apache/mod_wsgi
  • CGI
  • other (please specify):

Steps to Reproduce

  • Setup a WPS with an ComplexInput and some format with MODE.SIMPLE.
  • Execute the WPS process and see that the file is fetched before even arriving in the handler.

Additional Information

Part of requirements for developing OGC EMS which dispatches execution to remote ADES.
https://github.com/crim-ca/weaver

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants