Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

check max_content_length more consistently #1513

Closed
tuukkamustonen opened this issue Apr 17, 2019 · 8 comments · Fixed by #2620
Closed

check max_content_length more consistently #1513

tuukkamustonen opened this issue Apr 17, 2019 · 8 comments · Fixed by #2620
Assignees
Milestone

Comments

@tuukkamustonen
Copy link

tuukkamustonen commented Apr 17, 2019

pallets/flask#1200 was about MAX_CONTENT_LENGTH not being checked always, except when calling request.form. request.get_json() and others (request.get_data()?) did not trigger the check. This was fixed in https://github.com/pallets/werkzeug/pull/1126/files (Jun 2017).

However, in another ticket #690 David wrote (Jan 2019) that:

Request.max_content_length isn't particularly useful now, it's better handled at another layer.

This makes me wonder if the feature is supported, or useful, or not? If the check works, why shouldn't we use it?

Even in pallets/flask#2690 (Apr 2018):

All production WSGI and HTTP servers have more robust configuration for request limits that you should use.

nginx, uWSGI et al. might have it, but gunicorn doesn't, ref benoitc/gunicorn#1733 (just a fyi for anyone reading).

In pallets/flask#2690 (Apr 2018):

You must actually attempt to access the data to trigger form parsing and see the error. The content length only applies to form data, not arbitrary post bodies.

Questions:

  1. Why not make the check always, asap, when reading the request? I mean, headers come first right, so if the value given in Content-Length header exceeds MAX_CONTENT_LENGTH, why not raise error right away, before even attempting to read the request body?

    If I set MAX_CONTENT_LENGTH to 1 MB, and client sends 1 MB + 1 byte, will werkzeug read until 1 MB and only then raise error when it sees that there would be 1 byte more? Or will it actually just check the header?

  2. Maybe clarify the docs a bit. Now it says:

    The first one can be used to limit the total content length. For example by setting it to 1024 * 1024 * 16 the request won’t accept more than 16MB of transmitted data.

    The docs don't mention how the check is forced (the questions above). Docs don't even mention what happens when the constraint is violated - will it raise error or gap the stream?

@nioncode
Copy link

nioncode commented Apr 3, 2020

I also just stumbled across this when developing my flask application. I had assumed that simply setting MAX_CONTENT_LENGTH will drop the request as soon as the value in the header Content-Length exceeds this value.

Instead, a 413 was never raised, because request.json (which is all I use from the request) does not even trigger the check, which is quite confusing and completely unexpected. Instead, I now check the Content-Length header manually and raising the 413.

@davidism
Copy link
Member

davidism commented Jan 21, 2021

Reviewing #690, I think it's probably time to deprecate this attribute. As I pointed out there, limiting the body size is better handled by the HTTP (or WSGI) server.

We could check the Content-Length header, but when should we do that? At the beginning of the stream property? In the LimitedStream wrapper? What if we're streaming content with wsgi.input_terminated (which all major WSGI servers support now), we're not using a wrapper in that case, but this property implies we should still be raising an error if more than that is read.

Actually, unless I'm misreading the code, Werkzeug's form parser right now only seems to do that basic check. If content_length isn't set because this is a streaming request, it will happily parse out until the stream is terminated. In practice this isn't an issue because either the WSGI server implements wsgi.input_terminated or Werkzeug falls back to an empty stream.

Then there's the problem that the length of a JSON body can be very deceptive. It's possible to craft relatively small bodies that parse to huge memory sizes, and that's not something that we (or any part of the server pipeline, probably) would be able to detect. If that's a concern, you should subclass Request and override get_json to use a streaming parser, but that's outside our scope.

Overall, I think having this property in Werkzeug is more misleading than helpful. While we could move the rudimentary check to when request.stream is accessed, it's only going to apply in certain cases compared to what the server could do.

@pgjones
Copy link
Member

pgjones commented Jan 21, 2021

As I pointed out there, limiting the body size is better handled by the HTTP (or WSGI) server.

I don't think it should be done at the server level, as a user may want to have a higher limit on some routes compared with others. For example a route that purely consumes any data (without building up in memory) may want no limit, or an authorized upload route may want a higher limit than unauthorized. I think this is quite feasible with streamed bodies, especially async consumption.

@davidism
Copy link
Member

davidism commented Jan 21, 2021

So the changes would be something like:

  • if content_length is available, do a quick check in request.stream against max_content_length to raise 413
  • remove length check from form parser
  • modify get_input_stream to take max_length
  • pass min(content_length, max_length) to LimitedStream
  • users could make request.max_content_length a property that returns different values based on request.endpoint (in Flask)

The one problem here is that LimitedStream currently returns empty strings when trying to read past content_length. Would we want to modify it further so it does that for content_length but raises 413 for max_content_length? Also, it would need to be used when wsgi.input_terminated is supported too, in order to still enforce the max length even if the stream itself is unlimited.

@danieltuzes
Copy link

danieltuzes commented Jan 24, 2021

@greyli As you requested, I proved further details in connection with the ticket here.With this setup, it is working properly (OS: Windows 10 Pro 64-bit):

(base) PS C:\Users\daniel\source\collection> conda list
# packages in environment at C:\Users\daniel\miniconda3:
#
# Name                    Version                   Build  Channel
astroid                   2.4.2                    py38_0         
autopep8                  1.5.4                      py_0         
backcall                  0.2.0                      py_0         
ca-certificates           2021.1.19            haa95532_0         
certifi                   2020.12.5        py38haa95532_0         
cffi                      1.14.0           py38h7a1dbc1_0         
chardet                   3.0.4                 py38_1003         
click                     7.1.2              pyhd3eb1b0_0         
colorama                  0.4.4                      py_0         
conda                     4.9.2            py38haa95532_0         
conda-package-handling    1.6.1            py38h62dcd97_0         
console_shortcut          0.1.1                         4         
cryptography              2.9.2            py38h7a1dbc1_0         
decorator                 4.4.2                      py_0         
flask                     1.1.2                      py_0         
gevent                    21.1.1           py38h2bbff1b_1         
greenlet                  0.4.17           py38he774522_0         
idna                      2.9                        py_1         
ipykernel                 5.3.4            py38h5ca1d4c_0         
ipython                   7.19.0           py38hd4e2768_0         
ipython_genutils          0.2.0                    py38_0         
isort                     5.6.4                      py_0         
itsdangerous              1.1.0                      py_0         
jedi                      0.17.2                   py38_0
jinja2                    2.11.2             pyhd3eb1b0_0
jupyter_client            6.1.7                      py_0
jupyter_core              4.6.3                    py38_0
lazy-object-proxy         1.4.3            py38he774522_0
libsodium                 1.0.18               h62dcd97_0
markupsafe                1.1.1            py38he774522_0
mccabe                    0.6.1                    py38_1
menuinst                  1.4.16           py38he774522_0
openssl                   1.1.1i               h2bbff1b_0
parso                     0.7.0                      py_0
pickleshare               0.7.5                 py38_1000
pip                       20.0.2                   py38_3
powershell_shortcut       0.0.1                         3
prompt-toolkit            3.0.8                      py_0
pycodestyle               2.6.0                      py_0
pycosat                   0.6.3            py38he774522_0
pycparser                 2.20                       py_0
pygments                  2.7.2              pyhd3eb1b0_0
pylint                    2.6.0                    py38_0
pyopenssl                 19.1.0                   py38_0
pysocks                   1.7.1                    py38_0
python                    3.8.3                he1778fa_0
python-dateutil           2.8.1                      py_0
pywin32                   227              py38he774522_1
pyzmq                     19.0.2           py38ha925a31_1
requests                  2.23.0                   py38_0
ruamel_yaml               0.15.87          py38he774522_0
setuptools                46.4.0                   py38_0
six                       1.14.0                   py38_0
sqlite                    3.31.1               h2a8f88b_1
toml                      0.10.1                     py_0
tornado                   6.0.4            py38he774522_1
tqdm                      4.46.0                     py_0
traitlets                 5.0.5                      py_0
urllib3                   1.25.8                   py38_0
vc                        14.1                 h0510ff6_4
vs2015_runtime            14.16.27012          hf0eaf9b_1
wcwidth                   0.2.5                      py_0
werkzeug                  1.0.1                      py_0
wheel                     0.34.2                   py38_0
win_inet_pton             1.1.0                    py38_0
wincertstore              0.2                      py38_0
wrapt                     1.11.2           py38he774522_0
yaml                      0.1.7                hc54c509_2
zeromq                    4.3.2                ha925a31_3
zlib                      1.2.11               h62dcd97_4
zope                      1.0                      py38_1
zope.event                4.5.0                    py38_0
zope.interface            5.2.0            py38h2bbff1b_0

@davidism davidism changed the title Clarify how MAX_CONTENT_LENGTH works check max_content_length more consistently Feb 11, 2021
@davidism davidism self-assigned this Mar 10, 2023
@davidism davidism added this to the 2.3.0 milestone Mar 10, 2023
@davidism
Copy link
Member

Following up on my previous comment: If a max content length is set, it should wrap the stream regardless of if wsgi.input_terminated is set, but that wrapper should raise a RequestEntityTooLarge error if the max is reached, rather than returning empty as the normal content length limit does.

@davidism
Copy link
Member

davidism commented Mar 10, 2023

On second thought, it's probably a bad idea to raise RequestEntityTooLarge in the middle of reading the stream, since that's not what code consuming the stream would expect. It might leave things in an intermediate state. It's probably better to have the stream read empty bytes when max_content_length is met, the same way additional reads past Content-Length would, that's how io.read is supposed to work.

@davidism
Copy link
Member

We already raise ClientDisconnected if the stream returns empty bytes before reaching the length limit. Raising RequestEntityTooLarge if we reach a max length matches that, but returning empty matches the behavior of the normal length limit.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 29, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants