Skip to content

Commit

Permalink
fix(media): let media handlers deserialize empty media (#1758)
Browse files Browse the repository at this point in the history
* fear(media): let media handlers deserialize an empty media.

* chore(pep8): fix style errors

* chore(coverage): cover missing case

* chore(typo): remove unused object

* chore: update implementation based on feedback

this is just a wip version to gather feedback

* feat: update implementation of handler to raise consistent errors

* test: complete request media tests

* docs: updated docs and review comments

* chore: MultipartParseError is now a subclass of MediaMalformedError

* chore: improvement as per review

* chore: allow overriding empty error

* test: don't import msgpack globally

* test: restore coverage to 100%

* docs: improve documentation as per review comments

* refactor: do not cache the default when empty value in get_media

* chore: make coverage happy again

* docs: improve docs after feedback

Co-authored-by: Vytautas Liuolia <vytautas.liuolia@gmail.com>
Co-authored-by: Kurt Griffiths <mail@kgriffs.com>
  • Loading branch information
3 people authored Dec 18, 2020
1 parent 26aa528 commit c7ab8da
Show file tree
Hide file tree
Showing 17 changed files with 501 additions and 136 deletions.
18 changes: 18 additions & 0 deletions docs/_newsfragments/1589.breakingchange.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
General refactoring of internal media handler:

* Deserializing an empty body with a handler that does not support it will
raise :class:`falcon.MediaNotFoundError`, and will be rendered as a
``400 Bad Request`` response. This error may be suppressed by passing
a default value to ``get_media`` to be used in case of empty body.
See also :meth:`Request.get_media` for details.
Previously ``None`` was returned in all cases without calling the handler.
* Exceptions raised by the handlers are wrapped as
:class:`falcon.MediaMalformedError`, and will be rendered as a
``400 Bad Request`` response.
* Subsequent calls to :meth:`Request.get_media` or :attr:`Request.media` will
re-raise the same exception, if the first call ended in an error, unless the
exception was a :class:`falcon.MediaNotFoundError` and a default value is
passed to the ``default_when_empty`` attribute of the current invocation.
Previously ``None`` was returned.

External handlers should update their logic to align to the internal Falcon handlers.
6 changes: 6 additions & 0 deletions docs/api/errors.rst
Original file line number Diff line number Diff line change
Expand Up @@ -211,3 +211,9 @@ Predefined Errors

.. autoclass:: falcon.HTTPNetworkAuthenticationRequired
:members:

.. autoclass:: falcon.MediaNotFoundError
:members:

.. autoclass:: falcon.MediaMalformedError
:members:
24 changes: 24 additions & 0 deletions docs/api/media.rst
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,30 @@ middleware. Here is an example of how this can be done:
resp.content_type = req.accept
Exception Handling
------------------

Version 3 of Falcon updated how the handling of exceptions raised by handlers behaves:

* Falcon lets the media handler try to deserialized an empty body. For the media types
that don't allow empty bodies as a valid value, such as ``JSON``, an instance of
:class:`falcon.MediaNotFoundError` should be raised. By default, this error
will be rendered as a ``400 Bad Request`` response to the client.
This exception may be suppressed by passing a value to the ``default_when_empty``
argument when calling :meth:`Request.get_media`. In this case, this value will
be returned by the call.
* If a handler encounters an error while parsing a non-empty body, an instance of
:class:`falcon.MediaMalformedError` should be raised. The original exception, if any,
is stored in the ``__cause__`` attribute of the raised instance. By default, this
error will be rendered as a ``400 Bad Request`` response to the client.

If any exception was raised by the handler while parsing the body, all subsequent invocations
of :meth:`Request.get_media` or :attr:`Request.media` will result in a re-raise of the same
exception, unless the exception was a :class:`falcon.MediaNotFoundError` and a default value
is passed to the ``default_when_empty`` attribute of the current invocation.

External handlers should update their logic to align to the internal Falcon handlers.

.. _custom_media_handlers:

Replacing the Default Handlers
Expand Down
27 changes: 13 additions & 14 deletions falcon/asgi/multipart.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,9 +60,10 @@ async def get_text(self):
charset = options.get('charset', self._parse_options.default_charset)
try:
return (await self.get_data()).decode(charset)
except (ValueError, LookupError):
except (ValueError, LookupError) as err:
raise MultipartParseError(
description='invalid text or charset: {}'.format(charset))
description='invalid text or charset: {}'.format(charset)
) from err

data = property(get_data)
media = property(get_media)
Expand Down Expand Up @@ -112,20 +113,17 @@ async def _iterate_parts(self):
# end of a multipart form.
break
elif separator:
raise MultipartParseError(
description='unexpected form structure')
raise MultipartParseError(description='unexpected form structure')

except DelimiterError:
raise MultipartParseError(
description='unexpected form structure')
except DelimiterError as err:
raise MultipartParseError(description='unexpected form structure') from err

headers = {}
try:
headers_block = await stream.read_until(
_CRLF_CRLF, max_headers_size, consume_delimiter=True)
except DelimiterError:
raise MultipartParseError(
description='incomplete body part headers')
except DelimiterError as err:
raise MultipartParseError(description='incomplete body part headers') from err

for line in headers_block.split(_CRLF):
name, sep, value = line.partition(b': ')
Expand All @@ -142,9 +140,9 @@ async def _iterate_parts(self):
# bodies have been discovered.
if name == b'content-transfer-encoding' and value != b'binary':
raise MultipartParseError(
description=(
'the deprecated Content-Transfer-Encoding '
'header field is unsupported'))
description=('the deprecated Content-Transfer-Encoding '
'header field is unsupported')
)
# NOTE(vytas): RFC 7578, section 4.8.
# Other header fields MUST NOT be included and MUST be
# ignored.
Expand All @@ -154,7 +152,8 @@ async def _iterate_parts(self):
remaining_parts -= 1
if remaining_parts < 0 < self._parse_options.max_body_part_count:
raise MultipartParseError(
description='maximum number of form body parts exceeded')
description='maximum number of form body parts exceeded'
)

yield BodyPart(stream.delimit(delimiter), headers,
self._parse_options)
40 changes: 37 additions & 3 deletions falcon/asgi/request.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@

__all__ = ['Request']

_UNSET = falcon.request._UNSET


class Request(falcon.request.Request):
"""Represents a client's HTTP request.
Expand Down Expand Up @@ -390,7 +392,8 @@ def __init__(self, scope, receive, first_event=None, options=None):
self.method = 'GET' if self.is_websocket else scope['method']

self.uri_template = None
self._media = None
self._media = _UNSET
self._media_error = None

# TODO(kgriffs): ASGI does not specify whether 'path' may be empty,
# as was allowed for WSGI.
Expand Down Expand Up @@ -696,7 +699,7 @@ def netloc(self):

return netloc_value

async def get_media(self):
async def get_media(self, default_when_empty=_UNSET):
"""Return a deserialized form of the request stream.
The first time this method is called, the request stream will be
Expand All @@ -712,17 +715,40 @@ async def get_media(self):
See also :ref:`media` for more information regarding media handling.
Note:
When ``get_media`` is called on a request with an empty body,
Falcon will let the media handler try to deserialize the body
and will return the value returned by the handler or propagate
the exception raised by it. To instead return a different value
in case of an exception by the handler, specify the argument
``default_when_empty``.
Warning:
This operation will consume the request stream the first time
it's called and cache the results. Follow-up calls will just
retrieve a cached version of the object.
Args:
default_when_empty: Fallback value to return when there is no body
in the request and the media handler raises an error
(like in the case of the default JSON media handler).
By default, Falcon uses the value returned by the media handler
or propagates the raised exception, if any.
This value is not cached, and will be used only for the current
call.
Returns:
media (object): The deserialized media representation.
"""

if self._media is not None or self.stream.eof:
if self._media is not _UNSET:
return self._media
if self._media_error is not None:
if default_when_empty is not _UNSET and isinstance(
self._media_error, errors.MediaNotFoundError
):
return default_when_empty
raise self._media_error

handler = self.options.media_handlers.find_by_media_type(
self.content_type,
Expand All @@ -735,6 +761,14 @@ async def get_media(self):
self.content_type,
self.content_length
)
except errors.MediaNotFoundError as err:
self._media_error = err
if default_when_empty is not _UNSET:
return default_when_empty
raise
except Exception as err:
self._media_error = err
raise
finally:
if handler.exhaust_stream:
await self.stream.exhaust()
Expand Down
102 changes: 102 additions & 0 deletions falcon/errors.py
Original file line number Diff line number Diff line change
Expand Up @@ -2288,6 +2288,108 @@ def __init__(self, param_name, headers=None, **kwargs):
**kwargs
)


class MediaNotFoundError(HTTPBadRequest):
"""400 Bad Request.
Exception raised by a media handler when trying to parse an empty body.
Note:
Some media handlers, like the one for URL-encoded forms, allow an
empty body. In these cases this exception will not be raised.
Args:
media_type (str): The media type that was expected.
Keyword Args:
headers (dict or list): A ``dict`` of header names and values
to set, or a ``list`` of (*name*, *value*) tuples. Both *name* and
*value* must be of type ``str`` or ``StringType``, and only
character values 0x00 through 0xFF may be used on platforms that
use wide characters.
Note:
The Content-Type header, if present, will be overridden. If
you wish to return custom error messages, you can create
your own HTTP error class, and install an error handler
to convert it into an appropriate HTTP response for the
client
Note:
Falcon can process a list of ``tuple`` slightly faster
than a ``dict``.
href (str): A URL someone can visit to find out more information
(default ``None``). Unicode characters are percent-encoded.
href_text (str): If href is given, use this as the friendly
title/description for the link (default 'API documentation
for this error').
code (int): An internal code that customers can reference in their
support request or to help them when searching for knowledge
base articles related to this error (default ``None``).
"""

def __init__(self, media_type, **kwargs):
super().__init__(
title='Invalid {0}'.format(media_type),
description='Could not parse an empty {0} body'.format(media_type),
**kwargs
)


class MediaMalformedError(HTTPBadRequest):
"""400 Bad Request.
Exception raised by a media handler when trying to parse a malformed body.
The cause of this exception, if any, is stored in the ``__cause__`` attribute
using the "raise ... from" form when raising.
Args:
media_type (str): The media type that was expected.
Keyword Args:
headers (dict or list): A ``dict`` of header names and values
to set, or a ``list`` of (*name*, *value*) tuples. Both *name* and
*value* must be of type ``str`` or ``StringType``, and only
character values 0x00 through 0xFF may be used on platforms that
use wide characters.
Note:
The Content-Type header, if present, will be overridden. If
you wish to return custom error messages, you can create
your own HTTP error class, and install an error handler
to convert it into an appropriate HTTP response for the
client
Note:
Falcon can process a list of ``tuple`` slightly faster
than a ``dict``.
href (str): A URL someone can visit to find out more information
(default ``None``). Unicode characters are percent-encoded.
href_text (str): If href is given, use this as the friendly
title/description for the link (default 'API documentation
for this error').
code (int): An internal code that customers can reference in their
support request or to help them when searching for knowledge
base articles related to this error (default ``None``).
"""

def __init__(self, media_type, **kwargs):
super().__init__(
title='Invalid {0}'.format(media_type), description=None, **kwargs
)
self._media_type = media_type

@property
def description(self):
msg = 'Could not parse {} body'.format(self._media_type)
if self.__cause__ is not None:
msg += ' - {}'.format(self.__cause__)
return msg

@description.setter
def description(self, value):
pass

# -----------------------------------------------------------------------------
# Helpers
# -----------------------------------------------------------------------------
Expand Down
2 changes: 1 addition & 1 deletion falcon/media/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,7 @@ async def deserialize_async(self, stream, content_type, content_length):
return self.deserialize(io.BytesIO(data), content_type, content_length)

exhaust_stream = False
"""Whether to exhaust the WSGI input stream upon finishing deserialization.
"""Whether to exhaust the input stream upon finishing deserialization.
Exhausting the stream may be useful for handlers that do not necessarily
consume the whole stream, but the deserialized media object is complete and
Expand Down
28 changes: 13 additions & 15 deletions falcon/media/json.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,10 @@ class JSONHandler(BaseHandler):
alternative library. Good options in this respect include `orjson`,
`python-rapidjson`, and `mujson`.
This handler will raise a :class:`falcon.MediaNotFoundError` when attempting
to parse an empty body, or a :class:`falcon.MediaMalformedError`
if an error happens while parsing the body.
Note:
If you are deploying to PyPy, we recommend sticking with the standard
library's JSON implementation, since it will be faster in most cases
Expand Down Expand Up @@ -80,25 +84,19 @@ def __init__(self, dumps=None, loads=None):
self.serialize = self._serialize_b
self.serialize_async = self._serialize_async_b

def deserialize(self, stream, content_type, content_length):
def _deserialize(self, data):
if not data:
raise errors.MediaNotFoundError('JSON')
try:
return self.loads(stream.read().decode())
return self.loads(data.decode())
except ValueError as err:
raise errors.HTTPBadRequest(
title='Invalid JSON',
description='Could not parse JSON - {0}'.format(err)
)
raise errors.MediaMalformedError('JSON') from err

async def deserialize_async(self, stream, content_type, content_length):
data = await stream.read()
def deserialize(self, stream, content_type, content_length):
return self._deserialize(stream.read())

try:
return self.loads(data.decode())
except ValueError as err:
raise errors.HTTPBadRequest(
title='Invalid JSON',
description='Could not parse JSON - {0}'.format(err)
)
async def deserialize_async(self, stream, content_type, content_length):
return self._deserialize(await stream.read())

def _serialize_s(self, media, content_type):
return self.dumps(media).encode()
Expand Down
Loading

0 comments on commit c7ab8da

Please sign in to comment.