Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: redo validation by content type #760

Closed
wants to merge 11 commits into from

Conversation

dtkav
Copy link
Collaborator

@dtkav dtkav commented Nov 8, 2018

This is a WIP, I'd like to make this a bit more pluggable, so that it is straightforward to add content-type validation handlers.

Related to #655 #755

Changes proposed in this pull request:

  • remove all_json requirement that caused any non-json content types to preclude validating json

@dtkav dtkav force-pushed the content_type_validation branch from e7938e5 to 2a50660 Compare November 8, 2018 02:25
@spec-first spec-first deleted a comment Nov 8, 2018
@spec-first spec-first deleted a comment Nov 8, 2018
if error:
return error

elif self.strict_validation:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is a global setting it means that if any endpoints accept non handled content types (e.g. binary uploads) then strict validation would have to be disabled for all endpoints.

Copy link
Collaborator Author

@dtkav dtkav Nov 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - I'm still thinking through the implications. The strict flag seems pretty nebulous. AFAICT it mostly checks for extra form fields, and errors if those exist.

So far this diff makes connexion much more strict, which I personally think is a good thing, but others might disagree (and it may cause breakage for people that have been relying on connexion's past flexibility).

I think personally as a user I'd prefer to explicitly have a content handler for every content type that I accept. Part of my goal is to make it straightforward to register new content handlers.

We could also provide more 'batteries included' beyond just json and formdata. For example, I could add a default octet stream content handler.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right I see. If it is straight forward to add a custom content handler that would be ideal for my case in #592 - especially if the custom handler/validator can process as a stream.

@dtkav dtkav changed the title WIP: redo validation by content type WIP DEMO: redo validation by content type Nov 9, 2018
@dtkav dtkav force-pushed the content_type_validation branch 2 times, most recently from bd6d8a6 to de1a3df Compare November 9, 2018 06:21
@spec-first spec-first deleted a comment Nov 9, 2018
@spec-first spec-first deleted a comment Nov 9, 2018
@spec-first spec-first deleted a comment Nov 10, 2018
@dtkav dtkav force-pushed the content_type_validation branch from c7dfabf to d8429d6 Compare November 10, 2018 01:16
@dtkav dtkav changed the title WIP DEMO: redo validation by content type DEMO: redo validation by content type Nov 10, 2018
@spec-first spec-first deleted a comment Nov 10, 2018
@spec-first spec-first deleted a comment Nov 13, 2018
@dtkav dtkav force-pushed the content_type_validation branch from 29342ff to f87b84f Compare November 15, 2018 02:50
@dtkav
Copy link
Collaborator Author

dtkav commented Nov 15, 2018

hey @jmcs - I'd love your thoughts on this diff/demo.
This is a demo of a pluggable interface for handling different content-types.
While it is a fully working PR, I'm looking for concept and architectural level feedback right now.

Copy link
Contributor

@kornicameister kornicameister left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments along the way.

@@ -21,6 +20,7 @@ def __init__(self,
self.json_getter = json_getter
self.files = files
self.context = context if context is not None else {}
self.content_type = self.headers.get("Content-Type", "")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that's a derived thingy, perhaps a property and returning None if a content-type not set?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

from ..types import coerce_type
from ..utils import is_null

logger = logging.getLogger('connexion.serialization.deserializers')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't __name__ be sufficient here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why, but all of the other getLogger calls in connexion are hard-coded

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't mind, I might go through everything and clean it up for ya. Not sure when, but perhaps I can find some time today's evening or tomorrow.

)


KNOWN_CONTENT_TYPES = [
Copy link
Contributor

@kornicameister kornicameister Feb 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a nit, you can define all constant lista as tuples. That way you limit the ammount of used memory + load time is a bit faster. Tuples are cached by python runtime and once defined they are kept as they were without a change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might skip creating KNOWN_CONTENT_TYPES and use ContentHandler.__subclasses__() instead. (see my comment on connexion/decorators/validation.py)

@harph
Copy link

harph commented Sep 2, 2019

Any updates on the status of this?

Copy link
Contributor

@cognifloyd cognifloyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice direction. Lets finish it!


class JSONContentHandler(ContentHandler):
name = "application/json"
regex = re.compile(r'^application\/json.*|^.*\+json$')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_json_mimetype checks for application/json or application/.*\+json but this checks for application/json or .*+json. They should probably both check for the same thing. So, maybe is_json_mimetype should do the equivalent of this regex. It makes sense to be even more lenient and treat anything that ends in +json as json.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are several places that said validation can occur for text/plain as well as application/json - does this need to check for that too?

)


KNOWN_CONTENT_TYPES = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might skip creating KNOWN_CONTENT_TYPES and use ContentHandler.__subclasses__() instead. (see my comment on connexion/decorators/validation.py)

de(self.validator,
self.schema,
self.strict_validation,
self.is_null_value_valid) for de in KNOWN_CONTENT_TYPES
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.is_null_value_valid) for de in KNOWN_CONTENT_TYPES
self.is_null_value_valid) for de in ContentHandler.__subclasses__()

that way all you have to do to register a new one is subclass ContentHandler and make sure to import your ContentHandler subclass before initializing Connexion (or more accurately before whatever point it is that creates the RequestBodyValidator).

Actually, I think I would turn this into a classmethod on ContentHandler so that it can look for grandchildren not just children classes as described here: https://stackoverflow.com/a/3862957

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. I see register_content_handler below for registering additional content types. I would still like to see a way to replace default content handlers.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL!

Comment on lines 69 to 76
if len(matches) > 1:
logger.warning("Content could be handled by multiple validators")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want them to be able to vote? ie more than just a regex? That way someone could register a handler that should handle (for example) json with a higher priority than the internal json validator.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that the regex can't handle is None - what if a content handler wants to handle content_type = None (more likely to be used for serialization than deserialization/validation, assuming the same ContentHandlers are used for both processes).

- query params
- form params
- body params
- path params ?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the ? here?


logger = logging.getLogger('connexion.serialization.deserializers')
logger = logging.getLogger('connexion.content_types')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't we just do logging.getLogger(__name__) here (and everywhere else)? That's what I do in my other projects and it prevents pointless work when refactoring.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I think I was planning to stick to repo convention for now, and then doing a single PR to fix them all. Happy to do it incrementally though.

#time_start = 1010
#resp = app_client.get(
# '/v1.0/nullable-parameters?time_start={}'.format(time_start))
#assert json.loads(resp.data.decode('utf-8', 'replace')) == time_start
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you comment this out? I see you did that with the empty_object_body test, but it looks like this was testing something different with parameters. Are you planning to adjust parameter validation to use ContentHandlers as well (that doesn't make complete sense to me)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, but I think it was a mistake.

Comment on lines +346 to +365
request comes in -> connexionrequest (framework generic)
API metrics recorded
API security applied

arrays are deserialized (spec dependent)
- query params
- form params
- path params

body is deserialized (spec dependent, content dependent)

validation is applied (spec dependent, content dependent)
- query params
- form params
- body params
- path params ?

parameter_to_arg (handler, what to call it)

json serializer is applied
response is validated if json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the plan, not what it currently does, right?

Mapping this out (current -> planned)

  • decorators.metrics -> API metrics
  • decorators.security -> API security (this is also spec dependent for oauth scopes and security schemes)
  • decorators.uri_parsing -> query & path params deserialiation
  • part of validation? -> form params
  • part of validation -> body deserialization
  • decorators.validation -> request validation: query, form, body, path
  • decorators.parameter -> parameter_to_arg (I would call this an argument mapper)
  • part of api -> response serializer (on content type handlers? Need a way to register custom like request validators)
  • part of validation -> response validation (on content type handlers? Need a way to register custom like request validators)

Is that what you're thinking?

def deserialize(self, request):
return request.body

def validate(self, request):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def validate(self, request):
def validate_request(self, request):

Make the ContentHandlers useful in more contexts.

ie, we could add a validate_response_body() as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea

This was referenced Dec 7, 2019
@dtkav dtkav force-pushed the content_type_validation branch from 650abaa to 171cbb3 Compare December 12, 2019 04:47
@dtkav dtkav changed the title DEMO: redo validation by content type WIP: redo validation by content type Dec 13, 2019
@bilalshaikh42
Copy link

Hi @dtkav,
I was wondering what the status of this was and if there was a way to contribute?

@AjaySharma2209
Copy link

Very basic question. So, in case if application/xml content type is used, is there any way of parsing xml at all. I am getting below error.
cannot convert dictionary update sequence element #0 to a sequence

@Song2017
Copy link

Hey,
is it possible that request content only to be pass-through if content-type is not application/json, body content could be validated by developer self

@RobbeSneyders
Copy link
Member

Closing this one as #1588 and #1591 tackle this for request and response validation respectively.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants