Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG][PYTHON] BOUNTY Unwanted string-to-date coercion in allOf #10812

Closed
6 tasks done
mariodsantana opened this issue Nov 8, 2021 · 14 comments
Closed
6 tasks done

[BUG][PYTHON] BOUNTY Unwanted string-to-date coercion in allOf #10812

mariodsantana opened this issue Nov 8, 2021 · 14 comments

Comments

@mariodsantana
Copy link

mariodsantana commented Nov 8, 2021

Bug Report Checklist

  • Have you provided a full/minimal spec to reproduce the issue?
  • Have you validated the input using an OpenAPI validator (example)?
  • Have you tested with the latest master to confirm the issue still exists?
  • Have you searched for related issues/PRs?
  • What's the actual output vs expected output?
  • [Optional] Sponsorship to speed up the bug fix or feature request: $250 $1,000 (see below!)
Description

Using -g python on a spec that has a schema item with a string value, it's usually an integer but it doesn't have to be. When it's an integer that can be parsed as a date, it gets turned into a date. Sometimes. The way I've been able to consistently replicate it is with a composite object using allOf.

In the spec below, there are two schemas called thingone and thingtwo, each an object with two string properties, and another schema called twothings which is an allOf composition of thingone and thingtwo. When processing a twothings response, where all the string properties are of the form YYYYMMDD, the string properties that belong to thingone are coerced into dates.

openapi-generator version

5.3.1-SNAPSHOT using Docker

OpenAPI declaration file content or url

https://gist.github.com/mariodsantana/9944ee3e4d669771bccaeb059ce863a0

Generation Details
docker run -u $UID --rm -v $PWD:/local openapitools/openapi-generator-cli \
    generate -i /local/openapi.yaml -g python -o /local \
    --additional-properties packageName=fake \
    --additional-properties pythonAttrNoneIfUnset=true \
    --additional-properties generateSourceCodeOnly=true
Steps to reproduce
  1. Copy the openapi spec below
  2. Generate client code
  3. Run mocking server
  4. Call the get_twothings API
  5. Notice that some of the results are dates

I've written a little bash script to do all of the above, see https://gist.github.com/mariodsantana/28477b908e0d50e17cbcddb52161d790

Related issues/PRs

Obviously there's a lot of work going on with composed schemas and allOf &co, but in my ignorance I couldn't find anything that seemed actually related to this.

Suggest a fix

No real idea. I can fix the immediate issue with a small patch to the generated model_utils.py but then that keeps legitimate dates from being converted from strings.

I'm willing to sponsor this work to the tune of $250. Maybe more if it turns out to be a complicated issue. ETA: It's turned out to be a complicated issue. I've upped the bounty to $1,000. :)

@mariodsantana
Copy link
Author

mariodsantana commented Nov 9, 2021

Digging in a bit, it seems like the steps causing this are:

  1. you get some data for twothings, containing some properties that "belong" to thingone, and other properties that "belong" to thingtwo
  2. the composite model validates the received data in the context of thingone. At this time, the properties that "belong" to thingtwo are treated as additionalProperties - unknown filthy vagrants subject to whatever cleanup and coercion is on offer. (Because the addtionalProperties type is true, which allows any kind of data in an additionalProperty, so the validator tries to be helpful and upconvert whenever possible.)
  3. the result is that the string properties that "belong" to thingone are kept as strings, while the string properties that "belong" to thingtwo are upconverted if they match a date format or etc.
  4. the composite model then validates the received data in the context of thingtwo. This time, it's the properties that "belong" to thingone that get upconverted.
  5. the whole process returns the data as validated by the last subschema to process the data. So you can fiddle with which half of the string properties get upconverted by changing the names of the subschemas. (Presumaby this changes the order in which they are fetched from whatever python dict they are stored, or something.)

I can keep the upconversion from happening by specifying additionalPropertiesType in each subschema, like this:

components:
  schemas:
    thingone:
      additionalProperties:
        type: string
      type: object
      properties:
        propone:
          type: string

    thingtwo:
      additionalProperties:
        type: string
      type: object
      properties:
        proptwo:
          type: string

    boththings:
      allOf:
        - $ref: "#/components/schemas/thingone"
        - $ref: "#/components/schemas/thingtwo"

That works for this toy problem ... but in my real spec, I want to mix thingone and thingtwo with other schemas that contain dates and booleans and all kinds of stuff. So I don't want to limit what types those other things can be.

I think my options, if this turns out to be a non-bug, are:

  1. Isolate sub-schemas. I can wrap each subschema in its own key within the parent, like this:
    boththings:
      type: object
        properties:
          thingone:
            $ref: "#/components/schemas/thingone"
          thingtwo:
            $ref: "#/components/schemas/thingtwo"

I already have a lot of code that assumes boththings.propone rather than boththings.thingone.propone, so I'd like to avoid this if possible.

  1. Manual mix-ins. I can manually put propone and proptwo inside of boththings. This is will make my api spec ugly, but it would be a lot less work than option 1
  2. Teach the python code generator to let the sub-schema that "owns" the property to specify its type. In my ignorance, this sounds like the best answer...? Though I may have to go with option 2 just for expediency for now.

@mariodsantana
Copy link
Author

Wow, it's more work than I thought, to implement option 2 from my previous comment - the API implementation leverages the fact that thingone and thingtwo are different objects in order to identify which data "belongs" to each and process it differently.

I'm increasing my sponshorship to $1,000 for implementing option 3 in the previous comment.

@gavinmh
Copy link

gavinmh commented Nov 11, 2021

I am also affected by this defect. I am working around it by applying this patch to model_utils.py after generation:

@@ -1112,7 +1112,14 @@ def remove_uncoercible(required_types_classes, current_item, spec_property_namin
         if must_convert and class_pair in COERCIBLE_TYPE_PAIRS[spec_property_naming]:
             results_classes.append(required_type_class)
         elif class_pair in UPCONVERSION_TYPE_PAIRS:
-            results_classes.append(required_type_class)
+            if class_pair in {(str, date), (str, datetime)}:
+                try:
+                    parse(current_item)
+                    results_classes.append(required_type_class)
+                except Exception as e:
+                    pass
+            else:
+                results_classes.append(required_type_class)
     return results_classes

I don't think this is a desirable solution and my use-case may be narrower than others.

@mariodsantana
Copy link
Author

mariodsantana commented Nov 11, 2021

Interesting. I've implemented my option 2 from a couple of comments back, so my product works for now and I'm too busy with launch preparations to really dig into this right now. But FWIW, my original fix was also to patch model_utils.py after generation, though differently than you. My patch just removed the (str, date) and (str, datetime) tuples from the COERCIBLE_TYPE_PAIRS straight off. [ETA: or was it from UPCONVERSION_TYPE_PAIRS? I don't recall.]
Unfortunately, this meant that no actual dates would ever get unmarshalled from their string representation in transit, so it was a no-go for me...

I'm still hungry for a real fix here, and I do like to give back when I can, so the bounty will stay up for as long as I can keep my funders committed. :)

@mariodsantana
Copy link
Author

BUMP!

My funders are balking at keeping this open, compared to the small cost of keeping my openapi spec ugly. I like my code DRY and would really like to use the allOf functionality. I can keep the bounty at $1,000 for a couple of weeks, but after that it will drop to $500, and maybe go away altogether.

Here's hoping someone bites on this. If you have any other suggestions to incentivize a fix here - or if I'm thinking about this all wrong in the first place - I'm all ears!

@mariodsantana mariodsantana changed the title [BUG][PYTHON] Unwanted string-to-date coercion in allOf [BUG][PYTHON] BOUNTY Unwanted string-to-date coercion in allOf Dec 24, 2021
@spacether
Copy link
Contributor

spacether commented Dec 28, 2021

I am working on a new version of the python generator which will not have this issue.
This bug arises because values are coerced from one type into another.
That's not what openapi/jsonschema intends. Jsonschema/openapi is a domain specific language specifying type and validation constraints.
In my new python-experimental generator data is validated but it is not coerced from one type to another.
Specifically for date and datetime data, the data is stored as a string and there are date and datetime accessor methods.
The only exceptions on coercion in python-expeirmental will be:

  • serializing schema data into string-transmitted endpoint parameters like header, path, cookie, query parameters
  • deserializing headers into schema data (data starts out as string data and would need to be coerced into non-string types like int/float etc)

I intend on releasing it soon.

@mariodsantana
Copy link
Author

Nice! I'm looking forward to checking it out.

@spacether
Copy link
Contributor

@mariodsantana it is merged. When you have some time can you take a look at it?
If it resolves your issue could you submit the bounty on my github sponsor page?

@mariodsantana
Copy link
Author

@spacether - just a quick note to let you know I haven't forgotten this. I'm catching up after a week out sick, but I'll soon get a chance to integrate your changes, DRY up my openapi spec, and test it all out so my funders will release the bounty. Thanks for your hard work!

@mariodsantana
Copy link
Author

Still haven't forgotten! I'm finalizing a release and hope to start upgrading dependencies this week!

@spacether
Copy link
Contributor

Any update here?

@spacether
Copy link
Contributor

@mariodsantana what's the status?

@spacether
Copy link
Contributor

spacether commented Apr 5, 2022

@mariodsantana any update here?

@mariodsantana
Copy link
Author

@spacether - I'm not sure when I'll get the time to refactor my spec to make use of (and test) your changes. I've made you wait too long as it is. DM inbound to discuss bounty payment details. Thanks for your work, and apologies for the delay!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants
@gavinmh @spacether @mariodsantana and others