Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model name replaced by drive letter in JSON schema on Windows #7860

Closed
1 task done
joakimnordling opened this issue Oct 18, 2023 · 7 comments · Fixed by #7881
Closed
1 task done

Model name replaced by drive letter in JSON schema on Windows #7860

joakimnordling opened this issue Oct 18, 2023 · 7 comments · Fixed by #7881
Assignees
Labels
bug V2 Bug related to Pydantic V2

Comments

@joakimnordling
Copy link
Contributor

joakimnordling commented Oct 18, 2023

Initial Checks

  • I confirm that I'm using Pydantic V2

Description

In the JSON schema generated by pydantic the name of the model is not used, rather the drive letter is used on Windows, like D from D: instead of MyModel if the model is imported from an absolute filesystem path using importlib. The JSON schema thus looks something like this:

{
  "$defs": {
    "D": {
      "properties": {
        "my_int": {
          "title": "My Int",
          "type": "integer"
        }
      },
      "required": ["my_int"],
      "title": "MyModel",
      "type": "object"
    }
  },
  "title": "My Schema"
}

If the model is imported using a relative path or on a system that does not use drive letters the JOSN schema has the expected name of the model (MyModel) in it:

{
  "$defs": {
    "MyModel": {
      "properties": {
        "my_int": {
          "title": "My Int",
          "type": "integer"
        }
      },
      "required": ["my_int"],
      "title": "MyModel",
      "type": "object"
    }
  },
  "title": "My Schema"
}

I believe the cause of this is the get_defs_ref function in pydantic/json_schema.py which tries to remove an ID by splitting at a colon : character and thus ends up removing everything after the drive letter, so C:\Users\... becomes just C.

A more detailed description and example that you can run can be found at https://github.com/joakimnordling/fastapi-pydantic-bug-poc Below is the same example for completeness:

Example Code

Create two files.

main.py:

import importlib.util
import json
from pathlib import Path

from pydantic.json_schema import models_json_schema


def import_my_model():
    """
    Import a Pydantic model using importlib from an absolute filesystem path.
    """
    poc_dir = Path(__file__).parent.relative_to(Path.cwd())
    p = (poc_dir / "models.py").absolute()
    spec = importlib.util.spec_from_file_location(name=str(p), location=str(p))

    if not spec.loader:
        raise RuntimeError(f"Failed to import {p} module")

    module = spec.loader.load_module(str(p))

    return getattr(module, "MyModel")


def main():
    MyModel = import_my_model()

    _, top_level_schema = models_json_schema(
        [(MyModel, "validation")], title="My Schema"
    )
    print(json.dumps(top_level_schema, indent=2))


if __name__ == "__main__":
    main()

models.py:

from pydantic import BaseModel, Field


class MyModel(BaseModel):
    my_int: int = Field(
        ...,
    )

Python, Pydantic & OS Version

pydantic version: 2.4.2
        pydantic-core version: 2.10.1
          pydantic-core build: profile=release pgo=false
                 install path: C:\Users\runneradmin\AppData\Local\pypoetry\Cache\virtualenvs\fastapi-pydantic-bug-poc-y0CjNB0N-py3.10\Lib\site-packages\pydantic
               python version: 3.10.11 (tags/v3.10.11:7d4cc5a, Apr  5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)]
                     platform: Windows-10-10.0.20348-SP0
             related packages: typing_extensions-4.8.0
@joakimnordling joakimnordling added bug V2 Bug related to Pydantic V2 pending Is unconfirmed labels Oct 18, 2023
@sydney-runkle
Copy link
Member

@davidhewitt, could you take a look at this tomorrow when you're back online? Thanks 😄

@dmontagu
Copy link
Contributor

It seems bizarre to me that a C:\ would be showing up in the defs ref at all, I'm not sure how that's happening.

That said, @joakimnordling could you try changing the line you identified:

        components = [x.split(':')[0] for x in components]

to

        components = [x.rsplit(':', 1)[0] for x in components]

? This should cause it to only remove the last item (which was the intent) as opposed to keeping the first item. (If you are right about this, I guess my assumption that the core_ref wouldn't have a colon was incorrect. Again, not sure where that's coming from, but this seems more correct anyway.)

If that fixes it for you I suspect we could quickly merge that change.

joakimnordling added a commit to joakimnordling/pydantic that referenced this issue Oct 19, 2023
@joakimnordling
Copy link
Contributor Author

Hi @dmontagu!

I tested your suggested fix (made a fork of pydantic, applied the fix to it and updated my PoC to use my own fork and then re-run my GitHub action on Windows) and can confirm that it indeed solves the issue for our case.

BTW in a typical use case you likely don't end up with the full path to the file there, it's due to the funky way we use importlib to import the module and model. It's used in our ioxio-data-product-definition-tooling that way. In short that library is used to find python files in a folder hierarchy and spit out corresponding OpenAPI spec files based on the models used in the files. Due to this it uses importlib to load the files instead of normal imports. One of our devs encountered this issue when upgrading from pydantic 1.x to 2.x while working on a Windows machine. Once we identified the issue we were able to temporarily bypass it by using relative paths to the files instead of absolute when importing them. But we thought that it's still best to try to get it fixed in pydantic to avoid others hitting the same issue. And of course nice if we can avoid the extra step that converts the paths to relative ones.

@sydney-runkle
Copy link
Member

Hi @joakimnordling,

Glad to hear that worked! Thanks very much for testing that out and for reporting this issue, even though you were able to find a workaround. I'm sure this fix / issue will be helpful to others in the future.

Would you be able to create PR from your fork of Pydantic with the fix? Thanks so much for your help 🌟

@joakimnordling
Copy link
Contributor Author

Hi @sydney-runkle,

I opened a PR with that fix.

I'm not sure if I should have added some more unit tests for it as well or if you consider it enough that it does not break any of the existing ones. If desired I can try to add a unit test for it. I guess it shouldn't be too impossible considering the tests are run on Windows as well and there is a create_module helper that could likely be used. Of course a bit of effort required to test it out since I'm myself working on a non-Windows environment, but I guess GitHub actions would allow me to test it out sufficiently if desired.

@sydney-runkle
Copy link
Member

@joakimnordling,

Great, thanks so much for opening that PR. If you could add a unit test to verify the new behavior, that'd be awesome. Glad to see all of the other tests are still passing 👍. Otherwise, looks good and I'd be happy to review your PR in more detail once it has a relevant unit test 🌟

@sydney-runkle sydney-runkle removed the pending Is unconfirmed label Oct 20, 2023
@joakimnordling
Copy link
Contributor Author

Hi,

Sorry for the delay with the unittest and the PR in general. I ended up checking a bit deeper into the issue still while adding the unittest and realized a mistake in the way importlib was used and then got interrupted with some more urgent work for couple of days.

This was the original proof of concept code (based on our project that was using it the same way):

    poc_dir = Path(__file__).parent.relative_to(Path.cwd())
    p = (poc_dir / "models.py").absolute()
    spec = importlib.util.spec_from_file_location(name=str(p), location=str(p))

    if not spec.loader:
        raise RuntimeError(f"Failed to import {p} module")

    module = spec.loader.load_module(str(p))

    return getattr(module, "MyModel")

An import of a module file should really be made like this instead: https://docs.python.org/3/library/importlib.html#importing-a-source-file-directly

But if one uses the way shown above it will cause the name of the module to also be something like C:\path\to\module.py (on Windows). I was originally trying to change the name parameter to something without a colon, but each try to do that resulted in an error. That way of doing the import will only work if the name and location match (based on empirical tests at least), whereas when using the correct way you can set the name to something that looks like a python module name.

Somehow I didn't figure out that this was the wrong way to do the import and what the right way was.

I think the fix itself still makes sense; it makes the code more robust, but on the other hand I'm not sure if I really think the unit test for it makes sense; one should not really name a python module with a colon in the name. Showing that being done in the unittest and cluttering one of the fixtures with an extra parameter is not something I really feel happy about. On the other hand I can't think of any better way to add the test without unnecessary duplication either.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug V2 Bug related to Pydantic V2
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants