Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Various improvements #13

Draft
wants to merge 208 commits into
base: main
Choose a base branch
from
Draft

ENH: Various improvements #13

wants to merge 208 commits into from

Conversation

candleindark
Copy link

@candleindark candleindark commented May 28, 2024

This PR improves the existing conversion from Pydantic models to LinkML the following areas:

  1. Improve enum conversion so that different slots within the EnumDefinition metamodel are populated more accurately
  2. Add more detailed slots population
  3. Add slot usage in class population
  4. Provide a more accurate interpretation of slots/attributes in a class, by eliminating misinterpretation of a class variable in a Pydantic model as a slot for example.
  5. Add checks for unmet assumptions about the given schema implemented with Pydantic
  6. Specify limitations of this conversion tool in README.md
  7. Provide more tests to improve coverage
  8. Reorganized code so that additional support can be added more easily
  9. Provide a solution in handling a Pydantic model that has multiple parents that are also Pydantic models.
  10. Organize this repo into a Python project using Hatch
  11. Enable CLI to be installed as part of the project installation.

TODOs to achieve the above implementation.

  • Lay out a LinkmlGenerator class that takes in Python Enum objects and Pydantic models for conversion to LinkML schema
  • Resolve Pydantic core schema for each model and each field
  • Identify locally defined fields for each model, e.g. filter out those in __annotations__ that are ClassVar
  • Separate locally defined fields as newly defined ones and overriding ones. The newly defined ones are to be used for global slot definitions, and the overriding ones are to be used for slot usage definitions.
  • Layout a SlotGenerator class for generating slot definition from the Pydantic core schema for a field in a Pydantic model
  • Implement conversion of elemental Pydantic core schema to LinkML slot attribute values in SlotGenerator
    • Some special care possibly needed to be put in place for conversion of numeric types because "exclusiveMaximum" and "exclusiveMinimum" are currently not available in LinkML
  • Implement the conversion of Pydantic core model schemas for Pydantic models to LinkML class definitions.

Requests filed in pursue of completing this PR.

  1. LinkML

    1. Questions needed to be answered:
    1. Issues
      1. Unable to generate valid YAML for PermissibleValue with text containing a colon linkml/linkml#2138
        (Needed for dandischema which has enum values containing ":".)
      2. The exact_cardinality metamodel slot doesn't seem to have any effect when used in a SlotDefinition linkml/linkml#2104
        (Not needed. There is a workaround.)
      3. Inconsistent nullability in generated Pydantic model linkml/linkml#2155
        (Solution not needed for our purpose)
      4. equals_string_in meta slot has no affect in validation linkml/linkml#2206
        (Not needed. Found a workaround)
      5. No validation for range uri beyond validation for range string linkml/linkml#2215
        (Needed for the current translation of URL types in Pydantic)
      6. A slot requirement cannot be removed on a subclass linkml/linkml#1508 (comment)
        (Needed for all translations)
  2. Pydantic

    1. Is there a guide/tutorial to the Pydantic core schema? pydantic/pydantic#9485

@candleindark
Copy link
Author

@satra @djarecka @yarikoptic The assumption that there is exactly one Pydantic model corresponds to a class in a target LinkML schema is incorrect in aind_data_schema.

This conversion tool to LinkML schema is able to pickup name collisions in classes and enums. Here is the error.

>           raise NameCollisionError(err_msg)
E           pydantic2linkml.exceptions.NameCollisionError: Name collision @ Isi: [<class 'aind_data_schema.models.platforms.Isi'>, <class 'aind_data_schema.models.modalities.Isi'>]; Name collision @ Confocal: [<class 'aind_data_schema.models.platforms.Confocal'>, <class 'aind_data_schema.models.modalities.Confocal'>]; Name collision @ Merfish: [<class 'aind_data_schema.models.modalities.Merfish'>, <class 'aind_data_schema.models.platforms.Merfish'>]; Name collision @ Behavior: [<class 'aind_data_schema.models.modalities.Behavior'>, <class 'aind_data_schema.models.platforms.Behavior'>, <class 'aind_data_schema.models.harp_types.Behavior'>]; Name collision @ Olfactometer: [<class 'aind_data_schema.models.harp_types.Olfactometer'>, <class 'aind_data_schema.models.devices.Olfactometer'>]; Name collision @ Ecephys: [<class 'aind_data_schema.models.platforms.Ecephys'>, <class 'aind_data_schema.models.modalities.Ecephys'>]; Name collision @ Mri: [<class 'aind_data_schema.models.platforms.Mri'>, <class 'aind_data_schema.models.modalities.Mri'>]

As you can see, class names such as Isi and Confocal are used multiples times across different modules. I checked the Isi and Confocal classes, classes with the same name are very very similar. Any idea on how to handle these name conflicts?

P.S. We don't have this problem on dandischema.

@satra
Copy link
Collaborator

satra commented Jun 3, 2024

@saskiad - any thoughts on the above comment would be much appreciated. this is the pydantic to linkml conversion effort and issac is writing a converter that hopefully generalizes to multiple models.

@candleindark - as a first pass we could augment class name with module name to create a unique class. this would require storing that info. the second thing is to check what's different between the classes.

The use of `from __future__ import annotations` solicit hard to
predict behaviors from Pydantic and Typer in older Python versions
The use of `from __future__ import annotations` solicit hard to
predict behaviors from Pydantic and Typer in older Python versions
`pydantic.BaseModel` doesn't really model anything. It is merely
a template to specify a model.
Pydantic 2.9 adds support of validation of complex numbers.
To be compatible with Pydantic 2.9, We need this interface
defined.
Pydantic 2.8 doesn't have "core_schema.ComplexSchema"
needed in the current code. Let's support Pydantic 2.8
only when needed
Due to the transfer of the project to the DANDI
organization on github
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants