Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize implementation of SchemaView get_classes_by_slot() method #281

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

sujaypatil96
Copy link
Member

The get_classes_by_slot() method in SchemaView takes an extremely long time to run on the MIxS schema and generate the Applicable Classes table on slot documentation pages because of which we are having to explore ways to optimize the runtime for the get_classes_by_slot() method.

Copy link

codecov bot commented Oct 31, 2023

Codecov Report

Attention: 9 lines in your changes are missing coverage. Please review.

Comparison is base (b86c1fa) 62.11% compared to head (954c207) 62.10%.
Report is 6 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #281      +/-   ##
==========================================
- Coverage   62.11%   62.10%   -0.01%     
==========================================
  Files          63       63              
  Lines        8459     8463       +4     
  Branches     2169     2170       +1     
==========================================
+ Hits         5254     5256       +2     
  Misses       2599     2599              
- Partials      606      608       +2     
Files Coverage Δ
linkml_runtime/utils/schema_as_dict.py 91.30% <100.00%> (ø)
linkml_runtime/utils/schemaview.py 87.81% <100.00%> (+0.02%) ⬆️
linkml_runtime/utils/namespaces.py 72.51% <43.75%> (-0.86%) ⬇️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@sierra-moxon
Copy link
Member

We did some profiling (coincidentally :D) of docgen and found that the deepcopy calls for large schemas in schemaview were the most time-consuming bit.

induced_slot = deepcopy(slot)

setattr(induced_slot, metaslot_name, deepcopy(getattr(anc_slot, metaslot_name)))

@sujaypatil96
Copy link
Member Author

Oh, that's very good to know, thank you for digging into this @sierra-moxon 😁

@cmungall
Copy link
Member

cmungall commented Nov 6, 2023

good sleuthing! we should definitely avoid use of deepcopy when calculating induced slots

but note that induced slots may not be necessary for docgen purposes.

  • In some cases, we only want to show asserted usages. Otherwise we may end up flooding the user (e.g. in biolink, every named thing has id as an induced slot)
  • however, there is also an argument that it's useful to see the slot propagated down to all subclasses. But even here some simple logic is required, no need to to the full inference

@sierra-moxon
Copy link
Member

changing https://github.com/linkml/linkml/blob/598376ce7f8c11bd3cf31f0ca7e3d5c34770021a/linkml/generators/docgen/slot.md.jinja2#L26 from True to False in my custom template (and other instances in this file) took docgen on biolink down from >2 hours to just under a minute.

@sierra-moxon
Copy link
Member

sierra-moxon commented Nov 9, 2023

see also linkml issues #1214 and #1604

@cmungall
Copy link
Member

@sujaypatil96 is this replaced by #300?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants