Simply the "Imports" section in the Makefile template #1194

gouttegd · 2025-03-04T19:14:55Z

This PR attempts to make the "Imports" section in Makefile.jinja2 easier to read and maintain.

First, it moves some of the logic into the odk.py script. Notably, the script now takes care of automatically assigning the default module type to all individual modules that do not have an explicit module_type parameter, so that we do not have to go fetch the default value in the template. That is, instead of having to do things like that:

{% if ('slme' == ont.module_type) or (ont.module_type is none and 'slme' == project.import_group.module_type) %}

(testing if the module is a SLME module or if it does not have an explicit type and the default module type is SLME), we can do simply:

{% if 'slme' == ont.module_type %}

That change alone should reduce the risks of blood clots forming in the brain of ODK developers trying to read the template by at least a few percent.

Second, it removes rules that were generated for nothing: when use_base_merging is used, there is no need to generate rules for the individual imports (see #1189).

Third, it removes the “fast track” for “large“ SLME modules. This was a special rule invoked for (1) SLME modules marked as “large” or (2) module using the (undocumented) type fast_slme. The only difference compared to the standard SLME rule is that the “preprocessing step” (removing original ontology annotations and inserting a dc:source annotation instead) is skipped in the fast track. This was because the preprocessing step used to be performed with SPARQL queries, which was memory-intensive for large modules. That step is now performed by the ODK plugin using pure Java code, and has no excessive memory requirement. Therefore, it is no longer needed to treat large modules differently.

Fourth, and though it is not really a “simplification” (on the contrary, it’s a new feature), it adds the possibility of disabling the extraction of a seed from the -edit file (option scan_signature = False) (#638).

Fifth, it re-formats the template according to a few rules:

break all lines so that no lines is longer than 100 characters (max) or 72 characters (preferred);
consistently use long option names (e.g., --input instead of -i, --term-file instead of -T, etc);
indent Jinja2 conditionals;
add comments at the end of conditional blocks to help identify which block is closed (e.g., {% endif %}{# !use_base_merging -#}).

In order to make the Makefile template more readable, we need to move some of the logic it contains into the odk.py script. The general principle is that the configuration object that is given to the templates should contain "pre-computed" fields that can be used directly by the templates. This pre-computation will be done by the new `_derive_fields()` method in the `ProductGroup` class and its derived classes. In this commit, we migrate some of the logic needed to deal with imports. For example, instead of letting the Makefile template having to deal with the possibility that the `module_type` of a given module might not be set (in which case the template needs to get the value of the `module_type` at the level of the entire ImportGroup), we ensure in the `_derive_fields()` method of ImportGroup that all individual modules have an explicit `module_type` (set from the group-level value if no module type has been explicitly specified in the configuration file), so that the Makefile template can simply use something like: {% if 'filter' == ont.module_type %} instead of {% if ('filter' == ont.module_type) or (ont.module_type is none and 'filter' == project.import_group.module_type) %} Likewise for the `slme_individuals` and `module_type_slme` parameters. Also, we add a `special_products` derived field to the ImportGroup, containing the list of all the modules that either (1) are marked as "large", or (2) use a different type than the default, group-level module type. Overall, this makes the section of the template responsible for the production of the rules for the individual modules easier to read.

The rules that produce the individual import modules are only generated when use_base_merging is False, so there is no need, for every single rule, to test for use_base_merging and to make the module dependent on `$(MIRRORDIR)/merged.owl` if use_base_merging is True -- that can never happen, if we are generating a rule for an individual import module then we are necessarily under use_base_merging=False.

Indent Jinja2 conditionals (within the {% ... %} blocks to make the template easier to read. That is, instead of: {% if foo %} {% for b in bar %} {% if b.is_baz %} ... {% endif %} {% endfor %} {% endif %} we do instead: {% if foo %} {% for b in bar %} {% if b.is_baz %} ... {% endif %} {% endfor %} {% endif %} This does not solve *all* the readability issues of the Makefile template, but at least it provides a relatively easy way to find, for example, the `if` that corresponds to a given `endif`.

Reformat some import rules to make them more readable. Apply fine-grained control of Jinja2 whitespaces to avoid generating spurious blank lines in the Makefile.

If we are generating a `$(IMPORTDIR)/%_import.owl` rule, then it necessarily means that `use_base_merging` is False, so there is no need to test that field when generating the dependencies of the rule. (This should have been done at the same time as the same change concerning the rules for the "special" import modules.)

Move the rule that produce the OBO version of an import module towards the end of the imports section, so that it is not mixed with the rules that produce the actual import modules. Apply some minor Jinja2 reformatting, and also add a comment explaining why we need non-generic rules for the "special" imports. Include the type of the module in the generated Makefile comment, as advanced users might find that useful.

When generating a ROBOT command that takes repeated arguments from a list, write each item of the list on a separate line. That is: robot extract --base-iri IRI1 \ --base-iri IRI2 \ --base-iri IRI3 \ ... instead of: robot extract --base-iri IRI1 --base-iri IRI2 --base-iri IRI3 ... as the second form could produce arbitrary long lines depending on how many items we need to expand. This makes the generated Makefile easier to read, at the expense of making the *template* slightly harder to read. But since the Makefile is what the users are exposed to, its readability is more important than the readability of its template (which only ODK developers need worry about).

Though never officially documented, 'fast_slme' was a module type that was almost identical to 'slme', the only difference being that a "fast" SLME module was produced without removing the original ontology annotations and without adding a 'dc:source' annotation. Since those tasks used to be performed by SPARQL queries, they were very memory-intensive and could be a bottlenecks for large modules. In fact, SLME modules that were marked as "large" were automatically treated as "fast SLME" modules because of their high memory requirement. Now that adding the 'dc:source' annotation is added by the ODK plugin in a way that does not involve SPARQL, that step is as fast as it could be and, more importantly, does not depend on the size of the import module anymore. Therefore, there is no longer any real need for a "fast track" for large SLME modules. So we remove that fast track entirely. All SLME modules are now equal, regardless of their size. For backwards compatibility, if a module is explicitly configured as a 'fast_slme' module, we silently recognise it as a standard 'slme' module. This also means that for now, the only effect of the 'is_large' parameter is to allow to selectively *not* refresh large import modules, while refreshing the other, non-large modules.

As suggested by a comment in the template, there is no real need for the Makefile to catenate, sort, and "uniquify" the lists of terms we use as a seed for imports (the automatically generated $(IMPORTSEED) and the manually maintained $(IMPORT)_terms.txt). ROBOT is perfectly able to accept: * more than one --term-file; * containing comments; * containing unsorted terms; * containing duplicated terms. Anything we might do before passing the lists to ROBOT would in fact be redundant with what ROBOT will do anyway, so we might as well let ROBOT do everything. This means that we no longer need any *_terms_combined.txt file.

Add a 'scan_signature' option in the ImportGroup. If true (which is the default), a "pre-seed" is extracted from the -edit file (and its components, if any). That pre-seed is added to the main import seed (tmp/seed.txt) along with any terms extracted from the DOSDP patterns (if any). This is the behaviour of the current version of the ODK. If that option is set to false, then the pre-seed is not extracted, and terms from DOSDP patterns are not added to the main seed as well. This means that the main seed can now only contain the custom seed optionally set by the 'use_custom_import_module'. This, in turn, means that the only terms that will effectively be imported will be those actively selected by the user (either in the custom import module, or in the individual *_terms.txt files). If we do not scan the signature and there is no custom import module, then the tmp/seed.txt file would effectively be empty, so in that case we do not even generate the corresponding rule in the Makefile. closes #638

When refactoring the `$(IMPORTDIR)/merged_import.owl` rule, an error was introduced: the `remove` command that removes annotation properties only took as input (for the list of annotation properties to preserve) the properties listed in ANNOTATION_PROPERTIES. It should also take as input all the seeds, so that any annotation property listed here is preserved as well (this is the original behaviour).

When creating the merged import module, there is no need to preserve the original ontology annotations. They are actually misleading since they are only the annotations of the _first_ mirror to have been merged. (Prior to the refactoring, those annotations were removed by the "postprocess" SPARQL script.) Conversely, when creating a generic SLME import module, during the extraction step we do want to preserve annotations -- otherwise the dc:source annotation we have explicitly added at the beginning of the command pipeline would be lost.

matentzn

I went through the code line by line and LGTM; I made some mostly irrelevant comments, and currently running some tests. This may take a bit, but just wanted to ensure you I am working in it :)

template/src/ontology/Makefile.jinja2

gouttegd · 2025-03-13T10:04:48Z

currently running some tests

FYI I tested it on all FlyBase ontologies (FBdv, FBbt, DPO, FBcv) without issues. But of course they are all using a merged import module, so this did not test all the code paths.

gouttegd · 2025-03-22T15:08:58Z

@matentzn Are you still testing this?

matentzn · 2025-03-23T09:59:53Z

Sorry, I was buried in a backlog, now I am running some tests.

matentzn

I tested with three ontologies I care about (HPO, RO and CHR) and it the behaviour looks fine to me. Awesome! THANK you for this!

gouttegd added 10 commits March 3, 2025 20:59

Re-formatting.

667c4c0

Reformat some import rules to make them more readable. Apply fine-grained control of Jinja2 whitespaces to avoid generating spurious blank lines in the Makefile.

gouttegd self-assigned this Mar 4, 2025

gouttegd added 2 commits March 4, 2025 19:42

Fix typo in command.

9c20718

gouttegd requested a review from matentzn March 4, 2025 23:23

gouttegd added 2 commits March 5, 2025 12:54

Merge branch 'master' into rework-imports

0ec3bb6

matentzn reviewed Mar 13, 2025

View reviewed changes

template/src/ontology/Makefile.jinja2 Show resolved Hide resolved

template/src/ontology/Makefile.jinja2 Show resolved Hide resolved

template/src/ontology/Makefile.jinja2 Show resolved Hide resolved

Merge branch 'master' into rework-imports

67219f0

matentzn approved these changes Mar 25, 2025

View reviewed changes

gouttegd merged commit ab99581 into master Mar 25, 2025
1 check passed

gouttegd deleted the rework-imports branch March 25, 2025 12:43

gouttegd mentioned this pull request Mar 26, 2025

Should $(IMPORTDIR)/%_terms_combined.txt be .PRECIOUS? #1056

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simply the "Imports" section in the Makefile template #1194

Simply the "Imports" section in the Makefile template #1194

gouttegd commented Mar 4, 2025

matentzn left a comment

gouttegd commented Mar 13, 2025

gouttegd commented Mar 22, 2025

matentzn commented Mar 23, 2025

matentzn left a comment

Simply the "Imports" section in the Makefile template #1194

Simply the "Imports" section in the Makefile template #1194

Conversation

gouttegd commented Mar 4, 2025

matentzn left a comment

Choose a reason for hiding this comment

gouttegd commented Mar 13, 2025

gouttegd commented Mar 22, 2025

matentzn commented Mar 23, 2025

matentzn left a comment

Choose a reason for hiding this comment