Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[INFRA] Using label string as in contemporary schema #1162

Merged
merged 3 commits into from
Jul 26, 2022

Conversation

TheChymera
Copy link
Collaborator

Historical left-over, pointed out by @yarikoptic

@codecov
Copy link

codecov bot commented Jul 25, 2022

Codecov Report

Merging #1162 (baa2a89) into master (6e2a0b0) will increase coverage by 0.39%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #1162      +/-   ##
==========================================
+ Coverage   91.20%   91.60%   +0.39%     
==========================================
  Files          10       10              
  Lines        1046     1060      +14     
==========================================
+ Hits          954      971      +17     
+ Misses         92       89       -3     
Impacted Files Coverage Δ
...schemacode/bidsschematools/tests/test_validator.py 100.00% <100.00%> (ø)
tools/schemacode/bidsschematools/validator.py 92.60% <100.00%> (+0.96%) ⬆️
tools/schemacode/bidsschematools/tests/conftest.py 91.66% <0.00%> (+0.49%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a3e5df2...baa2a89. Read the comment docs.

@yarikoptic
Copy link
Collaborator

oh cool. May be while at it, since I hate duplication, you could just place that regex into a variable to avoid need to adjust so many lines whenever regex needs tobe fixed... something along the lines of the

the following diff which I have started to prepare before I saw this PR
commit d53571864417f47e8ba3a4d5ca0328a0662b49eb
Author: Yaroslav Halchenko <debian@onerussian.com>
Date:   Mon Jul 25 12:58:19 2022 -0400

    RF: centralize definition for the <label> regex within test_validator
    
    Note: I have aimed to avoid any possible functionality change. Those
    would be done in follow up PRs.
    
    Overal rationale: avoid heavy duplication requiring changing ALL those lines
    happen such regex needs to be adjusted.
    
    Remaining questions: why it is groupped (i.e. within ()) and why * instead of +
    as it should be

diff --git a/tools/schemacode/bidsschematools/tests/test_validator.py b/tools/schemacode/bidsschematools/tests/test_validator.py
index 179a027f..41f0e95e 100644
--- a/tools/schemacode/bidsschematools/tests/test_validator.py
+++ b/tools/schemacode/bidsschematools/tests/test_validator.py
@@ -4,6 +4,11 @@ import shutil
 import pytest
 
 
+LABEL_REGEX_CHARS = "[a-zA-Z0-9]"
+# S - "star", and G - group
+LABEL_REGEX_SG = f"({LABEL_REGEX_CHARS}*?)"
+
+
 def test__add_entity():
     from bidsschematools.validator import _add_entity
 
@@ -11,7 +16,7 @@ def test__add_entity():
     regex_entities = ""
     entity = "subject"
     entity_shorthand = "sub"
-    variable_field = "([a-zA-Z0-9]*?)"
+    variable_field = LABEL_REGEX_SG
     requirement_level = "required"
 
     _regex_entities = _add_entity(
@@ -27,13 +32,12 @@ def test__add_entity():
     # Test append input and optional entity
     regex_entities = (
         "sub-(?P=subject)(|_ses-(?P=session))"
-        "(|_task-(?P<task>([a-zA-Z0-9]*?)))(|_trc-(?P<tracer>([a-zA-Z0-9]*?)))"
-        "(|_rec-(?P<reconstruction>([a-zA-Z0-9]*?)))"
-        "(|_run-(?P<run>([a-zA-Z0-9]*?)))"
+        f"(|_task-(?P<task>{LABEL_REGEX_SG}))(|_trc-(?P<tracer>{LABEL_REGEX_SG}))"
+        f"(|_rec-(?P<reconstruction>{LABEL_REGEX_SG}))"
+        f"(|_run-(?P<run>{LABEL_REGEX_SG}))"
     )
     entity = "recording"
     entity_shorthand = "recording"
-    variable_field = "([a-zA-Z0-9]*?)"
     requirement_level = "optional"
 
     _regex_entities = _add_entity(
@@ -46,10 +50,10 @@ def test__add_entity():
 
     assert (
         _regex_entities == "sub-(?P=subject)(|_ses-(?P=session))"
-        "(|_task-(?P<task>([a-zA-Z0-9]*?)))(|_trc-(?P<tracer>([a-zA-Z0-9]*?)))"
-        "(|_rec-(?P<reconstruction>([a-zA-Z0-9]*?)))"
-        "(|_run-(?P<run>([a-zA-Z0-9]*?)))"
-        "(|_recording-(?P<recording>([a-zA-Z0-9]*?)))"
+        f"(|_task-(?P<task>{LABEL_REGEX_SG}))(|_trc-(?P<tracer>{LABEL_REGEX_SG}))"
+        f"(|_rec-(?P<reconstruction>{LABEL_REGEX_SG}))"
+        f"(|_run-(?P<run>{LABEL_REGEX_SG}))"
+        f"(|_recording-(?P<recording>{LABEL_REGEX_SG}))"
     )
 
 
@@ -59,8 +63,8 @@ def test__add_extensions():
     # Test single extension
     regex_string = (
         "sub-(?P=subject)(|_ses-(?P=session))"
-        "_sample-(?P<sample>([a-zA-Z0-9]*?))"
-        "(|_acq-(?P<acquisition>([a-zA-Z0-9]*?)))_photo"
+        f"_sample-(?P<sample>{LABEL_REGEX_SG})"
+        f"(|_acq-(?P<acquisition>{LABEL_REGEX_SG}))_photo"
     )
     variant = {
         "suffixes": ["photo"],
@@ -76,15 +80,15 @@ def test__add_extensions():
 
     assert (
         _regex_string == "sub-(?P=subject)(|_ses-(?P=session))"
-        "_sample-(?P<sample>([a-zA-Z0-9]*?))"
-        "(|_acq-(?P<acquisition>([a-zA-Z0-9]*?)))_photo\\.jpg"
+        f"_sample-(?P<sample>{LABEL_REGEX_SG})"
+        f"(|_acq-(?P<acquisition>{LABEL_REGEX_SG}))_photo\\.jpg"
     )
 
     # Test multiple extensions
     regex_string = (
         "sub-(?P=subject)(|_ses-(?P=session))"
-        "_sample-(?P<sample>([a-zA-Z0-9]*?))"
-        "(|_acq-(?P<acquisition>([a-zA-Z0-9]*?)))_photo"
+        f"_sample-(?P<sample>{LABEL_REGEX_SG})"
+        f"(|_acq-(?P<acquisition>{LABEL_REGEX_SG}))_photo"
     )
     variant = {
         "suffixes": ["photo"],
@@ -100,8 +104,8 @@ def test__add_extensions():
 
     assert (
         _regex_string == "sub-(?P=subject)(|_ses-(?P=session))"
-        "_sample-(?P<sample>([a-zA-Z0-9]*?))"
-        "(|_acq-(?P<acquisition>([a-zA-Z0-9]*?)))"
+        f"_sample-(?P<sample>{LABEL_REGEX_SG})"
+        f"(|_acq-(?P<acquisition>{LABEL_REGEX_SG}))"
         "_photo(\\.jpg|\\.png|\\.tif)"
     )
 
@@ -186,10 +190,10 @@ def test__add_suffixes():
     # Test multiple expansions
     regex_entities = (
         "sub-(?P=subject)(|_ses-(?P=session))"
-        "(|_acq-(?P<acquisition>([a-zA-Z0-9]*?)))"
-        "(|_rec-(?P<reconstruction>([a-zA-Z0-9]*?)))"
-        "(|_dir-(?P<direction>([a-zA-Z0-9]*?)))(|_run-(?P<run>([a-zA-Z0-9]*?)))"
-        "(|_recording-(?P<recording>([a-zA-Z0-9]*?)))"
+        f"(|_acq-(?P<acquisition>{LABEL_REGEX_SG}))"
+        f"(|_rec-(?P<reconstruction>{LABEL_REGEX_SG}))"
+        f"(|_dir-(?P<direction>{LABEL_REGEX_SG}))(|_run-(?P<run>{LABEL_REGEX_SG}))"
+        f"(|_recording-(?P<recording>{LABEL_REGEX_SG}))"
     )
     variant = {
         "suffixes": [
@@ -212,10 +216,10 @@ def test__add_suffixes():
     }
     regex_string = (
         "sub-(?P=subject)(|_ses-(?P=session))"
-        "(|_acq-(?P<acquisition>([a-zA-Z0-9]*?)))"
-        "(|_rec-(?P<reconstruction>([a-zA-Z0-9]*?)))"
-        "(|_dir-(?P<direction>([a-zA-Z0-9]*?)))(|_run-(?P<run>([a-zA-Z0-9]*?)))"
-        "(|_recording-(?P<recording>([a-zA-Z0-9]*?)))"
+        f"(|_acq-(?P<acquisition>{LABEL_REGEX_SG}))"
+        f"(|_rec-(?P<reconstruction>{LABEL_REGEX_SG}))"
+        f"(|_dir-(?P<direction>{LABEL_REGEX_SG}))(|_run-(?P<run>{LABEL_REGEX_SG}))"
+        f"(|_recording-(?P<recording>{LABEL_REGEX_SG}))"
         "_(physio|stim)"
     )
 
@@ -247,12 +251,12 @@ def test_write_report(tmp_path):
 
     validation_result["schema_tracking"] = [
         {
-            "regex": ".*?/sub-(?P<subject>([a-zA-Z0-9]*?))/"
-            "(|ses-(?P<session>([a-zA-Z0-9]*?))/)anat/sub-(?P=subject)"
-            "(|_ses-(?P=session))(|_acq-(?P<acquisition>([a-zA-Z0-9]*?)))"
-            "(|_ce-(?P<ceagent>([a-zA-Z0-9]*?)))"
-            "(|_rec-(?P<reconstruction>([a-zA-Z0-9]*?)))"
-            "(|_run-(?P<run>([a-zA-Z0-9]*?)))"
+            "regex": f".*?/sub-(?P<subject>{LABEL_REGEX_SG})/"
+            f"(|ses-(?P<session>{LABEL_REGEX_SG})/)anat/sub-(?P=subject)"
+            f"(|_ses-(?P=session))(|_acq-(?P<acquisition>{LABEL_REGEX_SG}))"
+            f"(|_ce-(?P<ceagent>{LABEL_REGEX_SG}))"
+            f"(|_rec-(?P<reconstruction>{LABEL_REGEX_SG}))"
+            f"(|_run-(?P<run>{LABEL_REGEX_SG}))"
             "(|_part-(?P<part>(mag|phase|real|imag)))"
             "_(T1w|T2w|PDw|T2starw|FLAIR|inplaneT1|inplaneT2|PDT2|angio|T2star)"
             "\\.(nii.gz|nii|json)$",
@@ -261,12 +265,12 @@ def test_write_report(tmp_path):
     ]
     validation_result["schema_listing"] = [
         {
-            "regex": ".*?/sub-(?P<subject>([a-zA-Z0-9]*?))/"
-            "(|ses-(?P<session>([a-zA-Z0-9]*?))/)anat/sub-(?P=subject)"
-            "(|_ses-(?P=session))(|_acq-(?P<acquisition>([a-zA-Z0-9]*?)))"
-            "(|_ce-(?P<ceagent>([a-zA-Z0-9]*?)))"
-            "(|_rec-(?P<reconstruction>([a-zA-Z0-9]*?)))"
-            "(|_run-(?P<run>([a-zA-Z0-9]*?)))"
+            "regex": f".*?/sub-(?P<subject>{LABEL_REGEX_SG})/"
+            f"(|ses-(?P<session>{LABEL_REGEX_SG})/)anat/sub-(?P=subject)"
+            f"(|_ses-(?P=session))(|_acq-(?P<acquisition>{LABEL_REGEX_SG}))"
+            f"(|_ce-(?P<ceagent>{LABEL_REGEX_SG}))"
+            f"(|_rec-(?P<reconstruction>{LABEL_REGEX_SG}))"
+            f"(|_run-(?P<run>{LABEL_REGEX_SG}))"
             "(|_part-(?P<part>(mag|phase|real|imag)))"
             "_(T1w|T2w|PDw|T2starw|FLAIR|inplaneT1|inplaneT2|PDT2|angio|T2star)"
             "\\.(nii.gz|nii|json)$",

note that I am still not sure why/if that () group around regex is needed, is it really needed?

@TheChymera
Copy link
Collaborator Author

Not sure this would help much, just obfuscate the actual strings. There is also an example report file (see the commit) which has everything verbatim. Updating it is easy, it's just substitution, I just want to avoid the situation where we end up setting variables for everything and the actual strings become illegible. In as far as it's hard-coded it's good to have all the strings explicit bc they're easier to inspect.

@TheChymera TheChymera marked this pull request as ready for review July 25, 2022 17:51
@yarikoptic
Copy link
Collaborator

@TheChymera : well, we can discuss that aspect of variable-vs-explicit more later ... for the purpose of this PR I still wonder:

why/if that () group around regex is needed, is it really needed?

@tsalo
Copy link
Member

tsalo commented Jul 26, 2022

Can't the regexes be loaded from the schema?

index:
display_name: Index
description: |
Non-negative, non-zero integers, optionally prefixed with leading zeros for sortability.
An index may not be all zeros.
pattern: '[0-9]*[1-9]+[0-9]*'
label:
display_name: Label
description: |
Freeform labels without special characters.
pattern: '[0-9a-zA-Z]+'

@TheChymera
Copy link
Collaborator Author

TheChymera commented Jul 26, 2022

@tsalo they can, but that would clump too many tests together (for validation and schemacode). These should really just be unit tests, where the strings could be anything. What's actually tested here is how they are being put together (the order and the separators). The strings could just as well be "A", "B", and "C".

@yarikoptic I think you're right, they might not be needed.

@TheChymera TheChymera requested a review from yarikoptic July 26, 2022 17:46
@yarikoptic
Copy link
Collaborator

@tsalo @TheChymera I think we should have a unittest (someone could call it "integration" test) which ensures that regex in the tests for the functionality matches the one in the schema indeed. Probably in a separate PR

Copy link
Collaborator

@yarikoptic yarikoptic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great to see more consistency

Copy link
Member

@tsalo tsalo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@TheChymera TheChymera merged commit f20986b into bids-standard:master Jul 26, 2022
@TheChymera TheChymera deleted the test_fixes branch July 26, 2022 19:29
@effigies effigies added the exclude-from-changelog This item will not feature in the automatically generated changelog label Jul 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
exclude-from-changelog This item will not feature in the automatically generated changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants