[INFRA] Using label string as in contemporary schema #1162

TheChymera · 2022-07-25T17:12:27Z

Historical left-over, pointed out by @yarikoptic

codecov · 2022-07-25T17:17:40Z

Codecov Report

Merging #1162 (baa2a89) into master (6e2a0b0) will increase coverage by 0.39%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #1162      +/-   ##
==========================================
+ Coverage   91.20%   91.60%   +0.39%     
==========================================
  Files          10       10              
  Lines        1046     1060      +14     
==========================================
+ Hits          954      971      +17     
+ Misses         92       89       -3

Impacted Files	Coverage Δ
...schemacode/bidsschematools/tests/test_validator.py	`100.00% <100.00%> (ø)`
tools/schemacode/bidsschematools/validator.py	`92.60% <100.00%> (+0.96%)`	⬆️
tools/schemacode/bidsschematools/tests/conftest.py	`91.66% <0.00%> (+0.49%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a3e5df2...baa2a89. Read the comment docs.

yarikoptic · 2022-07-25T17:20:48Z

oh cool. May be while at it, since I hate duplication, you could just place that regex into a variable to avoid need to adjust so many lines whenever regex needs tobe fixed... something along the lines of the

the following diff which I have started to prepare before I saw this PR

commit d53571864417f47e8ba3a4d5ca0328a0662b49eb
Author: Yaroslav Halchenko <debian@onerussian.com>
Date:   Mon Jul 25 12:58:19 2022 -0400

    RF: centralize definition for the <label> regex within test_validator
    
    Note: I have aimed to avoid any possible functionality change. Those
    would be done in follow up PRs.
    
    Overal rationale: avoid heavy duplication requiring changing ALL those lines
    happen such regex needs to be adjusted.
    
    Remaining questions: why it is groupped (i.e. within ()) and why * instead of +
    as it should be

diff --git a/tools/schemacode/bidsschematools/tests/test_validator.py b/tools/schemacode/bidsschematools/tests/test_validator.py
index 179a027f..41f0e95e 100644
--- a/tools/schemacode/bidsschematools/tests/test_validator.py
+++ b/tools/schemacode/bidsschematools/tests/test_validator.py
@@ -4,6 +4,11 @@ import shutil
 import pytest
 
 
+LABEL_REGEX_CHARS = "[a-zA-Z0-9]"
+# S - "star", and G - group
+LABEL_REGEX_SG = f"({LABEL_REGEX_CHARS}*?)"
+
+
 def test__add_entity():
     from bidsschematools.validator import _add_entity
 
@@ -11,7 +16,7 @@ def test__add_entity():
     regex_entities = ""
     entity = "subject"
     entity_shorthand = "sub"
-    variable_field = "([a-zA-Z0-9]*?)"
+    variable_field = LABEL_REGEX_SG
     requirement_level = "required"
 
     _regex_entities = _add_entity(
@@ -27,13 +32,12 @@ def test__add_entity():
     # Test append input and optional entity
     regex_entities = (
         "sub-(?P=subject)(|_ses-(?P=session))"
-        "(|_task-(?P<task>([a-zA-Z0-9]*?)))(|_trc-(?P<tracer>([a-zA-Z0-9]*?)))"
-        "(|_rec-(?P<reconstruction>([a-zA-Z0-9]*?)))"
-        "(|_run-(?P<run>([a-zA-Z0-9]*?)))"
+        f"(|_task-(?P<task>{LABEL_REGEX_SG}))(|_trc-(?P<tracer>{LABEL_REGEX_SG}))"
+        f"(|_rec-(?P<reconstruction>{LABEL_REGEX_SG}))"
+        f"(|_run-(?P<run>{LABEL_REGEX_SG}))"
     )
     entity = "recording"
     entity_shorthand = "recording"
-    variable_field = "([a-zA-Z0-9]*?)"
     requirement_level = "optional"
 
     _regex_entities = _add_entity(
@@ -46,10 +50,10 @@ def test__add_entity():
 
     assert (
         _regex_entities == "sub-(?P=subject)(|_ses-(?P=session))"
-        "(|_task-(?P<task>([a-zA-Z0-9]*?)))(|_trc-(?P<tracer>([a-zA-Z0-9]*?)))"
-        "(|_rec-(?P<reconstruction>([a-zA-Z0-9]*?)))"
-        "(|_run-(?P<run>([a-zA-Z0-9]*?)))"
-        "(|_recording-(?P<recording>([a-zA-Z0-9]*?)))"
+        f"(|_task-(?P<task>{LABEL_REGEX_SG}))(|_trc-(?P<tracer>{LABEL_REGEX_SG}))"
+        f"(|_rec-(?P<reconstruction>{LABEL_REGEX_SG}))"
+        f"(|_run-(?P<run>{LABEL_REGEX_SG}))"
+        f"(|_recording-(?P<recording>{LABEL_REGEX_SG}))"
     )
 
 
@@ -59,8 +63,8 @@ def test__add_extensions():
     # Test single extension
     regex_string = (
         "sub-(?P=subject)(|_ses-(?P=session))"
-        "_sample-(?P<sample>([a-zA-Z0-9]*?))"
-        "(|_acq-(?P<acquisition>([a-zA-Z0-9]*?)))_photo"
+        f"_sample-(?P<sample>{LABEL_REGEX_SG})"
+        f"(|_acq-(?P<acquisition>{LABEL_REGEX_SG}))_photo"
     )
     variant = {
         "suffixes": ["photo"],
@@ -76,15 +80,15 @@ def test__add_extensions():
 
     assert (
         _regex_string == "sub-(?P=subject)(|_ses-(?P=session))"
-        "_sample-(?P<sample>([a-zA-Z0-9]*?))"
-        "(|_acq-(?P<acquisition>([a-zA-Z0-9]*?)))_photo\\.jpg"
+        f"_sample-(?P<sample>{LABEL_REGEX_SG})"
+        f"(|_acq-(?P<acquisition>{LABEL_REGEX_SG}))_photo\\.jpg"
     )
 
     # Test multiple extensions
     regex_string = (
         "sub-(?P=subject)(|_ses-(?P=session))"
-        "_sample-(?P<sample>([a-zA-Z0-9]*?))"
-        "(|_acq-(?P<acquisition>([a-zA-Z0-9]*?)))_photo"
+        f"_sample-(?P<sample>{LABEL_REGEX_SG})"
+        f"(|_acq-(?P<acquisition>{LABEL_REGEX_SG}))_photo"
     )
     variant = {
         "suffixes": ["photo"],
@@ -100,8 +104,8 @@ def test__add_extensions():
 
     assert (
         _regex_string == "sub-(?P=subject)(|_ses-(?P=session))"
-        "_sample-(?P<sample>([a-zA-Z0-9]*?))"
-        "(|_acq-(?P<acquisition>([a-zA-Z0-9]*?)))"
+        f"_sample-(?P<sample>{LABEL_REGEX_SG})"
+        f"(|_acq-(?P<acquisition>{LABEL_REGEX_SG}))"
         "_photo(\\.jpg|\\.png|\\.tif)"
     )
 
@@ -186,10 +190,10 @@ def test__add_suffixes():
     # Test multiple expansions
     regex_entities = (
         "sub-(?P=subject)(|_ses-(?P=session))"
-        "(|_acq-(?P<acquisition>([a-zA-Z0-9]*?)))"
-        "(|_rec-(?P<reconstruction>([a-zA-Z0-9]*?)))"
-        "(|_dir-(?P<direction>([a-zA-Z0-9]*?)))(|_run-(?P<run>([a-zA-Z0-9]*?)))"
-        "(|_recording-(?P<recording>([a-zA-Z0-9]*?)))"
+        f"(|_acq-(?P<acquisition>{LABEL_REGEX_SG}))"
+        f"(|_rec-(?P<reconstruction>{LABEL_REGEX_SG}))"
+        f"(|_dir-(?P<direction>{LABEL_REGEX_SG}))(|_run-(?P<run>{LABEL_REGEX_SG}))"
+        f"(|_recording-(?P<recording>{LABEL_REGEX_SG}))"
     )
     variant = {
         "suffixes": [
@@ -212,10 +216,10 @@ def test__add_suffixes():
     }
     regex_string = (
         "sub-(?P=subject)(|_ses-(?P=session))"
-        "(|_acq-(?P<acquisition>([a-zA-Z0-9]*?)))"
-        "(|_rec-(?P<reconstruction>([a-zA-Z0-9]*?)))"
-        "(|_dir-(?P<direction>([a-zA-Z0-9]*?)))(|_run-(?P<run>([a-zA-Z0-9]*?)))"
-        "(|_recording-(?P<recording>([a-zA-Z0-9]*?)))"
+        f"(|_acq-(?P<acquisition>{LABEL_REGEX_SG}))"
+        f"(|_rec-(?P<reconstruction>{LABEL_REGEX_SG}))"
+        f"(|_dir-(?P<direction>{LABEL_REGEX_SG}))(|_run-(?P<run>{LABEL_REGEX_SG}))"
+        f"(|_recording-(?P<recording>{LABEL_REGEX_SG}))"
         "_(physio|stim)"
     )
 
@@ -247,12 +251,12 @@ def test_write_report(tmp_path):
 
     validation_result["schema_tracking"] = [
         {
-            "regex": ".*?/sub-(?P<subject>([a-zA-Z0-9]*?))/"
-            "(|ses-(?P<session>([a-zA-Z0-9]*?))/)anat/sub-(?P=subject)"
-            "(|_ses-(?P=session))(|_acq-(?P<acquisition>([a-zA-Z0-9]*?)))"
-            "(|_ce-(?P<ceagent>([a-zA-Z0-9]*?)))"
-            "(|_rec-(?P<reconstruction>([a-zA-Z0-9]*?)))"
-            "(|_run-(?P<run>([a-zA-Z0-9]*?)))"
+            "regex": f".*?/sub-(?P<subject>{LABEL_REGEX_SG})/"
+            f"(|ses-(?P<session>{LABEL_REGEX_SG})/)anat/sub-(?P=subject)"
+            f"(|_ses-(?P=session))(|_acq-(?P<acquisition>{LABEL_REGEX_SG}))"
+            f"(|_ce-(?P<ceagent>{LABEL_REGEX_SG}))"
+            f"(|_rec-(?P<reconstruction>{LABEL_REGEX_SG}))"
+            f"(|_run-(?P<run>{LABEL_REGEX_SG}))"
             "(|_part-(?P<part>(mag|phase|real|imag)))"
             "_(T1w|T2w|PDw|T2starw|FLAIR|inplaneT1|inplaneT2|PDT2|angio|T2star)"
             "\\.(nii.gz|nii|json)$",
@@ -261,12 +265,12 @@ def test_write_report(tmp_path):
     ]
     validation_result["schema_listing"] = [
         {
-            "regex": ".*?/sub-(?P<subject>([a-zA-Z0-9]*?))/"
-            "(|ses-(?P<session>([a-zA-Z0-9]*?))/)anat/sub-(?P=subject)"
-            "(|_ses-(?P=session))(|_acq-(?P<acquisition>([a-zA-Z0-9]*?)))"
-            "(|_ce-(?P<ceagent>([a-zA-Z0-9]*?)))"
-            "(|_rec-(?P<reconstruction>([a-zA-Z0-9]*?)))"
-            "(|_run-(?P<run>([a-zA-Z0-9]*?)))"
+            "regex": f".*?/sub-(?P<subject>{LABEL_REGEX_SG})/"
+            f"(|ses-(?P<session>{LABEL_REGEX_SG})/)anat/sub-(?P=subject)"
+            f"(|_ses-(?P=session))(|_acq-(?P<acquisition>{LABEL_REGEX_SG}))"
+            f"(|_ce-(?P<ceagent>{LABEL_REGEX_SG}))"
+            f"(|_rec-(?P<reconstruction>{LABEL_REGEX_SG}))"
+            f"(|_run-(?P<run>{LABEL_REGEX_SG}))"
             "(|_part-(?P<part>(mag|phase|real|imag)))"
             "_(T1w|T2w|PDw|T2starw|FLAIR|inplaneT1|inplaneT2|PDT2|angio|T2star)"
             "\\.(nii.gz|nii|json)$",

note that I am still not sure why/if that () group around regex is needed, is it really needed?

TheChymera · 2022-07-25T17:50:53Z

Not sure this would help much, just obfuscate the actual strings. There is also an example report file (see the commit) which has everything verbatim. Updating it is easy, it's just substitution, I just want to avoid the situation where we end up setting variables for everything and the actual strings become illegible. In as far as it's hard-coded it's good to have all the strings explicit bc they're easier to inspect.

yarikoptic · 2022-07-26T14:49:13Z

@TheChymera : well, we can discuss that aspect of variable-vs-explicit more later ... for the purpose of this PR I still wonder:

why/if that () group around regex is needed, is it really needed?

tsalo · 2022-07-26T16:09:34Z

Can't the regexes be loaded from the schema?

bids-specification/src/schema/objects/formats.yaml

Lines 4 to 14 in 2c12e58

    
           index: 
        
             display_name: Index 
        
             description: | 
        
               Non-negative, non-zero integers, optionally prefixed with leading zeros for sortability. 
        
               An index may not be all zeros. 
        
             pattern: '[0-9]*[1-9]+[0-9]*' 
        
           label: 
        
             display_name: Label 
        
             description: | 
        
               Freeform labels without special characters. 
        
             pattern: '[0-9a-zA-Z]+'

TheChymera · 2022-07-26T17:14:13Z

@tsalo they can, but that would clump too many tests together (for validation and schemacode). These should really just be unit tests, where the strings could be anything. What's actually tested here is how they are being put together (the order and the separators). The strings could just as well be "A", "B", and "C".

@yarikoptic I think you're right, they might not be needed.

yarikoptic · 2022-07-26T17:55:34Z

@tsalo @TheChymera I think we should have a unittest (someone could call it "integration" test) which ensures that regex in the tests for the functionality matches the one in the schema indeed. Probably in a separate PR

yarikoptic

great to see more consistency

tsalo

LGTM

Using label string as in contemporary schema

b0dca7f

TheChymera marked this pull request as ready for review July 25, 2022 17:51

TheChymera requested review from tsalo and erdalkaraca as code owners July 25, 2022 17:51

yarikoptic mentioned this pull request Jul 26, 2022

[FIX] clarify alphanumeric #1164

Merged

Dropped unneeded capture groups

f57d234

syntax fixes

baa2a89

TheChymera requested a review from yarikoptic July 26, 2022 17:46

yarikoptic approved these changes Jul 26, 2022

View reviewed changes

tsalo approved these changes Jul 26, 2022

View reviewed changes

TheChymera merged commit f20986b into bids-standard:master Jul 26, 2022

TheChymera deleted the test_fixes branch July 26, 2022 19:29

bids-maintenance added a commit that referenced this pull request Jul 26, 2022

[DOC] Auto-generate changelog entry for PR #1162

c80582e

effigies added the exclude-from-changelog This item will not feature in the automatically generated changelog label Jul 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[INFRA] Using label string as in contemporary schema #1162

[INFRA] Using label string as in contemporary schema #1162

TheChymera commented Jul 25, 2022

codecov bot commented Jul 25, 2022 •

edited

Loading

yarikoptic commented Jul 25, 2022

TheChymera commented Jul 25, 2022

yarikoptic commented Jul 26, 2022

tsalo commented Jul 26, 2022

TheChymera commented Jul 26, 2022 •

edited

Loading

yarikoptic commented Jul 26, 2022

yarikoptic left a comment

tsalo left a comment

[INFRA] Using label string as in contemporary schema #1162

[INFRA] Using label string as in contemporary schema #1162

Conversation

TheChymera commented Jul 25, 2022

codecov bot commented Jul 25, 2022 • edited Loading

Codecov Report

yarikoptic commented Jul 25, 2022

TheChymera commented Jul 25, 2022

yarikoptic commented Jul 26, 2022

tsalo commented Jul 26, 2022

TheChymera commented Jul 26, 2022 • edited Loading

yarikoptic commented Jul 26, 2022

yarikoptic left a comment

Choose a reason for hiding this comment

tsalo left a comment

Choose a reason for hiding this comment

codecov bot commented Jul 25, 2022 •

edited

Loading

TheChymera commented Jul 26, 2022 •

edited

Loading