-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[INFRA] Using label string as in contemporary schema #1162
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1162 +/- ##
==========================================
+ Coverage 91.20% 91.60% +0.39%
==========================================
Files 10 10
Lines 1046 1060 +14
==========================================
+ Hits 954 971 +17
+ Misses 92 89 -3
Continue to review full report at Codecov.
|
oh cool. May be while at it, since I hate duplication, you could just place that regex into a variable to avoid need to adjust so many lines whenever regex needs tobe fixed... something along the lines of the the following diff which I have started to prepare before I saw this PRcommit d53571864417f47e8ba3a4d5ca0328a0662b49eb
Author: Yaroslav Halchenko <debian@onerussian.com>
Date: Mon Jul 25 12:58:19 2022 -0400
RF: centralize definition for the <label> regex within test_validator
Note: I have aimed to avoid any possible functionality change. Those
would be done in follow up PRs.
Overal rationale: avoid heavy duplication requiring changing ALL those lines
happen such regex needs to be adjusted.
Remaining questions: why it is groupped (i.e. within ()) and why * instead of +
as it should be
diff --git a/tools/schemacode/bidsschematools/tests/test_validator.py b/tools/schemacode/bidsschematools/tests/test_validator.py
index 179a027f..41f0e95e 100644
--- a/tools/schemacode/bidsschematools/tests/test_validator.py
+++ b/tools/schemacode/bidsschematools/tests/test_validator.py
@@ -4,6 +4,11 @@ import shutil
import pytest
+LABEL_REGEX_CHARS = "[a-zA-Z0-9]"
+# S - "star", and G - group
+LABEL_REGEX_SG = f"({LABEL_REGEX_CHARS}*?)"
+
+
def test__add_entity():
from bidsschematools.validator import _add_entity
@@ -11,7 +16,7 @@ def test__add_entity():
regex_entities = ""
entity = "subject"
entity_shorthand = "sub"
- variable_field = "([a-zA-Z0-9]*?)"
+ variable_field = LABEL_REGEX_SG
requirement_level = "required"
_regex_entities = _add_entity(
@@ -27,13 +32,12 @@ def test__add_entity():
# Test append input and optional entity
regex_entities = (
"sub-(?P=subject)(|_ses-(?P=session))"
- "(|_task-(?P<task>([a-zA-Z0-9]*?)))(|_trc-(?P<tracer>([a-zA-Z0-9]*?)))"
- "(|_rec-(?P<reconstruction>([a-zA-Z0-9]*?)))"
- "(|_run-(?P<run>([a-zA-Z0-9]*?)))"
+ f"(|_task-(?P<task>{LABEL_REGEX_SG}))(|_trc-(?P<tracer>{LABEL_REGEX_SG}))"
+ f"(|_rec-(?P<reconstruction>{LABEL_REGEX_SG}))"
+ f"(|_run-(?P<run>{LABEL_REGEX_SG}))"
)
entity = "recording"
entity_shorthand = "recording"
- variable_field = "([a-zA-Z0-9]*?)"
requirement_level = "optional"
_regex_entities = _add_entity(
@@ -46,10 +50,10 @@ def test__add_entity():
assert (
_regex_entities == "sub-(?P=subject)(|_ses-(?P=session))"
- "(|_task-(?P<task>([a-zA-Z0-9]*?)))(|_trc-(?P<tracer>([a-zA-Z0-9]*?)))"
- "(|_rec-(?P<reconstruction>([a-zA-Z0-9]*?)))"
- "(|_run-(?P<run>([a-zA-Z0-9]*?)))"
- "(|_recording-(?P<recording>([a-zA-Z0-9]*?)))"
+ f"(|_task-(?P<task>{LABEL_REGEX_SG}))(|_trc-(?P<tracer>{LABEL_REGEX_SG}))"
+ f"(|_rec-(?P<reconstruction>{LABEL_REGEX_SG}))"
+ f"(|_run-(?P<run>{LABEL_REGEX_SG}))"
+ f"(|_recording-(?P<recording>{LABEL_REGEX_SG}))"
)
@@ -59,8 +63,8 @@ def test__add_extensions():
# Test single extension
regex_string = (
"sub-(?P=subject)(|_ses-(?P=session))"
- "_sample-(?P<sample>([a-zA-Z0-9]*?))"
- "(|_acq-(?P<acquisition>([a-zA-Z0-9]*?)))_photo"
+ f"_sample-(?P<sample>{LABEL_REGEX_SG})"
+ f"(|_acq-(?P<acquisition>{LABEL_REGEX_SG}))_photo"
)
variant = {
"suffixes": ["photo"],
@@ -76,15 +80,15 @@ def test__add_extensions():
assert (
_regex_string == "sub-(?P=subject)(|_ses-(?P=session))"
- "_sample-(?P<sample>([a-zA-Z0-9]*?))"
- "(|_acq-(?P<acquisition>([a-zA-Z0-9]*?)))_photo\\.jpg"
+ f"_sample-(?P<sample>{LABEL_REGEX_SG})"
+ f"(|_acq-(?P<acquisition>{LABEL_REGEX_SG}))_photo\\.jpg"
)
# Test multiple extensions
regex_string = (
"sub-(?P=subject)(|_ses-(?P=session))"
- "_sample-(?P<sample>([a-zA-Z0-9]*?))"
- "(|_acq-(?P<acquisition>([a-zA-Z0-9]*?)))_photo"
+ f"_sample-(?P<sample>{LABEL_REGEX_SG})"
+ f"(|_acq-(?P<acquisition>{LABEL_REGEX_SG}))_photo"
)
variant = {
"suffixes": ["photo"],
@@ -100,8 +104,8 @@ def test__add_extensions():
assert (
_regex_string == "sub-(?P=subject)(|_ses-(?P=session))"
- "_sample-(?P<sample>([a-zA-Z0-9]*?))"
- "(|_acq-(?P<acquisition>([a-zA-Z0-9]*?)))"
+ f"_sample-(?P<sample>{LABEL_REGEX_SG})"
+ f"(|_acq-(?P<acquisition>{LABEL_REGEX_SG}))"
"_photo(\\.jpg|\\.png|\\.tif)"
)
@@ -186,10 +190,10 @@ def test__add_suffixes():
# Test multiple expansions
regex_entities = (
"sub-(?P=subject)(|_ses-(?P=session))"
- "(|_acq-(?P<acquisition>([a-zA-Z0-9]*?)))"
- "(|_rec-(?P<reconstruction>([a-zA-Z0-9]*?)))"
- "(|_dir-(?P<direction>([a-zA-Z0-9]*?)))(|_run-(?P<run>([a-zA-Z0-9]*?)))"
- "(|_recording-(?P<recording>([a-zA-Z0-9]*?)))"
+ f"(|_acq-(?P<acquisition>{LABEL_REGEX_SG}))"
+ f"(|_rec-(?P<reconstruction>{LABEL_REGEX_SG}))"
+ f"(|_dir-(?P<direction>{LABEL_REGEX_SG}))(|_run-(?P<run>{LABEL_REGEX_SG}))"
+ f"(|_recording-(?P<recording>{LABEL_REGEX_SG}))"
)
variant = {
"suffixes": [
@@ -212,10 +216,10 @@ def test__add_suffixes():
}
regex_string = (
"sub-(?P=subject)(|_ses-(?P=session))"
- "(|_acq-(?P<acquisition>([a-zA-Z0-9]*?)))"
- "(|_rec-(?P<reconstruction>([a-zA-Z0-9]*?)))"
- "(|_dir-(?P<direction>([a-zA-Z0-9]*?)))(|_run-(?P<run>([a-zA-Z0-9]*?)))"
- "(|_recording-(?P<recording>([a-zA-Z0-9]*?)))"
+ f"(|_acq-(?P<acquisition>{LABEL_REGEX_SG}))"
+ f"(|_rec-(?P<reconstruction>{LABEL_REGEX_SG}))"
+ f"(|_dir-(?P<direction>{LABEL_REGEX_SG}))(|_run-(?P<run>{LABEL_REGEX_SG}))"
+ f"(|_recording-(?P<recording>{LABEL_REGEX_SG}))"
"_(physio|stim)"
)
@@ -247,12 +251,12 @@ def test_write_report(tmp_path):
validation_result["schema_tracking"] = [
{
- "regex": ".*?/sub-(?P<subject>([a-zA-Z0-9]*?))/"
- "(|ses-(?P<session>([a-zA-Z0-9]*?))/)anat/sub-(?P=subject)"
- "(|_ses-(?P=session))(|_acq-(?P<acquisition>([a-zA-Z0-9]*?)))"
- "(|_ce-(?P<ceagent>([a-zA-Z0-9]*?)))"
- "(|_rec-(?P<reconstruction>([a-zA-Z0-9]*?)))"
- "(|_run-(?P<run>([a-zA-Z0-9]*?)))"
+ "regex": f".*?/sub-(?P<subject>{LABEL_REGEX_SG})/"
+ f"(|ses-(?P<session>{LABEL_REGEX_SG})/)anat/sub-(?P=subject)"
+ f"(|_ses-(?P=session))(|_acq-(?P<acquisition>{LABEL_REGEX_SG}))"
+ f"(|_ce-(?P<ceagent>{LABEL_REGEX_SG}))"
+ f"(|_rec-(?P<reconstruction>{LABEL_REGEX_SG}))"
+ f"(|_run-(?P<run>{LABEL_REGEX_SG}))"
"(|_part-(?P<part>(mag|phase|real|imag)))"
"_(T1w|T2w|PDw|T2starw|FLAIR|inplaneT1|inplaneT2|PDT2|angio|T2star)"
"\\.(nii.gz|nii|json)$",
@@ -261,12 +265,12 @@ def test_write_report(tmp_path):
]
validation_result["schema_listing"] = [
{
- "regex": ".*?/sub-(?P<subject>([a-zA-Z0-9]*?))/"
- "(|ses-(?P<session>([a-zA-Z0-9]*?))/)anat/sub-(?P=subject)"
- "(|_ses-(?P=session))(|_acq-(?P<acquisition>([a-zA-Z0-9]*?)))"
- "(|_ce-(?P<ceagent>([a-zA-Z0-9]*?)))"
- "(|_rec-(?P<reconstruction>([a-zA-Z0-9]*?)))"
- "(|_run-(?P<run>([a-zA-Z0-9]*?)))"
+ "regex": f".*?/sub-(?P<subject>{LABEL_REGEX_SG})/"
+ f"(|ses-(?P<session>{LABEL_REGEX_SG})/)anat/sub-(?P=subject)"
+ f"(|_ses-(?P=session))(|_acq-(?P<acquisition>{LABEL_REGEX_SG}))"
+ f"(|_ce-(?P<ceagent>{LABEL_REGEX_SG}))"
+ f"(|_rec-(?P<reconstruction>{LABEL_REGEX_SG}))"
+ f"(|_run-(?P<run>{LABEL_REGEX_SG}))"
"(|_part-(?P<part>(mag|phase|real|imag)))"
"_(T1w|T2w|PDw|T2starw|FLAIR|inplaneT1|inplaneT2|PDT2|angio|T2star)"
"\\.(nii.gz|nii|json)$",
note that I am still not sure why/if that |
Not sure this would help much, just obfuscate the actual strings. There is also an example report file (see the commit) which has everything verbatim. Updating it is easy, it's just substitution, I just want to avoid the situation where we end up setting variables for everything and the actual strings become illegible. In as far as it's hard-coded it's good to have all the strings explicit bc they're easier to inspect. |
@TheChymera : well, we can discuss that aspect of variable-vs-explicit more later ... for the purpose of this PR I still wonder:
|
Can't the regexes be loaded from the schema? bids-specification/src/schema/objects/formats.yaml Lines 4 to 14 in 2c12e58
|
@tsalo they can, but that would clump too many tests together (for validation and schemacode). These should really just be unit tests, where the strings could be anything. What's actually tested here is how they are being put together (the order and the separators). The strings could just as well be @yarikoptic I think you're right, they might not be needed. |
@tsalo @TheChymera I think we should have a unittest (someone could call it "integration" test) which ensures that regex in the tests for the functionality matches the one in the schema indeed. Probably in a separate PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great to see more consistency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Historical left-over, pointed out by @yarikoptic