-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
concise phrase transform #811
Conversation
@@ -186,6 +186,27 @@ void redactRegexMatches(String source) { | |||
} | |||
|
|||
|
|||
@SneakyThrows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test by copilot?? typo PHASE instead of PHRASE everywhere :)
"Prep Customer Meeting,Prep ", | ||
"Prep: Customer,Prep: ", | ||
"Out of the Office: Vacation,Out of the Office", | ||
"Focus Time,'Focus Time,Focus'", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, this is reality of this; bc transform products ALL matches, not just first. In effect, it's more like categorization than redaction.
behavior is needed if you want "Team weekly" to match both "team" and "weekly" cases. that will end up as team,weekly
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a minor typo
return patterns.stream() | ||
.map(p -> p.matcher((String) s)) | ||
.filter(Matcher::matches) | ||
.map(m -> m.group(1)) //group 1, bc we created caputuring group in regex above |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.map(m -> m.group(1)) //group 1, bc we created caputuring group in regex above | |
.map(m -> m.group(1)) //group 1, bc we created capturing group in regex above |
"Focus Time Block,'Focus Time,Focus'", | ||
"Focus: Secret Project,Focus", | ||
"No Meeting Wednesday,No Meeting", | ||
" No Meetings,'No Meetings,No Meetings'", // q: why???? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because matches with No Meeting
and No Meetings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 it shouldn't though ... No Meeting
token string shouldn't match No Meetings
, bc I'm transforming it into a pattern with \b
tokens on either side to only match "word boundaries. So it shouldn't match prefixes.
|
||
List<Pattern> patterns = transform.getAllowedPhrases().stream() | ||
.map(p -> "\\Q" + p + "\\E") // quote it | ||
.map(p -> "\\b" + p + "[\\\\s:]*\\b") //boundary match, with optional whitespace or colon at end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jlorper this is where should be limiting to word boundaries.
maybe bug is that "[\\\\s:]*"
is wrong, so it's matching character 's' specifically, instead of \s
for whitespace as I intended ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, it was that - fixed now and works as expected.
* wip of concise phrase transform * fix import * cleanup, fix tests * cleaner rules and examples * fix tests * fix tests * drop case-insensitive multi-pattern stuff * fix pattern to allow spaces rather than s char; and tighten capture pattern * fix tests
Features
Change implications