Skip to content

Commit

Permalink
Support for prefix/equals-ignore-case and suffix/equals-ignore-case (#…
Browse files Browse the repository at this point in the history
…121)

* Support for prefix/equals-ignore-case and suffix/equals-ignore-case

* Adding functionality for prefix/equals-ignore-case and suffix/equals-ignore-case
with the respective unit tests, benchmarks, and README updates.

* Run benchmarks using GitHub actions

* Explicitly running the benchmarks class after mvn verify is executed.
By default, the maven-surefire-plugin only runs tests with the "Test" which
ends up ignoring the Benchmarks test class.

* Running only CL2 benchmarks on Github actions

After adding the "Run benchmarks" step on Github actions, we started seeing
failures on Ubuntu for JDK8 due to GC overhead limits. This commit changes
the command to only run CL2 benchmarks.

* Updated docs for prefix/equals-ignore-case and suffix/equals-ignore-case

* Updated "Performance" section on README with benchmarks results
* Updated "THe Patterns API" section on README with the new match types
* Updated PR template with instructions on where to get the benchmarks

* Update PR template with instructions on benchmark

---------

Co-authored-by: Rogerio Yamaguti <yamagur@amazon.com>
  • Loading branch information
rogeriosy and Rogerio Yamaguti authored Oct 24, 2023
1 parent 13e552b commit 4cddf22
Show file tree
Hide file tree
Showing 22 changed files with 1,179 additions and 21 deletions.
4 changes: 3 additions & 1 deletion .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,9 @@
#### Benchmark / Performance (for source code changes):

```
<replace this with output from /src/test/software/amazon/event/ruler/Bechmarks.java here>
<replace this with output from /src/test/software/amazon/event/ruler/Bechmarks.java here.
The benchmark results can be fetched from "Pull request checks -> Java build -> build (ubuntu-X.Y, 8) -> Run benchmarks".>
```

---
Expand Down
2 changes: 2 additions & 0 deletions .github/workflows/CI.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,3 +29,5 @@ jobs:
cache: 'maven'
- name: Verify with Maven
run: mvn --batch-mode --errors --update-snapshots verify
- name: Run benchmarks
run: mvn test '-Dtest=Benchmarks#CL2*'
22 changes: 22 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,15 @@ intersection between the event array and rule-array is non-empty.
```
Prefix matches only work on string-valued fields.

###Prefix equals-ignore-case matching

```javascript
{
"source": [ { "prefix": { "equals-ignore-case": "EC2" } } ]
}
```
Prefix equals-ignore-case matches only work on string-valued fields.

### Suffix matching

```javascript
Expand All @@ -105,6 +114,15 @@ Prefix matches only work on string-valued fields.
```
Suffix matches only work on string-valued fields.

###Suffix equals-ignore-case matching

```javascript
{
"source": [ { "suffix": { "equals-ignore-case": "EC2" } } ]
}
```
Suffix equals-ignore-case matches only work on string-valued fields.

### Equals-ignore-case matching

```javascript
Expand Down Expand Up @@ -579,7 +597,9 @@ static methods are useful.
```java
public static ValuePatterns exactMatch(final String value);
public static ValuePatterns prefixMatch(final String prefix);
public static ValuePatterns prefixEqualsIgnoreCaseMatch(final String prefix);
public static ValuePatterns suffixMatch(final String suffix);
public static ValuePatterns suffixEqualsIgnoreCaseMatch(final String suffix);
public static ValuePatterns equalsIgnoreCaseMatch(final String value);
public static ValuePatterns wildcardMatch(final String value);
public static AnythingBut anythingButMatch(final String anythingBut);
Expand Down Expand Up @@ -725,6 +745,8 @@ counts the matches, yields the following on a 2019 MacBook:

Events are processed at over 220K/second except for:
- equals-ignore-case matches, which are processed at over 200K/second.
- prefix/equals-ignore-case matches, which are processed at over 200K/second.
- suffix/equals-ignore-case matches, which are processed at over 200K/second.
- wildcard matches, which are processed at over 170K/second.
- anything-but matches, which are processed at over 150K/second.
- numeric matches, which are processed at over 120K/second.
Expand Down
2 changes: 1 addition & 1 deletion pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
<groupId>software.amazon.event.ruler</groupId>
<artifactId>event-ruler</artifactId>
<name>Event Ruler</name>
<version>1.4.1</version>
<version>1.5.0</version>
<description>Event Ruler is a Java library that allows matching Rules to Events. An event is a list of fields,
which may be given as name/value pairs or as a JSON object. A rule associates event field names with lists of
possible values. There are two reasons to use Ruler: 1/ It's fast; the time it takes to match Events doesn't
Expand Down
71 changes: 68 additions & 3 deletions src/main/software/amazon/event/ruler/ByteMachine.java
Original file line number Diff line number Diff line change
Expand Up @@ -23,14 +23,18 @@
import javax.annotation.concurrent.ThreadSafe;

import static software.amazon.event.ruler.CompoundByteTransition.coalesce;
import static software.amazon.event.ruler.MatchType.EQUALS_IGNORE_CASE;
import static software.amazon.event.ruler.MatchType.EXACT;
import static software.amazon.event.ruler.MatchType.EXISTS;
import static software.amazon.event.ruler.MatchType.SUFFIX;
import static software.amazon.event.ruler.MatchType.ANYTHING_BUT_SUFFIX;
import static software.amazon.event.ruler.MatchType.SUFFIX_EQUALS_IGNORE_CASE;

import static software.amazon.event.ruler.input.MultiByte.MAX_CONTINUATION_BYTE;
import static software.amazon.event.ruler.input.MultiByte.MAX_FIRST_BYTE_FOR_ONE_BYTE_CHAR;
import static software.amazon.event.ruler.input.MultiByte.MAX_FIRST_BYTE_FOR_TWO_BYTE_CHAR;
import static software.amazon.event.ruler.input.MultiByte.MAX_NON_FIRST_BYTE;
import static software.amazon.event.ruler.input.MultiByte.MIN_CONTINUATION_BYTE;
import static software.amazon.event.ruler.input.MultiByte.MIN_FIRST_BYTE_FOR_ONE_BYTE_CHAR;
import static software.amazon.event.ruler.input.MultiByte.MIN_FIRST_BYTE_FOR_TWO_BYTE_CHAR;
import static software.amazon.event.ruler.input.DefaultParser.getParser;
Expand Down Expand Up @@ -142,7 +146,9 @@ void deletePattern(final Patterns pattern) {
case EXACT:
case NUMERIC_EQ:
case PREFIX:
case PREFIX_EQUALS_IGNORE_CASE:
case SUFFIX:
case SUFFIX_EQUALS_IGNORE_CASE:
case ANYTHING_BUT_PREFIX:
case ANYTHING_BUT_SUFFIX:
case EQUALS_IGNORE_CASE:
Expand Down Expand Up @@ -490,10 +496,12 @@ private void doTransitionOn(final String valString, final Set<NameStateWithPatte
break;

case PREFIX:
case PREFIX_EQUALS_IGNORE_CASE:
transitionTo.add(new NameStateWithPattern(match.getNextNameState(), match.getPattern()));
break;
case ANYTHING_BUT_SUFFIX:
case SUFFIX:
case SUFFIX_EQUALS_IGNORE_CASE:
case EXISTS:
// we already harvested these matches via separate functions due to special matching
// requirements, so just ignore them here.
Expand Down Expand Up @@ -604,7 +612,7 @@ private void addSuffixMatch(final byte[] val, final Set<NameStateWithPattern> tr
// given we are traversing in reverse order (from right to left), only suffix matches are eligible
// to be collected.
MatchType patternType = match.getPattern().type();
if (patternType == SUFFIX) {
if (patternType == SUFFIX || patternType == SUFFIX_EQUALS_IGNORE_CASE) {
transitionTo.add(new NameStateWithPattern(match.getNextNameState(), match.getPattern()));
} else if (patternType == ANYTHING_BUT_SUFFIX) {
addToAnythingButsMap(failedAnythingButs, match.getNextNameState(), match.getPattern());
Expand Down Expand Up @@ -677,7 +685,9 @@ NameState addPattern(final Patterns pattern, final NameState nameState) {
case EXACT:
case NUMERIC_EQ:
case PREFIX:
case PREFIX_EQUALS_IGNORE_CASE:
case SUFFIX:
case SUFFIX_EQUALS_IGNORE_CASE:
case EQUALS_IGNORE_CASE:
case WILDCARD:
assert pattern instanceof ValuePatterns;
Expand Down Expand Up @@ -818,7 +828,8 @@ private boolean doMultipleTransitionsConvergeForInputByte(ByteState byteState, I
return false;
}

if (!isNextCharacterFirstByteOfMultiByte(characters, i)) {
boolean isNextCharacterForSuffixMatch = isNextCharacterFirstContinuationByteForSuffixMatch(characters, i);
if (!isNextCharacterFirstByteOfMultiByte(characters, i) && !isNextCharacterForSuffixMatch) {
// If we are in the midst of a multi-byte sequence, we know that we are dealing with single transitions.
return false;
}
Expand All @@ -834,7 +845,8 @@ private boolean doMultipleTransitionsConvergeForInputByte(ByteState byteState, I
// Parse the next Java character into lower and upper case representations. Check if there are multiple
// multibytes (paths) and that there exists a transition that both lead to.
String value = extractNextJavaCharacterFromInputCharacters(characters, i);
InputCharacter[] inputCharacters = getParser().parse(MatchType.EQUALS_IGNORE_CASE, value);
MatchType matchType = isNextCharacterForSuffixMatch ? SUFFIX_EQUALS_IGNORE_CASE : EQUALS_IGNORE_CASE;
InputCharacter[] inputCharacters = getParser().parse(matchType, value);
ByteTransition transition = getTransition(byteState, inputCharacters[0]);
return inputCharacters[0] instanceof InputMultiByteSet && transition != null;
}
Expand All @@ -846,7 +858,54 @@ private boolean isNextCharacterFirstByteOfMultiByte(InputCharacter[] characters,
return firstByte > MAX_NON_FIRST_BYTE;
}

private boolean isNextCharacterFirstContinuationByteForSuffixMatch(InputCharacter[] characters, int i) {
if (hasSuffix.get() <= 0) {
return false;
}
// If the previous byte is a continuation byte, this means that we're in the middle of a multi-byte sequence.
return isContinuationByte(characters, i) && !isContinuationByte(characters, i - 1);
}

private boolean isContinuationByte(InputCharacter[] characters, int i) {
if (i < 0) {
return false;
}
byte continuationByte = InputByte.cast(characters[i]).getByte();
// Continuation bytes have bit 7 set, and bit 6 should be unset
return continuationByte >= MIN_CONTINUATION_BYTE && continuationByte <= MAX_CONTINUATION_BYTE;
}

private String extractNextJavaCharacterFromInputCharacters(InputCharacter[] characters, int i) {
if (isNextCharacterFirstByteOfMultiByte(characters, i)) {
return extractNextJavaCharacterFromInputCharactersForForwardArrays(characters, i);
} else {
return extractNextJavaCharacterFromInputCharactersForBackwardsArrays(characters, i);
}
}

private String extractNextJavaCharacterFromInputCharactersForBackwardsArrays(InputCharacter[] characters, int i) {
List<Byte> bytesList = new ArrayList<>();
for (int multiByteIndex = i; multiByteIndex < characters.length; multiByteIndex++) {
if (!isContinuationByte(characters, multiByteIndex)) {
// This is the last byte of the suffix char
bytesList.add(InputByte.cast(characters[multiByteIndex]).getByte());
break;
}
bytesList.add(InputByte.cast(characters[multiByteIndex]).getByte());
}
// Undoing the reverse on the byte array to get the valid char
return new String(reverseBytesList(bytesList), StandardCharsets.UTF_8);
}

private byte[] reverseBytesList(List<Byte> bytesList) {
byte[] byteArray = new byte[bytesList.size()];
for (int i = 0; i < bytesList.size(); i++) {
byteArray[bytesList.size() - i - 1] = bytesList.get(i);
}
return byteArray;
}

private static String extractNextJavaCharacterFromInputCharactersForForwardArrays(InputCharacter[] characters, int i) {
byte firstByte = InputByte.cast(characters[i]).getByte();
if (firstByte >= MIN_FIRST_BYTE_FOR_ONE_BYTE_CHAR && firstByte <= MAX_FIRST_BYTE_FOR_ONE_BYTE_CHAR) {
return new String(new byte[] { firstByte } , StandardCharsets.UTF_8);
Expand All @@ -873,7 +932,9 @@ NameState findPattern(final Patterns pattern) {
case EXACT:
case NUMERIC_EQ:
case PREFIX:
case PREFIX_EQUALS_IGNORE_CASE:
case SUFFIX:
case SUFFIX_EQUALS_IGNORE_CASE:
case ANYTHING_BUT_SUFFIX:
case ANYTHING_BUT_PREFIX:
case EQUALS_IGNORE_CASE:
Expand Down Expand Up @@ -1583,11 +1644,13 @@ private void addMatchReferences(ByteMatch match) {
switch (pattern.type()) {
case EXACT:
case PREFIX:
case PREFIX_EQUALS_IGNORE_CASE:
case EXISTS:
case EQUALS_IGNORE_CASE:
case WILDCARD:
break;
case SUFFIX:
case SUFFIX_EQUALS_IGNORE_CASE:
hasSuffix.incrementAndGet();
break;
case NUMERIC_EQ:
Expand Down Expand Up @@ -1685,11 +1748,13 @@ private void updateMatchReferences(ByteMatch match) {
switch (pattern.type()) {
case EXACT:
case PREFIX:
case PREFIX_EQUALS_IGNORE_CASE:
case EXISTS:
case EQUALS_IGNORE_CASE:
case WILDCARD:
break;
case SUFFIX:
case SUFFIX_EQUALS_IGNORE_CASE:
hasSuffix.decrementAndGet();
break;
case NUMERIC_EQ:
Expand Down
58 changes: 58 additions & 0 deletions src/main/software/amazon/event/ruler/JsonRuleCompiler.java
Original file line number Diff line number Diff line change
Expand Up @@ -366,6 +366,10 @@ private static Patterns processMatchExpression(final JsonParser parser) throws I
return pattern;
} else if (Constants.PREFIX_MATCH.equals(matchTypeName)) {
final JsonToken prefixToken = parser.nextToken();
if (prefixToken == JsonToken.START_OBJECT) {
return processPrefixEqualsIgnoreCaseExpression(parser);
}

if (prefixToken != JsonToken.VALUE_STRING) {
barf(parser, "prefix match pattern must be a string");
}
Expand All @@ -376,6 +380,10 @@ private static Patterns processMatchExpression(final JsonParser parser) throws I
return pattern;
} else if (Constants.SUFFIX_MATCH.equals(matchTypeName)) {
final JsonToken suffixToken = parser.nextToken();
if (suffixToken == JsonToken.START_OBJECT) {
return processSuffixEqualsIgnoreCaseExpression(parser);
}

if (suffixToken != JsonToken.VALUE_STRING) {
barf(parser, "suffix match pattern must be a string");
}
Expand Down Expand Up @@ -515,6 +523,56 @@ private static Patterns processMatchExpression(final JsonParser parser) throws I
}
}

private static Patterns processPrefixEqualsIgnoreCaseExpression(final JsonParser parser) throws IOException {
final JsonToken prefixObject = parser.nextToken();
if (prefixObject != JsonToken.FIELD_NAME) {
barf(parser, "Prefix expression name not found");
}

final String prefixObjectOp = parser.getCurrentName();
if (!Constants.EQUALS_IGNORE_CASE.equals(prefixObjectOp)) {
barf(parser, "Unsupported prefix pattern: " + prefixObjectOp);
}

final JsonToken prefixEqualsIgnoreCase = parser.nextToken();
if (prefixEqualsIgnoreCase != JsonToken.VALUE_STRING) {
barf(parser, "equals-ignore-case match pattern must be a string");
}
final Patterns pattern = Patterns.prefixEqualsIgnoreCaseMatch('"' + parser.getText());
if (parser.nextToken() != JsonToken.END_OBJECT) {
barf(parser, "Only one key allowed in match expression");
}
if (parser.nextToken() != JsonToken.END_OBJECT) {
barf(parser, "Only one key allowed in match expression");
}
return pattern;
}

private static Patterns processSuffixEqualsIgnoreCaseExpression(final JsonParser parser) throws IOException {
final JsonToken suffixObject = parser.nextToken();
if (suffixObject != JsonToken.FIELD_NAME) {
barf(parser, "Suffix expression name not found");
}

final String suffixObjectOp = parser.getCurrentName();
if (!Constants.EQUALS_IGNORE_CASE.equals(suffixObjectOp)) {
barf(parser, "Unsupported suffix pattern: " + suffixObjectOp);
}

final JsonToken suffixEqualsIgnoreCase = parser.nextToken();
if (suffixEqualsIgnoreCase != JsonToken.VALUE_STRING) {
barf(parser, "equals-ignore-case match pattern must be a string");
}
final Patterns pattern = Patterns.suffixEqualsIgnoreCaseMatch(parser.getText() + '"');
if (parser.nextToken() != JsonToken.END_OBJECT) {
barf(parser, "Only one key allowed in match expression");
}
if (parser.nextToken() != JsonToken.END_OBJECT) {
barf(parser, "Only one key allowed in match expression");
}
return pattern;
}

private static Patterns processAnythingButListMatchExpression(JsonParser parser) throws JsonParseException {
JsonToken token;
Set<String> values = new HashSet<>();
Expand Down
2 changes: 2 additions & 0 deletions src/main/software/amazon/event/ruler/MatchType.java
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,9 @@ public enum MatchType {
ABSENT, // absent key pattern
EXISTS, // existence pattern
PREFIX, // string prefix
PREFIX_EQUALS_IGNORE_CASE, // case-insensitive string prefix
SUFFIX, // string suffix
SUFFIX_EQUALS_IGNORE_CASE, // case-insensitive string suffix
NUMERIC_EQ, // exact numeric match
NUMERIC_RANGE, // numeric range with high & low bound & </<=/>/>= options
ANYTHING_BUT, // deny list effect
Expand Down
8 changes: 8 additions & 0 deletions src/main/software/amazon/event/ruler/Patterns.java
Original file line number Diff line number Diff line change
Expand Up @@ -45,10 +45,18 @@ public static ValuePatterns prefixMatch(final String prefix) {
return new ValuePatterns(MatchType.PREFIX, prefix);
}

public static ValuePatterns prefixEqualsIgnoreCaseMatch(final String prefix) {
return new ValuePatterns(MatchType.PREFIX_EQUALS_IGNORE_CASE, prefix);
}

public static ValuePatterns suffixMatch(final String suffix) {
return new ValuePatterns(MatchType.SUFFIX, new StringBuilder(suffix).reverse().toString());
}

public static ValuePatterns suffixEqualsIgnoreCaseMatch(final String suffix) {
return new ValuePatterns(MatchType.SUFFIX_EQUALS_IGNORE_CASE, new StringBuilder(suffix).reverse().toString());
}

public static AnythingBut anythingButMatch(final String anythingBut) {
return new AnythingBut(Collections.singleton(anythingBut), false);
}
Expand Down
Loading

0 comments on commit 4cddf22

Please sign in to comment.