Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

Support Top/Rare Command In PPL #720

Merged
merged 18 commits into from
Sep 8, 2020

Conversation

rupal-bq
Copy link
Contributor

Issue #685

Description of changes:

  • core: added support for rare/top operation in a logical & physical layer
  • ppl: added AstBuilder logic for rare and top command
  • updated documentation with examples for using rare and top command

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@dai-chen dai-chen added the PPL label Aug 28, 2020
@@ -83,6 +84,16 @@ public static LogicalPlan dedupe(
input, Arrays.asList(fields), allowedDuplication, keepEmpty, consecutive);
}

public static LogicalPlan rareTopN(LogicalPlan input, Boolean rareTopFlag,
List<Expression> groupByList, Expression... fields) {
return rareTopN(input, rareTopFlag, 10, groupByList, fields);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some use a constant like DEFAULT_NO_OF_RESULTS and some have 10 hardcoded. Should probably be consistent with the constant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used the variable/function to get default value wherever required. hardcoded values are used for the test purpose.


@Override
public List<LogicalPlan> getChild() {
return Collections.singletonList(child);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a different method we use ImmutableList.of() here. Let's be consistent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced ImmutableList.of(child) in RareTopN.java with Collections.singletonList(child)

* Get a list of top values.
*/
public List<FieldKey> findTop(HashMap<FieldKey, Integer> map) {
PriorityQueue<FieldKey> topQueue = new PriorityQueue<>(new Comparator<FieldKey>() {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

findTop() and findRare() have the same implementation, except the comparator is different. We could just have a single find() function that takes in a comparator instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replaced with find() function

/**
* Get a list of result.
*/
public List<FieldKey> getList(HashMap<FieldKey, Integer> map, PriorityQueue<FieldKey> queue,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This algorithm for building this list is too complicated. I think you can model this more easily with streams.
map.entrySet().stream()
.sort(ValueComparator)
.limit(noOfResults)

Pass in different comparator for rare vs. top.
.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Added this in find() function.

/**
* Return the Map of field and field value.
*/
public LinkedHashMap<String, ExprValue> fieldKeyMap() {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually you don't want to return the specific type and instead return the interface unless the caller really needs to work with LinkedHashMap in particular.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Changed return type to Map.

@@ -0,0 +1,49 @@
package com.amazon.opendistroforelasticsearch.sql.ast.tree;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing the license header.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

*/
public FieldKey(ExprValue value) {
this.fieldByValueList = new ArrayList<>();
for (Expression fieldExpr : fieldExprList) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets use fieldExprList.stream().map(...).collect(Collectors.toList())

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to stream

*/
public Map<String, ExprValue> fieldKeyMap() {
Map<String, ExprValue> map = new LinkedHashMap<>();
for (int i = 0; i < fieldExprList.size(); i++) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you using guava? Can you use Streams.zip().collect(Collectors.toMap(...))?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced with Streams.zip()...

*/
public GroupKey(ExprValue value) {
this.groupByValueList = new ArrayList<>();
for (Expression groupExpr : groupByExprList) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switch to stream

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

/**
* Return the Map of group field and group field value.
*/
public Map<String, ExprValue> groupKeyMap() {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switch to stream.

Copy link

@jduo jduo Aug 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same logic and return type as for FieldKey#fieldKeyMap. Can they derive from a common base, or use a generic parameter?

The constructor is also pretty much the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used common class for finding group & field key

* Top command.
*/
@Override
public UnresolvedPlan visitTopCommand(TopCommandContext ctx) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code is the same for these two methods, except differing by the flag. Can they be combined into a helper? If the two context classes have the same methods...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added helper methods for getting a list of groups & fields. Can't add other arguments in helper since that requires respective command context object

@rupal-bq rupal-bq marked this pull request as ready for review August 31, 2020 22:06

PPL query::

od> source=accounts | rare age by gender;
Copy link
Contributor

@penghuo penghuo Sep 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how many value for each field will be returned? 10?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 values per group for each field

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add this to the doc? I think the max value = 10, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the max is 10. Added in the Description section.

============
top [N] <field-list> [by-clause]

* N: number of results to return. **Default**: 10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the rare command doesn't has thsi paramater?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rare doesn't have N because it's not in the syntax given in this documentation: https://docs.splunk.com/Documentation/Splunk/8.0.2/SearchReference/Rare

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

limit controls the number of results for rare but support for top-options isn't covered in this PR.
Should I add parameter N for rare?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need now, just confirm,

@@ -197,6 +206,56 @@ public UnresolvedPlan visitEvalCommand(EvalCommandContext ctx) {
);
}

private List<UnresolvedExpression> getGroupByList(ByClauseContext ctx) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you also change the stats command to use these genric method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

used getGroupByList in stats command

.collect(Collectors.toList());
}

private List<Field> getFieldList(FieldListContext ctx) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

used getFieldList for dedup command

@RequiredArgsConstructor
public class Group {

private final Map<Key, List<HashMap<Key, Integer>>> groupListMap = new HashMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the value is List instead of just Map?

Copy link
Contributor Author

@rupal-bq rupal-bq Sep 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for noticing. The list is not required here. I will remove it.
I had list<list<ExprValue>> before I understood every input has values for the entire row and not just for a cell. Looks like I missed this while updating the structure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed list

Rupal Mahajan added 3 commits September 1, 2020 15:35
- remove list
- use generic getGroupByList function for stats
- use generic getFieldList function for dedup
public class RareTopN extends UnresolvedPlan {

private UnresolvedPlan child;
private final Boolean rareTopFlag;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a boolean instead of Boolean? Would an enum with values RARE or TOP be better? I can't tell if setting this to true means rare or top.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add enum

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replaced with enum

@EqualsAndHashCode.Exclude
private Iterator<ExprValue> iterator;

private static final Integer DEFAULT_NO_OF_RESULTS = 10;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this and noOfResults just be primitive ints?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I can replace it with primitive ints. All other operators had an object of Integer so I just followed the same structure. Should I use primitive instead?


@Override
public boolean hasNext() {
return iterator.hasNext();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a precondition check here and on next() to verify open() has been called.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate more? Like what should we return if a condition fails? Maybe Null?
I have simply followed the logic of AggregationOperator here. It might not be required since open() is updating an iterator and if open() isn't called then hasNext() will simply return false since there are no elements. And when hasNext() will return false, next() won't be called.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add Precondition.checkNotNull(iterator) here. But really depends on what coding style is preferred.

Copy link
Contributor

@penghuo penghuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the change.

@@ -80,6 +80,17 @@ public static DedupeOperator dedupe(
input, Arrays.asList(expressions), allowedDuplication, keepEmpty, consecutive);
}

public static RareTopNOperator rareTopN(PhysicalPlan input, Boolean rareTopFlag,
List<Expression> groups, Expression... expressions) {
return rareTopN(input, rareTopFlag, 10, groups, expressions);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment on constant for 10.

}

/**
* Get a list of top values.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use tags like @return and such when doing java docs for return values. Similar comment for @param for parameter. Would generally have this nit across the code.

}

/**
* Field Key.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This javadoc isn't very useful. Comments should help the developer understand the code and provide value like "what, why, how".

@penghuo penghuo merged commit 94f2d32 into opendistro-for-elasticsearch:develop Sep 8, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants