-
Notifications
You must be signed in to change notification settings - Fork 264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Predicate UDF Reordering Optimization #618
Conversation
gaurav274
commented
Mar 20, 2023
•
edited
Loading
edited
- Reorders the conjuncts based on the cost of the UDFs
- Bug fix: orderby executor did not safely handle corner case if the batch is empty.
- Bug fix: TupleValue Expression did not respect the "mask" used by short-circuiting leading to wrong results.
…avicha3/udf-predicate-reordering
@@ -17,3 +17,4 @@ | |||
NO_GPU = -1 | |||
UNDEFINED_GROUP_ID = -1 | |||
IFRAMES = "IFRAMES" | |||
DEFAULT_FUNCTION_EXPRESSION_COST = 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a future PR, we should look at why we have two constants.py
files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
contains_func_exprs = [] | ||
simple_exprs = [] | ||
for conjunct in conjuncts: | ||
if list(conjunct.find_all(FunctionExpression)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a test case where there are no FunctionExpression
in conjunct
-- edge case?
@@ -80,7 +80,7 @@ def test_should_run_pytorch_and_resnet50(benchmark, setup_pytorch_tests): | |||
# non-trivial test case for Resnet50 | |||
res = actual_batch.frames | |||
assert res["featureextractor.features"][0].shape == (1, 2048) | |||
assert res["featureextractor.features"][0][0][0] > 0.3 | |||
# assert res["featureextractor.features"][0][0][0] > 0.3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why was this commented out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the feature extractor to use a newer model.
if name == "DummyMultiObjectDetector": | ||
return MagicMock(cost=10) | ||
else: | ||
return MagicMock(cost=5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have a test case for a scenario where there is no cost in the catalog? (defaults to DEFAULT_FUNCTION_EXPRESSION_COST
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
@@ -33,7 +33,10 @@ def __init__(self, value: Any, v_type: ColumnType = ColumnType.INTEGER): | |||
self._v_type = v_type | |||
|
|||
def evaluate(self, batch: Batch, **kwargs): | |||
return Batch(pd.DataFrame({0: [self._value] * len(batch)})) | |||
batch = Batch(pd.DataFrame({0: [self._value] * len(batch)})) | |||
if "mask" in kwargs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the use of mask here?
@@ -499,6 +503,64 @@ def apply(self, before: LogicalJoin, context: OptimizerContext): | |||
yield new_join | |||
|
|||
|
|||
class ReorderPredicates(Rule): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we give it a more specific name like CostBasedReorder...
to differentiate with SelectivityBasedReorder
?
# Segregate the conjuncts into simple and function expressions | ||
contains_func_exprs = [] | ||
simple_exprs = [] | ||
for conjunct in conjuncts: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just make sure I understand this correctly, if we have something like OR(AND(a,b), c)
, this reorder rule won't be able to flip any order inside OR
right?