feat(fuzzer): Add ability in expression fuzzer to run multiple batches #11903

bikramSingh91 · 2024-12-18T00:44:47Z

Summary:
This change adds the ability to run 2 input batches for each
expression fuzzer iteration which will re-use the ExprSet to simulate
its typical usage in actual use-cases like in the ProjectFilter
Operator. The full execution loop of each iteration is modified to
accommodate this change, including input generation and modification,
result verification, re-running input using TRY, finding the minimal
breaking expression tree, and the facility to serialize the input and
repro using the ExpressionRunner utility.

Side note: this exposed a bug in Simplified path where the inputs are
not cleared if during eval of inputs an exception is thrown. The fix is
also a part of this change.

Differential Revision: D67368974

netlify · 2024-12-18T00:45:03Z

✅ Deploy Preview for meta-velox canceled.

Name	Link
🔨 Latest commit	`f67f6c5`
🔍 Latest deploy log	https://app.netlify.com/sites/meta-velox/deploys/676ef1e10d5b8a0008792c0e

facebook-github-bot · 2024-12-18T00:45:06Z

This pull request was exported from Phabricator. Differential Revision: D67368974

facebookincubator#11903) Summary: This change adds the ability to run 2 input batches for each expression fuzzer iteration which will re-use the ExprSet to simulate its typical usage in actual use-cases like in the ProjectFilter Operator. The full execution loop of each iteration is modified to accommodate this change, including input generation and modification, result verification, re-running input using TRY, finding the minimal breaking expression tree, and the facility to serialize the input and repro using the ExpressionRunner utility. Side note: this exposed a bug in Simplified path where the inputs are not cleared if during eval of inputs an exception is thrown. The fix is also a part of this change. Differential Revision: D67368974

facebook-github-bot · 2024-12-18T20:27:18Z

This pull request was exported from Phabricator. Differential Revision: D67368974

kagamiori

Hi @bikramSingh91, thank you for adding this support to enable fuzzer coverage of dictionary memoization! I left a few comments.

velox/expression/fuzzer/ExpressionFuzzerVerifier.cpp

velox/expression/fuzzer/ExpressionFuzzerVerifier.h

kagamiori · 2024-12-19T02:47:31Z

velox/expression/tests/ExpressionRunner.cpp

+  std::vector<VectorPtr> children = inputVector->children();
+  auto firstEncodedChild = children[indices[0]];
+  VELOX_CHECK_EQ(
+      firstEncodedChild->encoding(), VectorEncoding::Simple::DICTIONARY);
+  auto commonDictionaryIndices = firstEncodedChild->wrapInfo();
+  auto commonNulls = firstEncodedChild->nulls();


So the assumption is that all child vectors at indices have the same top-level (and only top-level) indices and nulls, right? Could you add a comment telling this assumption?

Its mentioned in the method comment above at L140. Lemme know if it sounds unclear.

I see. I think the comment at L140 can be revised a bit like this "Make all children of the input row vector at 'indices' wrapped in the same dictionary Buffers. These children are assumed to have already been wrapped in the same dictionary but through separate Buffers. Making them wrapped in the same Buffers is necessary to trigger peeling."

kagamiori · 2024-12-19T03:10:46Z

velox/expression/tests/ExpressionRunner.cpp

+          "Input vector is not a RowVector: {}",
+          inputVector->toString());
+      VELOX_CHECK_GT(inputVector->size(), 0, "Input vector must not be empty.");
+      if (inputSelectivityPaths.size() > i) {


I may have missed something, but why could we have unmatched number of selectivity paths and input paths?

there are 2 cases I am trying to handle, one where there are no input selectivity paths, the other where all inputs have one. However since I am allowing the user to input a comma separated list of paths, this check allows me to handle the case where the user only adds selectivity vector paths for some inputs.

The cmd line params don't offer the best user experience here, so I am definitely open to suggestions.

The cmd line params don't offer the best user experience here, so I am definitely open to suggestions.

Could you share an example of the printout of the persisted repro paths and the cmd line arguments to ExpressionRunnerTest with multiple input paths and selectivity vector paths? I wonder whether we need to/should save all selectivity vector paths to a subdirectory when persisting the repro info, so that we only need to provide one path to the subdirectory here.

@kagamiori you can either specify the folder where all files where generated by the fuzzer like before:

--fuzzer_repro_path /tmp/fuzzer_repro/velox_expressionVerifier_Gu1Seu/ --mode verify

Or you can specify the input, etc explicitly like:

--input_paths /tmp/fuzzer_repro/velox_expressionVerifier_Gu1Seu/input_vector_0,/tmp/fuzzer_repro/velox_expressionVerifier_Gu1Seu/input_vector_1 --input_selectivity_vector_paths /tmp/fuzzer_repro/velox_expressionVerifier_Gu1Seu/input_selectivity_vector_0,/tmp/fuzzer_repro/velox_expressionVerifier_Gu1Seu/input_selectivity_vector_1 --sql_path /tmp/fuzzer_repro/velox_expressionVerifier_Gu1Seu/sql --mode verify

velox/expression/tests/ExpressionRunner.cpp

velox/expression/tests/ExpressionVerifier.cpp

facebookincubator#11903) Summary: This change adds the ability to run 2 input batches for each expression fuzzer iteration which will re-use the ExprSet to simulate its typical usage in actual use-cases like in the ProjectFilter Operator. The full execution loop of each iteration is modified to accommodate this change, including input generation and modification, result verification, re-running input using TRY, finding the minimal breaking expression tree, and the facility to serialize the input and repro using the ExpressionRunner utility. Side note: this exposed a bug in Simplified path where the inputs are not cleared if during eval of inputs an exception is thrown. The fix is also a part of this change. Differential Revision: D67368974

facebook-github-bot · 2024-12-19T17:34:59Z

This pull request was exported from Phabricator. Differential Revision: D67368974

facebookincubator#11903) Summary: This change adds the ability to run 2 input batches for each expression fuzzer iteration which will re-use the ExprSet to simulate its typical usage in actual use-cases like in the ProjectFilter Operator. The full execution loop of each iteration is modified to accommodate this change, including input generation and modification, result verification, re-running input using TRY, finding the minimal breaking expression tree, and the facility to serialize the input and repro using the ExpressionRunner utility. Side note: this exposed a bug in both Simplified and common paths where the inputs are not cleared if during eval of inputs an exception is thrown. The fix is also a part of this change. Differential Revision: D67368974

facebook-github-bot · 2024-12-20T00:00:24Z

This pull request was exported from Phabricator. Differential Revision: D67368974

kagamiori · 2024-12-23T19:45:45Z

velox/expression/tests/ExpressionRunner.cpp

+  std::vector<VectorPtr> children = inputVector->children();
+  auto firstEncodedChild = children[indices[0]];
+  VELOX_CHECK_EQ(
+      firstEncodedChild->encoding(), VectorEncoding::Simple::DICTIONARY);
+  auto commonDictionaryIndices = firstEncodedChild->wrapInfo();
+  auto commonNulls = firstEncodedChild->nulls();


I see. I think the comment at L140 can be revised a bit like this "Make all children of the input row vector at 'indices' wrapped in the same dictionary Buffers. These children are assumed to have already been wrapped in the same dictionary but through separate Buffers. Making them wrapped in the same Buffers is necessary to trigger peeling."

kagamiori · 2024-12-23T19:54:09Z

velox/expression/tests/ExpressionRunner.cpp

+          "Input vector is not a RowVector: {}",
+          inputVector->toString());
+      VELOX_CHECK_GT(inputVector->size(), 0, "Input vector must not be empty.");
+      if (inputSelectivityPaths.size() > i) {


The cmd line params don't offer the best user experience here, so I am definitely open to suggestions.

Could you share an example of the printout of the persisted repro paths and the cmd line arguments to ExpressionRunnerTest with multiple input paths and selectivity vector paths? I wonder whether we need to/should save all selectivity vector paths to a subdirectory when persisting the repro info, so that we only need to provide one path to the subdirectory here.

kagamiori · 2024-12-23T20:05:23Z

velox/expression/tests/ExpressionRunnerTest.cpp

+    "table 't'. Note: if selectivity vectors are specified for those inputs "
+    "using --input_selectivity_vector_paths, then those will not be used for "
+    "the query.");


nit: Maybe simply say "Note: --input_selectivity_vector_paths is ignored in this mode".

kagamiori

LGTM. Thank you for expanding the coverage of expression fuzzer! (Don't forget to fix the two comments and it's good to merge.)

facebookincubator#11903) Summary: This change adds the ability to run 2 input batches for each expression fuzzer iteration which will re-use the ExprSet to simulate its typical usage in actual use-cases like in the ProjectFilter Operator. The full execution loop of each iteration is modified to accommodate this change, including input generation and modification, result verification, re-running input using TRY, finding the minimal breaking expression tree, and the facility to serialize the input and repro using the ExpressionRunner utility. Side note: this exposed a bug in both Simplified and common paths where the inputs are not cleared if during eval of inputs an exception is thrown. The fix is also a part of this change. Reviewed By: kagamiori Differential Revision: D67368974

facebook-github-bot · 2024-12-27T18:29:24Z

This pull request was exported from Phabricator. Differential Revision: D67368974

bikramSingh91 · 2024-12-27T21:07:54Z

Failures are unrelated:
one in Build with GCC / Linux release with adapters is #11857

one in Spark Fuzzer is #11462

facebook-github-bot · 2024-12-27T21:30:17Z

This pull request has been merged in 883b989.

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 18, 2024

facebook-github-bot added the fb-exported label Dec 18, 2024

bikramSingh91 requested a review from kagamiori December 18, 2024 00:45

bikramSingh91 force-pushed the export-D67368974 branch from c0cee4e to c87a6ab Compare December 18, 2024 20:27

kagamiori reviewed Dec 19, 2024

View reviewed changes

bikramSingh91 force-pushed the export-D67368974 branch from c87a6ab to d4b17a7 Compare December 19, 2024 17:34

bikramSingh91 force-pushed the export-D67368974 branch from d4b17a7 to 705f1e2 Compare December 20, 2024 00:00

kagamiori reviewed Dec 23, 2024

View reviewed changes

kagamiori approved these changes Dec 24, 2024

View reviewed changes

bikramSingh91 force-pushed the export-D67368974 branch from 705f1e2 to f67f6c5 Compare December 27, 2024 18:28

facebook-github-bot closed this in 883b989 Dec 27, 2024

facebook-github-bot added the Merged label Dec 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(fuzzer): Add ability in expression fuzzer to run multiple batches #11903

feat(fuzzer): Add ability in expression fuzzer to run multiple batches #11903

bikramSingh91 commented Dec 18, 2024

netlify bot commented Dec 18, 2024 •

edited

Loading

facebook-github-bot commented Dec 18, 2024

facebook-github-bot commented Dec 18, 2024

kagamiori left a comment

kagamiori Dec 19, 2024

bikramSingh91 Dec 19, 2024

kagamiori Dec 23, 2024

kagamiori Dec 19, 2024

bikramSingh91 Dec 19, 2024

kagamiori Dec 23, 2024

bikramSingh91 Dec 24, 2024

facebook-github-bot commented Dec 19, 2024

facebook-github-bot commented Dec 20, 2024

kagamiori Dec 23, 2024

kagamiori Dec 23, 2024

kagamiori Dec 23, 2024

kagamiori left a comment

facebook-github-bot commented Dec 27, 2024

bikramSingh91 commented Dec 27, 2024

facebook-github-bot commented Dec 27, 2024

feat(fuzzer): Add ability in expression fuzzer to run multiple batches #11903

feat(fuzzer): Add ability in expression fuzzer to run multiple batches #11903

Conversation

bikramSingh91 commented Dec 18, 2024

netlify bot commented Dec 18, 2024 • edited Loading

✅ Deploy Preview for meta-velox canceled.

facebook-github-bot commented Dec 18, 2024

facebook-github-bot commented Dec 18, 2024

kagamiori left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

facebook-github-bot commented Dec 19, 2024

facebook-github-bot commented Dec 20, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kagamiori left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Dec 27, 2024

bikramSingh91 commented Dec 27, 2024

facebook-github-bot commented Dec 27, 2024

netlify bot commented Dec 18, 2024 •

edited

Loading