Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modified EXPLAIN ANALYZE output #23824

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

infvg
Copy link

@infvg infvg commented Oct 14, 2024

Description

The EXPLAIN ANALYZE operator only supports one substage when returning the output.
When outputting in TEXT format, modify it to loop through the substages and return
the substage ID and its plan.
For JSON format, return a list of plans.

Motivation and Context

Multiple substages were not previously supported in the output
Resolves: #23798

Impact

Modifies the EXPLAIN ANALYZE output

New text output
 Stage ID: 20241014_184402_00071_gx85r.1                                                                                                                       
 Fragment 1 [COORDINATOR_ONLY]                                                                                                                                 
     CPU: 16.78ms, Scheduled: 70.36ms, Input: 6 rows (1.45kB); per task: avg.: 6.00 std.dev.: 0.00, Output: 1 row (9B), 1 tasks                                
     Output layout: [rows]                                                                                                                                     
     Output partitioning: SINGLE []                                                                                                                            
     Stage Execution Strategy: UNGROUPED_EXECUTION                                                                                                             
     - TableCommit[PlanNodeId 388][Optional[TableHandle {connectorId='hive', connectorHandle='HiveTableHandle{schemaName=__temp_ctes__, tableName=__presto_tem>
             CPU: 14.00ms (2.52%), Scheduled: 55.00ms (3.66%), Output: 1 row (9B)                                                                              
             Input avg.: 0.00 rows, Input std.dev.: ?%                                                                                                         
         - RemoteSource[2] => [rows_6:bigint, fragments:varbinary, commitcontext:varbinary]                                                                    
                 CPU: 1.00ms (0.18%), Scheduled: 13.00ms (0.86%), Output: 6 rows (1.45kB)                                                                      
                 Input avg.: 6.00 rows, Input std.dev.: 0.00%                                                                                                  
                                                                                                                                                               
 Fragment 2 [ROUND_ROBIN]                                                                                                                                      
     CPU: 543.08ms, Scheduled: 1.48s, Input: 3 rows (15B); per task: avg.: 0.75 std.dev.: 1.30, Output: 6 rows (1.45kB), 4 tasks                               
     Output layout: [rows_6, fragments, commitcontext]                                                                                                         
     Output partitioning: SINGLE []                                                                                                                            
     Stage Execution Strategy: UNGROUPED_EXECUTION                                                                                                             
     - TableWriterMerge[PlanNodeId 447] => [rows_6:bigint, fragments:varbinary, commitcontext:varbinary]                                                       
             CPU: 8.00ms (1.44%), Scheduled: 95.00ms (6.32%), Output: 6 rows (1.45kB)                                                                          
             Input avg.: 0.00 rows, Input std.dev.: ?%                                                                                                         
         - LocalExchange[PlanNodeId 446][SINGLE] () => [partialrowcount:bigint, partialfragments:varbinary, partialcontext:varbinary]                          
                 CPU: 1.00ms (0.18%), Scheduled: 31.00ms (2.06%), Output: 9 rows (1.87kB)                                                                      
                 Input avg.: 1.13 rows, Input std.dev.: 29.40%                                                                                                 
             - TableWriter[PlanNodeId 389] => [partialrowcount:bigint, partialfragments:varbinary, partialcontext:varbinary]                                   
                     CPU: 525.00ms (94.59%), Scheduled: 1.27s (84.38%), Output: 9 rows (1.87kB)                                                                
                     Input avg.: 0.00 rows, Input std.dev.: ?%                                                                                                 
                     _c0_field := field (1:42)                                                                                                                 
                     Statistics collected: 0                                                                                                                   
                 - LocalExchange[PlanNodeId 445][ROUND_ROBIN] () => [field:integer]                                                                            
                         CPU: 2.00ms (0.36%), Scheduled: 25.00ms (1.66%), Output: 3 rows (15B)                                                                 
                         Input avg.: 0.19 rows, Input std.dev.: 387.30%                                                                                        
                     - RemoteSource[3] => [field:integer]                                                                                                      
                             CPU: 1.00ms (0.18%), Scheduled: 3.00ms (0.20%), Output: 3 rows (15B)                                                              
                             Input avg.: 0.19 rows, Input std.dev.: 387.30%                                                                                    
                                                                                                                                                               
 Fragment 3 [SINGLE]                                                                                                                                           
     CPU: 3.91ms, Scheduled: 13.35ms, Input: 3 rows (15B); per task: avg.: 3.00 std.dev.: 0.00, Output: 3 rows (15B), 1 tasks                                  
     Output layout: [field]                                                                                                                                    
     Output partitioning: ROUND_ROBIN []                                                                                                                       
     Stage Execution Strategy: UNGROUPED_EXECUTION                                                                                                             
     - Values[PlanNodeId 0] => [field:integer]                                                                                                                 
             CPU: 3.00ms (0.54%), Scheduled: 13.00ms (0.86%), Output: 3 rows (15B)                                                                             
             Input avg.: 3.00 rows, Input std.dev.: 0.00%                                                                                                      
             (INTEGER'1')                                                                                                                                      
             (INTEGER'2')                                                                                                                                      
             (INTEGER'3')                                                                                                                                      
                                                                                                                                                               
 Stage ID: 20241014_184402_00071_gx85r.4                                                                                                                       
 Fragment 4 [hive:buckets=128, bucketFunctionType=HIVE_COMPATIBLE, types=[string]]                                                                             
     CPU: 71.89ms, Scheduled: 100.49ms, Input: 3 rows (346B); per task: avg.: 0.75 std.dev.: 1.30, Output: 3 rows (15B), 4 tasks                               
     Output layout: [field_7]                                                                                                                                  
     Output partitioning: SINGLE []                                                                                                                            
     Stage Execution Strategy: UNGROUPED_EXECUTION                                                                                                             
     - TableScan[PlanNodeId 390][TableHandle {connectorId='hive', connectorHandle='HiveTableHandle{schemaName=__temp_ctes__, tableName=__presto_temporary_tabl
             CPU: 71.00ms (100.00%), Scheduled: 100.00ms (100.00%), Output: 3 rows (15B)                                                                       
             Input avg.: 3.00 rows, Input std.dev.: 0.00%                                                                                                      
             LAYOUT: __temp_ctes__.__presto_temporary_table_parquet_20241014_184402_00071_gx85r_9a59cb18_71c7_4c64_b224_9b51b88a83c9{buckets=128}              
             field_7 := _c0_field:int:0:REGULAR (1:42)                                                                                                         
             Input: 3 rows (346B), Filtered: 0.00%                  
Old text output
 Fragment 1 [COORDINATOR_ONLY]                                                                                                                                 
     CPU: 5.39ms, Scheduled: 26.15ms, Input: 6 rows (1.45kB); per task: avg.: 6.00 std.dev.: 0.00, Output: 1 row (9B), 1 tasks                                 
     Output layout: [rows]                                                                                                                                     
     Output partitioning: SINGLE []                                                                                                                            
     Stage Execution Strategy: UNGROUPED_EXECUTION                                                                                                             
     - TableCommit[PlanNodeId 388][Optional[TableHandle {connectorId='hive', connectorHandle='HiveTableHandle{schemaName=__temp_ctes__, tableName=__presto_tem
             CPU: 3.00ms (6.38%), Scheduled: 20.00ms (9.30%), Output: 1 row (9B)                                                                               
             Input avg.: 0.00 rows, Input std.dev.: ?%                                                                                                         
         - RemoteSource[2] => [rows_6:bigint, fragments:varbinary, commitcontext:varbinary]                                                                    
                 CPU: 1.00ms (2.13%), Scheduled: 5.00ms (2.33%), Output: 6 rows (1.45kB)                                                                       
                 Input avg.: 6.00 rows, Input std.dev.: 0.00%                                                                                                  
                                                                                                                                                               
 Fragment 2 [ROUND_ROBIN]                                                                                                                                      
     CPU: 45.53ms, Scheduled: 283.01ms, Input: 3 rows (15B); per task: avg.: 0.75 std.dev.: 1.30, Output: 6 rows (1.45kB), 4 tasks                             
     Output layout: [rows_6, fragments, commitcontext]                                                                                                         
     Output partitioning: SINGLE []                                                                                                                            
     Stage Execution Strategy: UNGROUPED_EXECUTION                                                                                                             
     - TableWriterMerge[PlanNodeId 447] => [rows_6:bigint, fragments:varbinary, commitcontext:varbinary]                                                       
             CPU: 7.00ms (14.89%), Scheduled: 40.00ms (18.60%), Output: 6 rows (1.45kB)                                                                        
             Input avg.: 0.00 rows, Input std.dev.: ?%                                                                                                         
         - LocalExchange[PlanNodeId 446][SINGLE] () => [partialrowcount:bigint, partialfragments:varbinary, partialcontext:varbinary]                          
                 CPU: 4.00ms (8.51%), Scheduled: 23.00ms (10.70%), Output: 9 rows (1.87kB)                                                                     
                 Input avg.: 1.13 rows, Input std.dev.: 29.40%                                                                                                 
             - TableWriter[PlanNodeId 389] => [partialrowcount:bigint, partialfragments:varbinary, partialcontext:varbinary]                                   
                     CPU: 26.00ms (55.32%), Scheduled: 98.00ms (45.58%), Output: 9 rows (1.87kB)                                                               
                     Input avg.: 0.00 rows, Input std.dev.: ?%                                                                                                 
                     _c0_field := field (1:42)                                                                                                                 
                     Statistics collected: 0                                                                                                                   
                 - LocalExchange[PlanNodeId 445][ROUND_ROBIN] () => [field:integer]                                                                            
                         CPU: 1.00ms (2.13%), Scheduled: 15.00ms (6.98%), Output: 3 rows (15B)                                                                 
                         Input avg.: 0.19 rows, Input std.dev.: 387.30%                                                                                        
                     - RemoteSource[3] => [field:integer]                                                                                                      
                             CPU: 1.00ms (2.13%), Scheduled: 9.00ms (4.19%), Output: 3 rows (15B)                                                              
                             Input avg.: 0.19 rows, Input std.dev.: 387.30%                                                                                    
                                                                                                                                                               
 Fragment 3 [SINGLE]                                                                                                                                           
     CPU: 5.86ms, Scheduled: 7.59ms, Input: 3 rows (15B); per task: avg.: 3.00 std.dev.: 0.00, Output: 3 rows (15B), 1 tasks                                   
     Output layout: [field]                                                                                                                                    
     Output partitioning: ROUND_ROBIN []                                                                                                                       
     Stage Execution Strategy: UNGROUPED_EXECUTION                                                                                                             
     - Values[PlanNodeId 0] => [field:integer]                                                                                                                 
             CPU: 4.00ms (8.51%), Scheduled: 5.00ms (2.33%), Output: 3 rows (15B)                                                                              
             Input avg.: 3.00 rows, Input std.dev.: 0.00%                                                                                                      
             (INTEGER'1')                                                                                                                                      
             (INTEGER'2')                                                                                                                                      
             (INTEGER'3')                                                                                                                                      
New JSON output
[{                                                                                                                                                            
    "1" : {                                                                                                                                                     
      "plan" : { ... }                                                                                                                                                        
    },                                                                                                                                                          
    "2" : {                                                                                                                                                     
      "plan" : { ... }                                                                                                                                                          
    },                                                                                                                                                          
    "3" : { ... }                                                                                                                                                          
  },{                                                                                                                                                           
    "4" : {                                                                                                                                                     
      "plan" : { ... }                                                                                                                                                       
    }                                                                                                                                                          
}]       
Old JSON output
 {                                                                                                                                                             
   "1" : {                                                                                                                                                     
     "plan" : { ... }                                                                                                                                                            
   },                                                                                                                                                          
   "2" : { ... },                                                                                                                                                          
   "3" : { ... }                                                                                                                                                        
 }      

Test Plan

Tested locally using the HiveQueryRunner

== RELEASE NOTES ==

General Changes
* Modified the EXPLAIN ANALYZE output for TEXT and JSON formats. Text now returns each stage ID followed by its plan. JSON returns a list of plans. :pr:`23824`

@infvg infvg force-pushed the PRESTO_23798 branch 2 times, most recently from 0e64d8c to dbe88ac Compare October 15, 2024 13:37
@infvg infvg marked this pull request as ready for review October 15, 2024 15:58
@infvg infvg requested a review from presto-oss October 15, 2024 15:58
break;
case JSON:
plan = jsonDistributedPlan(queryInfo.getOutputStage().get().getSubStages().get(0), functionAndTypeManager, operatorContext.getSession());
StringJoiner planStringJoiner = new StringJoiner(",", "[", "]");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the output needs to be valid JSON, we should probably use a JSON codec for lists and render it through that instead. It likely will require some more refactoring of the PlanPrinter class to achieve that. Joining strings like this is prone to errors/generating invalid JSON

fragmentsList = renderer.deserialize((String) computeActual("EXPLAIN ANALYZE (format JSON) SELECT rank() OVER (PARTITION BY orderkey ORDER BY clerk DESC) FROM orders WHERE orderkey < 0").getOnlyValue());
for (Map<PlanFragmentId, JsonRenderer.JsonPlan> fragments : fragmentsList) {
fragments.values().forEach(planFragment -> assertJsonNodesHaveStats(planFragment.getPlan()));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should add another test where we enable CTEs and verify that all stages exist in the plan output

Comment on lines 384 to 386
for (Map<PlanFragmentId, JsonRenderer.JsonPlan> fragments : fragmentsList) {
fragments.values().forEach(planFragment -> assertJsonNodesHaveStats(planFragment.getPlan()));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: since you're using the stream API within the loop, you might as well do just use it for the whole thing:

Suggested change
for (Map<PlanFragmentId, JsonRenderer.JsonPlan> fragments : fragmentsList) {
fragments.values().forEach(planFragment -> assertJsonNodesHaveStats(planFragment.getPlan()));
}
fragmentsList.stream().map(fragments -> {
fragments.values().forEach(planFragment -> assertJsonNodesHaveStats(planFragment.getPlan()));
});

same for the other blocks in this test

The EXPLAIN ANALYZE operator only supports one substage when returning the output.
When outputting in TEXT format, modify it to loop through the substages and return
the substage ID and its plan.
For JSON format, return a list of plans.

Resolves: prestodb#23798
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

EXPLAIN ANALYZE fails on queries with CTE materialization
2 participants