Add support for processing Photon event logs in Scala #1338

parthosa · 2024-09-10T22:54:24Z

Contributes to #251. This PR adds support for qualifying Photon event logs in Scala.

Approach:

Mapped Photon operators to Spark operators using com.databricks.photon.PhotonSupport.
During Spark plan graph construction, identified Photon nodes and created:
- PhotonSparkPlanGraphNode: Extends SparkPlanGraphNode for non-WholeStageCodegen nodes
- PhotonSparkPlanGraphCluster: Extends SparkPlanGraphCluster for Photon's WholeStageCodegen nodes.
Added PhotonPlanParser for non-WholeStageCodegen nodes:
- Includes parseNode method to parse Photon nodes using Photon-specific parser (e.g., PhotonBroadcastNestedLoopJoinExecParser)
- Fallbacks to Spark CPU parser if no Photon-specific parser is provided .
Added PhotonStageExecParser for WholeStageCodegen nodes.

Limitations:

Some mappings are one-to-many; the tool selects the first match.
Supports Databricks Runtime up to 13.3. Future versions will require separate mappings.

Output Changes:

No changes in Output Files schema.
Photon event logs will be processed successfully.

Testing:

Unit Tests:

QualificationSuite: Tests for Photon processing were added.
PhotonPlanParserSuite: Added tests for Photon nodes.

E2E Tests:

Updated Photon test case from skipped to success.

Manual Test Command:

java -Xmx8g -XX:+UseG1GC -cp "$SPARK_RAPIDS_TOOLS_JAR:$SPARK_HOME/jars/*" com.nvidia.spark.rapids.tool.qualification.QualificationMain <photon-event-log>

Signed-off-by: Partho Sarthi <psarthi@nvidia.com>

core/src/main/resources/photonOperatorMapping.json

tgravescs · 2024-09-11T15:19:28Z

core/src/main/scala/com/nvidia/spark/rapids/tool/planparser/DatabricksParseHelper.scala

+    // Implicitly define JSON formats for deserialization using DefaultFormats
+    implicit val formats: Formats = DefaultFormats
+    // Extract and deserialize the JValue object into a Map[String, String]
+    // TODO: Instead of only extracting the first value, we should consider extracting all values


so is this implying the ones that have multiple entries in the mapping file aren't actually used? seems like a must fix

same question here about selecting the first match, is there a plan to fix later?

Discussed offline with Tom, for operators that have 1-to-many mappings, we cannot distinguish between them. Selecting the first one for now.

tgravescs · 2024-09-11T15:20:50Z

core/src/main/scala/com/nvidia/spark/rapids/tool/planparser/DatabricksParseHelper.scala

+        nodeName = sparkNode,
+        simpleString = planInfo.simpleString.replace(planInfo.nodeName, sparkNode),
+        children = planInfo.children,
+        metadata = planInfo.metadata,


does the meta all lineup and metrics? I wouldn't expect metrics to be anywhere close

Could you explain the lining up and metrics?

lets talk about off line. I mean that if photon execs have certain parameters, we need to make sure the CPU one we replace it with has its parameters filled in correctly. In addition to just parameters there are metrics that come out with each exec. In general the Photon ones I think keep the CPU ones and add their own but we need to verify that, especially if we use those metrics for other heuristics.

Verified from NDS photon runs that there are no photon specific parameters

Verified metrics from Photon runs, CPU metrics are kept intact.

tgravescs · 2024-09-11T15:22:40Z

core/src/main/scala/com/nvidia/spark/rapids/tool/planparser/SQLPlanParser.scala

+      } else {
+        // If exprString has the format: Inner, BuildRight
+        // Note: This format is present in Photon Event logs
+        (nestedLoopParameters(1).trim, nestedLoopParameters(0).trim)


this perhaps goes to my last comment, it seems we aren't really doing the full mapping with parameters and everything? if we are going to convert it seems more like to convert it fully over to have some format if possible. If its not possible then we need something like shim layer to deal with.

By default any Photon specific parsing is not needed in most cases. Replacing the name by its equivalent Spark name is sufficient.

Incase, a Photon specific parser is needed, we define a custom parser (eg. added classes like PhotonBroadcastNestedLoopJoinExecParser)

tgravescs · 2024-09-11T15:23:10Z

core/src/main/scala/com/nvidia/spark/rapids/tool/profiling/GenerateDot.scala

@@ -46,6 +46,7 @@ import org.apache.spark.sql.rapids.tool.profiling.{ApplicationInfo, SparkPlanInf
 object GenerateDot {
  val GPU_COLOR = "#76b900" // NVIDIA Green
  val CPU_COLOR = "#0071c5"
+  // TODO: Add color for Photon nodes


not sure why this is here? Are we printing the photon plan or the converted plan or do we have options for both?

Removed this

tgravescs · 2024-09-11T15:25:23Z

core/src/main/scala/org/apache/spark/sql/rapids/tool/util/ToolsPlanGraph.scala

      nodeIdGenerator: AtomicLong,
      nodes: mutable.ArrayBuffer[SparkPlanGraphNode],
      edges: mutable.ArrayBuffer[SparkPlanGraphEdge],
      parent: SparkPlanGraphNode,
      subgraph: SparkPlanGraphCluster,
      exchanges: mutable.HashMap[SparkPlanInfo, SparkPlanGraphNode]): Unit = {
+    // Replace Photon node names with Spark node names
+    // TODO: Skip this if app.isPhoton is false


I would like to see this sooner rather then later if we can easily determine up front

Changed the flow. We do not create a new instance of SparkPlanInfo here. We are only interested in the name

amahussein

It will be nice to learn from our experience dealing with different platforms/versions and be ahead of the changes/feature requests.
I had some previous plans to use shims/plugin to extend the implementation for different Operators/environments, and it never materialized because of priorities.
My 2 cents:

It is more accurate to parse an operator based on the envrionmentType. This is the reason why I changed the arguments of SqlPlanParser to pass in the app so that we can get this information. Ideally, SqlPlanParser should be shimmed since we have DB, DB-photon, and other custom spark from other customers.
Versioning is another dimension. We had suffer from versioning a lot. Especially, for testing. I expect that Photon is changing very frequently and optimizations are going to vary from a release to the other. We will be shooting ourselves in the foot if we repeat the same methodology.
Our use of trait ExecParser could be improved. There are so many copy-pasted code in all the ExecParser classes. Ideally an abstract class should contain all the functions we need to apply, then we only implement the code for what needs to be handled differently. This will reduce our code base significantly.

photoOperatorMapping.json is a very good first step. In the future, as an improvement, it will be nice to see a generic way to plugin other framework.

Thanks @parthosa !

parthosa · 2024-09-17T16:40:40Z

Thanks @amahussein for the feedback.

It is more accurate to parse an operator based on the envrionmentType.

I agree. I was thinking to design parsers specific for each Photon Exec that extend Spark Exec parser. Eg

case class PhotonBroadcastNestedLoopJoinExecParser(
    node: PhotonSparkPlanGraphNode,
    checker: PluginTypeChecker,
    sqlID: Long)
  extends BroadcastNestedLoopJoinExecParserBase(node, checker, sqlID)

Versioning is another dimension. We had suffer from versioning a lot.

Currently we are using a mapping generated by a single Databricks runtime version 13.3. In future, we could extend this by having separate mapping files for separate versions.

Signed-off-by: Partho Sarthi <psarthi@nvidia.com>

parthosa · 2024-09-19T20:47:42Z

.pre-commit-config.yaml

+        name: Check for file over 4.0MiB
+        args: ['--maxkb=4000', '--enforce-all']


Increasing the threshold, as even with ultra zstd compression, unable to reduce photon event log size below 2GB.

Signed-off-by: Partho Sarthi <psarthi@nvidia.com>

parthosa · 2024-09-20T17:11:32Z

core/src/main/scala/com/nvidia/spark/rapids/tool/planparser/SQLPlanParser.scala

+  /**
+   * Parse the SparkPlanGraphNode and return ExecInfo.
+   */
+  def parseSparkNode(


Moved from parsePlanNode

parthosa · 2024-09-20T17:12:58Z

core/src/test/scala/com/nvidia/spark/rapids/tool/planparser/BasePlanParserSuite.scala

@@ -0,0 +1,82 @@
+/*


Moved methods from SQLPlanParserSuite to BasePlanParserSuite

core/src/main/scala/com/nvidia/spark/rapids/tool/planparser/SQLPlanParser.scala

...com/nvidia/spark/rapids/tool/planparser/photon/PhotonBroadcastNestedLoopJoinExecParser.scala

Signed-off-by: Partho Sarthi <psarthi@nvidia.com>

cindyyuanjiang

Thanks @parthosa!

amahussein

Thanks @parthosa
We can discuss offline The ToolsPlanGraph changes.

amahussein · 2024-09-30T19:08:23Z

core/src/main/scala/com/nvidia/spark/rapids/tool/qualification/QualOutputWriter.scala

@@ -466,6 +467,7 @@ object QualOutputWriter {
  val RECOMMENDED_WORKER_NODE_TYPE = "Recommended Worker Node Type"
  val DRIVER_NODE_TYPE = "Driver Node Type"
  val TOTAL_CORE_SEC = "Total Core Seconds"
+  val IS_PHOTON = "Photon App"


Is it part of the requirements to add a binary flag for "Photon App" in the output files?
If not, then this should be part of the App properties Map[String, String]. For example, a user running on dataproc, the columns will not be useful.

Removed the binary flag as we can detect this information from spark_properties.csv

amahussein · 2024-09-30T19:17:10Z

core/src/main/scala/org/apache/spark/sql/rapids/tool/util/ToolsPlanGraph.scala

+    val photonName: String,
+    val photonDesc: String,
+    sparkName: String,
+    sparkDesc: String,


We can revisit this because I think it is redundant to keep this information in node level since the map should be an operator scope rather than node scope. This will also impact memory storage.

sparkName and sparkDesc are passed to SparkPlanGraphNode as node and desc respectively.

This ensures that any call to node.name and node.desc returns the corresponding Spark name and desc.

amahussein · 2024-09-30T19:19:37Z

core/src/main/scala/org/apache/spark/sql/rapids/tool/util/ToolsPlanGraph.scala

@@ -290,7 +356,8 @@ object ToolsPlanGraph {
          planInfo.nodeName,
          planInfo.simpleString,
          mutable.ArrayBuffer[SparkPlanGraphNode](),
-          metrics)
+          metrics,
+          isPhotonNode = isPhotonNode)


We don't need to have isPhotonNode as argument because it is set using an object's method. Then have different classes for photonNodes.

Removed addition of isPhotonNode as we have different classes for PhotonNodes

amahussein

Discussed offline some of the main concepts. The actions:

Adding a column "Is Photon App" to all SQL and applications seem to be invasive.
Discussed the possibility to generate that as a property map that can be extended later to include any sort of information that can be used by QualX.

This reverts commit 316c342.

parthosa · 2024-10-02T22:46:56Z

Converting it to draft as changes in PR-1362 will affect this.

Signed-off-by: Partho Sarthi <psarthi@nvidia.com>

…a-dev # Conflicts: # core/src/main/scala/org/apache/spark/sql/rapids/tool/util/ToolsPlanGraph.scala

Signed-off-by: Partho Sarthi <psarthi@nvidia.com>

parthosa · 2024-10-09T00:23:41Z

@amahussein

Adding a column "Is Photon App" to all SQL and applications seem to be invasive.

Removed Is Photon App column from the output file.

Discussed the possibility to generate that as a property map that can be extended later to include any sort of information that can be used by QualX.

We decided that it is not needed since downstream processes (QualX + Python CLI) can detect photon app using spark_properties.csv

parthosa added 4 commits September 10, 2024 17:07

Add support to process photon event logs

98506a7

Signed-off-by: Partho Sarthi <psarthi@nvidia.com>

Add isPhoton col in output csv and update tests

316c342

Signed-off-by: Partho Sarthi <psarthi@nvidia.com>

Add unit test for photon event log

547ddc2

Signed-off-by: Partho Sarthi <psarthi@nvidia.com>

Update E2E tests for Photon Support

5dffd0c

Signed-off-by: Partho Sarthi <psarthi@nvidia.com>

parthosa added feature request New feature or request core_tools Scope the core module (scala) labels Sep 10, 2024

parthosa self-assigned this Sep 10, 2024

parthosa added the affect-output A change that modifies the output (add/remove/rename files, add/remove/rename columns) label Sep 10, 2024

parthosa marked this pull request as ready for review September 10, 2024 23:20

parthosa requested review from tgravescs, cindyyuanjiang, amahussein and nartal1 September 10, 2024 23:20

tgravescs reviewed Sep 11, 2024

View reviewed changes

parthosa linked an issue Sep 11, 2024 that may be closed by this pull request

[FEA] Add qualification support for Databricks Photon event logs #251

Open

amahussein reviewed Sep 11, 2024

View reviewed changes

parthosa marked this pull request as draft September 18, 2024 17:18

parthosa added 3 commits September 19, 2024 13:31

Refactor photon parsing

941da22

Signed-off-by: Partho Sarthi <psarthi@nvidia.com>

Add PhotonPlanParser unit test

cd4e32f

Signed-off-by: Partho Sarthi <psarthi@nvidia.com>

Address comments

9bc9d10

Signed-off-by: Partho Sarthi <psarthi@nvidia.com>

parthosa force-pushed the spark-rapids-tools-251-support-photon-in-scala branch from fea97ec to 9bc9d10 Compare September 19, 2024 20:46

parthosa commented Sep 19, 2024

View reviewed changes

Add databricks runtime info in mapping file

df8f4ce

Signed-off-by: Partho Sarthi <psarthi@nvidia.com>

parthosa marked this pull request as ready for review September 19, 2024 22:04

parthosa requested review from amahussein and tgravescs September 19, 2024 23:19

Merge branch 'dev' into spark-rapids-tools-251-support-photon-in-scala

c129687

parthosa commented Sep 20, 2024

View reviewed changes

Merge branch 'dev' into spark-rapids-tools-251-support-photon-in-scala

64b1a96

cindyyuanjiang reviewed Sep 25, 2024

View reviewed changes

core/src/main/scala/com/nvidia/spark/rapids/tool/planparser/SQLPlanParser.scala Outdated Show resolved Hide resolved

cindyyuanjiang reviewed Sep 25, 2024

View reviewed changes

...com/nvidia/spark/rapids/tool/planparser/photon/PhotonBroadcastNestedLoopJoinExecParser.scala Show resolved Hide resolved

Address review comments

5d2753c

Signed-off-by: Partho Sarthi <psarthi@nvidia.com>

parthosa requested a review from cindyyuanjiang September 26, 2024 03:22

cindyyuanjiang previously approved these changes Sep 26, 2024

View reviewed changes

amahussein reviewed Sep 30, 2024

View reviewed changes

amahussein requested changes Sep 30, 2024

View reviewed changes

Revert "Add isPhoton col in output csv and update tests"

80440c7

This reverts commit 316c342.

parthosa marked this pull request as draft October 2, 2024 22:47

parthosa added 3 commits October 4, 2024 15:20

Update unit test

a3d6a08

Signed-off-by: Partho Sarthi <psarthi@nvidia.com>

Merge branch 'dev' into spark-rapids-tools-251-support-photon-in-scal…

a73428d

…a-dev # Conflicts: # core/src/main/scala/org/apache/spark/sql/rapids/tool/util/ToolsPlanGraph.scala

Update Photon parsing

84828c9

Signed-off-by: Partho Sarthi <psarthi@nvidia.com>

parthosa dismissed cindyyuanjiang’s stale review via 84828c9 October 8, 2024 23:49

Remove unused PhotonEventLogException

b19f3d1

Signed-off-by: Partho Sarthi <psarthi@nvidia.com>

parthosa marked this pull request as ready for review October 9, 2024 00:14

parthosa requested review from cindyyuanjiang and amahussein October 9, 2024 00:15

parthosa removed the affect-output A change that modifies the output (add/remove/rename files, add/remove/rename columns) label Oct 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for processing Photon event logs in Scala #1338

Add support for processing Photon event logs in Scala #1338

parthosa commented Sep 10, 2024 •

edited

Loading

tgravescs Sep 11, 2024

cindyyuanjiang Sep 18, 2024

parthosa Sep 19, 2024

tgravescs Sep 11, 2024

parthosa Sep 11, 2024

tgravescs Sep 11, 2024

parthosa Sep 19, 2024

tgravescs Sep 11, 2024

parthosa Sep 19, 2024

tgravescs Sep 11, 2024

parthosa Sep 19, 2024

tgravescs Sep 11, 2024

parthosa Sep 19, 2024

amahussein left a comment

parthosa commented Sep 17, 2024 •

edited

Loading

parthosa Sep 19, 2024

parthosa Sep 20, 2024

parthosa Sep 20, 2024

cindyyuanjiang left a comment

amahussein left a comment

amahussein Sep 30, 2024

parthosa Oct 9, 2024

amahussein Sep 30, 2024

parthosa Oct 9, 2024

amahussein Sep 30, 2024

parthosa Oct 9, 2024

amahussein left a comment

parthosa commented Oct 2, 2024

parthosa commented Oct 9, 2024

		name: Check for file over 4.0MiB
		args: ['--maxkb=4000', '--enforce-all']

Add support for processing Photon event logs in Scala #1338

Are you sure you want to change the base?

Add support for processing Photon event logs in Scala #1338

Conversation

parthosa commented Sep 10, 2024 • edited Loading

Approach:

Limitations:

Output Changes:

Testing:

Unit Tests:

E2E Tests:

Manual Test Command:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amahussein left a comment

Choose a reason for hiding this comment

parthosa commented Sep 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cindyyuanjiang left a comment

Choose a reason for hiding this comment

amahussein left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amahussein left a comment

Choose a reason for hiding this comment

parthosa commented Oct 2, 2024

parthosa commented Oct 9, 2024

parthosa commented Sep 10, 2024 •

edited

Loading

parthosa commented Sep 17, 2024 •

edited

Loading