-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for processing Photon event logs in Scala #1338
base: dev
Are you sure you want to change the base?
Add support for processing Photon event logs in Scala #1338
Conversation
Signed-off-by: Partho Sarthi <psarthi@nvidia.com>
Signed-off-by: Partho Sarthi <psarthi@nvidia.com>
Signed-off-by: Partho Sarthi <psarthi@nvidia.com>
Signed-off-by: Partho Sarthi <psarthi@nvidia.com>
// Implicitly define JSON formats for deserialization using DefaultFormats | ||
implicit val formats: Formats = DefaultFormats | ||
// Extract and deserialize the JValue object into a Map[String, String] | ||
// TODO: Instead of only extracting the first value, we should consider extracting all values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so is this implying the ones that have multiple entries in the mapping file aren't actually used? seems like a must fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same question here about selecting the first match, is there a plan to fix later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline with Tom, for operators that have 1-to-many mappings, we cannot distinguish between them. Selecting the first one for now.
nodeName = sparkNode, | ||
simpleString = planInfo.simpleString.replace(planInfo.nodeName, sparkNode), | ||
children = planInfo.children, | ||
metadata = planInfo.metadata, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does the meta all lineup and metrics? I wouldn't expect metrics to be anywhere close
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain the lining up and metrics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets talk about off line. I mean that if photon execs have certain parameters, we need to make sure the CPU one we replace it with has its parameters filled in correctly. In addition to just parameters there are metrics that come out with each exec. In general the Photon ones I think keep the CPU ones and add their own but we need to verify that, especially if we use those metrics for other heuristics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Verified from NDS photon runs that there are no photon specific parameters
- Verified metrics from Photon runs, CPU metrics are kept intact.
} else { | ||
// If exprString has the format: Inner, BuildRight | ||
// Note: This format is present in Photon Event logs | ||
(nestedLoopParameters(1).trim, nestedLoopParameters(0).trim) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this perhaps goes to my last comment, it seems we aren't really doing the full mapping with parameters and everything? if we are going to convert it seems more like to convert it fully over to have some format if possible. If its not possible then we need something like shim layer to deal with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- By default any Photon specific parsing is not needed in most cases. Replacing the name by its equivalent Spark name is sufficient.
- Incase, a Photon specific parser is needed, we define a custom parser (eg. added classes like
PhotonBroadcastNestedLoopJoinExecParser
)
@@ -46,6 +46,7 @@ import org.apache.spark.sql.rapids.tool.profiling.{ApplicationInfo, SparkPlanInf | |||
object GenerateDot { | |||
val GPU_COLOR = "#76b900" // NVIDIA Green | |||
val CPU_COLOR = "#0071c5" | |||
// TODO: Add color for Photon nodes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure why this is here? Are we printing the photon plan or the converted plan or do we have options for both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed this
nodeIdGenerator: AtomicLong, | ||
nodes: mutable.ArrayBuffer[SparkPlanGraphNode], | ||
edges: mutable.ArrayBuffer[SparkPlanGraphEdge], | ||
parent: SparkPlanGraphNode, | ||
subgraph: SparkPlanGraphCluster, | ||
exchanges: mutable.HashMap[SparkPlanInfo, SparkPlanGraphNode]): Unit = { | ||
// Replace Photon node names with Spark node names | ||
// TODO: Skip this if app.isPhoton is false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to see this sooner rather then later if we can easily determine up front
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the flow. We do not create a new instance of SparkPlanInfo
here. We are only interested in the name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be nice to learn from our experience dealing with different platforms/versions and be ahead of the changes/feature requests.
I had some previous plans to use shims/plugin to extend the implementation for different Operators/environments, and it never materialized because of priorities.
My 2 cents:
- It is more accurate to parse an operator based on the envrionmentType. This is the reason why I changed the arguments of SqlPlanParser to pass in the app so that we can get this information. Ideally, SqlPlanParser should be shimmed since we have DB, DB-photon, and other custom spark from other customers.
- Versioning is another dimension. We had suffer from versioning a lot. Especially, for testing. I expect that Photon is changing very frequently and optimizations are going to vary from a release to the other. We will be shooting ourselves in the foot if we repeat the same methodology.
- Our use of
trait ExecParser
could be improved. There are so many copy-pasted code in all the ExecParser classes. Ideally an abstract class should contain all the functions we need to apply, then we only implement the code for what needs to be handled differently. This will reduce our code base significantly.
photoOperatorMapping.json
is a very good first step. In the future, as an improvement, it will be nice to see a generic way to plugin other framework.
Thanks @parthosa !
Thanks @amahussein for the feedback.
I agree. I was thinking to design parsers specific for each Photon Exec that extend Spark Exec parser. Eg case class PhotonBroadcastNestedLoopJoinExecParser(
node: PhotonSparkPlanGraphNode,
checker: PluginTypeChecker,
sqlID: Long)
extends BroadcastNestedLoopJoinExecParserBase(node, checker, sqlID)
Currently we are using a mapping generated by a single Databricks runtime version 13.3. In future, we could extend this by having separate mapping files for separate versions. |
Signed-off-by: Partho Sarthi <psarthi@nvidia.com>
Signed-off-by: Partho Sarthi <psarthi@nvidia.com>
Signed-off-by: Partho Sarthi <psarthi@nvidia.com>
fea97ec
to
9bc9d10
Compare
name: Check for file over 4.0MiB | ||
args: ['--maxkb=4000', '--enforce-all'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Increasing the threshold, as even with ultra
zstd compression, unable to reduce photon event log size below 2GB.
Signed-off-by: Partho Sarthi <psarthi@nvidia.com>
/** | ||
* Parse the SparkPlanGraphNode and return ExecInfo. | ||
*/ | ||
def parseSparkNode( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved from parsePlanNode
@@ -0,0 +1,82 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved methods from SQLPlanParserSuite
to BasePlanParserSuite
core/src/main/scala/com/nvidia/spark/rapids/tool/planparser/SQLPlanParser.scala
Outdated
Show resolved
Hide resolved
...com/nvidia/spark/rapids/tool/planparser/photon/PhotonBroadcastNestedLoopJoinExecParser.scala
Show resolved
Hide resolved
Signed-off-by: Partho Sarthi <psarthi@nvidia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @parthosa!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @parthosa
We can discuss offline The ToolsPlanGraph changes.
@@ -466,6 +467,7 @@ object QualOutputWriter { | |||
val RECOMMENDED_WORKER_NODE_TYPE = "Recommended Worker Node Type" | |||
val DRIVER_NODE_TYPE = "Driver Node Type" | |||
val TOTAL_CORE_SEC = "Total Core Seconds" | |||
val IS_PHOTON = "Photon App" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it part of the requirements to add a binary flag for "Photon App" in the output files?
If not, then this should be part of the App properties Map[String, String]. For example, a user running on dataproc, the columns will not be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the binary flag as we can detect this information from spark_properties.csv
val photonName: String, | ||
val photonDesc: String, | ||
sparkName: String, | ||
sparkDesc: String, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can revisit this because I think it is redundant to keep this information in node level since the map should be an operator scope rather than node scope. This will also impact memory storage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sparkName
and sparkDesc
are passed to SparkPlanGraphNode
as node
and desc
respectively.
This ensures that any call to node.name
and node.desc
returns the corresponding Spark name and desc.
@@ -290,7 +356,8 @@ object ToolsPlanGraph { | |||
planInfo.nodeName, | |||
planInfo.simpleString, | |||
mutable.ArrayBuffer[SparkPlanGraphNode](), | |||
metrics) | |||
metrics, | |||
isPhotonNode = isPhotonNode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to have isPhotonNode
as argument because it is set using an object's method. Then have different classes for photonNodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed addition of isPhotonNode
as we have different classes for PhotonNodes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline some of the main concepts. The actions:
- Adding a column "Is Photon App" to all SQL and applications seem to be invasive.
- Discussed the possibility to generate that as a property map that can be extended later to include any sort of information that can be used by QualX.
This reverts commit 316c342.
Converting it to draft as changes in PR-1362 will affect this. |
Signed-off-by: Partho Sarthi <psarthi@nvidia.com>
…a-dev # Conflicts: # core/src/main/scala/org/apache/spark/sql/rapids/tool/util/ToolsPlanGraph.scala
Signed-off-by: Partho Sarthi <psarthi@nvidia.com>
Signed-off-by: Partho Sarthi <psarthi@nvidia.com>
Removed
We decided that it is not needed since downstream processes (QualX + Python CLI) can detect photon app using |
Contributes to #251. This PR adds support for qualifying Photon event logs in Scala.
Approach:
com.databricks.photon.PhotonSupport
.PhotonSparkPlanGraphNode
: ExtendsSparkPlanGraphNode
for non-WholeStageCodegen
nodesPhotonSparkPlanGraphCluster
: ExtendsSparkPlanGraphCluster
for Photon'sWholeStageCodegen
nodes.PhotonPlanParser
for non-WholeStageCodegen
nodes:parseNode
method to parse Photon nodes using Photon-specific parser (e.g.,PhotonBroadcastNestedLoopJoinExecParser
)PhotonStageExecParser
forWholeStageCodegen
nodes.Limitations:
Output Changes:
Testing:
Unit Tests:
QualificationSuite
: Tests for Photon processing were added.PhotonPlanParserSuite
: Added tests for Photon nodes.E2E Tests:
skipped
tosuccess
.Manual Test Command: