-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: retrieve and include extra field in GET /execution-events #1339
feat: retrieve and include extra field in GET /execution-events #1339
Conversation
@@ -192,5 +193,6 @@ object ExecutionEventRepositoryImpl { | |||
"extra.appId", | |||
"execPlanDetails.dataSourceUri", | |||
"execPlanDetails.dataSourceType", | |||
"extra" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, this field is not analysed, we don't search on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wajda I added that because I would like to be able to search on the field I am adding in the extraMetaRule. I understand if thats not supported. Alternatively could I add this right after the analyze function in the arango query?
OR CONTAINS(LOWER(ee.extra), @searchTerm)
E.g.,
| AND (@searchTerm == null
${
SearchFields.map({
fld =>
s""" OR ANALYZER(LIKE(ee.$fld, CONCAT("%", TOKENS(@searchTerm, "norm_en")[0], "%")), "norm_en")"""
}).mkString("\n")
}
| OR CONTAINS(LOWER(ee.extra), @searchTerm)
| )
Let me know. Otherwise I will just remove the field and make do without the search capability!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that the field extra
is nothing but a blackbox payload for Spline. It's an extension point in the data model that enables you to cary some information through Spline, while Spline itself has no idea what it is and how to handle it. That's why we specifically don't want to look into that field - we can barely carry it over, but not to look inside it.
If you want a custom property that you could search on, then I would suggest you to use labels. It basically just other place to store a custom data in the Spline model, but while extra
is a wild card, the labels
is a set of key-value pairs which you can filter on (not search but filter).
Look at the example here - https://github.com/AbsaOSS/spline-spark-agent/blob/e558e2a92a72b00c3e45c0dd5dda274be837262f/examples/src/main/resources/spline.yaml#L38
And then on the UI / Consumer API you can do this - https://github.com/AbsaOSS/spline/blob/develop/consumer-rest-core/src/main/scala/za/co/absa/spline/consumer/rest/controller/LabelsController.scala and also this - https://github.com/AbsaOSS/spline/blob/develop/consumer-rest-core/src/main/scala/za/co/absa/spline/consumer/rest/controller/ExecutionEventsController.scala#L64.
Basically, the API allows you to list available label names, as well as the distinct values for the given label, and then you can filter your events by combination of label key:value pairs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh. Yes looks like the use of labels
could fill the gap that we were trying to fill with the extra
field. I think I missed this because we are on the latest release which doesn't appear to support either the labels
controller or the labels
field in the request/response of the GET /execution-events
.
With that in mind, I'll pare this PR down to simply adding the extra
field into the arango query and into the response object. However, when I look at WriteEventInfo
, I am not seeing the labels
field here. Is this something I should add as part of this PR as well similar to how I have done for extra
?
case class WriteEventInfo
(
@ApiModelProperty(value = "Id of the execution event")
executionEventId: WriteEventInfo.Id,
@ApiModelProperty(value = "Id of the execution plan")
executionPlanId: ExecutionPlanInfo.Id,
@ApiModelProperty(value = "Name of the framework that triggered this execution event")
frameworkName: String,
@ApiModelProperty(value = "Name of the application/job")
applicationName: String,
@ApiModelProperty(value = "Id of the application/job")
applicationId: String,
@ApiModelProperty(value = "When the execution was triggered")
timestamp: WriteEventInfo.Timestamp,
@ApiModelProperty(value = "Duration of execution in nanoseconds (for successful executions)")
durationNs: Option[WriteEventInfo.DurationNs],
@ApiModelProperty(value = "Error (for failed executions)")
error: Option[WriteEventInfo.Error],
@ApiModelProperty(value = "Output data source name")
dataSourceName: String,
@ApiModelProperty(value = "Output data source URI")
dataSourceUri: String,
@ApiModelProperty(value = "Output data source (or data) type")
dataSourceType: String,
@ApiModelProperty(value = "Write mode - (true=Append; false=Override)")
append: WriteEventInfo.Append,
@ApiModelProperty(value = "Other extra info")
extra: Map[String, Any]
) {
def this() = this(null, null, null, null, null, null, null, null, null, null, null, null, null)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I don't mind, please go ahead. But please test it thoroughly on your side first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do see the labels
filtering happening in the query:
${
lblNames.zipWithIndex.map({
case (lblName, i) =>
s"""
AND (
@lblValues[$i] ANY == ee.labels['${escapeJavaScript(lblName)}']
OR @lblValues[$i] ANY == ee.execPlanDetails.labels['${escapeJavaScript(lblName)}']
)
""".stripMargin
}).mkString("\n")
}
However, it would be nice to see those coming back in the API response as well. Is that something you'd be comfortable with me adding @wajda ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wajda I took a stab at wiring up the labels
field. I won't be able to test this right away, so I think I'll just move forward with my forked spline
repo with these changes. If everything is working as I get to testing I will let you know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wajda The latest commit has the working code for including labels
and extra
in the response for GET /execution-events
. I was able to get this tested. This is the new responses below and their associated userExtraMeta rule.
Let me know if there was anything else you wanted to see here.
Labels
userExtraMeta rule:
{
"executionEvent": {
"labels": {
"sparkContextId": {
"$js": "session.conf().get('spark.databricks.sparkContextId')"
}
}
}
}
Observed Response of GET /execution-events
{
"items": [
{
"executionEventId": "6d454497-338b-529e-9c9c-7363abad856b:lxtce2er",
"executionPlanId": "6d454497-338b-529e-9c9c-7363abad856b",
"frameworkName": "spark 3.3.2",
"applicationName": "Databricks Shell",
"applicationId": "...",
"timestamp": 1719255593907,
"durationNs": 4.4308673125E10,
"error": null,
"dataSourceName": "newcountries",
"dataSourceUri": "...",
"dataSourceType": "delta",
"append": true,
"extra": {
"appId": "...",
"user": "root",
"readMetrics": {},
"writeMetrics": {}
},
"labels": {
"sparkContextId": [
"8900849439166745669"
]
}
}
],
"totalCount": 9,
"pageNum": 1,
"pageSize": 10,
"totalDateRange": []
}
Extra
userExtraMeta rule:
{
"executionEvent": {
"extra": {
"sparkContextId": {
"$js": "session.conf().get('spark.databricks.sparkContextId')"
}
}
}
}
Observed Response of GET /execution-events
{
"items": [
{
"executionEventId": "3da8af99-80b9-5622-847f-209d6f9b10a0:lxtduewe",
"executionPlanId": "3da8af99-80b9-5622-847f-209d6f9b10a0",
"frameworkName": "spark 3.3.2",
"applicationName": "Databricks Shell",
"applicationId": "...",
"timestamp": 1719258036206,
"durationNs": 2.2877156976E10,
"error": null,
"dataSourceName": "...",
"dataSourceUri": ...,
"dataSourceType": "delta",
"append": true,
"extra": {
"readMetrics": {},
"sparkContextId": "6677040947717205628",
"appId": "...",
"user": "root",
"writeMetrics": {}
},
"labels": {}
}
],
"totalCount": 9,
"pageNum": 1,
"pageSize": 10,
"totalDateRange": []
}
8be707d
to
311465e
Compare
311465e
to
e57d07b
Compare
9471179
to
61495db
Compare
…execution-events Add fields extra and labels to response of GET /execution-events
61495db
to
85db72e
Compare
85db72e
to
ae88544
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a minor change in typing: ArrayList -> List
consumer-services/src/main/scala/za/co/absa/spline/consumer/service/model/WriteEventInfo.scala
Outdated
Show resolved
Hide resolved
consumer-services/src/main/scala/za/co/absa/spline/consumer/service/model/WriteEventInfo.scala
Outdated
Show resolved
Hide resolved
…rvice/model/WriteEventInfo.scala
…rvice/model/WriteEventInfo.scala
Overview
Adding this feature as per discussion with @wajda on the spark-spline-agent repo. See discussion for more details.
Changes
WriteEventInfo
model to include theextra
field which is already available within the ArangoDBExecutionEventRepositoryImpl
to read theextra
field from the execution event table in the DB