Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve StandardizationSink #210

Merged
merged 8 commits into from
Jun 15, 2023
Merged

Improve StandardizationSink #210

merged 8 commits into from
Jun 15, 2023

Conversation

yruslan
Copy link
Collaborator

@yruslan yruslan commented Jun 14, 2023

  • Add support for Delta as the format of the raw layer.
  • Add support for Delta as the format of the publish layer.
  • Add support for customizing partition columns of the publish layer.
  • Allow column transformations to use information date in expressions.

@@ -245,7 +247,13 @@ abstract class TaskRunnerBase(conf: Config,
case None => runResult.data
}

val postProcessed = task.job.postProcessing(dfWithTimestamp, task.infoDate, conf)
val dfWithInfoDate = if (dfWithTimestamp.schema.exists(f => f.name.equals(task.job.outputTable.infoDateColumn))) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only change in the framework itself. The rest is Enceladus-specific.

@@ -336,10 +336,12 @@ class TaskRunnerBaseSuite extends AnyWordSpec with SparkTestBase with TextCompar
"""[ {
| "a" : "B",
| "b" : 2,
| "INFO_DATE" : "2022-02-18",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this is the test for the new logic - the information date is now available after a jobs has run, even before it is saved.

@github-actions
Copy link

github-actions bot commented Jun 14, 2023

Unit Test Coverage

File Coverage [84.72%] 🍏
StandardizationConfig.scala 95.91% 🍏
HiveFormat.scala 90.36% 🍏
StandardizationSink.scala 83.17% 🍏
TaskRunnerBase.scala 83.1% 🍏
Total Project Coverage 78.92% 🍏

@yruslan yruslan marked this pull request as ready for review June 15, 2023 07:17
@yruslan yruslan requested a review from jirifilip as a code owner June 15, 2023 07:17
@yruslan yruslan enabled auto-merge (rebase) June 15, 2023 07:17
@yruslan yruslan merged commit 3776d85 into main Jun 15, 2023
@yruslan yruslan deleted the feature/delta-output-std-sink branch June 15, 2023 13:18
@yruslan yruslan mentioned this pull request Jun 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants