Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run Python transformations on Databricks #204

Merged
merged 12 commits into from
Jun 28, 2023

Conversation

jirifilip
Copy link
Collaborator

@jirifilip jirifilip commented Jun 6, 2023

A very bare-bones implementation of running Pramen-Py transformations on Databricks. We submit a one-time transient job using a REST API.

Not sure about:

  • Responses object which contains case classes used for the REST API response deserialization. The attributes are snake-case for simplified deserialization. But we can also expand it into a POJO with Json annotations if required.

More possible features that could be implemented in another PR (this PR was getting really large):

  • Put every configuration for Pramen-Py command line under (pramen.py.cmd). Databricks-specific configuration is already under (pramen.py.databricks)
  • Specifying a job cluster per Python transformation (sort of like it's done now for the command-line runner).
  • Implementing a Runner class in Pramen-Py that could just be imported into the runner script/notebook or run as python wheel task (instead of copy pasting a big ugly script from the README)

@github-actions
Copy link

github-actions bot commented Jun 6, 2023

Unit Test Coverage

File Coverage [67.52%]
PramenPyJobTemplate.scala 100% 🍏
DatabricksClient.scala 100% 🍏
PramenPyCmdConfig.scala 100% 🍏
StringUtils.scala 100% 🍏
PythonTransformationJob.scala 96.01% 🍏
ConfigUtils.scala 82.96% 🍏
Responses.scala 77.27% 🍏
OperationSplitter.scala 45.44%
DatabricksClientImpl.scala 12.52%
Total Project Coverage 78.72% 🍏

Copy link
Collaborator

@yruslan yruslan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of comments. Since it is a draft, I wasn't too nitpicky.

@jirifilip jirifilip force-pushed the feature/run-python-transformation-on-databricks branch from 2a4cc7c to c1cd748 Compare June 21, 2023 07:46
@jirifilip jirifilip force-pushed the feature/run-python-transformation-on-databricks branch from 4196794 to 9bb65d5 Compare June 27, 2023 10:39
@jirifilip jirifilip marked this pull request as ready for review June 27, 2023 10:44
@jirifilip jirifilip changed the title Initial draft of how the Pramen-Py and Databricks integration could work Run Pramen-Py transformations on Databricks Jun 28, 2023
@jirifilip jirifilip changed the title Run Pramen-Py transformations on Databricks Run Python transformations on Databricks Jun 28, 2023
@jirifilip jirifilip requested a review from yruslan June 28, 2023 09:26
Copy link
Collaborator

@yruslan yruslan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome! Great job!

}

private[databricks] def replaceVariablesInMap(map: Map[String, Any]): Map[String, Any] = {
// in typesafe Config, keys can be set to null (this function will be maily called on Maps created from
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😮

jirifilip and others added 2 commits June 28, 2023 14:32
@jirifilip
Copy link
Collaborator Author

Thanks a lot! I was a long journey 😄

@jirifilip jirifilip merged commit bcc8bee into main Jun 28, 2023
@jirifilip jirifilip deleted the feature/run-python-transformation-on-databricks branch June 28, 2023 12:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants