Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support of delta table for bookkeeper #475

Closed
lukas-zeman-ABSA opened this issue Aug 21, 2024 · 6 comments · Fixed by #478
Closed

Add support of delta table for bookkeeper #475

lukas-zeman-ABSA opened this issue Aug 21, 2024 · 6 comments · Fixed by #478
Labels
DS enhancement New feature or request Pramen-Scala

Comments

@lukas-zeman-ABSA
Copy link

Add support of delta table for bookkeeper. Could be used to maintain metastore in databricks.

bookkeepingConfig.bookkeepingHadoopFormat match {

@yruslan
Copy link
Collaborator

yruslan commented Aug 21, 2024

We had such an implementation, actually 😄. It was quite slow, so we removed it. But it was a couple of years ago. Maybe now is a good time to revive it.

@yruslan yruslan added enhancement New feature or request Pramen-Scala DS labels Aug 21, 2024
@yruslan
Copy link
Collaborator

yruslan commented Aug 27, 2024

Found classes for Delta. I want to restore them in next Pramen version. Just, currently, it uses Delta paths, not tables. This is because it requires several different subpaths to save different stuff. Do you want to add Delta Lake table support or a path is fine?

@lukas-zeman-ABSA
Copy link
Author

Well maybe we could make it work at databricks with just path, but saveAsTable would be much better. (It would improve speed and also allow us to store this data in databricks managed tables)

@yruslan
Copy link
Collaborator

yruslan commented Aug 27, 2024

Got it, will add support for tables

@yruslan
Copy link
Collaborator

yruslan commented Aug 27, 2024

Just want also to clarify that Pramen is going to use several tables for bookkeeping, So when this is implemented, you can specify the database and table prefix for Delta Table configuration.

Somethting like:

pramen {
  bookkeeping.enabled = true
  bookkeeping.delta.database = "my_db"
  bookkeeping.delta.table.prefix = "bk_"
}

Let me know if this is okay for you.

@lukas-zeman-ABSA
Copy link
Author

chcecked the implementation. Yes this would work totally fine, thanks. Theoretically database here means "catalog.schema" but will work :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DS enhancement New feature or request Pramen-Scala
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants