-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added deployment state for bundles #1267
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1267 +/- ##
==========================================
- Coverage 52.54% 52.50% -0.05%
==========================================
Files 308 321 +13
Lines 17614 18300 +686
==========================================
+ Hits 9256 9609 +353
- Misses 7664 7967 +303
- Partials 694 724 +30 ☔ View full report in Codecov by Sentry. |
Integration tests passed |
|
||
if b.Config.Workspace.CurrentUser != nil { | ||
opts.CurrentUser = b.Config.Workspace.CurrentUser.User | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious: when is this nil?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I don't think realistically it can, just did the check based on type and it can be potentially nil.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! This one is slightly tricky to get right.
bundle/deploy/state_pull.go
Outdated
|
||
// Create a new snapshot based on the deployment state file. | ||
log.Infof(ctx, "Creating new snapshot") | ||
snapshotState, err := sync.NewSnapshotState(state.Files.ToSlice(b.Config.Path)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can instead merge the deployment state with the existing snapshot state here, to ensure that any files deleted locally (file is present in the deployment state but not the local sync snapshot state) to make incremental sync work (like alluded to above)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't really keep the local snapshot as @pietern pointed earlier for cases when someone else have made a deployment with new files. If we keep local snapshot we will just lose track of these files because we don't sync them from workspace to local system. So if someone does deployment in this case remote files will stay lingering there. But if we take deployment state as source of truth, they will be removed which is in line with local state of files when deployment was made.
Instead we better do just full sync, which makes reconsolidating a bit more straight forward.
@@ -49,7 +48,8 @@ func NewSnapshotState(localFiles []fileset.File) (*SnapshotState, error) { | |||
for _, f := range localFiles { | |||
// Compute the remote name the file will have in WSFS | |||
remoteName := filepath.ToSlash(f.Relative) | |||
isNotebook, _, err := notebook.Detect(f.Absolute) | |||
isNotebook, err := f.IsNotebook() | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the underlying file does not exist on the local file system, and we're restoring a remote snapshot, then the file should still be included in the snapshot such that a new sync will issue a delete for the notebook.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, indeed, that's what's happening right now (confirmed with test coverage)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I understand now; this works because whether the file is a notebook or not is cached in the underlying object. The lines are blurred w.r.t. responsibilities, between what happens in the bundle code, what happens in fileset
, and what happens here. I don't have a concrete proposal for a different approach, but this feels a bit scattered. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd argue that the way things now is quite reasonable:
- within
snapshot_state
scope ifIsNotebook
fails with whatever reason - we can't proceed further - within
file
scope we keep an internal value ofisNotebook
but provide public functions to create files of specific types likeNewNotebookFile
- within
development state
scope, we know what is the type of the file we work with and thus can instantiate eitherNewNotebookFile
orNewSourceFile
I guess the confusion comes from knowing that previously we have to always do calls to FS and now that's not the case anymore
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are the integration tests looking?
bundle/phases/deploy.go
Outdated
@@ -41,6 +43,7 @@ func Deploy() bundle.Mutator { | |||
terraform.Load(), | |||
metadata.Compute(), | |||
metadata.Upload(), | |||
deploy.StatePush(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can move right after the TF state push. We could also parallelize them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think as a general improvement it would be nice to introduce bundle.Parallel
and use it in various phases
@@ -49,7 +48,8 @@ func NewSnapshotState(localFiles []fileset.File) (*SnapshotState, error) { | |||
for _, f := range localFiles { | |||
// Compute the remote name the file will have in WSFS | |||
remoteName := filepath.ToSlash(f.Relative) | |||
isNotebook, _, err := notebook.Detect(f.Absolute) | |||
isNotebook, err := f.IsNotebook() | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I understand now; this works because whether the file is a notebook or not is cached in the underlying object. The lines are blurred w.r.t. responsibilities, between what happens in the bundle code, what happens in fileset
, and what happens here. I don't have a concrete proposal for a different approach, but this feels a bit scattered. Thoughts?
All green ✅ |
CLI: * Propagate correct `User-Agent` for CLI during OAuth flow ([#1264](#1264)). * Add usage string when command fails with incorrect arguments ([#1276](#1276)). Bundles: * Include `dyn.Path` as argument to the visit callback function ([#1260](#1260)). * Inline logic to set a value in `dyn.SetByPath` ([#1261](#1261)). * Add assertions for the `dyn.Path` argument to the visit callback ([#1265](#1265)). * Add `dyn.MapByPattern` to map a function to values with matching paths ([#1266](#1266)). * Filter current user from resource permissions ([#1262](#1262)). * Retain location annotation when expanding globs for pipeline libraries ([#1274](#1274)). * Added deployment state for bundles ([#1267](#1267)). * Do CheckRunningResource only after terraform.Write ([#1292](#1292)). * Rewrite relative paths using `dyn.Location` of the underlying value ([#1273](#1273)). * Push deployment state right after files upload ([#1293](#1293)). * Make `Append` function to `dyn.Path` return independent slice ([#1295](#1295)). * Move bundle tests into bundle/tests ([#1299](#1299)). * Upgrade Terraform provider to 1.38.0 ([#1308](#1308)). Internal: * Add integration test for mlops-stacks initialization ([#1155](#1155)). * Update actions/setup-python to v5 ([#1290](#1290)). * Update codecov/codecov-action to v4 ([#1291](#1291)). API Changes: * Changed `databricks catalogs list` command. * Changed `databricks online-tables create` command. * Changed `databricks lakeview publish` command. * Added `databricks lakeview create` command. * Added `databricks lakeview get` command. * Added `databricks lakeview get-published` command. * Added `databricks lakeview trash` command. * Added `databricks lakeview update` command. * Moved settings related commands to `databricks settings` and `databricks account settings`. OpenAPI commit 93763b0d7ae908520c229c786fff28b8fd623261 (2024-03-20) Dependency updates: * Bump golang.org/x/oauth2 from 0.17.0 to 0.18.0 ([#1270](#1270)). * Bump golang.org/x/mod from 0.15.0 to 0.16.0 ([#1271](#1271)). * Update Go SDK to v0.35.0 ([#1300](#1300)). * Update Go SDK to v0.36.0 ([#1304](#1304)).
CLI: * Propagate correct `User-Agent` for CLI during OAuth flow ([#1264](#1264)). * Add usage string when command fails with incorrect arguments ([#1276](#1276)). Bundles: * Include `dyn.Path` as argument to the visit callback function ([#1260](#1260)). * Inline logic to set a value in `dyn.SetByPath` ([#1261](#1261)). * Add assertions for the `dyn.Path` argument to the visit callback ([#1265](#1265)). * Add `dyn.MapByPattern` to map a function to values with matching paths ([#1266](#1266)). * Filter current user from resource permissions ([#1262](#1262)). * Retain location annotation when expanding globs for pipeline libraries ([#1274](#1274)). * Added deployment state for bundles ([#1267](#1267)). * Do CheckRunningResource only after terraform.Write ([#1292](#1292)). * Rewrite relative paths using `dyn.Location` of the underlying value ([#1273](#1273)). * Push deployment state right after files upload ([#1293](#1293)). * Make `Append` function to `dyn.Path` return independent slice ([#1295](#1295)). * Move bundle tests into bundle/tests ([#1299](#1299)). * Upgrade Terraform provider to 1.38.0 ([#1308](#1308)). Internal: * Add integration test for mlops-stacks initialization ([#1155](#1155)). * Update actions/setup-python to v5 ([#1290](#1290)). * Update codecov/codecov-action to v4 ([#1291](#1291)). API Changes: * Changed `databricks catalogs list` command. * Changed `databricks online-tables create` command. * Changed `databricks lakeview publish` command. * Added `databricks lakeview create` command. * Added `databricks lakeview get` command. * Added `databricks lakeview get-published` command. * Added `databricks lakeview trash` command. * Added `databricks lakeview update` command. * Moved settings related commands to `databricks settings` and `databricks account settings`. OpenAPI commit 93763b0d7ae908520c229c786fff28b8fd623261 (2024-03-20) Dependency updates: * Bump golang.org/x/oauth2 from 0.17.0 to 0.18.0 ([#1270](#1270)). * Bump golang.org/x/mod from 0.15.0 to 0.16.0 ([#1271](#1271)). * Update Go SDK to v0.35.0 ([#1300](#1300)). * Update Go SDK to v0.36.0 ([#1304](#1304)).
Changes
This PR introduces new structure (and a file) being used locally and synced remotely to Databricks workspace to track bundle deployment related metadata.
The state is pulled from remote, updated and pushed back remotely as part of
bundle deploy
command.This state can be used for deployment sequencing as it's
Version
field is monotonically increasing on each deployment.Currently, it only tracks files being synced as part of the deployment.
This helps fix the issue with files not being removed during deployments on CI/CD as sync snapshot was never present there.
Fixes #943
Tests
Added E2E (regression) test for files removal on CI/CD