Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added deployment state for bundles #1267

Merged
merged 9 commits into from
Mar 18, 2024
Merged

Added deployment state for bundles #1267

merged 9 commits into from
Mar 18, 2024

Conversation

andrewnester
Copy link
Contributor

@andrewnester andrewnester commented Mar 8, 2024

Changes

This PR introduces new structure (and a file) being used locally and synced remotely to Databricks workspace to track bundle deployment related metadata.

The state is pulled from remote, updated and pushed back remotely as part of bundle deploy command.

This state can be used for deployment sequencing as it's Version field is monotonically increasing on each deployment.

Currently, it only tracks files being synced as part of the deployment.

This helps fix the issue with files not being removed during deployments on CI/CD as sync snapshot was never present there.

Fixes #943

Tests

Added E2E (regression) test for files removal on CI/CD

@codecov-commenter
Copy link

codecov-commenter commented Mar 8, 2024

Codecov Report

Attention: Patch coverage is 64.51613% with 121 lines in your changes are missing coverage. Please review.

Project coverage is 52.50%. Comparing base (29ab96f) to head (e5538a1).
Report is 19 commits behind head on main.

Files Patch % Lines
bundle/deploy/state_pull.go 64.94% 24 Missing and 10 partials ⚠️
bundle/deploy/state_update.go 56.00% 23 Missing and 10 partials ⚠️
bundle/deploy/state.go 74.69% 14 Missing and 7 partials ⚠️
bundle/deploy/state_push.go 44.00% 10 Missing and 4 partials ⚠️
internal/bundle/helpers.go 0.00% 6 Missing ⚠️
cmd/bundle/sync.go 0.00% 4 Missing ⚠️
libs/fileset/file.go 91.42% 2 Missing and 1 partial ⚠️
bundle/deploy/filer.go 0.00% 2 Missing ⚠️
bundle/deploy/terraform/state_pull.go 50.00% 1 Missing ⚠️
bundle/deploy/terraform/state_push.go 50.00% 1 Missing ⚠️
... and 2 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1267      +/-   ##
==========================================
- Coverage   52.54%   52.50%   -0.05%     
==========================================
  Files         308      321      +13     
  Lines       17614    18300     +686     
==========================================
+ Hits         9256     9609     +353     
- Misses       7664     7967     +303     
- Partials      694      724      +30     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

bundle/deploy/filer.go Outdated Show resolved Hide resolved
bundle/deploy/state.go Outdated Show resolved Hide resolved
bundle/deploy/state.go Outdated Show resolved Hide resolved
bundle/deploy/state.go Outdated Show resolved Hide resolved
libs/fileset/file.go Outdated Show resolved Hide resolved
bundle/deploy/state_pull.go Outdated Show resolved Hide resolved
bundle/deploy/state_pull.go Outdated Show resolved Hide resolved
bundle/deploy/state_update.go Outdated Show resolved Hide resolved
bundle/deploy/state_update.go Outdated Show resolved Hide resolved
bundle/deploy/state_update.go Outdated Show resolved Hide resolved
@andrewnester andrewnester requested a review from pietern March 13, 2024 13:12
@andrewnester
Copy link
Contributor Author

Integration tests passed


if b.Config.Workspace.CurrentUser != nil {
opts.CurrentUser = b.Config.Workspace.CurrentUser.User
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious: when is this nil?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I don't think realistically it can, just did the check based on type and it can be potentially nil.

bundle/deploy/state.go Outdated Show resolved Hide resolved
internal/bundle/deployment_state_test.go Show resolved Hide resolved
libs/fileset/file.go Outdated Show resolved Hide resolved
Copy link
Contributor

@shreyas-goenka shreyas-goenka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This one is slightly tricky to get right.

bundle/deploy/state.go Outdated Show resolved Hide resolved
bundle/deploy/state_pull.go Outdated Show resolved Hide resolved
libs/sync/snapshot_state.go Outdated Show resolved Hide resolved
libs/sync/snapshot_state.go Outdated Show resolved Hide resolved
bundle/deploy/state.go Outdated Show resolved Hide resolved
bundle/deploy/state.go Outdated Show resolved Hide resolved
bundle/deploy/files/sync.go Show resolved Hide resolved
bundle/deploy/state_update.go Show resolved Hide resolved
internal/bundle/deployment_state_test.go Show resolved Hide resolved

// Create a new snapshot based on the deployment state file.
log.Infof(ctx, "Creating new snapshot")
snapshotState, err := sync.NewSnapshotState(state.Files.ToSlice(b.Config.Path))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can instead merge the deployment state with the existing snapshot state here, to ensure that any files deleted locally (file is present in the deployment state but not the local sync snapshot state) to make incremental sync work (like alluded to above)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't really keep the local snapshot as @pietern pointed earlier for cases when someone else have made a deployment with new files. If we keep local snapshot we will just lose track of these files because we don't sync them from workspace to local system. So if someone does deployment in this case remote files will stay lingering there. But if we take deployment state as source of truth, they will be removed which is in line with local state of files when deployment was made.

Instead we better do just full sync, which makes reconsolidating a bit more straight forward.

bundle/deploy/state_pull.go Outdated Show resolved Hide resolved
bundle/deploy/state.go Show resolved Hide resolved
bundle/deploy/state_pull_test.go Outdated Show resolved Hide resolved
bundle/deploy/state_pull_test.go Show resolved Hide resolved
libs/fileset/file.go Show resolved Hide resolved
bundle/deploy/state.go Outdated Show resolved Hide resolved
bundle/deploy/state.go Outdated Show resolved Hide resolved
bundle/deploy/state_update.go Outdated Show resolved Hide resolved
bundle/deploy/state_update.go Show resolved Hide resolved
@@ -49,7 +48,8 @@ func NewSnapshotState(localFiles []fileset.File) (*SnapshotState, error) {
for _, f := range localFiles {
// Compute the remote name the file will have in WSFS
remoteName := filepath.ToSlash(f.Relative)
isNotebook, _, err := notebook.Detect(f.Absolute)
isNotebook, err := f.IsNotebook()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the underlying file does not exist on the local file system, and we're restoring a remote snapshot, then the file should still be included in the snapshot such that a new sync will issue a delete for the notebook.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, indeed, that's what's happening right now (confirmed with test coverage)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I understand now; this works because whether the file is a notebook or not is cached in the underlying object. The lines are blurred w.r.t. responsibilities, between what happens in the bundle code, what happens in fileset, and what happens here. I don't have a concrete proposal for a different approach, but this feels a bit scattered. Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd argue that the way things now is quite reasonable:

  1. within snapshot_state scope if IsNotebook fails with whatever reason - we can't proceed further
  2. within file scope we keep an internal value of isNotebook but provide public functions to create files of specific types like NewNotebookFile
  3. within development state scope, we know what is the type of the file we work with and thus can instantiate either NewNotebookFile or NewSourceFile

I guess the confusion comes from knowing that previously we have to always do calls to FS and now that's not the case anymore

andrewnester and others added 2 commits March 18, 2024 11:10
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
@andrewnester andrewnester requested a review from pietern March 18, 2024 10:55
Copy link
Contributor

@pietern pietern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are the integration tests looking?

bundle/deploy/state.go Outdated Show resolved Hide resolved
bundle/deploy/state_pull.go Show resolved Hide resolved
@@ -41,6 +43,7 @@ func Deploy() bundle.Mutator {
terraform.Load(),
metadata.Compute(),
metadata.Upload(),
deploy.StatePush(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can move right after the TF state push. We could also parallelize them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think as a general improvement it would be nice to introduce bundle.Parallel and use it in various phases

@@ -49,7 +48,8 @@ func NewSnapshotState(localFiles []fileset.File) (*SnapshotState, error) {
for _, f := range localFiles {
// Compute the remote name the file will have in WSFS
remoteName := filepath.ToSlash(f.Relative)
isNotebook, _, err := notebook.Detect(f.Absolute)
isNotebook, err := f.IsNotebook()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I understand now; this works because whether the file is a notebook or not is cached in the underlying object. The lines are blurred w.r.t. responsibilities, between what happens in the bundle code, what happens in fileset, and what happens here. I don't have a concrete proposal for a different approach, but this feels a bit scattered. Thoughts?

@andrewnester
Copy link
Contributor Author

How are the integration tests looking?

All green ✅

@andrewnester andrewnester added this pull request to the merge queue Mar 18, 2024
Merged via the queue into main with commit 1b0ac61 Mar 18, 2024
4 checks passed
@andrewnester andrewnester deleted the deloyment-state branch March 18, 2024 14:48
pietern added a commit that referenced this pull request Mar 25, 2024
CLI:
 * Propagate correct `User-Agent` for CLI during OAuth flow ([#1264](#1264)).
 * Add usage string when command fails with incorrect arguments ([#1276](#1276)).

Bundles:
 * Include `dyn.Path` as argument to the visit callback function ([#1260](#1260)).
 * Inline logic to set a value in `dyn.SetByPath` ([#1261](#1261)).
 * Add assertions for the `dyn.Path` argument to the visit callback ([#1265](#1265)).
 * Add `dyn.MapByPattern` to map a function to values with matching paths ([#1266](#1266)).
 * Filter current user from resource permissions ([#1262](#1262)).
 * Retain location annotation when expanding globs for pipeline libraries ([#1274](#1274)).
 * Added deployment state for bundles ([#1267](#1267)).
 * Do CheckRunningResource only after terraform.Write ([#1292](#1292)).
 * Rewrite relative paths using `dyn.Location` of the underlying value ([#1273](#1273)).
 * Push deployment state right after files upload ([#1293](#1293)).
 * Make `Append` function to `dyn.Path` return independent slice ([#1295](#1295)).
 * Move bundle tests into bundle/tests ([#1299](#1299)).
 * Upgrade Terraform provider to 1.38.0 ([#1308](#1308)).

Internal:
 * Add integration test for mlops-stacks initialization ([#1155](#1155)).
 * Update actions/setup-python to v5 ([#1290](#1290)).
 * Update codecov/codecov-action to v4 ([#1291](#1291)).

API Changes:
 * Changed `databricks catalogs list` command.
 * Changed `databricks online-tables create` command.
 * Changed `databricks lakeview publish` command.
 * Added `databricks lakeview create` command.
 * Added `databricks lakeview get` command.
 * Added `databricks lakeview get-published` command.
 * Added `databricks lakeview trash` command.
 * Added `databricks lakeview update` command.
 * Moved settings related commands to `databricks settings` and `databricks account settings`.

OpenAPI commit 93763b0d7ae908520c229c786fff28b8fd623261 (2024-03-20)

Dependency updates:
 * Bump golang.org/x/oauth2 from 0.17.0 to 0.18.0 ([#1270](#1270)).
 * Bump golang.org/x/mod from 0.15.0 to 0.16.0 ([#1271](#1271)).
 * Update Go SDK to v0.35.0 ([#1300](#1300)).
 * Update Go SDK to v0.36.0 ([#1304](#1304)).
@pietern pietern mentioned this pull request Mar 25, 2024
github-merge-queue bot pushed a commit that referenced this pull request Mar 25, 2024
CLI:
* Propagate correct `User-Agent` for CLI during OAuth flow
([#1264](#1264)).
* Add usage string when command fails with incorrect arguments
([#1276](#1276)).

Bundles:
* Include `dyn.Path` as argument to the visit callback function
([#1260](#1260)).
* Inline logic to set a value in `dyn.SetByPath`
([#1261](#1261)).
* Add assertions for the `dyn.Path` argument to the visit callback
([#1265](#1265)).
* Add `dyn.MapByPattern` to map a function to values with matching paths
([#1266](#1266)).
* Filter current user from resource permissions
([#1262](#1262)).
* Retain location annotation when expanding globs for pipeline libraries
([#1274](#1274)).
* Added deployment state for bundles
([#1267](#1267)).
* Do CheckRunningResource only after terraform.Write
([#1292](#1292)).
* Rewrite relative paths using `dyn.Location` of the underlying value
([#1273](#1273)).
* Push deployment state right after files upload
([#1293](#1293)).
* Make `Append` function to `dyn.Path` return independent slice
([#1295](#1295)).
* Move bundle tests into bundle/tests
([#1299](#1299)).
* Upgrade Terraform provider to 1.38.0
([#1308](#1308)).

Internal:
* Add integration test for mlops-stacks initialization
([#1155](#1155)).
* Update actions/setup-python to v5
([#1290](#1290)).
* Update codecov/codecov-action to v4
([#1291](#1291)).

API Changes:
 * Changed `databricks catalogs list` command.
 * Changed `databricks online-tables create` command.
 * Changed `databricks lakeview publish` command.
 * Added `databricks lakeview create` command.
 * Added `databricks lakeview get` command.
 * Added `databricks lakeview get-published` command.
 * Added `databricks lakeview trash` command.
 * Added `databricks lakeview update` command.
* Moved settings related commands to `databricks settings` and
`databricks account settings`.

OpenAPI commit 93763b0d7ae908520c229c786fff28b8fd623261 (2024-03-20)

Dependency updates:
* Bump golang.org/x/oauth2 from 0.17.0 to 0.18.0
([#1270](#1270)).
* Bump golang.org/x/mod from 0.15.0 to 0.16.0
([#1271](#1271)).
* Update Go SDK to v0.35.0
([#1300](#1300)).
* Update Go SDK to v0.36.0
([#1304](#1304)).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

workspace files are not removed when renamed or moved
4 participants