Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add filer.Filer to read notebooks from WSFS without omitting their extension #1457

Merged
merged 30 commits into from
May 30, 2024

Conversation

shreyas-goenka
Copy link
Contributor

@shreyas-goenka shreyas-goenka commented May 27, 2024

Changes

This PR adds a filer that'll allow us to read notebooks from the WSFS using their full paths (with the extension included). The filer relies on the existing workspace filer (and consequently the workspace import/export/list APIs).

Using this filer along with a virtual filesystem layer (https://github.com/databricks/cli/pull/1452/files) will allow us to use our custom implementation (which preserves the notebook extensions) rather than the default mount available via DBR when the CLI is run from DBR.

Tests

Integration tests.

@shreyas-goenka shreyas-goenka marked this pull request as ready for review May 28, 2024 12:46
@shreyas-goenka shreyas-goenka changed the title Add filer for the workspace FUSE mount Add filer to emulate a workspace filesystem FUSE May 28, 2024
@shreyas-goenka shreyas-goenka changed the title Add filer to emulate a workspace filesystem FUSE Add filer.Filer to emulate a workspace filesystem FUSE May 28, 2024
internal/filer_test.go Outdated Show resolved Hide resolved
internal/filer_test.go Outdated Show resolved Hide resolved
internal/filer_test.go Outdated Show resolved Hide resolved
internal/filer_test.go Outdated Show resolved Hide resolved
libs/filer/workspace_fuse_client.go Outdated Show resolved Hide resolved
libs/filer/workspace_fuse_client.go Outdated Show resolved Hide resolved
libs/filer/workspace_fuse_client.go Outdated Show resolved Hide resolved
libs/filer/workspace_fuse_client.go Outdated Show resolved Hide resolved
libs/filer/workspace_fuse_client.go Outdated Show resolved Hide resolved
libs/filer/workspace_fuse_client.go Outdated Show resolved Hide resolved
@shreyas-goenka shreyas-goenka changed the title Add filer.Filer to emulate a workspace filesystem FUSE Add a filer to read files and notebooks with their extensions included May 29, 2024
@shreyas-goenka shreyas-goenka changed the title Add a filer to read files and notebooks with their extensions included Add filer.Filer to read notebooks from WSFS without omitting their extension May 29, 2024
libs/filer/workspace_files_extensions_client.go Outdated Show resolved Hide resolved
}

// Not the correct language. Return early.
if stat.Language != extensionsToLanguages[ext] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, how can this happen?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's say we're trying to stat foo.scala. Without extension this is foo. It can be a Python notebook and the language returned in the stat is PYTHON. Then this check will return false, because we can't (and shouldn't) access foo.scala if it is a Python notebook, only if it is a Scala notebook.

libs/filer/workspace_files_extensions_client.go Outdated Show resolved Hide resolved
libs/filer/workspace_files_extensions_client.go Outdated Show resolved Hide resolved
libs/filer/workspace_files_extensions_client.go Outdated Show resolved Hide resolved
@pietern pietern requested a review from andrewnester May 30, 2024 11:52
@pietern pietern enabled auto-merge May 30, 2024 11:57
@pietern pietern added this pull request to the merge queue May 30, 2024
Merged via the queue into main with commit ec33a7c May 30, 2024
5 checks passed
@pietern pietern deleted the wsfs-filer branch May 30, 2024 12:06
pietern added a commit that referenced this pull request Jun 4, 2024
CLI:
 * Update OpenAPI spec ([#1466](#1466)).

Bundles:
 * Upgrade TF provider to 1.46.0 ([#1460](#1460)).
 * Add support for Lakehouse monitoring ([#1307](#1307)).
 * Make dbt-sql and default-sql templates public ([#1463](#1463)).

Internal:
 * Abstract over filesystem interaction with libs/vfs ([#1452](#1452)).
 * Add `filer.Filer` to read notebooks from WSFS without omitting their extension ([#1457](#1457)).
 * Fix listing notebooks in a subdirectory ([#1468](#1468)).

API Changes:
 * Changed `databricks account storage-credentials list` command to return .
 * Added `databricks consumer-listings batch-get` command.
 * Added `databricks consumer-providers batch-get` command.
 * Removed `databricks apps create-deployment` command.
 * Added `databricks apps deploy` command.

OpenAPI commit 37b925eba37dfb3d7e05b6ba2d458454ce62d3a0 (2024-06-03)

Dependency updates:
 * Bump github.com/hashicorp/go-version from 1.6.0 to 1.7.0 ([#1454](#1454)).
 * Bump github.com/hashicorp/hc-install from 0.6.4 to 0.7.0 ([#1453](#1453)).
@pietern pietern mentioned this pull request Jun 4, 2024
github-merge-queue bot pushed a commit that referenced this pull request Jun 4, 2024
CLI:
* Update OpenAPI spec
([#1466](#1466)).

Bundles:
* Upgrade TF provider to 1.46.0
([#1460](#1460)).
* Add support for Lakehouse monitoring
([#1307](#1307)).
* Make dbt-sql and default-sql templates public
([#1463](#1463)).

Internal:
* Abstract over filesystem interaction with libs/vfs
([#1452](#1452)).
* Add `filer.Filer` to read notebooks from WSFS without omitting their
extension ([#1457](#1457)).
* Fix listing notebooks in a subdirectory
([#1468](#1468)).

API Changes:
* Changed `databricks account storage-credentials list` command to
return .
 * Added `databricks consumer-listings batch-get` command.
 * Added `databricks consumer-providers batch-get` command.
 * Removed `databricks apps create-deployment` command.
 * Added `databricks apps deploy` command.

OpenAPI commit 37b925eba37dfb3d7e05b6ba2d458454ce62d3a0 (2024-06-03)

Dependency updates:
* Bump github.com/hashicorp/go-version from 1.6.0 to 1.7.0
([#1454](#1454)).
* Bump github.com/hashicorp/hc-install from 0.6.4 to 0.7.0
([#1453](#1453)).
github-merge-queue bot pushed a commit that referenced this pull request Jul 3, 2024
)

## Changes

The FUSE mount of the workspace file system on DBR doesn't include file
extensions for notebooks. When these notebooks are checked into a
repository, they do have an extension. PR #1457 added a filer type that
is aware of this disparity and makes these notebooks show up as if they
do have these extensions.

This change swaps out the native `vfs.Path` with one that uses this
filer when running on DBR.

Follow up: consolidate between interfaces exported by `filer.Filer` and
`vfs.Path`.

## Tests

* Unit tests pass
* (Manually ran a snapshot build on DBR against a bundle with notebooks)

---------

Co-authored-by: Andrew Nester <andrew.nester@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants