Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python wheel based tasks no longer deployed correctly #783

Closed
stawo opened this issue Sep 15, 2023 · 6 comments · Fixed by #797
Closed

Python wheel based tasks no longer deployed correctly #783

stawo opened this issue Sep 15, 2023 · 6 comments · Fixed by #797
Assignees
Labels
DABs DABs related issues

Comments

@stawo
Copy link

stawo commented Sep 15, 2023

I'm using v0.205.0 to deploy python wheel based tasks over a shared cluster that uses DBR 13.3, but they are deployed as Notebook tasks, which is not the expected result.
I suspect that the issue is the PR #635, which overwrites silently the specified settings to create a Notebook that calls the wheel file.
This also causes the big issue that logs don't properly work anymore (I can't see any output from the print statements as before) and the specified arguments are not visible in the task web UI.

I personally would like such heavy modification to be left as an option for those cases that, as you mention in the PR, wheel deployments don't work, as the deployment differs significantly from what I specify in the bundle and that makes it not really dependable and opaque.

@shreyas-goenka shreyas-goenka added the DABs DABs related issues label Sep 18, 2023
@stawo
Copy link
Author

stawo commented Sep 19, 2023

Another breaking behavior is the fact that if the task specifies a list of dependent libraries, only wheel are translated into the Jupyter Notebook, while all the others (Pypi, Maven, etc.) are simply ignored.
The behavior fails in 2 points:

  • the generated Notebook will contain many "empty" pip commands like %pip install --force-reinstall, one for each non-wheel library.
    These commands fail with the error:
    CalledProcessError: Command 'pip --disable-pip-version-check install --force-reinstall' returned non-zero exit status 1.
    
  • the newly created Notebook task does not contain the dependent non-wheel libraries defined in the original Python wheel task in the bundle

@andrewnester
Copy link
Contributor

Hi @stawo ! Sorry to hear that you have issues with python wheel tasks and thanks for raising this. We're aware of the problem and actively figuring out what would be the best way forward. We'll keep this thread updated.

@stawo
Copy link
Author

stawo commented Sep 19, 2023

@andrewnester , super, thank you for the quick reply.
At the moment I had to downgrade the Databricks CLI, but please let me know if there is anything I can help you with, especially testing as I'm developing Python-wheel pipelines in my company.

@Paul-AUB
Copy link

I also have the same issue, and would be interested in a fix.

In my case, I specify a dependency to typing-extensions==4.7.1 in my wheel (in pyproject.toml), but it seems the notebook script is not able to uninstall the existing typing-extensions==4.3.0 and install the wanted version, which result to the following error here:

ImportError                               Traceback (most recent call last)
File <command-1412211438096362>, line 27
     25 with redirect_stdout(f):
     26 	if entry:
---> 27 		entry[0].load()()
     28 	else:
     29 		raise ImportError("Entry point 'train' not found")

Error is :

ImportError: cannot import name 'deprecated' from 'typing_extensions' (/databricks/python/lib/python3.10/site-packages/typing_extensions.py)

github-merge-queue bot pushed a commit that referenced this issue Sep 26, 2023
## Changes
Instead of always using notebook wrapper for Python wheel tasks, let's
make this an opt-in option.

Now by default Python wheel tasks will be deployed as is to Databricks
platform.
If notebook wrapper required (DBR < 13.1 or other configuration
differences), users can provide a following experimental setting

```
experimental:
  python_wheel_wrapper: true
```

Fixes #783,
databricks/databricks-asset-bundles-dais2023#8

## Tests
Added unit tests.

Integration tests passed for both cases

```
    helpers.go:163: [databricks stdout]: Hello from my func
    helpers.go:163: [databricks stdout]: Got arguments:
    helpers.go:163: [databricks stdout]: ['my_test_code', 'one', 'two']
    ...
Bundle remote directory is ***/.bundle/ac05d5e8-ed4b-4e34-b3f2-afa73f62b021
Deleted snapshot file at /var/folders/nt/xjv68qzs45319w4k36dhpylc0000gp/T/TestAccPythonWheelTaskDeployAndRunWithWrapper3733431114/001/.databricks/bundle/default/sync-snapshots/cac1e02f3941a97b.json
Successfully deleted files!
--- PASS: TestAccPythonWheelTaskDeployAndRunWithWrapper (214.18s)
PASS
coverage: 93.5% of statements in ./...
ok      github.com/databricks/cli/internal/bundle       214.495s        coverage: 93.5% of statements in ./...

```

```
    helpers.go:163: [databricks stdout]: Hello from my func
    helpers.go:163: [databricks stdout]: Got arguments:
    helpers.go:163: [databricks stdout]: ['my_test_code', 'one', 'two']
    ...
Bundle remote directory is ***/.bundle/0ef67aaf-5960-4049-bf1d-dc9e29157421
Deleted snapshot file at /var/folders/nt/xjv68qzs45319w4k36dhpylc0000gp/T/TestAccPythonWheelTaskDeployAndRunWithoutWrapper2340216760/001/.databricks/bundle/default/sync-snapshots/edf0b322cee93b13.json
Successfully deleted files!
--- PASS: TestAccPythonWheelTaskDeployAndRunWithoutWrapper (192.36s)
PASS
coverage: 93.5% of statements in ./...
ok      github.com/databricks/cli/internal/bundle       195.130s        coverage: 93.5% of statements in ./...

```
@andrewnester
Copy link
Contributor

@stawo @Paul-AUB The fix to make a notebook wrapper causing these issues optional is merged and will be released in the upcoming CLI release this week. In case you need a notebook wrapper (see details here #635) you can switch it on with this setting in your bundle configuration

experimental:
  python_wheel_wrapper: true

@andrewnester
Copy link
Contributor

The fix was just released in version 0.206.0, feel free to give it a try

hectorcast-db pushed a commit that referenced this issue Oct 13, 2023
## Changes
Instead of always using notebook wrapper for Python wheel tasks, let's
make this an opt-in option.

Now by default Python wheel tasks will be deployed as is to Databricks
platform.
If notebook wrapper required (DBR < 13.1 or other configuration
differences), users can provide a following experimental setting

```
experimental:
  python_wheel_wrapper: true
```

Fixes #783,
databricks/databricks-asset-bundles-dais2023#8

## Tests
Added unit tests.

Integration tests passed for both cases

```
    helpers.go:163: [databricks stdout]: Hello from my func
    helpers.go:163: [databricks stdout]: Got arguments:
    helpers.go:163: [databricks stdout]: ['my_test_code', 'one', 'two']
    ...
Bundle remote directory is ***/.bundle/ac05d5e8-ed4b-4e34-b3f2-afa73f62b021
Deleted snapshot file at /var/folders/nt/xjv68qzs45319w4k36dhpylc0000gp/T/TestAccPythonWheelTaskDeployAndRunWithWrapper3733431114/001/.databricks/bundle/default/sync-snapshots/cac1e02f3941a97b.json
Successfully deleted files!
--- PASS: TestAccPythonWheelTaskDeployAndRunWithWrapper (214.18s)
PASS
coverage: 93.5% of statements in ./...
ok      github.com/databricks/cli/internal/bundle       214.495s        coverage: 93.5% of statements in ./...

```

```
    helpers.go:163: [databricks stdout]: Hello from my func
    helpers.go:163: [databricks stdout]: Got arguments:
    helpers.go:163: [databricks stdout]: ['my_test_code', 'one', 'two']
    ...
Bundle remote directory is ***/.bundle/0ef67aaf-5960-4049-bf1d-dc9e29157421
Deleted snapshot file at /var/folders/nt/xjv68qzs45319w4k36dhpylc0000gp/T/TestAccPythonWheelTaskDeployAndRunWithoutWrapper2340216760/001/.databricks/bundle/default/sync-snapshots/edf0b322cee93b13.json
Successfully deleted files!
--- PASS: TestAccPythonWheelTaskDeployAndRunWithoutWrapper (192.36s)
PASS
coverage: 93.5% of statements in ./...
ok      github.com/databricks/cli/internal/bundle       195.130s        coverage: 93.5% of statements in ./...

```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DABs DABs related issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants