Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Pandas >2.0 #726

Closed
alexeyegorov opened this issue Jul 5, 2024 · 9 comments
Closed

Add support for Pandas >2.0 #726

alexeyegorov opened this issue Jul 5, 2024 · 9 comments
Labels
bug Something isn't working

Comments

@alexeyegorov
Copy link

alexeyegorov commented Jul 5, 2024

Describe the bug

Since Numpy released its latest version 2.0, it is not compatible with an older version of Pandas. However, dbt-databricks in version 1.8.3 only supports pandas up to version 2.0.

Workaround: fix numpy version to 1.26.4 (latest before 2.0).

Steps To Reproduce

  1. For my devcontainer setup, I use requirements.txt with only few entries:
dbt-databricks==1.8.3
sqlfluff
sqlfluff-templater-dbt
  1. Install the above dependencies.
  2. Run dbt deps
  3. Try to run any dbt command like dbt compile

Expected behavior

Successfull dbt run.

Screenshots and log output

The outcome of the commans:

Bildschirmfoto 2024-07-05 um 15 07 16

DBT now has installed the packages.

But it fails in any other execution (in this case, it is dbt compile):

Bildschirmfoto 2024-07-05 um 15 07 32

Quote from the logs:

13:07:21 Running with dbt=1.8.3

A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.0.0 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

Traceback (most recent call last): File "/usr/local/bin/dbt", line 8, in
sys.exit(cli())
File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1157, in call
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.9/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/click/decorators.py", line 33, in new_func
return f(get_current_context(), *args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/dbt/cli/main.py", line 148, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/dbt/cli/requires.py", line 138, in wrapper
result, success = func(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/dbt/cli/requires.py", line 101, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/dbt/cli/requires.py", line 215, in wrapper
profile = load_profile(flags.PROJECT_DIR, flags.VARS, flags.PROFILE, flags.TARGET, threads)
File "/usr/local/lib/python3.9/site-packages/dbt/config/runtime.py", line 71, in load_profile
profile = Profile.render(
File "/usr/local/lib/python3.9/site-packages/dbt/config/profile.py", line 403, in render
return cls.from_raw_profiles(
File "/usr/local/lib/python3.9/site-packages/dbt/config/profile.py", line 369, in from_raw_profiles
return cls.from_raw_profile_info(
File "/usr/local/lib/python3.9/site-packages/dbt/config/profile.py", line 325, in from_raw_profile_info
credentials: Credentials = cls._credentials_from_profile(
File "/usr/local/lib/python3.9/site-packages/dbt/config/profile.py", line 149, in _credentials_from_profile
cls = load_plugin(typename)
File "/usr/local/lib/python3.9/site-packages/dbt/adapters/factory.py", line 239, in load_plugin
return FACTORY.load_plugin(name)
File "/usr/local/lib/python3.9/site-packages/dbt/adapters/factory.py", line 68, in load_plugin
mod: Any = import_module("." + name, "dbt.adapters")
File "/usr/local/lib/python3.9/importlib/init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "/usr/local/lib/python3.9/site-packages/dbt/adapters/databricks/init.py", line 3, in
from dbt.adapters.databricks.connections import DatabricksConnectionManager # noqa
File "/usr/local/lib/python3.9/site-packages/dbt/adapters/databricks/connections.py", line 26, in
from databricks.sql.client import Connection as DatabricksSQLConnection
File "/usr/local/lib/python3.9/site-packages/databricks/sql/client.py", line 3, in
import pandas
File "/usr/local/lib/python3.9/site-packages/pandas/init.py", line 23, in
from pandas.compat import (
File "/usr/local/lib/python3.9/site-packages/pandas/compat/init.py", line 27, in
from pandas.compat.pyarrow import (
File "/usr/local/lib/python3.9/site-packages/pandas/compat/pyarrow.py", line 8, in
import pyarrow as pa
File "/usr/local/lib/python3.9/site-packages/pyarrow/init.py", line 65, in
import pyarrow.lib as _lib
AttributeError: _ARRAY_API not found
13:07:21 Encountered an error:
numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject
13:07:21 Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/dbt/cli/requires.py", line 138, in wrapper
result, success = func(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/dbt/cli/requires.py", line 101, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/dbt/cli/requires.py", line 215, in wrapper
profile = load_profile(flags.PROJECT_DIR, flags.VARS, flags.PROFILE, flags.TARGET, threads)
File "/usr/local/lib/python3.9/site-packages/dbt/config/runtime.py", line 71, in load_profile
profile = Profile.render(
File "/usr/local/lib/python3.9/site-packages/dbt/config/profile.py", line 403, in render
return cls.from_raw_profiles(
File "/usr/local/lib/python3.9/site-packages/dbt/config/profile.py", line 369, in from_raw_profiles
return cls.from_raw_profile_info(
File "/usr/local/lib/python3.9/site-packages/dbt/config/profile.py", line 325, in from_raw_profile_info
credentials: Credentials = cls._credentials_from_profile(
File "/usr/local/lib/python3.9/site-packages/dbt/config/profile.py", line 149, in _credentials_from_profile
cls = load_plugin(typename)
File "/usr/local/lib/python3.9/site-packages/dbt/adapters/factory.py", line 239, in load_plugin
return FACTORY.load_plugin(name)
File "/usr/local/lib/python3.9/site-packages/dbt/adapters/factory.py", line 68, in load_plugin
mod: Any = import_module("." + name, "dbt.adapters")
File "/usr/local/lib/python3.9/importlib/init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1030, in _gcd_import
File "", line 1007, in _find_and_load
File "", line 986, in _find_and_load_unlocked
File "", line 680, in _load_unlocked
File "", line 850, in exec_module
File "", line 228, in _call_with_frames_removed
File "/usr/local/lib/python3.9/site-packages/dbt/adapters/databricks/init.py", line 3, in
from dbt.adapters.databricks.connections import DatabricksConnectionManager # noqa
File "/usr/local/lib/python3.9/site-packages/dbt/adapters/databricks/connections.py", line 26, in
from databricks.sql.client import Connection as DatabricksSQLConnection
File "/usr/local/lib/python3.9/site-packages/databricks/sql/client.py", line 3, in
import pandas
File "/usr/local/lib/python3.9/site-packages/pandas/init.py", line 46, in
from pandas.core.api import (
File "/usr/local/lib/python3.9/site-packages/pandas/core/api.py", line 1, in
from pandas._libs import (
File "/usr/local/lib/python3.9/site-packages/pandas/_libs/init.py", line 18, in
from pandas._libs.interval import Interval
File "interval.pyx", line 1, in init pandas._libs.interval
ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

System information

The output of dbt --version:

<output goes here>

The operating system you're using:

The output of python --version:

Additional context

@alexeyegorov alexeyegorov added the bug Something isn't working label Jul 5, 2024
@benc-db
Copy link
Collaborator

benc-db commented Jul 8, 2024

Our upstream dependencies at dbt Labs have communicated to me that they are going to be pinning to numpy < 2, so even if I remove the pandas pin, we can't expect numpy 2 to work. We need to verify that a newer Pandas works as well, as the reason we started pinning is that newer Pandas started breaking dbt-databricks. Keeping the ticket open to try upgrading Pandas again at some point.

@alexeyegorov
Copy link
Author

@benc-db It's all fine. I fixed the version for myself and made sure it is known in the community. ;) tanks.

@benc-db
Copy link
Collaborator

benc-db commented Jul 9, 2024

For anyone else who sees this issue, newer versions of pandas also drop support for python 3.8, which we are not prepared to drop support for yet.

@ShaneMazur
Copy link

@benc-db Can you take another look at this given the latest releases to pandas/numpy. The current state is blocking any upgrades to python 3.13. Not immediately problematic but would be good to know there is a solution that will eventually unblock this 👍

@benc-db
Copy link
Collaborator

benc-db commented Dec 4, 2024

Thanks for bringing this back to my attention. We have dropped support for 3.8, so I can look into this again.

@ShaneMazur
Copy link

@benc-db do you have a rough idea of when (or what version) we will be able to upgrade to python 3.13? I saw there was a PR up for this already 👍

@benc-db
Copy link
Collaborator

benc-db commented Dec 9, 2024

We likely need to wait to support python 3.13 until dbt-adapters does. Dropping the pandas pin can happen as soon as 1.9.1, probably releasing early January.

@alexeyegorov
Copy link
Author

1.9.0 still requires the numpy version between 1.23.4 and 1.26.4. I am curious to see if 1.9.1 changes this.

@alexeyegorov
Copy link
Author

Update: I have updated to the latest version 1.9.1 and now it works without any workaround. I ran into some new issues/warnings, but I will create new tickets for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants