-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix python build and pin dependencies #2268
Conversation
@staticmethod # noqa: C901 | ||
def from_(value): | ||
@staticmethod | ||
def from_(value): # noqa: C901 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these changes needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, they were one of the causes of the failures
Current Failures:
Added:
|
e52de63
to
4461847
Compare
python/setup.py
Outdated
'pyarrow==2.0.0' | ||
] | ||
|
||
# pandas doesn't get built for 3.6 anymore, we should consider deprecating 3.6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whole PyData ecosystem around pandas
, numpy
and also soon pyarrow
drops / has dropped 3.6, so dropping it here would also make sense.
python/setup.py
Outdated
DEPENDENCIES = ['botocore==1.20.14', | ||
'boto3==1.17.14', | ||
'fastavro==1.3.2', | ||
'hmsclient==0.1.1', | ||
'mmh3==3.0.0', | ||
'pyparsing==2.4.7', | ||
'python-dateutil==2.8.1', | ||
'pytz==2021.1', | ||
'requests==2.25.1', | ||
'retrying==1.3.3', | ||
'pyarrow==2.0.0' | ||
] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't pin so hard, this will make it hard to use Iceberg together with something else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are your recommendations for pinning @xhochy? I dont like having silently failing builds or uncertain behaviour but I see your point that pinning pytz isn't going to help anything.
I have reduced pinning to just things that could likely break the build: avro, hmsclient, pyarrow, pyparsing. Do you think that is better or still to harsh?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, reducing this down to these and chaging the pins to e.g. pyarrow>=2,<3' (you can be stricter for other projects but
pyarrow` is mostly releasing on the major level). In some cases I use a strict pinning (all packages pinned) for the tests and a looser pinning in the package itself.
In pyarrow
we have quite open pins but we spent a lot of time on fixing the breakages directly. A different approach I also quite like and works for a lot of things is have the pins as strict as suggested but have a bot like dependendabot running that automatically makes PRs and updates the pins. Then we see in these bot PRs whether the new versions break things.
(Also, when I finally get time to contribute here (:sob:), I would add nightly tests that also test against pyarrow
nightlies to get earlier notifications about breakage).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In
pyarrow
we have quite open pins but we spent a lot of time on fixing the breakages directly. A different approach I also quite like and works for a lot of things is have the pins as strict as suggested but have a bot like dependendabot running that automatically makes PRs and updates the pins. Then we see in these bot PRs whether the new versions break things.
This is my preference too. I suppose dependabot + apache is a no-go though :-(
(Also, when I finally get time to contribute here (), I would add nightly tests that also test against
pyarrow
nightlies to get earlier notifications about breakage).
Is there a source of arrow nightlies or would it require building arrow? I am also looking to spend more time on the python side here, curious to see if there is anything on your todo list I can help w/?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is my preference too. I suppose dependabot + apache is a no-go though :-(
You can build a poor mans dependendabot using Github actions as I did in conda-forge/miniforge#117 The only limitation with running it in the Apache org will be that CI doesn't run automatically (you need to close/reopen the PR to trigger it).
Is there a source of arrow nightlies or would it require building arrow? I am also looking to spend more time on the python side here, curious to see if there is anything on your todo list I can help w/?
We have a conda channel called arrow-nightlies
and there are also pip
packages as documented in https://arrow.apache.org/docs/python/install.html?highlight=nightlies#installing-nightly-packages
One thing we did with a lot of other packages is that we added a nightly job to the arrow project itself so that we get reports over there if we break an upstream library. Worth looking for e.g. turbodbc
in the Arrow sources.
0a4146c
to
4c500ba
Compare
5492282
to
88e1434
Compare
As per #2266 the python build is broken.
This fixes it and pins dependencies to avoid similar issues in the future