Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix python build and pin dependencies #2268

Merged
merged 5 commits into from
Mar 6, 2021
Merged

Conversation

rymurr
Copy link
Contributor

@rymurr rymurr commented Feb 24, 2021

As per #2266 the python build is broken.

This fixes it and pins dependencies to avoid similar issues in the future

@staticmethod # noqa: C901
def from_(value):
@staticmethod
def from_(value): # noqa: C901
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these changes needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, they were one of the causes of the failures

@rymurr
Copy link
Contributor Author

rymurr commented Feb 24, 2021

Current Failures:

  • linters don't pass in Literals anymore
  • pandas and pyarrow version related issues

Added:

  • python 3.9 tests

@github-actions github-actions bot added the INFRA label Feb 24, 2021
python/setup.py Outdated
'pyarrow==2.0.0'
]

# pandas doesn't get built for 3.6 anymore, we should consider deprecating 3.6
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole PyData ecosystem around pandas, numpy and also soon pyarrow drops / has dropped 3.6, so dropping it here would also make sense.

python/setup.py Outdated
Comment on lines 22 to 34
DEPENDENCIES = ['botocore==1.20.14',
'boto3==1.17.14',
'fastavro==1.3.2',
'hmsclient==0.1.1',
'mmh3==3.0.0',
'pyparsing==2.4.7',
'python-dateutil==2.8.1',
'pytz==2021.1',
'requests==2.25.1',
'retrying==1.3.3',
'pyarrow==2.0.0'
]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't pin so hard, this will make it hard to use Iceberg together with something else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are your recommendations for pinning @xhochy? I dont like having silently failing builds or uncertain behaviour but I see your point that pinning pytz isn't going to help anything.

I have reduced pinning to just things that could likely break the build: avro, hmsclient, pyarrow, pyparsing. Do you think that is better or still to harsh?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, reducing this down to these and chaging the pins to e.g. pyarrow>=2,<3' (you can be stricter for other projects but pyarrow` is mostly releasing on the major level). In some cases I use a strict pinning (all packages pinned) for the tests and a looser pinning in the package itself.

In pyarrow we have quite open pins but we spent a lot of time on fixing the breakages directly. A different approach I also quite like and works for a lot of things is have the pins as strict as suggested but have a bot like dependendabot running that automatically makes PRs and updates the pins. Then we see in these bot PRs whether the new versions break things.

(Also, when I finally get time to contribute here (:sob:), I would add nightly tests that also test against pyarrow nightlies to get earlier notifications about breakage).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In pyarrow we have quite open pins but we spent a lot of time on fixing the breakages directly. A different approach I also quite like and works for a lot of things is have the pins as strict as suggested but have a bot like dependendabot running that automatically makes PRs and updates the pins. Then we see in these bot PRs whether the new versions break things.

This is my preference too. I suppose dependabot + apache is a no-go though :-(

(Also, when I finally get time to contribute here (), I would add nightly tests that also test against pyarrow nightlies to get earlier notifications about breakage).

Is there a source of arrow nightlies or would it require building arrow? I am also looking to spend more time on the python side here, curious to see if there is anything on your todo list I can help w/?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is my preference too. I suppose dependabot + apache is a no-go though :-(

You can build a poor mans dependendabot using Github actions as I did in conda-forge/miniforge#117 The only limitation with running it in the Apache org will be that CI doesn't run automatically (you need to close/reopen the PR to trigger it).

Is there a source of arrow nightlies or would it require building arrow? I am also looking to spend more time on the python side here, curious to see if there is anything on your todo list I can help w/?

We have a conda channel called arrow-nightlies and there are also pip packages as documented in https://arrow.apache.org/docs/python/install.html?highlight=nightlies#installing-nightly-packages

One thing we did with a lot of other packages is that we added a nightly job to the arrow project itself so that we get reports over there if we break an upstream library. Worth looking for e.g. turbodbc in the Arrow sources.

@rdblue
Copy link
Contributor

rdblue commented Mar 6, 2021

Looks good to me. I'll merge. Thanks for the fix, @rymurr!

Thanks for reviewing this, @xhochy! Looks like your guidance on versions was helpful.

@rdblue rdblue merged commit 030144e into apache:master Mar 6, 2021
@rymurr
Copy link
Contributor Author

rymurr commented Mar 8, 2021

Thanks for reviewing this, @xhochy! Looks like your guidance on versions was helpful.

Agreed! Thanks @xhochy and thanks for merging @rdblue

@rymurr rymurr deleted the python-versions branch March 8, 2021 09:29
coolderli pushed a commit to coolderli/iceberg that referenced this pull request Apr 26, 2021
stevenzwu pushed a commit to stevenzwu/iceberg that referenced this pull request Jul 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants