Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposes the use of Python's Dataclasses #12

Merged
merged 3 commits into from
Mar 9, 2021

Conversation

verdan
Copy link
Member

@verdan verdan commented Nov 10, 2020

Signed-off-by: verdan verdan.mahmood@gmail.com

Python 3.7 introduced dataclasses (PEP557). Dataclasses can be a convenient way to generate classes whose primary goal is to contain values.

We are relying on attrs for (de-)serializing data for each request within Amundsen. attrs itself is a big package with so many different features available that we don't even need.

By using Python's native dataclasses we can have one less dependency with almost the same features attrs provide that we intend to use.

Signed-off-by: verdan <verdan.mahmood@gmail.com>
@verdan verdan requested a review from a team as a code owner November 10, 2020 12:49
Signed-off-by: verdan <verdan.mahmood@gmail.com>
@feng-tao
Copy link
Member

currently Amundsen supports python36 and above. Will it force to drop python36 support?

@verdan
Copy link
Member Author

verdan commented Nov 11, 2020

currently Amundsen supports python36 and above. Will it force to drop python36 support?

@feng-tao Yes, it will. I've made it explicit in the technical details and also in the drawbacks.

I believe we should aim to support as many python versions as possible, but MUST support at least the last 3 versions. The current Python version is 3.9, so I think it is okay to drop the support of Python3.6.

@feng-tao
Copy link
Member

Given python36 is still widely used these days while python39 is just introduced, I will be hesitant to just drop python36. I take a quick look at airflow and superset which both still supports py3.6.

@feng-tao
Copy link
Member

@verdan actually it is backported to py36 (https://github.com/ericvsmith/dataclasses). I think it is fine from the python version pov.

## Unresolved questions

- Should we aim for having as less dependencies as possible? That includes the third party packages.
- Do we think of a case where we'll be doing heavy data validations, and if those can not be done with dataclasses?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is ok.

@verdan
Copy link
Member Author

verdan commented Nov 19, 2020

@verdan actually it is backported to py36 (https://github.com/ericvsmith/dataclasses). I think it is fine from the python version pov.

oh, this is nice. I did not know that before. I am still curious however why companies are still using Python3.6, as Python3.7 introduced a number of nice feature that the community was waiting for. but yes, the major share is still Python3.6 as you mentioned.

@dorianj
Copy link
Contributor

dorianj commented Nov 30, 2020

Strongly in support of this -- cleaner syntax and typing, and the python3.6 backport works well. Any reason not to push this through and start using?

@verdan
Copy link
Member Author

verdan commented Dec 10, 2020

@feng-tao do we agree to move this to the Active state, and start implementing this if no further comments?

@feng-tao
Copy link
Member

sgtm

@jornh
Copy link
Contributor

jornh commented Dec 11, 2020

To quote #12 (comment)

I believe we should aim to support as many python versions as possible, but MUST support at least the last 3 versions. The current Python version is 3.9.

Not really related to this RFC itself, but since the topic is up. We're currently only officially supporting 3.6 and 3.7. Looks like the Databuilder package works for folks with 3.8, but tests need some love + maybe addition to CI. Similarly the 3 services need some work just to be able to build with 3.8.

Time to kick off that work? I'd be happy to do some of it. I guess it doesn't need a RFC as it's basically just a maintenance chore, not introducing any new feature or breaking anything.

Comment on lines +64 to +66
- One of the main thing to consider is the Python 3.6 support. If we go for the dataclasses, we will not be supporting
Python 3.6 anymore. The minimum version requirement for Python for Amundsen will become 3.7.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- One of the main thing to consider is the Python 3.6 support. If we go for the dataclasses, we will not be supporting
Python 3.6 anymore. The minimum version requirement for Python for Amundsen will become 3.7.

@verdan
Copy link
Member Author

verdan commented Dec 23, 2020

To quote #12 (comment)

I believe we should aim to support as many python versions as possible, but MUST support at least the last 3 versions. The current Python version is 3.9.

Not really related to this RFC itself, but since the topic is up. We're currently only officially supporting 3.6 and 3.7. Looks like the Databuilder package works for folks with 3.8, but tests need some love + maybe addition to CI. Similarly the 3 services need some work just to be able to build with 3.8.

Time to kick off that work? I'd be happy to do some of it. I guess it doesn't need a RFC as it's basically just a maintenance chore, not introducing any new feature or breaking anything.

@jornh that would be awesome if you could please help out with making sure that all the services work with the latest 3 versions of Python.

In the meantime, I am taking care of the dataclasses.

@dorianj
Copy link
Contributor

dorianj commented Dec 30, 2020

Note that dataclasses were landed in the front-end repo, with the back-compat library for 3.6, and is working smoothly.

Co-authored-by: jornh <jornhansen@gmail.com>
Signed-off-by: Dorian Johnson <2020@dorianj.net>
@dorianj dorianj force-pushed the python-dataclasses branch from 8ecc7fc to 6027455 Compare March 9, 2021 01:02
@dorianj dorianj merged commit e7c09cd into amundsen-io:master Mar 9, 2021
@verdan verdan deleted the python-dataclasses branch March 23, 2021 12:12
allisonsuarez pushed a commit that referenced this pull request May 5, 2021
* Proposes the use of Python's Dataclasses

Signed-off-by: verdan <verdan.mahmood@gmail.com>

* Updates the PR number

Signed-off-by: verdan <verdan.mahmood@gmail.com>

* Update rfcs/000-python-dataclasses.md

Co-authored-by: jornh <jornhansen@gmail.com>
Signed-off-by: Dorian Johnson <2020@dorianj.net>

Co-authored-by: Tao Feng <tao.feng@databricks.com>
Co-authored-by: jornh <jornhansen@gmail.com>
Signed-off-by: Allison Suarez Miranda <asuarezmiranda@lyft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants