Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create next DataFusion release (after 7.0) - 7.1 #2095

Closed
alamb opened this issue Mar 25, 2022 · 21 comments
Closed

Create next DataFusion release (after 7.0) - 7.1 #2095

alamb opened this issue Mar 25, 2022 · 21 comments
Assignees
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Mar 25, 2022

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
We released datafusion 7.0.0 about a month ago https://crates.io/crates/datafusion/7.0.0

We should figure out when to release the next one

Describe the solution you'd like

Plan out the next release(s) of DataFusion. Also figure out if we want to do a maintenance release (e.g. 7.0.1 / 7.1.0) or a release from master (8.0.0).

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Brought up by @silence-coding here: #2066 (comment)

See notes on 7.0 release #1587

@silence-coding
Copy link
Contributor

I think datafusion can release a small version (e.g. 7.0.1) once a month and a large version (e.g. 8.0.0-alpha) when there are major changes. The reason is that the pull request of DataFusion is frequent. Therefore, a stable release plan is required.

@alamb
Copy link
Contributor Author

alamb commented Mar 26, 2022

I would be happy to help support more incremental releases of datafusion, but I probably don't have time to manage the whole thing

What I think would be needed is:

  1. A stable branch (release_7.x) -- which I can make
  2. people to make PRs that cherry-pick changes from master to that stable branch
  3. A regular release from the release branch

I am happy to do the mechanics of creating a branch and release artifacts, but I would need help from the community backporting / cherry-picking backwards compatible changes to it.

@silence-coding
Copy link
Contributor

I agree with you very much. I suggest that you can post a bulletin in ReadMe to recruit volunteers to help manage the publishing. It may not be obvious to track the issue.

@HaoYang670
Copy link
Contributor

Maybe we could imitate the style of Apache Spark.

  1. Major release (such as 8.0.0)(irregularly, but may be 2~3 years): significant new features, optimizations in architecture, or the back compatibility is broken.
  2. Minor release (such as 7.1, 7.2 ...)(around half a year): Performance improvement and small features
  3. Maintenance release (such as 7.0.1, 7.0.2)(regularly released monthly) : focus on bugs and stability. New features should not be introduced.

@jychen7
Copy link
Contributor

jychen7 commented Mar 26, 2022

Arrow C++ (official) seems to have major release quarterly and as of 2022-02, it is 7.0.0.
As of 2022-03, Arrow Rust have reach 11.0.0 and Arrow Datafusion is 7.0.0.

I think Datafusion can have similar release plan as Arrow (C++)

  • major release (from master), every 3 month
  • minor release (from master), every 1 month
  • patch release (from previous minor release branch) for bugs (review whether need release every week)

@yahoNanJing
Copy link
Contributor

Agree to make 3-layer releases: major, minor, bug fix.

Another question is whether it is necessary to maintain the same version for different modules, like Ballista, datafusion-data-access (newly splitted one).

@alamb
Copy link
Contributor Author

alamb commented Mar 27, 2022

Agree to make 3-layer releases: major, minor, bug fix.

Arrow C++ does major quarterly releases; I have not seen a minor release (e.g. 6.1.0) in the last year. Occasionally there are patch releases but it is infrequent and typically once per major release.

I agree the three release sounds ideal as well.

minor release (from master), every 1 month

If we intended to conform to "semantic versioning" in the rust style, it is a challenge to release minor versions from master. For the minor (e.g. 7.0.0 to 7.1.0) release to be semantically versioned no breaking API changes can be introduced which would restrict what we can put on master

Another question is whether it is necessary to maintain the same version for different modules, like Ballista, datafusion-data-access (newly splitted one).

I do not think it is necessary to keep the same versions. I keep the versions of arrow-rs/arrow-flight/parquet in sync because it lowers the release overhead.

@alamb
Copy link
Contributor Author

alamb commented Mar 27, 2022

The challenge I predict we will encounter is getting the time to manage the releases (aka reviewing PRs, decide what to backport, backporting, making release notes and version bumps).

I don't think the work is "hard" per se but it does take sustained time and effort

Maybe we could start with

  • major release (from master), every 3 month
  • minor/patch release (from previous minor release branch) for bugs and minor features (released on on demand / every month)

@alamb
Copy link
Contributor Author

alamb commented Mar 27, 2022

Does anyone want to volunteer to manage such release(s)?

@jychen7
Copy link
Contributor

jychen7 commented Mar 27, 2022

Does anyone want to volunteer to manage such release(s)?

I would love to

@houqp
Copy link
Member

houqp commented Mar 27, 2022

Agree that backporting patches to a stable branch is a very time consuming work so better not commit to it until we see strong need from our users or we have a maintainer who can allocate dedicated time to maintain the stable branch.

@alamb
Copy link
Contributor Author

alamb commented Mar 28, 2022

Ok, since @jychen7 has volunteered, let's give it a try for a release or two of datafusion 7.x

I have created a 7.x maintenance branch

The next steps would be to decide on some content to backport (via cherry-pick) that are semantically compatible.

To do so I suggest:

  1. Create a new PR for each change you would like to release in the 7.x line against the maint-7.x branch
  2. Tag me on the PR -- I'll review and merge
  3. When we are ready to release this next version, we can update the release notes / changelog and I'll propose an official release.

Sound good @jychen7 ?

@jychen7
Copy link
Contributor

jychen7 commented Mar 29, 2022

@alamb if I understand correctly, our next major release wil be around 2022-05-14 (2nd weekend of May). And next possible minor release will be around 2022-04-09 (2nd weekend of Apr).

we ask contributor who want minor/patch release to create PR to maint-* branch after original PR is merged to master? I draft the doc update at #2110

As volunteer, I would help to

  1. every week, check how many PRs to maintainance branch (e.g. search by base:maint-7.x). If any,
    • confirm commit is cherry-pick from master
    • confirms is minor (non API breaking) change
    • tag you for review
  2. draft changelog for release every month if major/minor release need

ps: 1 may be automate in Github workflow in future if need

@alamb
Copy link
Contributor Author

alamb commented Mar 29, 2022

@alamb if I understand correctly, our next major release wil be around 2022-05-14 (2nd weekend of May). And next possible minor release will be around 2022-04-09 (2nd weekend of Apr).

I think that would be reasonable

we ask contributor who want minor/patch release to create PR to maint-* branch after original PR is merged to master? I draft the doc update at #2110

Yes, thank you

As volunteer, I would help to

❤️ thank you so much!

@happysalada
Copy link
Contributor

Question related, do you plan to release the datafusion-cli as a crate as well ? I see that the 7.0.0 datafusion-cli crate has been yanked (for reasons that I ignore).

@alamb
Copy link
Contributor Author

alamb commented Apr 8, 2022

Hi @happysalada --I don't expect we'll release datafusion-cli to crates.io.

The reason that the datafusion-cli crate was not published (to crates.io) for 7.0.0 is that it depends on ballista which did not have a 7.0 release.

For now, you can probably install datafusion-cli from source / github if you want

As backstory, datafusion-cli is mostly a debugging / development tool for datafusion and ballista -- and clients such as https://github.com/roapi/roapi or https://github.com/datafusion-contrib/datafusion-python were more appropriate for end users.

If you wanted to make a datafusion-cli crate that was publishable (or break something similar into https://github.com/datafusion-contrib) I think it could be useful.

@matthewmturner
Copy link
Contributor

@happysalada shameless plug - im working on a more full featured datafusion cli client https://github.com/datafusion-contrib/datafusion-tui if youre interested. its a new project that still has bugs but im getting close to a 0.1 release. My plan is to publish on crates and to homebrew.

@happysalada
Copy link
Contributor

I think the easiest way to package this for now is to build from source.
With the PR that you nicely merged, it should be good. I hit some compilation error on macos x86_64, but I'm going to test on linux and release it for that platform (for nixos) if it all goes well.

Matthew, nice project! Watching out for releases, will package it for nixos as well when I play with it!

@alamb alamb changed the title Create next DataFusion release (after 7.0) Create next DataFusion release (after 7.0) - 7.1 Apr 12, 2022
@alamb
Copy link
Contributor Author

alamb commented Apr 12, 2022

Release candidate was created and voting started here: https://lists.apache.org/thread/kvk7688gpfofrc46zso306rdnqxfdcdc

@alamb alamb self-assigned this Apr 18, 2022
@alamb
Copy link
Contributor Author

alamb commented Apr 18, 2022

@alamb alamb closed this as completed Apr 18, 2022
@alamb
Copy link
Contributor Author

alamb commented Apr 18, 2022

Thanks @jychen7 for the assist getting this out 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

8 participants