Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update CHANGELOG.md, update release scripts #1807

Merged
merged 1 commit into from
Feb 14, 2022

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Feb 10, 2022

Which issue does this PR close?

Re #1587

You can see a rendered version here: https://github.com/alamb/arrow-datafusion/blob/alamb/changelog_for_7.0.0/datafusion/CHANGELOG.md

Rationale for this change

Trying to tell the 🌎 🌍 🌏 about DataFusion

What changes are included in this PR?

It was created using

$ ./dev/release/update_change_log-datafusion.sh

🤯 what a lot of stuff happened!!!

Since this file is automatically generated: to make changes, please edit the ticket subjects / labels directly (or tag me @alamb if you don't have the permissions to do so).

I am hoping to create a release candidate over the weekend (as I will be largely offline starting next Thursday Feb 17 for a week or so) so I can do the release next week before I head out.

@github-actions github-actions bot added the datafusion Changes in the datafusion crate label Feb 10, 2022
@@ -21,7 +21,6 @@ Changelogs are maintained separately for each subproject. Please check out the
changelog file within each subproject folder for more details:

* [Datafusion CHANGELOG](./datafusion/CHANGELOG.md)
* [Datafusion Python Binding CHANGELOG](./python/CHANGELOG.md)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python has been moved to its own crate, I believe.

@@ -50,14 +50,14 @@ OUTPUT_PATH="${PROJECT}/CHANGELOG.md"
pushd ${SOURCE_TOP_DIR}

# reset content in changelog
git co "${SINCE_TAG}" "${OUTPUT_PATH}"
git checkout "${SINCE_TAG}" "${OUTPUT_PATH}"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my mac didn't like git co 😢

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha, sorry that's an alias i made up in my gitconfig, muscle memory...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem. I tip my hat to you for these scripts in general (automating the RAT in particular is genius, I normally just fix it up manually in arrow-rs)

# remove license header so github-changelog-generator has a clean base to append
sed -i '1,18d' "${OUTPUT_PATH}"
sed -i.bak '1,18d' "${OUTPUT_PATH}"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise, apparently sed from mac is slightly different than linux

https://stackoverflow.com/questions/5694228/sed-in-place-flag-that-works-both-on-mac-bsd-and-linux


docker run -it --rm \
-e CHANGELOG_GITHUB_TOKEN=$CHANGELOG_GITHUB_TOKEN \
-v "$(pwd)":/usr/local/src/your-app \
githubchangeloggenerator/github-changelog-generator:1.16.2 \
githubchangeloggenerator/github-changelog-generator \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use latest

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might want to pin to the latest version number instead to avoid breaking changes in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 https://hub.docker.com/r/githubchangeloggenerator/github-changelog-generator/tags

Seems like latest has no other tag (as in latest is newer than 1.16.2 but there are no other numbered versions newer than 1.16.2) 😞

Screen Shot 2022-02-11 at 6 00 42 AM

Copy link
Member

@xudong963 xudong963 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @alamb . I gave some suggestions.

- Simplify creating new `ListingTable` [\#1705](https://github.com/apache/arrow-datafusion/issues/1705)
- Implement TableProvider for DataFrameImpl to allow registration of logical plans [\#1698](https://github.com/apache/arrow-datafusion/issues/1698)
- Public Expr simplification API [\#1694](https://github.com/apache/arrow-datafusion/issues/1694)
- Query Optimizer: Add OUTER --\> INNER join conversion [\#1670](https://github.com/apache/arrow-datafusion/issues/1670)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, this is not implemented. BTW, I am worried that there are some errors similar to this one, we just closed repeated issues, but features are not implemented.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not explore this script, which can generate changelog based on PRs already merged?

- Support DataType::Decimal\(15, 2\) in TPC-H benchmark [\#174](https://github.com/apache/arrow-datafusion/issues/174)
- Make `MemoryStream` public [\#150](https://github.com/apache/arrow-datafusion/issues/150)
- Add support for Parquet schema merging [\#132](https://github.com/apache/arrow-datafusion/issues/132)
- Add SQL support for IN expression [\#118](https://github.com/apache/arrow-datafusion/issues/118)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Member

@houqp houqp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

automation fix looks good to me 👍 looks like we still have a bit of work left to update PRs with proper tags :D

@matthewmturner
Copy link
Contributor

I'm interested in learning a bit more about the lifecycle of issues / tags / releases.

@houqp @alamb could you provide a little more info? i didnt see anything mentioned in the developers guide.

once i understand better maybe i could help to get things in order for the next release.

@alamb
Copy link
Contributor Author

alamb commented Feb 11, 2022

I'm interested in learning a bit more about the lifecycle of issues / tags / releases.
once i understand better maybe i could help to get things in order for the next release.

🎉 that would be wonderful @matthewmturner

@houqp @alamb could you provide a little more info? i didnt see anything mentioned in the developers guide.

The basic release instructions https://github.com/apache/arrow-datafusion/blob/master/dev/release/README.md#update-changelogmd

Talk about

# create the changelog
CHANGELOG_GITHUB_TOKEN=<TOKEN> ./dev/release/update_change_log-all.sh
# review change log / edit issues and labels if needed, rerun until you are happy with the result
git commit -a -m 'Create changelog for release'

But the "until you are happy with the result" leaves a lot to the imagination 😆

Basically what I did was run that script (it updates the CHANGELOG.md file locally) and then looked at the output and tried to make something that looked coherent.

Examples of things that I did:

  1. Found issues that did not have datafusion but were related to datafusion and put them in (otherwise they don't show up in the release notes)
  2. Changed titles of PRs / issues so they were more specified (e.g. from Fixed bug in select to Fixed bug when there are order by number)
  3. Applied a liberal dose of judgement to what issues / PRs should have enhancement bug or api-change on them.

The most questionable thing related to labels were:

  1. API changes get listed under the "Beaking Changes" labels, and some tickets had new apis, etc that I didn't feel were breaking changes or if there were several PRs that together made a "single" breaking change from the user's point of view (e.g. breaking LogicalPlan into enums)
  2. Changes with multiple labels got put under a single heading so I removed some labels that were accurate but were obscuring what I felt was the "most important" part. For example, a ticket with an enhancement and a documentation label ended up under the Documentation heading, even when it also had code changes. I removed the documentation ticket for that one

It would be great to have some more help here. Some thoughts:

  1. We could probably automate adding datafusion labels for issues that are closed via PR that also has the datafusion label (and the datafusion label is applied automatically based on the path that was fixed)

Something else I haven't even been looking into is releasing the python bindings and releasing ballista lol -- so any help you want to lend in there would be awesome

@alamb alamb merged commit 81e76ed into apache:master Feb 14, 2022
@alamb alamb deleted the alamb/changelog_for_7.0.0 branch February 14, 2022 12:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion Changes in the datafusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants