Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-26415][SQL] Mark StreamSinkProvider and StreamSourceProvider as stable #23354

Closed
wants to merge 1 commit into from

Conversation

granthenke
Copy link
Member

What changes were proposed in this pull request?

This change marks the StreamSinkProvider and StreamSourceProvider
interfaces as stable so that it can be relied on for compatibility for all of
Spark 2.x.

These interfaces have been available since Spark 2.0.0 and unchanged
since Spark 2.1.0. Additionally the Kafka integration has been using it
since Spark 2.1.0.

Because structured streaming general availability was announced in
Spark 2.2.0, I suspect there are other third-party integrations using it
already as well.

This change marks the StreamSinkProvider and
StreamSourceProvider interfaces as stable
so that it can be relied on for compatibility for all of
Spark 2.x.

These interfaces have been available since Spark 2.0.0
and unchanged since Spark 2.1.0. Additionally the
Kafka integration has been using it since Spark 2.1.0.

Because structured streaming general availability was
announced in Spark 2.2.0, I suspect there are other
thirdparty integrations using it already as well.
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@dongjoon-hyun
Copy link
Member

Hi, @granthenke . Please file an Apache SPARK JIRA issue. This is worth for that.
And, you had better send dev@spark.apache.org with that JIRA issue link.

@dongjoon-hyun
Copy link
Member

cc @tdas and @zsxwing

@granthenke granthenke changed the title [MINOR][SQL] Mark StreamSinkProvider and StreamSourceProvider as stable [SPARK-26415][SQL] Mark StreamSinkProvider and StreamSourceProvider as stable Dec 19, 2018
@granthenke
Copy link
Member Author

Thank you @dongjoon-hyun. I have created a Jira, updated the subject here, and sent a summary email to the dev mailing list.

@arunmahadevan
Copy link
Contributor

If these are going to change as a part of the DatasourceV2 better to keep it unstable.

@granthenke
Copy link
Member Author

If I understand the progress of DatasourceV2 it sounds like there was a bit of a redesign and that DatasourceV2 wouldn't be available until Spark 3.0.0. If that is the case marking this as stable in the mean time should be minimal impact. Additionally using new interfaces for DatasourceV2 would allow for a much smoother transition for those who adopted structured streaming early.

My context into DataSourceV2 comes from these locations:

Perhaps @cloud-fan could comment.

@gaborgsomogyi
Copy link
Contributor

AFAIK DSV2 could effect 3.0 only. On rest of branches can be done, though can potentially make backports harder.

@granthenke
Copy link
Member Author

Another reason to use new Interfaces for DatasourceV2 is precisely to make any backports of bugs in earlier code easier.

* Implemented by objects that can produce a streaming `Source` for a specific format or system.
*
* @since 2.0.0
*/
@Experimental
@Unstable
@Stable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, unstable to stable sounds a big jump tho. Maybe Evolving or simply removing Unstable can be a okay compromise if we're not sure on this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @tdas and @zsxwing

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HyukjinKwon I don't think it makes sense to go through the motions step by step just for the sake of taking the steps. This interface has gone unchanged since Spark 2.2.0.

* Implemented by objects that can produce a streaming `Source` for a specific format or system.
*
* @since 2.0.0
*/
@Experimental
@Unstable
@Stable
trait StreamSourceProvider {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is stable, I think Source and Sink need to be stable as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Source and Sink do not have any annotations on them. Does that imply they are stable? Sink and Source are exposed/returned in the DataSource class today as well.

I am happy to add a Stable annotation if that is your recommendation.

@zsxwing
Copy link
Member

zsxwing commented Dec 20, 2018

Since there won't be a 2.5.x release, I don't think changing this make any sense. We don't change APIs between maintenance release (such as 2.4.1).

@granthenke
Copy link
Member Author

@zsxwing Does that mean that I can rely on this interface not breaking for all of Spark 2? That's the main thing I am trying to ensure.

@zsxwing
Copy link
Member

zsxwing commented Dec 20, 2018

@granthenke Yes since 2.4 is the last minor version in Spark 2.x.

@granthenke
Copy link
Member Author

Thanks @zsxwing, thats good to know.

Note: The Unstable annotation does imply that it can be changed between maintenance releases. It implies this because the Evolving annotation claims it can only change from one feature release to another and unstable is more permissive than that. Perhaps the documentation could be updated.

@granthenke
Copy link
Member Author

@zsxwing Would you mind closing the Jira as Wont-Fix and add a comment with this reason?

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Dec 20, 2018

@granthenke . So, if you want to stability only on branch-2.4, +1 for closing this PR and JIRA issue.

@granthenke
Copy link
Member Author

@dongjoon-hyun I want it for anything in the 2.x series. This is what the stable annotation would indicate. Since @zsxwing said there would be no 2.5 and maintenance releases won't break it, it sounds like that is the case.

@dongjoon-hyun
Copy link
Member

Thanks for the confirmation. Then, I'll close this for you. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants