-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-26415][SQL] Mark StreamSinkProvider and StreamSourceProvider as stable #23354
Conversation
This change marks the StreamSinkProvider and StreamSourceProvider interfaces as stable so that it can be relied on for compatibility for all of Spark 2.x. These interfaces have been available since Spark 2.0.0 and unchanged since Spark 2.1.0. Additionally the Kafka integration has been using it since Spark 2.1.0. Because structured streaming general availability was announced in Spark 2.2.0, I suspect there are other thirdparty integrations using it already as well.
Can one of the admins verify this patch? |
Hi, @granthenke . Please file an Apache SPARK JIRA issue. This is worth for that. |
Thank you @dongjoon-hyun. I have created a Jira, updated the subject here, and sent a summary email to the dev mailing list. |
If these are going to change as a part of the DatasourceV2 better to keep it unstable. |
If I understand the progress of DatasourceV2 it sounds like there was a bit of a redesign and that DatasourceV2 wouldn't be available until Spark 3.0.0. If that is the case marking this as stable in the mean time should be minimal impact. Additionally using new interfaces for DatasourceV2 would allow for a much smoother transition for those who adopted structured streaming early. My context into DataSourceV2 comes from these locations:
Perhaps @cloud-fan could comment. |
AFAIK DSV2 could effect 3.0 only. On rest of branches can be done, though can potentially make backports harder. |
Another reason to use new Interfaces for DatasourceV2 is precisely to make any backports of bugs in earlier code easier. |
* Implemented by objects that can produce a streaming `Source` for a specific format or system. | ||
* | ||
* @since 2.0.0 | ||
*/ | ||
@Experimental | ||
@Unstable | ||
@Stable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, unstable to stable sounds a big jump tho. Maybe Evolving
or simply removing Unstable
can be a okay compromise if we're not sure on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HyukjinKwon I don't think it makes sense to go through the motions step by step just for the sake of taking the steps. This interface has gone unchanged since Spark 2.2.0.
* Implemented by objects that can produce a streaming `Source` for a specific format or system. | ||
* | ||
* @since 2.0.0 | ||
*/ | ||
@Experimental | ||
@Unstable | ||
@Stable | ||
trait StreamSourceProvider { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is stable, I think Source
and Sink
need to be stable as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Source and Sink do not have any annotations on them. Does that imply they are stable? Sink and Source are exposed/returned in the DataSource class today as well.
I am happy to add a Stable annotation if that is your recommendation.
Since there won't be a 2.5.x release, I don't think changing this make any sense. We don't change APIs between maintenance release (such as 2.4.1). |
@zsxwing Does that mean that I can rely on this interface not breaking for all of Spark 2? That's the main thing I am trying to ensure. |
@granthenke Yes since 2.4 is the last minor version in Spark 2.x. |
Thanks @zsxwing, thats good to know. Note: The Unstable annotation does imply that it can be changed between maintenance releases. It implies this because the Evolving annotation claims it can only change from one feature release to another and unstable is more permissive than that. Perhaps the documentation could be updated. |
@zsxwing Would you mind closing the Jira as Wont-Fix and add a comment with this reason? |
@granthenke . So, if you want to stability only on |
@dongjoon-hyun I want it for anything in the 2.x series. This is what the stable annotation would indicate. Since @zsxwing said there would be no 2.5 and maintenance releases won't break it, it sounds like that is the case. |
Thanks for the confirmation. Then, I'll close this for you. :) |
What changes were proposed in this pull request?
This change marks the StreamSinkProvider and StreamSourceProvider
interfaces as stable so that it can be relied on for compatibility for all of
Spark 2.x.
These interfaces have been available since Spark 2.0.0 and unchanged
since Spark 2.1.0. Additionally the Kafka integration has been using it
since Spark 2.1.0.
Because structured streaming general availability was announced in
Spark 2.2.0, I suspect there are other third-party integrations using it
already as well.