-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-24630][SS] Support SQLStreaming in Spark #22575
Conversation
ok to test |
Is this still a WIP? |
@WangTaoTheTonic
Besides, the keyword 'stream' makes it easier to express StructStreaming with pure SQL.
|
How should we do if we wanna join two kafka stream and sink the result to another stream? |
|
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As comment in https://issues.apache.org/jira/browse/SPARK-24630?focusedCommentId=16523064&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16523064, the currently approach supporting user submitting a streaming job all by sql, but mainly based on Hive table support, need more discussion on other data sources.
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
Outdated
Show resolved
Hide resolved
sql/hive/src/test/scala/org/apache/spark/sql/hive/StreamTableDDLCommandSuite.scala
Outdated
Show resolved
Hide resolved
Yes, the 'stream' keyword is the only difference from normal sql. |
@WangTaoTheTonic @cloud-fan @xuanyuanking |
Nice! I am looking forward to it. |
@cloud-fan Hi, Wenchen. Is it ready for merge in? This PR is very useful and is what I want to develop and need. |
@tdas @zsxwing @cloud-fan |
Hi stczwd, |
SQLStreaming does not support multiple streams. In our cases, SQLStreaming is basically used in ad-hoc, Each case only run one insert into steam. |
@cloud-fan @zsxwing @tdas @xuanyuanking |
@stczwd Can you provide a detail design document for this PR, by mentioning the scenarios is been handled and constraints if any. this wll give a complete pitcture about this PR. Thanks |
@sujithjay |
cc @koeninger |
Can you send a mail to Ryan blue for adding this SPIP topic in tomorrow
meeting. Meeting will be conducted tomorrow 05:00 pm PST. If you confirm
then we can also attend the meeting.
…On Wed, 28 Nov 2018 at 10:27 AM, stczwd ***@***.***> wrote:
[image: image]
<https://user-images.githubusercontent.com/12999161/49129177-ab056680-f2f4-11e8-8f71-4695ebc045c1.png>
I have removed the 'stream' keyword.
There is a DatasourceV2 community synch meetup tomorrow which is
cordinated by Ryan Blue , can we discuss this point.
Yep, it's a good idea.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#22575 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AMZZ-SyUG6FGTS5Q89z_zh8A3a_mjn8hks5uzhfSgaJpZM4W9ueb>
.
|
I have send an email to Ryan Blue to attend this meeting. |
I think you should also ask him to add your SPIP topic for tomorrows discussion.Agenda has to be set prior. |
I hive send an email to Ryan Blue.
Tomorrow's discussion is mainly focus on DataSource V2 API, I don't think they will spend time to discuss SQL API. However, We can mention it while discussing the Catalog API. |
sql/hive/src/test/scala/org/apache/spark/sql/hive/StreamTableDDLCommandSuite.scala
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/SQLStreamingSink.scala
Outdated
Show resolved
Hide resolved
@cloud-fan You can take a look at my changes here, I will update the design documentation later, |
@stczwd lgtm. nit: It would be very helpful to add some instructions to the online documentation. |
@yaooqinn Thanks for support. I have written a detailed design doc, which will be issued soon. |
*/ | ||
private def parseTrigger(): Trigger = { | ||
val trigger = Utils.timeStringAsMs(sqlConf.sqlStreamTrigger) | ||
Trigger.ProcessingTime(trigger, TimeUnit.MILLISECONDS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Continuous processing mode is supported now, do you plan to support it? If so I think we can traverse the logical plan to find out whether this is a continuous query and create a ContinuousTrigger
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feature is under discussion: https://docs.google.com/document/d/19degwnIIcuMSELv6BQ_1VQI5AIVcvGeqOm5xE2-aRA0
Can one of the admins verify this patch? |
We're closing this PR because it hasn't been updated in a while. If you'd like to revive this PR, please reopen it! |
What changes were proposed in this pull request?
This patch propose new support of SQLStreaming in Spark, Please refer SPARK-24630 for more details.
This patch supports:
create table kafka_sql_test using kafka options( isStreaming = 'true', subscribe = 'topic', kafka.bootstrap.servers = 'localhost:9092')
select stream * from kafka_sql_test
How was this patch tested?
Some UTs are added to verify sqlstreaming.