WIP: 128: produce using NATIVE_AVRO in SQL sink when choosing avro format to allow Pulsar consumer consume #217

imaffe · 2022-09-07T04:57:15Z

closes #126
closes #128

TODO: add more descriptions

…rigger failure if not all tasks are started

Some InternalPriorityQueue implementations need a correct key/group set before performing poll() or remove(). In particular, ChangelogKeyGroupedPriorityQueue logs key group so that state changes can be re-distributed or shuffled. This change re-orders queue.poll and keyContext.setCurrentKey.

Implements the SEARCH operator in the codegen and removes the scalar implementation of IN and NOT_IN. Now every scalar IN/NOT_IN using a constant set is implemented through SEARCH (following Calcite's development on the topic CALCITE-4173) and plans will only have SEARCH. This closes apache#19001.

… in BoundedSourceITCase

Previously, the tmpWorkingDirectory was created in the current working directory, and as a result there were directories created in the root directories of the modules, i.e. `flink-table/flink-table-planner` which were not cleaned up with `mvn clean`.

…ress

The user explicitly marked the cleanup retry logic to terminate after a certain amount of attempts. This should be considered as desired behavior and shouldn't make the cluster fail fatally.

We want to try infinitely if nothing is specified.

…ponential-delay.attempts

…cumentation

…veral places This closes apache#19089.

…ined sink operators Since the topology has changes between Flink 1.14 and 1.15 it might happen that stateful upgrades are not possible if no pior operator uids were set. With this commit, users can set operator uid hashes for the respective operators.

…attern Since there is no dedicated committer operator in Flink 1.14 it is safe to use the uid pattern of 1.13 to ease upgrades from Flink 1.13 to 1.15.

… jar files on Windows

Deprecate `nullCheck` and `decimalContext`.

…nly the path of Path instance The issue before the fix was, that using getPath would strip off the scheme information which causes problems in situations where the FileSystem is not the default FileSystem

…clean up redundant content of bounded and unbounded data. - mvn dependency - using namespace in schema for reflect records [FLINK-26604][doc] bug fix (cherry picked from commit 579e554). This closes apache#19134

…e the change

… if the JobGraph is not put into the JobGraphStore, yet This can happen if cleanup is triggered after a failover of a dirty JobResultStore entry (i.e. of a globally-terminated job). In that case, no recovery of the JobGraph happens and, therefore, no JobGraph is added to the internal addedJobGraphs collection. This required KubernetesStateHandleStore.releaseAndTryRemove to work for non-existing state as well. The ZooKeeperStateHandleStore implementation is already idempotent in this matter. ZooKeeperStateHandleStore.releaseAndTryRemove already works like that.

…orContext

…ith pipeline jars This closes apache#19133.

This closes apache#18757.

…sar source connector.

…ting performance for PulsarSink.

…ue length option.

…ce. Avoid hanging on small message income rates.

…r sink.

drop extra empty line Apply suggestions from code review Co-authored-by: Huanli Meng <48120384+Huanli-Meng@users.noreply.github.com>

…pulsar (#205)

…izationSchema

imaffe · 2022-09-07T09:22:55Z

...c/main/java/org/apache/flink/connector/pulsar/table/sink/PulsarTableSerializationSchema.java

@@ -62,6 +68,7 @@ public PulsarTableSerializationSchema(
        this.valueSerialization = checkNotNull(valueSerialization);
        this.valueFieldGetters = checkNotNull(valueFieldGetters);
        this.writableMetadata = checkNotNull(writableMetadata);
+//        this.pulsarSchema = getPulsarSchemaFromSerialization(valueSerialization);


remember to delete this line

imaffe · 2022-09-07T09:23:12Z

...c/main/java/org/apache/flink/connector/pulsar/table/sink/PulsarTableSerializationSchema.java

@@ -89,7 +96,8 @@ public PulsarMessage<?> serialize(RowData consumedRow, PulsarSinkContext sinkCon
        }

        byte[] serializedData = valueSerialization.serialize(valueRow);
-        messageBuilder.value(Schema.BYTES, serializedData);
+//        messageBuilder.value(Schema.BYTES, serializedData);


remember to remove comments

imaffe · 2022-09-07T09:23:30Z

...c/main/java/org/apache/flink/connector/pulsar/table/sink/PulsarTableSerializationSchema.java

+    private static Schema getPulsarSchemaFromSerialization(SerializationSchema<RowData> serializationSchema) {
+        if (serializationSchema instanceof AvroRowDataSerializationSchema) {
+            SerializationSchema<GenericRecord> nestedSchema = ((AvroRowDataSerializationSchema) serializationSchema).getNestedSchema();
+            org.apache.avro.Schema avroSchema = ((AvroSerializationSchema) nestedSchema).getSchema();


this would harm the performance.

can we initialize the schema in open() method and make it transient ?

imaffe · 2022-09-07T13:27:44Z

Current fix is not ok, because we changed other components as well. We need to figure out a way not modifying the AvroRowDataSerializationSchema

Myasuka and others added 30 commits March 16, 2022 00:21

[FLINK-26650][checkpoint] Avoid to print stack trace for checkpoint t…

0562fc3

…rigger failure if not all tasks are started

[FLINK-26106][runtime] Used 'filesystem' for state change log storage…

537b871

… in BoundedSourceITCase

[FLINK-26573][test] Do not resolve the metadata file which is in prog…

cc7a640

…ress

[FLINK-26652][runtime] Makes the cleanup not fail fatally

0d8412f

The user explicitly marked the cleanup retry logic to terminate after a certain amount of attempts. This should be considered as desired behavior and shouldn't make the cluster fail fatally.

[hotfix][docs] Updates the default value from the fixed delay strategy

51068fe

We want to try infinitely if nothing is specified.

[hotfix][docs] Uses @OverrideDefault instead of noDefaultValue for ex…

9162c7e

…ponential-delay.attempts

[hotfix][runtime][test] Improves assert message

7e768d5

[hotfix][docs] Adds missing JRS configuration parameter in Chinese do…

4e4d77b

…cumentation

[hotfix][tests] Fixed wrong default value in test

282079b

[FLINK-26658][docs] Migrate documentation build to Github Actions

991c74a

[FLINK-26607][python] Correct the MAX_LONG_VALUE/MIN_LONG_VALUE in se…

891ec9d

…veral places This closes apache#19089.

[hotfix][ci] Try to fix the e2e ci pipeline upgrading the libssl version

311122c

[FLINK-26613][streaming] Use Flink 1.13 sink committer operator uid p…

3d06468

…attern Since there is no dedicated committer operator in Flink 1.14 it is safe to use the uid pattern of 1.13 to ease upgrades from Flink 1.13 to 1.15.

[FLINK-26680][coordination] Properly handle deleted jobs during recovery

63817b5

[hotfix][python][docs] Improve the documentation about how to specify…

ed0be9d

… jar files on Windows

[FLINK-26194][table-api-java] Deprecate unused options in TableConfig

1e33448

Deprecate `nullCheck` and `decimalContext`.

[FLINK-26698][runtime] Uses the actual basePath instance instead of o…

265788b

…nly the path of Path instance The issue before the fix was, that using getPath would strip off the scheme information which causes problems in situations where the FileSystem is not the default FileSystem

[hotfix][runtime] Makes use of static variable

604668e

Update for 1.15

e0dfcda

[hotfix][release] Fix the broken doc config and the script to generat…

70c4ce9

…e the change

[hotfix][docs] Aligns JavaDoc with method signature

9ed108b

[FLINK-26723][runtime]fix the error message thrown by SourceCoordinat…

e3992ab

…orContext

[FLINK-26618][sql-client] Fix 'remove jar' statement is not aligned w…

8dbef3f

…ith pipeline jars This closes apache#19133.

[FLINK-25226][doc] Add documentation about the AdaptiveBatchScheduler

41099c4

This closes apache#18757.

syhily and others added 21 commits July 22, 2022 03:37

[FLINK-28609][Connector/Pulsar] Drop useless end-to-end test classes.

f7c60c4

[FLINK-25686][Connector/Pulsar]: Add schema evolution support for pul…

0e86a86

…sar source connector.

137: use a random default subscription name in Pulsar SQL Connector

a82ab8f

138: update the documentation

c23bca6

Create new CODEOWNERS file

dd52584

[FLINK-28820][Connector/Pulsar] Drop MailboxExecutor, improve the wri…

6293907

…ting performance for PulsarSink.

[FLINK-28820][Connector/Pulsar] Deprecated unused message writing que…

0637523

…ue length option.

[FLINK-28870][Connector/Pulsar] Add fetch time option for Pulsar Sour…

603b1bf

…ce. Avoid hanging on small message income rates.

[FLINK-28505][Connector/Pulsar] Support topic auto creation for Pulsa…

24c7058

…r sink.

[FLINK-28870][Connector/Pulsar] Increase fetch time for tests.

582eb04

[FLINK-28870][Connector/Pulsar] Increase fetch time for tests.

c61e05a

163: support unbounded stop cursor in SQL Connector

e28eb59

176: add functional testing for StopCursor

79703eb

flink-183: add pulsar client config guides in the doc

3b8a31a

flink-173: add releationship between apache/flink and streamnative/flink

8e24c7d

drop extra empty line Apply suggestions from code review Co-authored-by: Huanli Meng <48120384+Huanli-Meng@users.noreply.github.com>

170: add properties metadata testing and documentation

47a73b6

Backport FLINK-27399 and support new start cursor and stop cursor.

3d3b6ad

Backport FLINK-27917 and drop consumer seek method. (#204)

6413661

[FLINK-28960][Connector/Pulsar] Add jaxb-api back to flink-connector-…

370be48

…pulsar (#205)

PulsarSchema: fix the byte array serialization issues. (#207)

629e7ac

use native_avro schema and retrieve avro format from Flink AvroSerial…

6306d40

…izationSchema

imaffe requested a review from a team as a code owner September 7, 2022 04:57

imaffe commented Sep 7, 2022

View reviewed changes

fix 126 as well

d4559e0

imaffe force-pushed the affe/128-fetch-EMPTY branch from dd5537e to d4559e0 Compare September 7, 2022 09:50

imaffe added 3 commits September 7, 2022 21:40

newest update

c60ab0f

WIP

febab44

add support for RAW formats

39280fb

imaffe force-pushed the develop branch from 59bfbf3 to 6eb2736 Compare November 15, 2022 05:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: 128: produce using NATIVE_AVRO in SQL sink when choosing avro format to allow Pulsar consumer consume #217

WIP: 128: produce using NATIVE_AVRO in SQL sink when choosing avro format to allow Pulsar consumer consume #217

imaffe commented Sep 7, 2022

imaffe Sep 7, 2022

imaffe Sep 7, 2022

imaffe Sep 7, 2022

imaffe Sep 7, 2022

imaffe commented Sep 7, 2022

WIP: 128: produce using NATIVE_AVRO in SQL sink when choosing avro format to allow Pulsar consumer consume #217

Are you sure you want to change the base?

WIP: 128: produce using NATIVE_AVRO in SQL sink when choosing avro format to allow Pulsar consumer consume #217

Conversation

imaffe commented Sep 7, 2022

imaffe Sep 7, 2022

Choose a reason for hiding this comment

imaffe Sep 7, 2022

Choose a reason for hiding this comment

imaffe Sep 7, 2022

Choose a reason for hiding this comment

imaffe Sep 7, 2022

Choose a reason for hiding this comment

imaffe commented Sep 7, 2022