Is it possible to do CDC stream UPSERT with java library? #2135

ismailsimsek · 2023-06-04T19:34:07Z

is it possible to do CDC stream inserts? is this feature available in java library?

getting following error

    JSONArray jsonArr = new JSONArray();
    JSONObject record = new JSONObject().put("c_id", 2).put("c_string", "record-1").put("_CHANGE_TYPE", "UPSERT");
    jsonArr.put(record);
    AppendRowsResponse response = streamWriter.append(jsonArr).get();

Exception in thread "main" com.google.cloud.bigquery.storage.v1.Exceptions$AppendSerializationError: INVALID_ARGUMENT: Append serialization failed for writer: projects/myproject-dev/datasets/stage/tables/test_CDC_stream_data_loading/_default
	at com.google.cloud.bigquery.storage.v1.SchemaAwareStreamWriter.append(SchemaAwareStreamWriter.java:207)
	at com.google.cloud.bigquery.storage.v1.SchemaAwareStreamWriter.append(SchemaAwareStreamWriter.java:109)
	at com.google.cloud.bigquery.storage.v1.JsonStreamWriter.append(JsonStreamWriter.java:62)
	at experiments.TestCDCStreamLoading.main(TestCDCStreamLoading.java:72)

faisalhasnain · 2023-06-12T21:47:50Z

I am getting same error

faisalhasnain · 2023-06-12T22:11:09Z

this CDC feature works in python but not in java, any ideas when it will get fixed?

ismailsimsek · 2023-06-21T16:18:17Z

i believe the error is related to missing field _CHANGE_TYPE in the table schema.

it works as append mode(without failing), when unknown fields set to be ignored .setIgnoreUnknownFields(TRUE)

Neenu1995 · 2023-06-26T17:36:48Z

To use the upsert functionality, it needs to specify the table schema with the _change_type field. You can do this by changing the constructor you use inside the DataWriter.initialize method by using link.

In order to padding the _change_type into the current table schema, the code can be:

final String CHANGE_TYPE_PSEUDO_COLUMN = "_change_type";

    tableSchema.toBuilder()
        .addFields(
            TableFieldSchema.newBuilder()
                .setName(CHANGE_TYPE_PSEUDO_COLUMN)
                .setType(TableFieldSchema.Type.STRING)
                .setMode(Mode.NULLABLE)
                .build())
        .build();
  }

faisalhasnain · 2023-06-26T19:18:20Z

yeah that's what i did to get it working, thanks for sharing :)

PhongChuong · 2023-11-29T18:38:28Z

We recently added CDC upsert sample which can be found here:
samples/snippets/src/main/java/com/example/bigquerystorage/JsonWriterStreamCdc.java

augi · 2024-03-19T08:30:35Z

@Neenu1995 @PhongChuong We are publishing a JSON to PubSub, and the subscription uses the Use Table Schema settings. We have the _CHANGE_TYPE field included in the JSON, but it is still unrecognized.

Does this mean that we should alter the BigQuery table to have the _change_type column?

EDIT: This is not possible.

augi · 2024-03-19T12:14:55Z

Just for the record, the issue was that the BigQuery table didn´t have PK specified. The component responsible for writing then probably doesn´t expect the _CHANGE_TYPE field.

AndyCorlin · 2024-09-19T06:25:52Z

Thanks for the tips about primary key and to add the _CHANGE_TYPE to the spec!

I found this code in com.google.cloud.bigquery.storage.v1.SchemaAwareStreamWriter how to fetch the schema from BigQuery and then used @Neenu1995 's code to add the pseudo column, before creating the JsonStreamWriter.

            String streamName = tableName + "/_default";
            GetWriteStreamRequest writeStreamRequest = GetWriteStreamRequest.newBuilder().setName(streamName).setView(WriteStreamView.FULL).build();
            WriteStream writeStream = this.client.getWriteStream(writeStreamRequest);
            TableSchema tableSchema = writeStream
                    .getTableSchema()
                    .toBuilder()
                    .addFields(
                            TableFieldSchema.newBuilder()
                                    .setName(CHANGE_TYPE_PSEUDO_COLUMN)
                                    .setType(TableFieldSchema.Type.STRING)
                                    .setMode(TableFieldSchema.Mode.NULLABLE)
                                    .build())
                    .build();

            bigquerystorage/latest/com/google/cloud/bigquery/storage/v1/JsonStreamWriter.html
            return JsonStreamWriter.newBuilder(tableName, tableSchema, client)

product-auto-label bot added the api: bigquerystorage Issues related to the googleapis/java-bigquerystorage API. label Jun 4, 2023

Neenu1995 added the type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. label Jun 16, 2023

ismailsimsek mentioned this issue Jun 18, 2023

Feature, Support upsert mode with streaming consumer memiiso/debezium-server-bigquery#93

Merged

PhongChuong closed this as completed Nov 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it possible to do CDC stream UPSERT with java library? #2135

Is it possible to do CDC stream UPSERT with java library? #2135

ismailsimsek commented Jun 4, 2023

faisalhasnain commented Jun 12, 2023

faisalhasnain commented Jun 12, 2023

ismailsimsek commented Jun 21, 2023 •

edited

Loading

Neenu1995 commented Jun 26, 2023

faisalhasnain commented Jun 26, 2023

PhongChuong commented Nov 29, 2023

augi commented Mar 19, 2024 •

edited

Loading

augi commented Mar 19, 2024

AndyCorlin commented Sep 19, 2024

Is it possible to do CDC stream UPSERT with java library? #2135

Is it possible to do CDC stream UPSERT with java library? #2135

Comments

ismailsimsek commented Jun 4, 2023

faisalhasnain commented Jun 12, 2023

faisalhasnain commented Jun 12, 2023

ismailsimsek commented Jun 21, 2023 • edited Loading

Neenu1995 commented Jun 26, 2023

faisalhasnain commented Jun 26, 2023

PhongChuong commented Nov 29, 2023

augi commented Mar 19, 2024 • edited Loading

augi commented Mar 19, 2024

AndyCorlin commented Sep 19, 2024

ismailsimsek commented Jun 21, 2023 •

edited

Loading

augi commented Mar 19, 2024 •

edited

Loading