Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support column families, grouping multiple columns into 1 kv #147

Closed
ddorian opened this issue Apr 8, 2018 · 4 comments
Closed

Support column families, grouping multiple columns into 1 kv #147

ddorian opened this issue Apr 8, 2018 · 4 comments
Assignees
Labels
kind/question This is a question

Comments

@ddorian
Copy link
Contributor

ddorian commented Apr 8, 2018

I don't know how much more efficient the DocDB is compared to RocksDB, but cockroachdb ended up grouping all columns into 1 kv by default: https://www.cockroachlabs.com/docs/stable/column-families.html

This the same for cases when you mostly want to get back the whole array/map/set.

Maybe even using the Merge operator https://github.com/facebook/rocksdb/wiki/Merge-Operator to add non-reading-updates like tokumx https://www.percona.com/doc/percona-tokumx/fast_updates.html

Makes sense ?

@rkarthik007 rkarthik007 added the kind/question This is a question label Apr 11, 2018
@rkarthik007 rkarthik007 self-assigned this Apr 11, 2018
@rkarthik007
Copy link
Collaborator

Hi @ddorian,

Great points as always @ddorian! Here are some discussions we had originally about this architectural choice.

I don't know how much more efficient the DocDB is compared to RocksDB, but cockroachdb ended up grouping all columns into 1 kv by default: https://www.cockroachlabs.com/docs/stable/column-families.html

Read the link, seems like cockroachdb keeps one kv by default for all rows (all the columns encoded in one row). If there are FAMILY definitions, the kvs cells are split across RocksDB column families but still stored as one row. Based on our Cassandra (entire row in one cell) vs HBase (one column per cell) experience, we felt the "entire row per cell" approach is significantly better compared to the "one column per cell" approach if done carefully:

  • reads are generally more efficient because they do not need to deserialize the entire row

  • the key index is more uniform at locating data irrespective of the size of the values

  • the one column per cell is much better at handling tables with lots of columns (esp for complex types like JSON documents). This is because

    • in the "entire row per cell" approach, all the columns worth of data needs to be brought into memory, which causes significant increase in the memory used while compacting data. The "one column per cell" is much more memory-efficient because we only need one cell to compact, and the DB block can be a subset of the row.
    • Further, we prefix compress the individual cells in memory for a compact representation, so the extra bloat is pretty much not an issue.
  • Lastly, we support per column TTL efficiently without having to deserialize the whole row

This the same for cases when you mostly want to get back the whole array/map/set.

This can be achieved with the FROZEN keyword: https://docs.yugabyte.com/latest/api/cassandra/type_frozen/

Maybe even using the Merge operator https://github.com/facebook/rocksdb/wiki/Merge-Operator to add non-reading-updates like tokumx https://www.percona.com/doc/percona-tokumx/fast_updates.html

We support fast updates - most inserts in Cassandra are "upserts". We do read-modify-write only if we need to INSERT IF NOT EXISTS.

@ddorian
Copy link
Contributor Author

ddorian commented Apr 12, 2018

But FROZEN can only be updated as a whole no ?

@rkarthik007
Copy link
Collaborator

Ah I see your point. Yes FROZEN is a full update.

Our default is fine-grained updates and compactions bring data close together for efficient reads. We did not use merge operators, instead have made changes into the DocDB directly to be able to understand the layout of data on our modified RocksDB.

@ddorian
Copy link
Contributor Author

ddorian commented May 24, 2018

This can be implemented in a jsonb blob, since it's stored as a single kv #26

@ddorian ddorian closed this as completed May 24, 2018
jasonyb pushed a commit that referenced this issue Mar 18, 2024
This patch reproduce the same behavior of the regexp_* function with
devansh-ism pushed a commit to devansh-ism/yugabyte-db that referenced this issue Jul 19, 2024
This issue was reported in PR yugabyte#147 by @haizz. The proposed fix was
incomplete (as I pointed out) and I fixed add-tables option and also
format 2 (add-tables and filter-tables).
devansh-ism pushed a commit to devansh-ism/yugabyte-db that referenced this issue Jul 19, 2024
Summary:
Copyright update.

Initial commit.

The plugin was first named json_decoding_plugin but changed the name
after discussion on -hackers. It is a POC to illustrate that it is
possible for changeset extraction work just fine. The changeset
extraction feature is already in development, hence, it is expected that
this code changes a lot until the feature is set in stone.

It is based on:

repository: git://git.postgresql.org/git/users/andresfreund/postgres.git
branch: xlog-decoding-rebasing-remapping
commit: 6f819a68cd10938df126ee44a3cfef7f457c8e1b

Generate a long bytea instead of explicitly writing a long one.

It reduces the sql file size. Suggested by Merlin Moncure.

In commit ea177a3ba7a7901f6467eadb0a407e03d46462fd, the internal structure was changed.

Unbreak regression tests.

Remove contrib dependency.

The old Makefile relies heavily on been in the contrib tree. While
analysing the ethansf pull request, I realized that that premise does
not hold water because you could build it without a source code
tree.

Although I didn't have used his patch [1], this code was inspired on it.

[1] eulerto/wal2json#1

Fix regression tests.

The regression tests was very unstable. The main reason is the
associated LSN in some CONTEXT messages. Adding VERBOSITY setting to
supress those CONTEXT messages.

Update copyright.

Include LSN position for the next transaction.

This feature was discussed with @ethansf and is useful for tool that
consumes WAL using pg_recvlogical and similar tools. The LSN is
important to mark the apply progress. It stores the LSN pointing to the
end of commit record + 1. Hence, if the consuming tool crashes while
applying a transaction (JSON document), it can restart the
pg_recvlogical streaming from the last saved LSN position. This is
essential for such consuming tool to work properly.

Although, I did not have used Ethans' patch, this piece of code is
based on his idea.

Rename "next lsn" JSON key.

It is a comestic change because you can access values like:

foo['next lsn']

The other form (foo.bar) is not possible because of the blank space.

Fix handling of nonexistant columns in old tuple versions.

This fix was discoverd in bug yugabyte#13470 and commit
d47a1136e441cebe7ae7fe72d70eb8ce278d5cd6 fixes it.

pretty print option.

The new parameter 'pretty-print' chooses between a compact (without
spaces) or a pretty-print format.

The default changed to compact format since tools usually transform the
JSON objects.

Control how the stream is written.

The new parameter 'write-in-chunks' controls how to write in the
stream. If true, write occurs per tuple; else it writes at the end
of transaction.

The default is true. Hence, the behavior is the same as prior code.

Add a real README.

README has instructions to build and install wal2json. Also, there are
examples of use.

Add Coverity Scan Badge

Replace palloc with palloc0

Avoid garbage on JsonDecodingData.

Fixed tests results

Give an error instead of a warning in case of bad param

Added gitignore

Ignore .o files

Error message on unknown params reworded

Merge pull request yugabyte#13 from dvarrazzo/gitignore

Added gitignore

Merge pull request yugabyte#12 from dvarrazzo/err-on-bad-option

Give an error instead of a warning in case of bad param

Merge pull request yugabyte#11 from dvarrazzo/fix-tests

Fixed tests results

Default for write-in-chunks changed to false

Default for include-xids changed to false

style fix

Update copyright year information

Dropped include-xids=0 from test suite

Avoid leaking memory when we have some can-not-happen situations

Fix LICENSE

Forgot to replace organization.

Merge pull request yugabyte#18 from dvarrazzo/better-defaults

Better defaults

Use '1' and '0' instead of 't' and 'f' on tests

Forgot to add some static on functions

Daniele Varrazzo

Support generic WAL messages for logical decoding

This feature allows software to insert data into WAL stream that can be
read by wal2json. Those messages could be useful to control replication,
for example. Messages can be sent as transactional or not. Non-transactional messages mean that it is sent even if the transaction is rollbacked.

There was a PR yugabyte#20 for this same feature but I didn't use it. Indeed,
this code was dusty in my computer for a few months.

NOTE: 'message' test will fail on <= 9.5 because this feature was coded
in 9.6 (I don't want to complicate Makefile).

Fix generic WAL messages

As spotted in issue yugabyte#28, I committed a code that was not supposed to be
part of the original commit 645ab69.

Thanks to Julien Rouhaud (@rjuju) to bring it up.

Support type name with modifier

Issue yugabyte#31 asked for type name with type modifier (if available). This
commit implemented type modifier inclusion by default. It is possible to
use the old type name, if you use the option 'include-typmod' as 'false'.

I'll change the regression tests to use the new format later. I've added
a new test (typmod) to exercise this new option.

Fix a description

Remove superfluous includes

rel.h includes pg_class.h, index.h, and relcache.h. logical.h includes
output_plugin.h. All of the others are not necessary.

TupleDescAttr() support

Commit d34a74dd064af959acd9040446925d9d53dff15b introduced
TupleDescAttr(tupdesc, i) in back branches (9.4.14, 9.5.9, 9.6.5, 10). If the
version supports it, use this macro. This change will unbreak (future) version
11.

Support type oids

Issue yugabyte#37 asked for type oids in the output because psycopg2 uses it
directly for internal type manipulation. Although @lionel-panhaleux had
provided PR yugabyte#38, I didn't like the way it was implemented mainly because
it mixes options. The new option 'include-type-oids' has default to
false. I didn't provide tests for this feature.

Support not-null constraints information

Issue yugabyte#44 asked for not-null constraints information because the other
side could benefit from this. The new option 'include-not-null' will add
another name/value pair (columnoptionals) that contains false for
not-null constraint (it means that the value is not optional) or true,
otherwise. The default is false.

Improve JSON encoding: * use the postgres builtin escape_json() function * escape all type, schema, table, field names. Add tests. [yugabyte#35] * leave `\x` alone unless it's a bytea prefix [yugabyte#23] * escape generic logical decoding messages with tests

Update call to AllocSetContextCreate for PostgreSQL 9.6+

Merge pull request yugabyte#55 from davidfetter/pg_96_plus

Update call to AllocSetContextCreate for PostgreSQL 9.6+

Filter out tables

The new parameter 'filter-tables' is a comma separated value that
contains schema.table elements. Both schema and table can be * that
denotes all. Hence, *.table_1 means table_1 in all schemas and
schema_1.* means all tables in schema_1.

Some characters (space, single quote, comma, period, asterisk) have
special treatment because they are part of the comma separated value. If
any of these characters is part of the schema or table name, add a
backslash (\) before it. For example, table public."foo bar" should be
specified as 'public.foo\ bar'.

Select tables to retrieve data

The new parameter 'add-tables' select only data from those tables. Like
parameter 'filter-tables', it is a comma separated value that contains
schema.table elements. Both schema and table can be * that denotes all.
Escape also works for special characters. By default, all tables in all
schemas are selected.

Merge pull request yugabyte#40 from koordinates/encoding

Improve JSON encoding

Preserve unchanged toast values by default. Use `include-unchanged-toast=0` option to suppress.

Merge pull request yugabyte#50 from koordinates/include-unchanged-toast

Preserve unchanged toast values by default.

This commit introduces a behavior change since TOAST values will always be output from now on. If you want the previous behavior, add include-unchanged-toast=0.

Style fixes for last commit

Copyright update

Document parameters

Update examples

The examples don't reflect the defaults options. Also, rename tables in
SQL functions example (avoid table conflict with the other test).

Build wal2json out of the tree on Windows

Build out of the tree is a common task in Unix-like systems (PostgreSQL
has PGXS but it doesn't support Windows). This commit adds a project
file to build out of the tree on Windows. I also add minimal
instructions to build on Windows. I tested wal2json on MS Visual Studio
2017 (version 15.6.4) on a x64 platform.

Improve style

Fix oversight in copyright message

LICENSE is correct but C file has wrong copyright message.

Tweak message test

Per PR yugabyte#59, replace bytea value with valid data.

Sanity checks after selective table options

The sanity checks should be run after filter-tables and add-tables
options to avoid a bunch of unsolicited warnings (per issue yugabyte#64).

Refactor wal2json output

Almost all writes were handled in two different ways: with or without
pretty print. It means more lines of code and an error-prone code. The
new code avoids all of the pretty-print tests and also uses only one
setence per output. It reduces the code by more than 150 lines.

I also fix some comments (use only C89 comments) and improve style.

Revert include-unchanged-toast option

Per discussion in issue yugabyte#74, we can't rely on access unchanged TOAST
data because it is not in the WAL stream.

It reverts commits 947043e
ce82d73 and
d86a13b .

Add an error message to include-unchanged-toast parameter

Per issue yugabyte#74, parameter include-unchanged-toast wasn't safe, hence,
remove it. Since it is part of release 1.0, deprecate this option for
the next release.

Make wal2json logging less chatty

Suppress the "XXX argument is null" message. Decrease the TOAST message
level from WARNING to DEBUG1. Also, decrease some debug message levels.

Improve perf on deletes when repl ident full

Merge pull request yugabyte#82 from rkrage/repl-ident-full-perf

Improve perf on deletes when repl ident full

Add tests for include-timestamp option

Add tests for include-lsn option

Add tests for include-xids option

Merge pull request yugabyte#94 from paulczajka/add-tests

Add tests for timestamp, lsn, xids options

Avoid null pointer dereference with empty filter-tables

As spotted by clang:
wal2json.c:321:21: warning: Dereference of null pointer
                        rawstr = pstrdup(strVal(elem->arg));
                                         ^~~~~~~~~~~~~~~~~
/usr/include/postgresql/11/server/nodes/value.h:54:20: note: expanded from macro 'strVal'

Add option format-version

This option defines which format to use. Only version 1 is currently
supported. New formats will be available.

Note include-unchanged-toast is deprecated

Merge pull request yugabyte#102 from benjie/patch-1

Note `include-unchanged-toast` is deprecated

Merge pull request yugabyte#97 from xrmx/nullfiltertables

Avoid null pointer dereference with empty filter-tables

remove empty line

Add 'cd' command to README

Merge pull request yugabyte#107 from benjie/patch-2

Add 'cd' command to README

PATH should contain the parent folder of pg_config

Merge pull request yugabyte#108 from benjie/patch-3

PATH should contain the parent folder of pg_config

Fix add-tables to filter single-character schemas

Add extra_float_digits to tests

Postgres commit 02ddd499322ab6f2f0d58692955dc9633c2150fc for v12 and
later, changes the default number of trailing digits output for double
precision. Add extra_float_digits to keep regression tests stable.

Filter messages by prefix

The new parameter 'filter-msg-prefixes' filters messages with those
prefixes. It a comma separated value. By default no messages are
filtered. Per off-list discussion with @martinmarques

Updated copyright to 2019

Add messages by prefix

The new parameter 'add-msg-prefixes' adds only messages with these
prefixes. It is a comma separated value. By default all messages are
included. wal2json enforces this rule after 'filter-msg-prefixes'. Per
off-list discussion with @martinmarques

Merge pull request yugabyte#123 from davidfetter/copyright_2019

Updated copyright to 2019

Update instructions

Since version 10, we don't need to adjust parameters. Also, pg_hba.conf
changes its behavior. It matches normal connections instead of
(physical) replication connections.

Merge pull request yugabyte#113 from benjie/patch-4

Fix add-tables to filter single-character schemas

New wal2json format

This is a new format for wal2json. You can choose it using
option 'format-version' = 2. This format is completely different from
version 1.

Features are:

* one JSON per tuple;
* each JSON has an "action" (BEGIN, COMMIT, INSERT, UPDATE, DELETE,
MESSAGE);
* one (optional) JSON object for BEGIN/COMMIT;
* BEGIN contains xid, timestamp, and lsn;
* COMMIT contains xid, timestamp, and lsn;
* INSERT/UPDATE/DELETE contains lsn, schema, table, columns, identity;
* "columns" and "identity" are arrays of elements;
* each "columns" element is an object that contains name, type, value,
and optional;
* "identity" is an array of elements (REPLICA IDENTITY for UPDATE /
DELETE statements) that contains name, type, and value;
* MESSAGE contains xid, timestamp, lsn, transactional, prefix, and
content.

This new format solves the big transaction issue (that consumes a lot of
memory) since tuples aren't accumulate until the end of transaction to
be written. Users can control transactions using option
include-transaction that will emit a JSON at the beginning of the
transaction and another one at the end of the transaction.

Fix tests

Postgres 12 breaks random() predictability. Since we use the same tests
for all versions, replace random() with a sequence.

Improve regression test

message regression test used to fail with 9.4 and 9.5. That's because
message API is available in 9.6 and later. Filter out this test if the
major version is 9.4 or 9.5.

Add parameter include-origin

This parameter includes replication origin. Default is false. Format
version 2 adds origin information in BEGIN, COMMIT, CHANGE, and MESSAGE.
In format version 1, origin is also available. Since replication origin
was introduced in 9.5, 9.4 does not print origin information.

Support TRUNCATE command for logical decoding

This feature includes TRUNCATE commands into wal2json output. It was
introduced in PostgreSQL 11 which means that wal2json will support it
only for this version and on. Format version 2 will output it by
default. Format version 1 won't (maintain backward compatibility). (A
new option will be added to select actions -- insert, update, delete,
truncate -- and it should be possible to output TRUNCATE commands in
version 1 too).

Option actions

This new feature let users select actions they want to receive. Actions
supported are insert, update, delete, and truncate. This actions are
similar to PUBLICATION command.

Format version 1 won't enable truncate by default (to maintain backward
compatibility). Format version 2 will enable all options by default.

A new test was added (run only on Postgres v11 or later).  That's
because test includes truncate and it will only supported by v11 or
later.

Adjust default format version

Format version 1 will still remain the default version. wal2json does
not require a format version until a few commits ago (with no new
version released) and use a new format version will break calls that
does not inform format-version parameter. Let's give some time to users
adopt format-version parameter.

format-version 2 example

This new example shows the output for format-version 2.

Fix license oversight

The wal2json license is BSD 3-clause, however, there is a sentence in the
README that says it is released under PostgreSQL license (it is not). This is
an oversight in the commit 0b653ee.

Update copyright

Adjust instructions

Use default max_replication_slots value as suggestion for examples /
tests.

Fix oversight in option actions

This oversight was introduced in commit
aa431a9.

Issue yugabyte#148 reported by @dko-slapdash.

Fix memory leak while using add-tables and filter-tables

This issue was reported in PR yugabyte#147 by @haizz. The proposed fix was
incomplete (as I pointed out) and I fixed add-tables option and also
format 2 (add-tables and filter-tables).

Fix segfault on 32 bits

I forgot to pass a Datum and I'm surprised it haven't crashed yet.  This
commit fixes issue yugabyte#142.

Reported by @df7cb and confirmed by @cpaelzer.

Use correct format specifier

uint64 should use UINT64_FORMAT instead of %lu because format specifier
depends on a type probe. It suppresses a compiler warning.

Move some debug code for general function

Some debug messages were only printed if we are using format 1, let's
move them to general function.

Add macro for backward compatibility

UInt64GetDatum was introduced in 9.6. I forgot to test old stable
versions and didn't get it. Fortunately, @df7cb reported it.

Add parameter include-domain-data-type

This parameter replaces domain name with its underlying data type.
Default is false (backward compatibility). Both formats (1 and 2) are
supported. It is useful if you want to expose the "real" data type.

struct ReorderBufferTXN changed

Some struct members (including has_catalog_changes) was turned into a
single variable since v13 (cf postgres commit
a7b6ab5db1d35438112f74f3531354ddd61970b5).

Rephrase include-unchanged-toast docs

This parameter was deprecated some time ago. However, its description is
confusing users. Let's add a blink alert to NOT use it. It will
eventually be removed.

Add expected file for parameter include-domain-data-type

Commit e33df94 forgot to add expected
file.

Add parameter filter-origins

This parameter is useful in multidirectional replication solutions. It
can prevent replicating some data back and forth. Although, each
transaction has origin information, filter using this callback is more
efficient. This callback is available in 9.5+.

Improve Linux installation instructions

Since there are some issues that are opened from time to time, I add a
few instructions to Linux distros. I add instructions to install the
available packages from Debian/Ubuntu and RHEL/CentOS. I also write some
instructions to compile on those OSes. I hope it should clarify doubts
about installation.

Fix an oversight for old minor versions

Format 2 (1b0cbac) forgot to consider
the workaround I applied to TupleDescAttr
(5352cc4).  Since I did not test with
old minor versions, this bug was not caught by tests. This was reported
in issue yugabyte#162.

Improve configuration section

Add a comment explaining which parameters need adjust based on the
version users are using. Version 10 or later just need to change
wal_level, for example. Though, old versions have to adjust a few
parameters to use logical replication.

Add parameter include-column-positions

This parameter is useful to detect schema changes (since attnum are
numbered from 1 up). Both formats (1 and 2) are supported although
element names are different ("columnpositions" for format 1 and
"position" for format 2). Default is false.

Issue yugabyte#160

Add parameter include-default

This parameter adds "columndefaults" that contains all defaults values
(same as pg_get_expr). Both formats (1 and 2) are supported although
element names are different ("columndefaults" for format 1 and "default"
for format 2). Each default value is represented as string because
it can be an expression. Tests are included. Default is false.

I also added prototypes to 2 static functions.

Add parameter include-pk

This parameter adds primary key information if it is available. Both
formats (1 and 2) are supported. Each "pk" object provides column name
and its data type. Tests are included. Default is false.

Fix an oversight in format 2: nextlsn

Format 2 (1b0cbac) forgot to add
nextlsn. Format 1 includes this information if include-lsn is provided.
This information can be useful for applications that use LSN position as
a restart point.

Fix package name

As pointed by @MaxmaxmaximusAWS Debian/Ubuntu package name was wrong.
It fixes yugabyte#175.

Update instructions to Postgres 13

Since Postgres 13 was released a few days ago, let's update wal2json
instructions. I also replaced wal2json version with the latest version
(2.3).

Improve pretty print

Replace strcpy() and strncpy() with an assignment.

Improve documentation

Add 'message' usage to the examples. In comparison to the widely used
output plugins such as pgoutput and pglogical 2, wal2json is the only
plugin that supports message_cb.

Fix table names in the example

I renamed the tables in the first example but forgot to change the
script.

Remove unused code

Don't rely on index attribute names for primary key

If you rename a column that is part of the primary key, the attname for
the primary key index isn't renamed. Someone could argue that it is a
Postgres bug (I think it is a catalog inconsistency that could probably
be fixed eventually) but it seems the cure is worse than the disease
(see [1]).

The code was refactored to avoid comparison by attribute name, instead
it uses RelationGetIndexAttrBitmap() that obtains a bitmap of index
attribute numbers based on its kind (primary key, identity, ...).

Closes yugabyte#11

[1] https://www.postgresql.org/message-id/CAA8M49r%3D1XoE27tQ08sAPhft_ayBu4Vvib%2BubwX4SRqtUGJ%2B3g%40mail.gmail.com

Combine common code into functions

Instead of repeating code into different wal2json format output
functions, combine the common code into functions and use them.

Add docs on include-origin option.

document include-transaction option

Filter is not applied for TRUNCATE

Filter is failing to exclude tables that matchs the filter. Filter code
was refactored at commit a96dd31 but I
forgot to replace the filter in the truncate function. I include a
regression test to cover this code.

Merge pull request yugabyte#207 from olirice/master

Document include-transaction option

Avoid duplicate double quotes for type names

When an internal function already returns a type name with double
quotes, doesn't add another one.

Fix yugabyte#200

Stamp 2.4

default value was not always printed in v1

This bug prevents a default value to be printed in v1 after a TRUNCATE
statement.

Reported by @ls-guillaume-rebesche

Fixes yugabyte#225

Parameter write-in-chunks is only used for format version 1

If you set format-version to 2, parameter write-in-chunks has no effect.
Document this behavior.

Fixes yugabyte#228

Add Postgres 14 to the installation instructions

Refer to the latest Postgres version in the instructions.

Update copyright year

Add include-type-oids support in format-version 2

unbreak test on old versions

'default' test uses TRUNCATE that is not supported in v10 and previous
versions. Add an alternate test file to cover this case.

Array type does not print the correct type name

If the base type requires quotes and the type is an array, wal2json is
not quoting the whole type name. Instead, it was not considering the
brackets as part of the type name. Fix both format versions.

Fixes yugabyte#233

Lag tracking support

See lag times in pg_stat_replication.

Add PostgreSQL 15 support

Merge pull request yugabyte#247 from Naros/postgres15-compatibility

Add PostgreSQL 15 support

Stamp 2.5

Add option to output numeric data types as string. (yugabyte#255)

Add option to output numeric data types as string.

Data types like `numeric`, `real`, `double precision` supports `Infinity`,
`-Infinity` and `NaN` values. Currently these values output as `null` because
JSON specification does not recognize them as valid numeric values. This will
create problems for the users of wal2json who need these values to maintain
data integerity.

invalid JSON for non-transaction message in v1 (yugabyte#266)

Non-transactional messages contain only one object. It means comma
should not be provided for such JSON objects. If you are sending a
non-transactional message into a transaction, it should guarantee that
comma is only emitted for transactional commands. Hence, the previous
check is weak and it should also check for transactional information.

Fixes yugabyte#266

Correct DELETED tuple reference pg_decode_change_v1 (yugabyte#252)

Avoid variable shadowing

Fix typo

Fix lag tracking support for large transactions

Postgres fixed some time ago a logical replication timeout during large
transactions (commit
postgres/postgres@f95d53e). For
Postgres 16, a proper fix was added in the logical decoding but the
previous versions (10 to 14) a new function
(update-replication_progress) was added. It is the same function used by
pgoutput plugin.

Fixes yugabyte#270

Add version

Packages might rely on the version from source code instead of
extracting the version from the tarball file name.

If the WAL2JSON_VERSION contains "dev" suffix it means it is not a
released version. WAL2JSON_VERSION_NUM changes only for release. After a
release, the next commit will add the "dev" suffix that will be removed
only for the next release.

Replace tab with space

Update .gitignore

Fix include-pk when replica identity is full

In format 1, the primary key information was not included even though
the include-pk option is enabled. It happens when the table has a
primary key and the replica identity is full. Format 2 was not affected.

Fixes yugabyte#273

Update instructions with the latest PostgreSQL version

wal2json supports up to the latest supported version: 16.

Fix Red Hat package name

Update copyright year

Add PostgreSQL 17 support

Postgres 17 changed the ReorderBufferChange data structure (commit
postgres/postgres@08e6344).

Update build infrastructure for Windows

It updates MS Visual Studio from 2017 (v141) to 2022 (v143). It also
changes the target OS from 8.1 to 10.0 because Windows 8 was EOL for
some time (Windows 10 is also deprecated 3 months ago but since it is
the one I used to test these builds, I kept it. Feel free to update
WindowsTargetPlatformVersion if you are using a newer Windows version.)

It also includes a solution file (wal2json.sln -- although there aren't
multiple project files) that you can also use to trigger the build using
the following command:

msbuild wal2json.sln /verbosity:normal /p:Configuration=Release

I tested these builds with Postgres 10 and later on x86_64.

Stamp 2.6

Add 'src/postgres/third-party-extensions/wal2json/' from commit '75629c2e1e81a12350cc9d63782fc53252185d8d'

git-subtree-dir: src/postgres/third-party-extensions/wal2json
git-subtree-mainline: 66ed3a5
git-subtree-split: 75629c2

Test Plan: jenkins: compile only

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D36690
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/question This is a question
Projects
None yet
Development

No branches or pull requests

2 participants