Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Materialize - Plugin for row transformation and fanout #17824

Open
pranavmayekar opened this issue Feb 19, 2025 · 1 comment

Comments

@pranavmayekar
Copy link

Feature Description

A Go-based plugin system that allows users to define Materialize row transformations using Go code rather than being limited to SQL expressions. This would enable arbitrary logic to be applied to rows as they are streamed and written to the target table.

Configuration

Within the --table-settings flag for Materialize, a new field (e.g., source_transformation) could be specified:

[
...
{

   "target_table": "foo",
   "source_transformation: "myCustomTransformationFunc",
   "source_transformation_params: {
	"key1": "val1",
	"key2": "val2",
   },
   "create_ddl": "CREATE TABLE foo ( ... )"

 }
...
]

Plugin Interface

Vitess would invoke a Go function (e.g., myCustomTransformationFunc) that implements the following interface:

type SourceTableTransformation interface {
	Map(ctx context.Context, inputRow []sqltypes.Value, params map[string]string) ([][]sqltypes.Value, []key.DestinationKeyspaceID, error)
} 

Input

  • inputRow: The raw row from the vStream, as an array of sqltypes.Value.
  • params: Arbitrary user defined params as provided in the –table-settings.

Return values

  • A list of output rows - each represented by a slice of type sqltypes.Value
  • A slice of type key.DestinationKeyspaceID representing the destination shards for the output rows.
  • An error (if transformation logic fails).

Use Case(s)

We would like to use Materialize for defining materialised views (hereafter referred to as MV) on our tables but our requirements currently fall outside the bounds of what the SQL-based source table expression can support.

We have a JSON column in our source table (let's call it data) that we cannot model as a traditional relational schema. This is because the "schema" for each table is not the same and it varies from use-case to use-case. Any MVs defined on these tables would also have differing JSON schemas based on the source table schema.

With this context in place, our requirements are:

  • Complex function logic on vIndex columns: We want to apply some custom logic on the data column to derive the output column on which the vIndex function is to be defined. Currently, complex expressions (such as MySQL functions or JSON_EXTRACT) are simply unsupported in Materialize for vIndex columns. Furthermore, this logic is highly specific to our use case and is non-trivial to write as a MySQL function.
  • Row transformation: We want to parse and transform the data column before it's written to the MV.
  • One-to-many row fanout: For a few of our MV use cases, we want the capability to "explode" a source table row into multiple rows in the MV. To be more specific, we have an array inside our data column and we want multiple output rows in the MV - one for each element in that array. This can be likened to a data denormalization query.
@pranavmayekar pranavmayekar added the Needs Triage This issue needs to be correctly labelled and triaged label Feb 19, 2025
@mattlord mattlord added Type: Feature Component: VReplication and removed Needs Triage This issue needs to be correctly labelled and triaged labels Feb 19, 2025
@github-project-automation github-project-automation bot moved this to Backlog in VReplication Feb 19, 2025
@derekperkins
Copy link
Member

@mattlord can correct me if I'm wrong, but I don't think this is necessarily a near-term case for the built in Materialize. This seems like a good case for #17222, where you can consume the vstream yourself and do whatever transforms are necessary. It's fundamentally the same mechanism as vreplication, but lets you control the entire lifecycle using Go.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Backlog
Development

No branches or pull requests

3 participants