Skip to content

Commit

Permalink
Merge pull request #133 from GreenmaskIO/feat/transformation_conditions
Browse files Browse the repository at this point in the history
feat: transformation conditions
  • Loading branch information
wwoytenko authored Oct 27, 2024
2 parents d58dfb7 + f3553ee commit 460e924
Show file tree
Hide file tree
Showing 58 changed files with 1,120 additions and 223 deletions.
1 change: 0 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -27,4 +27,3 @@ venv
.cache
# Binaries
cmd/greenmask/greenmask
pkg/toolkit/test/test
Original file line number Diff line number Diff line change
@@ -1,14 +1,22 @@
# Template custom functions

Within Greenmask, custom functions play a crucial role, providing a wide array of options for implementing diverse logic. Under the hood, the custom functions are based on the [sprig Go's template functions](https://masterminds.github.io/sprig/). Greenmask enhances this capability by introducing additional functions and transformation functions. These extensions mirror the logic found in the [standard transformers](../../standard_transformers/index.md) but offer you the flexibility to implement intricate and comprehensive logic tailored to your specific needs.
Within Greenmask, custom functions play a crucial role, providing a wide array of options for implementing diverse
logic. Under the hood, the custom functions are based on
the [sprig Go's template functions](https://masterminds.github.io/sprig/). Greenmask enhances this capability by
introducing additional functions and transformation functions. These extensions mirror the logic found in
the [standard transformers](../../standard_transformers/index.md) but offer you the flexibility to implement intricate
and comprehensive logic tailored to your specific needs.

Currently, you can use template custom functions for the [advanced transformers](../index.md):

* [Json](../json.md)
* [Template](../template.md)
* [TemplateRecord](../template_record.md)

and for the [Transformation condition feature](../../transformation_condition.md) as well.

Custom functions are arbitrarily divided into 2 groups:

- [Core functions](core_functions.md) — custom functions that vary in purpose and include PostgreSQL driver, JSON output, testing, and transformation functions.
- [Core functions](core_functions.md) — custom functions that vary in purpose and include PostgreSQL driver, JSON
output, testing, and transformation functions.
- [Faker functions](faker_function.md) — custom function of a *faker* type which generate synthetic data.
4 changes: 2 additions & 2 deletions docs/built_in_transformers/advanced_transformers/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,6 @@ Advanced transformers are modifiable anonymization methods that users can adjust
Below you can find an index of all advanced transformers currently available in Greenmask.

1. [Json](json.md) — changes a JSON content by using `delete` and `set` operations.
1. [Template](template.md) — executes a Go template of your choice and applies the result to a specified column.
1. [TemplateRecord](template_record.md) — modifies records by using a Go template of your choice and applies the changes via the PostgreSQL
2. [Template](template.md) — executes a Go template of your choice and applies the result to a specified column.
3. [TemplateRecord](template_record.md) — modifies records by using a Go template of your choice and applies the changes via the PostgreSQL
driver.
160 changes: 160 additions & 0 deletions docs/built_in_transformers/transformation_condition.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
# Transformation Condition

## Description

The transformation condition feature allows you to execute a defined transformation only if a specified condition is
met.
The condition must be defined as a boolean expression that evaluates to `true` or `false`. Greenmask uses
[expr-lang/expr](https://github.com/expr-lang/expr) under the hood. You can use all functions and syntax provided by the
`expr` library.

You can use the same functions that are described in
the [built-in transformers](/docs/built_in_transformers/advanced_transformers/custom_functions/index.md)

The transformers are executed one by one - this helps you create complex transformation pipelines. For instance
depending on value chosen in the previous transformer, you can decide to execute the next transformer or not.

## Record descriptors

To improve the user experience, Greenmask offers special namespaces for accessing values in different formats: either
the driver-encoded value in its real type or as a raw string.

- **`record`**: This namespace provides the record value in its actual type.
- **`raw_record`**: This namespace provides the record value as a string.

You can access a specific column’s value using `record.column_name` for the real type or `raw_record.column_name` for
the raw string value.

!!! warning

A record may always be modified by previous transformers before the condition is evaluated. This means Greenmask does
not retain the original record value and instead provides the current modified value for condition evaluation.

## Null values condition

To check if the value is null, you can use `null` value for the comparisson. This operation works compatibly
with SQL operator `IS NULL` or `IS NOT NULL`.

```text title="Is null cond example"
record.accountnumber == null && record.date > now()
```

```text title="Is not null cond example"
record.accountnumber != null && record.date <= now()
```

## Expression scope

Expression scope can be on table or specific transformer. If you define the condition on the table scope, then the
condition will be evaluated before any transformer is executed. If you define the condition on the transformer scope,
then the condition will be evaluated before the specified transformer is executed.

```yaml title="Table scope"
- schema: "purchasing"
name: "vendor"
when: 'record.accountnumber == null || record.accountnumber == "ALLENSON0001"'
transformers:
- name: "RandomString"
params:
column: "accountnumber"
min_length: 9
max_length: 12
symbols: "1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ"
```
```yaml title="Transformer scope"
- schema: "purchasing"
name: "vendor"
transformers:
- name: "RandomString"
when: 'record.accountnumber != null || record.accountnumber == "ALLENSON0001"'
params:
column: "accountnumber"
min_length: 9
max_length: 12
symbols: "1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ"
```
## Int and float value definition
It is important to create the integer or float value in the correct format. If you want to define the integer value you
must write a number without dot (`1`, `2`, etc.). If you want to define the float value you must write a number with
dot (`1.0`, `2.0`, etc.).

!!! warning

You may see a wrong comparison result if you compare int and float, for example `1 == 1.0` will return `false`.

## Architecture

Greenmask encodes the way only when evaluating the condition - this allows to optimize the performance of the
transformation if you have a lot of conditions that uses or (`||`) or and (`&&`) operators.

## Example: Chose random value and execute one of

In the following example, the `RandomChoice` transformer is used to choose a random value from the list of values.
Depending on the chosen value, the `Replace` transformer is executed to set the `activeflag` column to `true` or
`false`.

In this case the condition scope is on the transformer level.

```yaml
- schema: "purchasing"
name: "vendor"
transformers:
- name: "RandomChoice"
params:
column: "name"
values:
- "test1"
- "test2"
- name: "Replace"
when: 'record.name == "test1"'
params:
column: "activeflag"
value: "false"
- name: "Replace"
when: 'record.name == "test2"'
params:
column: "activeflag"
value: "true"
```

## Example: Do not transform specific columns

In the following example, the `RandomString` transformer is executed only if the `businessentityid` column value is not
equal to `1492` or `1`.

```yaml
- schema: "purchasing"
name: "vendor"
when: '!(record.businessentityid | has([1492, 1]))'
transformers:
- name: "RandomString"
params:
column: "accountnumber"
min_length: 9
max_length: 12
symbols: "1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ"
```

## Example: Check the json attribute value

In the following example, the `RandomString` transformer is executed only if the `a` attribute in the `json_data` column
is equal to `1`.

```yaml
- schema: "public"
name: "jsondata"
when: 'raw_record.json_data | jsonGet("a") == 1'
transformers:
- name: "RandomString"
params:
column: "accountnumber"
min_length: 9
max_length: 12
symbols: "1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ"
```

1 change: 1 addition & 0 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ require (
github.com/Masterminds/sprig/v3 v3.3.0
github.com/aws/aws-sdk-go v1.55.5
github.com/dchest/siphash v1.2.3
github.com/expr-lang/expr v1.16.7
github.com/ggwhite/go-masker v1.1.0
github.com/go-faker/faker/v4 v4.5.0
github.com/google/uuid v1.6.0
Expand Down
2 changes: 2 additions & 0 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@ github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc h1:U9qPSI2PIWSS1
github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
github.com/dchest/siphash v1.2.3 h1:QXwFc8cFOR2dSa/gE6o/HokBMWtLUaNDVd+22aKHeEA=
github.com/dchest/siphash v1.2.3/go.mod h1:0NvQU092bT0ipiFN++/rXm69QG9tVxLAlQHIXMPAkHc=
github.com/expr-lang/expr v1.16.7 h1:gCIiHt5ODA0xIaDbD0DPKyZpM9Drph3b3lolYAYq2Kw=
github.com/expr-lang/expr v1.16.7/go.mod h1:8/vRC7+7HBzESEqt5kKpYXxrxkr31SaO8r40VO/1IT4=
github.com/frankban/quicktest v1.14.6 h1:7Xjx+VpznH+oBnejlPUj8oUpdxnVs4f8XU8WnHkI4W8=
github.com/frankban/quicktest v1.14.6/go.mod h1:4ptaffx2x8+WTWXmUCuVU6aPUX1/Mz7zb5vbUoiM6w0=
github.com/fsnotify/fsnotify v1.7.0 h1:8JEhPFa5W2WU7YfeZzPNqzMP6Lwt7L2715Ggo0nosvA=
Expand Down
36 changes: 7 additions & 29 deletions internal/db/postgres/cmd/validate.go
Original file line number Diff line number Diff line change
Expand Up @@ -130,9 +130,15 @@ func (v *Validate) Run(ctx context.Context) (int, error) {
return nonZeroExitCode, fmt.Errorf("unable to build runtime context: %w", err)
}

if err = v.printValidationWarnings(); err != nil {
err = toolkit.PrintValidationWarnings(
v.context.Warnings, v.config.Validate.ResolvedWarnings, v.config.Validate.Warnings,
)
if err != nil {
return nonZeroExitCode, err
}
if v.context.IsFatal() {
return nonZeroExitCode, fmt.Errorf("fatal validation error")
}

if err = v.diffWithPreviousSchema(ctx); err != nil {
return nonZeroExitCode, err
Expand Down Expand Up @@ -280,34 +286,6 @@ func (v *Validate) createDocument(ctx context.Context, t *entries.Table) (valida
return doc, nil
}

func (v *Validate) printValidationWarnings() error {
// TODO: Implement warnings hook, such as logging and HTTP sender
for _, w := range v.context.Warnings {
w.MakeHash()
if idx := slices.Index(v.config.Validate.ResolvedWarnings, w.Hash); idx != -1 {
log.Debug().Str("hash", w.Hash).Msg("resolved warning has been excluded")
if w.Severity == toolkit.ErrorValidationSeverity {
return fmt.Errorf("warning with hash %s cannot be excluded because it is an error", w.Hash)
}
continue
}

if w.Severity == toolkit.ErrorValidationSeverity {
// The warnings with error severity must be printed anyway
log.Error().Any("ValidationWarning", w).Msg("")
} else {
// Print warnings with severity level lower than ErrorValidationSeverity only if requested
if v.config.Validate.Warnings {
log.Warn().Any("ValidationWarning", w).Msg("")
}
}
}
if v.context.IsFatal() {
return fmt.Errorf("fatal validation error")
}
return nil
}

func (v *Validate) getTablesToValidate() ([]*domains.Table, error) {
var tablesToValidate []*domains.Table
for _, tv := range v.config.Validate.Tables {
Expand Down
40 changes: 20 additions & 20 deletions internal/db/postgres/cmd/validate_utils/json_document_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,26 +15,6 @@ import (
"github.com/greenmaskio/greenmask/pkg/toolkit"
)

type testTransformer struct{}

func (tt *testTransformer) Init(ctx context.Context) error {
return nil
}

func (tt *testTransformer) Done(ctx context.Context) error {
return nil
}

func (tt *testTransformer) Transform(ctx context.Context, r *toolkit.Record) (*toolkit.Record, error) {
return nil, nil
}

func (tt *testTransformer) GetAffectedColumns() map[int]string {
return map[int]string{
1: "name",
}
}

func TestJsonDocument_GetAffectedColumns(t *testing.T) {
tab, _, _ := getTableAndRows()
jd := NewJsonDocument(tab, true, true)
Expand Down Expand Up @@ -87,6 +67,26 @@ func TestJsonDocument_GetRecords(t *testing.T) {
//r.SetRow(row)
}

type testTransformer struct{}

func (tt *testTransformer) Init(ctx context.Context) error {
return nil
}

func (tt *testTransformer) Done(ctx context.Context) error {
return nil
}

func (tt *testTransformer) Transform(ctx context.Context, r *toolkit.Record) (*toolkit.Record, error) {
return nil, nil
}

func (tt *testTransformer) GetAffectedColumns() map[int]string {
return map[int]string{
1: "name",
}
}

func getTableAndRows() (table *entries.Table, original, transformed [][]byte) {

tableDef := `
Expand Down
4 changes: 3 additions & 1 deletion internal/db/postgres/context/table.go
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,7 @@ func validateAndBuildTablesConfig(
// InitTransformation toolkit
if len(tableCfg.Transformers) > 0 {
for _, tc := range tableCfg.Transformers {
transformer, initWarnings, err := initTransformer(ctx, driver, tc, registry, types)
transformer, initWarnings, err := initTransformer(ctx, driver, tc, registry)
if len(initWarnings) > 0 {
for _, w := range initWarnings {
// Enriching the tables context into meta
Expand Down Expand Up @@ -155,6 +155,7 @@ func validateAndBuildTablesConfig(
func getTable(ctx context.Context, tx pgx.Tx, t *domains.Table) ([]*entries.Table, toolkit.ValidationWarnings, error) {
table := &entries.Table{
Table: &toolkit.Table{},
When: t.When,
}
var warnings toolkit.ValidationWarnings
var tables []*entries.Table
Expand Down Expand Up @@ -204,6 +205,7 @@ func getTable(ctx context.Context, tx pgx.Tx, t *domains.Table) ([]*entries.Tabl
RootPtSchema: table.Schema,
RootPtName: table.Name,
RootOid: table.Oid,
When: table.When,
}
if err = rows.Scan(&pt.Oid, &pt.Schema, &pt.Name); err != nil {
return nil, nil, fmt.Errorf("error scanning TableGetChildPatsQuery: %w", err)
Expand Down
13 changes: 6 additions & 7 deletions internal/db/postgres/context/transformers.go
Original file line number Diff line number Diff line change
Expand Up @@ -27,22 +27,21 @@ func initTransformer(
ctx context.Context, d *toolkit.Driver,
c *domains.TransformerConfig,
r *transformersUtils.TransformerRegistry,
types []*toolkit.Type,
) (*transformersUtils.TransformerContext, toolkit.ValidationWarnings, error) {
var totalWarnings toolkit.ValidationWarnings
td, ok := r.Get(c.Name)
if !ok {
totalWarnings = append(totalWarnings,
toolkit.NewValidationWarning().
SetMsg("transformer not found").
SetSeverity(toolkit.ErrorValidationSeverity).SetTrace(&toolkit.Trace{
SchemaName: d.Table.Schema,
TableName: d.Table.Name,
TransformerName: c.Name,
}))
AddMeta("SchemaName", d.Table.Schema).
AddMeta("TableName", d.Table.Name).
AddMeta("TransformerName", c.Name).
SetSeverity(toolkit.ErrorValidationSeverity),
)
return nil, totalWarnings, nil
}
transformer, warnings, err := td.Instance(ctx, d, c.Params, c.DynamicParams)
transformer, warnings, err := td.Instance(ctx, d, c.Params, c.DynamicParams, c.When)
if err != nil {
return nil, nil, fmt.Errorf("unable to init transformer: %w", err)
}
Expand Down
Loading

0 comments on commit 460e924

Please sign in to comment.