From 1c8b959be677c8e6035c1d0e61005631c674a88e Mon Sep 17 00:00:00 2001 From: Vadim Voitenko Date: Fri, 30 Aug 2024 13:31:38 +0300 Subject: [PATCH] doc: Added changelog v0.2.0b1 and v0.2.0b2. Revised some parts of doc. --- docs/commands/restore.md | 17 +-- docs/overrides/main.html | 2 +- docs/release_notes/greenmask_0_2_0_b1.md | 119 +++++++++++++++++++++ docs/release_notes/greenmask_0_2_0_b2.md | 125 +++++++++++++++++++++++ mkdocs.yml | 2 + 5 files changed, 257 insertions(+), 8 deletions(-) create mode 100644 docs/release_notes/greenmask_0_2_0_b1.md create mode 100644 docs/release_notes/greenmask_0_2_0_b2.md diff --git a/docs/commands/restore.md b/docs/commands/restore.md index 04fb7dea..01c61392 100644 --- a/docs/commands/restore.md +++ b/docs/commands/restore.md @@ -110,10 +110,11 @@ greenmask --config=config.yml restore DUMP_ID --inserts --overriding-system-valu ### Restoration in topological order By default, Greenmask restores tables in the order they are listed in the dump file. To restore tables in topological -order, use the `--restore-in-order` flag. This is particularly useful when your schema includes foreign key references -and -you need to insert data in the correct order. Without this flag, you may encounter errors when inserting data into -tables with foreign key constraints. +order, use the `--restore-in-order` flag. This flag ensures that dependent tables are not restored until the tables they +depend on have been restored. + +This is useful when you have the schema already created with foreign keys and other constraints, and you want to insert +data into the tables in the correct order or catch-up the target database with the new data. !!! warning @@ -143,9 +144,11 @@ greenmask --config=config.yml restore latest --pgzip The COPY command returns the error only on transaction commit. This means that if you have a large dump and an error occurs, you will have to wait until the end of the transaction to see the error message. To avoid this, you can use the `--batch-size` flag to specify the number of rows to insert in a single batch during the COPY command. If an error -occurs -during the batch insertion, the error message will be displayed immediately. The data will be committed **only if all -batches are inserted successfully**. +occurs during the batch insertion, the error message will be displayed immediately. The data will be committed **only +if all batches are inserted successfully**. + +This is useful when you want to be notified of errors as immediately as possible without waiting for the entire +table to be restored. !!! warning diff --git a/docs/overrides/main.html b/docs/overrides/main.html index af87952c..f9f98e99 100644 --- a/docs/overrides/main.html +++ b/docs/overrides/main.html @@ -1,7 +1,7 @@ {% extends "base.html" %} {% block announce %} - A new major beta version 0.2.0b1 is released + A new major beta version 0.2.0b2 (2024.08.30) is released {% endblock %} {% block outdated %} diff --git a/docs/release_notes/greenmask_0_2_0_b1.md b/docs/release_notes/greenmask_0_2_0_b1.md new file mode 100644 index 00000000..d49e2be5 --- /dev/null +++ b/docs/release_notes/greenmask_0_2_0_b1.md @@ -0,0 +1,119 @@ +# Greenmask 0.2.0b1 (pre-release) + +This **major beta** release introduces new features and refactored transformers, significantly enhancing Greenmask's +flexibility to better meet business needs. + +## Changes overview + +* [Introduced dynamic parameters in the transformers](../built_in_transformers/dynamic_parameters.md) + * Most transformers now support dynamic parameters where applicable. + * Dynamic parameters are strictly enforced. If you need to cast values to another type, Greenmask provides templates + and predefined cast functions accessible via `cast_to`. These functions cover frequent operations such as + `UnixTimestampToDate` and `IntToBool`. +* The transformation logic has been significantly refactored, making transformers more customizable and flexible than + before. +* [Introduced transformation engines](../built_in_transformers/transformation_engines.md) + * `random` - generates transformer values based on pseudo-random algorithms. + * `hash` - generates transformer values using hash functions. Currently, it utilizes `sha3` hash functions, which + are secure but perform slowly. In the stable release, there will be an option to choose between `sha3` and + `SipHash`. + +* [Introduced static parameters value template](../built_in_transformers/parameters_templating.md) + +## Notable changes + +### Core + +* Introduced the `Parametrizer` interface, now implemented for both dynamic and static parameters. +* Renamed most of the toolkit types for enhanced clarity and comprehensive documentation coverage. +* Refactored the `Driver` initialization logic. +* Added validation warnings for overridden types in the `Driver`. +* Migrated existing built-in transformers to utilize the new `Parametrizer` interface. +* Implemented a new abstraction, `TransformationContext`, as the first step towards enabling new feature transformation + conditions (#34). +* Optimized most transformers for performance in both dynamic and static modes. While dynamic mode offers flexibility, + static mode ensures performance remains high. Using only the necessary transformation features helps keep + transformation time predictable. + +### Documentation + +Documentation has been significantly refactored. New information about features and updates to transformer descriptions +have been added. + +### Transformers + +* [RandomEmail](../built_in_transformers/standard_transformers/random_email.md) - Introduces a new transformer that + supports both random and deterministic engines. It allows for flexible email value generation; you can use column + values in the template and choose to keep the original domain or select any from the `domains` parameter. + +* [NoiseDate](../built_in_transformers/standard_transformers/noise_date.md), [NoiseFloat](../built_in_transformers/standard_transformers/noise_float.md), [NoiseInt](../built_in_transformers/standard_transformers/noise_int.md) - + These transformers support both random and deterministic engines, offering dynamic mode parameters that control the + noise thresholds within the `min` and `max` range. Unlike previous implementations which used a single `ratio` + parameter, the new release features `min_ratio` and `max_ratio` parameters to define noise values more precisely. + Utilizing the `hash` engine in these transformers enhances security by complicating statistical analysis for + attackers, especially when the same salt is used consistently over long periods. + +* [NoiseNumeric](../built_in_transformers/standard_transformers/noise_numeric.md) - A newly implemented transformer, + sharing features with `NoiseInt` and `NoiseFloat`, but specifically designed for numeric values (large integers or + floats). It provides a `decimal` parameter to handle values with fractions. + +* [RandomChoice](../built_in_transformers/standard_transformers/random_choice.md) - Now supports the `hash` engine + +* [RandomDate](../built_in_transformers/standard_transformers/random_date.md), [RandomFloat](../built_in_transformers/standard_transformers/random_float.md), [RandomInt](../built_in_transformers/standard_transformers/random_int.md) - + Now enhanced with hash engine support. Threshold parameters `min` and `max` have been updated to support dynamic mode, + allowing for more flexible configurations. + +* [RandomNumeric](../built_in_transformers/standard_transformers/random_numeric.md) - A new transformer specifically + designed for numeric types (large integers or floats), sharing similar features with `RandomInt` and `RandomFloat`, + but tailored for handling huge numeric values. + +* [RandomString](../built_in_transformers/standard_transformers/random_string.md) - Now supports hash engine mode + +* [RandomUnixTimestamp](../built_in_transformers/standard_transformers/random_unix_timestamp.md) - This new transformer + generates Unix timestamps with selectable units (`second`, `millisecond`, `microsecond`, `nanosecond`). Similar in + function to `RandomDate`, it supports the hash engine and dynamic parameters for `min` and `max` thresholds, with the + ability to override these units using `min_unit` and `max_unit` parameters. + +* [RandomUuid](../built_in_transformers/standard_transformers/random_uuid.md) - Added hash engine support + +* [RandomPerson](../built_in_transformers/standard_transformers/random_person.md) - Implemented a new transformer that + replaces `RandomName`, `RandomLastName`, `RandomFirstName`, `RandomFirstNameMale`, `RandomFirstNameFemale`, + `RandomTitleMale`, and `RandomTitleFemale`. This new transformer offers enhanced customizability while providing + similar functionalities as the previous versions. It generates personal data such as `FirstName`, `LastName`, and + `Title`, based on the provided `gender` parameter, which now supports dynamic mode. Future minor versions will allow + for overriding the default names database. + +* Added [tsModify](../built_in_transformers/advanced_transformers/custom_functions/core_functions.md#tsmodify) - a new + template function for time.Time objects modification + +* Introduced a new [RandomIp](../built_in_transformers/standard_transformers/random_ip.md) transformer capable of + generating a random IP address based on the specified netmask. + +* Added a new [RandomMac](../built_in_transformers/standard_transformers/random_mac.md) transformer for generating + random Mac addresses. + +* Deleted transformers include `RandomMacAddress`, `RandomIPv4`, `RandomIPv6`, `RandomUnixTime`, `RandomTitleMale`, + `RandomTitleFemale`, `RandomFirstName`, `RandomFirstNameMale`, `RandomFirstNameFemale`, `RandomLastName`, and + `RandomName` due to the introduction of more flexible and unified options. + +#### Full Changelog: [v0.1.14...v0.2.0b1](https://github.com/GreenmaskIO/greenmask/compare/v0.1.14...v0.2.0b1) + +## Playground usage for beta version + +If you want to run a Greenmask [playground](../playground.md) for the beta version v0.2.0b1 execute: + +``` +git checkout tags/v0.2.0b1 -b v0.2.0b1 +docker-compose run greenmask-from-source +``` + +## Links + +Feel free to reach out to us if you have any questions or need assistance: + +* [Greenmask Roadmap](https://github.com/orgs/GreenmaskIO/projects/6) +* [Email](mailto:support@greenmask.io) +* [Twitter](https://twitter.com/GreenmaskIO) +* [Telegram](https://t.me/greenmask_community) +* [Discord](https://discord.gg/tAJegUKSTB) +* [DockerHub](https://hub.docker.com/r/greenmask/greenmask) diff --git a/docs/release_notes/greenmask_0_2_0_b2.md b/docs/release_notes/greenmask_0_2_0_b2.md new file mode 100644 index 00000000..41a83412 --- /dev/null +++ b/docs/release_notes/greenmask_0_2_0_b2.md @@ -0,0 +1,125 @@ +# Greenmask 0.2.0b2 (pre-release) + +This **major beta** release introduces new features such as the database subset, pgzip support, restoration in +topological and many more. It also includes fixes and improvements. + +## Preface + +This release is a major milestone that significantly expands Greenmask's functionality, transforming it into a simple, +extensible, and reliable solution for database security, data anonymization, and everyday operations. Our goal is to +create a core system that can serve as a foundation for comprehensive dynamic staging environments and robust data +security. + +## Notable changes + +* [**Database Subset**](../database_subset.md) - a new feature that allows you to define a subset of the database, + allowing you to scale down the dump size ([#110](https://github.com/GreenmaskIO/greenmask/issues/110)). This is + robust for multipurpose and especially useful for testing and development environments. It supports: + + * References with [NULL values](../database_subset.md/#references-with-null-values) - generate the LEFT JOIN query + for the FK reference with NULL values to include them in the subset. + * Supports [virtual references](../database_subset.md/#virtual-references) (virtual foreign keys) - create a logical + FK in Greenmask that will be used for subset dependencies graph. The virtual reference can be defined for a column + or an expression, allowing you to get the value from JSON and similar. + * Supports [circular references](../database_subset.md/#circular-reference) - Greenmask will automatically resolve + circular dependencies in the subset by generating a recursive query. The query is generated with integrity checks + of the subset ensuring that the data gathered from circular dependencies is consistent. + * Fully covered with documentation including [troubleshooting](../database_subset.md/#troubleshooting) + and [examples](../database_subset.md/#example-dump-a-subset-of-the-database). + * Supports FK and PK that have more than one column (or expression). + * **Multi-cycles resolution in one strong connected component (SCC)** is supported - Greenmask will generate a + recursive query for the SCC whether it is a single cycle or multiple cycles, making the subset system universal + for any database schema. + +* **pgzip** support for faster [compression](../commands/dump.md/#pgzip-compression) + and [decompression](../commands/restore.md/#pgzip-decompression) — setting `--pgzip` can speed up the dump and + restoration processes through parallel compression. In some tests, it shows up to 5x faster dump and restore + operations. +* [**Restoration in topological order**](../commands/restore.md/#restoration-in-topological-order) - This flag ensures + that dependent tables are not restored until the tables they depend on have been restored. This is useful when you + want to be notified of errors as immediately as possible without waiting for the entire table to be restored. +* **[Insert format](../commands/restore.md/#inserts-and-error-handling)** restoration - For a flexible restoration + process, Greenmask now supports data restoration in the `INSERT` format. It generates the insert statements based on + `COPY` records from the dump. You do not need to re-dump your data to use this feature; it can be defined in the + `restore` command. The list of new features related to the `INSERT` format: + + * Generate `INSERT` statements with the `**ON CONFLICT DO NOTHING**` clause if the flag `--on-conflict-do-nothing` + is set. + * **[Error exclusion list](http://127.0.0.1:8000/configuration/#restoration-error-exclusion)** in the config to skip + certain errors and continue inserting subsequent rows from the dump. + * Use cases - **incremental dump and restoration** for logical data. For example, if you have a database and you + want to insert data periodically from another source, this can be used together with the database subset and + transformations to catch up the target database. + +* [Restore data batching](../commands/restore.md/#restore-data-batching) ([#173](https://github.com/GreenmaskIO/greenmask/pull/174)) - + By default, the COPY protocol returns the error only on transaction commit. To override this behavior, use the + `--batch-size` flag to specify the number of rows to insert in a single batch during the COPY command. This is useful + when you want to control the transaction size and commit. +* [Introduced](https://github.com/GreenmaskIO/greenmask/pull/162) `keep_null` parameter for `RandomPerson` transformer. + +### Fixes and improvements + +* [Fixed](https://github.com/GreenmaskIO/greenmask/pull/140) `validate` command with the `--table` flag, which had the + wrong order of the table name representation `{{ table_name }}.{{ schema }}` instead of + `{{ schema }}.{{ table_name }}`. +* [Fixed](https://github.com/GreenmaskIO/greenmask/pull/137/commits/d421d6df2b55019235c81bdd22e341aa2509400b#diff-7a8b28dfeb9522d6af581535cbf61f3d2a744a68d4558515644d746fc9d43a2bL114) + `Row.SetColumn` out of range validation. +* [Fixed](https://github.com/GreenmaskIO/greenmask/pull/137/commits/d421d6df2b55019235c81bdd22e341aa2509400b#diff-ef03875763278adee04b936cae57bb51d57c4ec8e55816f73e98c0af479a2441L543) + `restoreWorker` panic caused when the worker received an error from pgx. +* [Fixed](https://github.com/GreenmaskIO/greenmask/pull/157/commits/03d7d7af3c569d629f44b29114caa74c14a47826) error + handling in the `restore` command. +* [Fixed](https://github.com/GreenmaskIO/greenmask/pull/157/commits/03d7d7af3c569d629f44b29114caa74c14a47826) restore + jobs now start a transaction for each table restoration and commit it after the table restoration is done. +* [Fixed](https://github.com/GreenmaskIO/greenmask/pull/157/commits/03d7d7af3c569d629f44b29114caa74c14a47826) + `--exit-on-error` works incorrectly in the `restore` command. Now, the `--exit-on-error` flag works correctly with the + `data` section. +* [Fixed](https://github.com/GreenmaskIO/greenmask/pull/159) transaction rollback in the `validate` command. +* [Fixed](https://github.com/GreenmaskIO/greenmask/pull/143) typo in documentation. +* [Fixed](https://github.com/GreenmaskIO/greenmask/pull/136) a CI/CD bug related to retrieving current tags. +* [Fixed](https://github.com/GreenmaskIO/greenmask/pull/141) the Docker image tag for `latest` to exclude specific + keywords. +* [Fixed](https://github.com/GreenmaskIO/greenmask/pull/161) a case where the hashing value was not set for each column + in the `RandomPerson` transformer. +* [Fixed](https://github.com/GreenmaskIO/greenmask/pull/165) original email value parsing conditions. +* [Subset docs revision](https://github.com/GreenmaskIO/greenmask/pull/169/files). +* [Fixes](https://github.com/GreenmaskIO/greenmask/pull/171) a case where data entries were excluded by exclusion + parameters such as `--exclude-table`, `--table`, etc. +* [Fixed](https://github.com/GreenmaskIO/greenmask/pull/172) zero bytes that were written in the buffer due to the wrong + buffer limit in the `Email` transformer. +* [Fixed](https://github.com/GreenmaskIO/greenmask/pull/175) a case where the overridden type of column via + `columns_type_override` did not work. +* [Fixed](https://github.com/GreenmaskIO/greenmask/pull/177) a case where an unknown option provided in the config was + just ignored instead of throwing an error. +* [Fixed](https://github.com/GreenmaskIO/greenmask/pull/178) a case where `min` and `max` parameter values were ignored + in transformers `NoiseDate`, `NoiseNumeric`, `NoiseFloat`, `NoiseInt`, `RandomNumeric`, `RandomFloat`, and + `RandomInt`. +* [Fixed](https://github.com/GreenmaskIO/greenmask/pull/180) TOC entry COPY restoration statement - added missing + newline and semicolon. Now backward pg_dump call `pg_restore 1724504511561 --file 1724504511561.sql` is backward + compatible and works as expected. +* [Fixed](https://github.com/GreenmaskIO/greenmask/pull/184) a case where dump/restore fails when masking tables with a + generated column. +* [Updated go version (v1.22) and dependencies](https://github.com/GreenmaskIO/greenmask/pull/188) +* [Revised installation section of doc](https://github.com/GreenmaskIO/greenmask/pull/187) +* A bunch of refactoring and code cleanup to make the codebase more maintainable and readable. + +#### Full Changelog: [v0.2.0b1...v0.2.0b2](https://github.com/GreenmaskIO/greenmask/compare/v0.2.0b1...v0.2.0b2) + +## Playground usage for beta version + +If you want to run a Greenmask [playground](../playground.md) for the beta version v0.2.0b2 execute: + +```bash +git checkout tags/v0.2.0b2 -b v0.2.0b2 +docker-compose run greenmask-from-source +``` + +## Links + +Feel free to reach out to us if you have any questions or need assistance: + +* [Greenmask Roadmap](https://github.com/orgs/GreenmaskIO/projects/6) +* [Email](mailto:support@greenmask.io) +* [Twitter](https://twitter.com/GreenmaskIO) +* [Telegram](https://t.me/greenmask_community) +* [Discord](https://discord.gg/tAJegUKSTB) +* [DockerHub](https://hub.docker.com/r/greenmask/greenmask) diff --git a/mkdocs.yml b/mkdocs.yml index 152ec551..6470dddc 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -125,6 +125,8 @@ nav: - Core custom functions: built_in_transformers/advanced_transformers/custom_functions/core_functions.md - Faker function: built_in_transformers/advanced_transformers/custom_functions/faker_function.md - Release notes: + - Greenmask 0.2.0b2: release_notes/greenmask_0_2_0_b2.md + - Greenmask 0.2.0b1: release_notes/greenmask_0_2_0_b1.md - Greenmask 0.1.14: release_notes/greenmask_0_1_14.md - Greenmask 0.1.13: release_notes/greenmask_0_1_13.md - Greenmask 0.1.12: release_notes/greenmask_0_1_12.md