Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
merged of changes from remote
  • Loading branch information
ronanstokes-db committed Apr 9, 2023
2 parents 7de014c + 615c56b commit c859475
Show file tree
Hide file tree
Showing 15 changed files with 711 additions and 114 deletions.
19 changes: 14 additions & 5 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,23 +3,32 @@
## Change History
All notable changes to the Databricks Labs Data Generator will be documented in this file.

### Version 0.3.4

#### Changed
* Modified option to allow for range when specifying `numFeatures` with `structType='array'` to allow generation
of varying number of columns
* When generating multi-column or array valued columns, compute random seed with different name for each column
* Additional build ordering enhancements to reduce circumstances where explicit base column must be specified

#### Added
* Scripting of data generation code from schema (Experimental)
* Scripting of data generation code from dataframe (Experimental)
* Added top level `random` attribute to data generator specification constructor


### Version 0.3.3post2

#### Changed
* Fixed use of logger in _version.py and in spark_singleton.py
* Fixed template issues
* Document reformatting and updates, related code comment changes
* Modified option to allow for range when specifying `numFeatures` with `structType='array'` to allow generation
of varying number of columns
* When generating multi-column or array valued columns, compute random seed with different name for each column

### Fixed
* Apply pandas optimizations when generating multiple columns using same `withColumn` or `withColumnSpec`

### Added
* Added use of prospector to build process to validate common code issues
* Added top level `random` attribute to data generator specification constructor



### Version 0.3.2
Expand Down
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@ used in other computations
* use of SQL expressions in synthetic data generation
* plugin mechanism to allow use of 3rd party libraries such as Faker
* Use within a Databricks Delta Live Tables pipeline as a synthetic data generation source
* Generate synthetic data generation code from existing schema or data (experimental)

Details of these features can be found in the online documentation -
[online documentation](https://databrickslabs.github.io/dbldatagen/public_docs/index.html).
Expand All @@ -62,7 +63,7 @@ details of use and many examples.

Release notes and details of the latest changes for this specific release
can be found in the GitHub repository
[here](https://github.com/databrickslabs/dbldatagen/blob/release/v0.3.3post2/CHANGELOG.md)
[here](https://github.com/databrickslabs/dbldatagen/blob/release/v0.3.4/CHANGELOG.md)

# Installation

Expand Down
2 changes: 1 addition & 1 deletion dbldatagen/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@
from .datagen_constants import DEFAULT_RANDOM_SEED, RANDOM_SEED_RANDOM, RANDOM_SEED_FIXED, \
RANDOM_SEED_HASH_FIELD_NAME, MIN_PYTHON_VERSION, MIN_SPARK_VERSION
from .utils import ensure, topologicalSort, mkBoundsList, coalesce_values, \
deprecated, parse_time_interval, DataGenError, split_list_matching_condition
deprecated, parse_time_interval, DataGenError, split_list_matching_condition, strip_margins
from ._version import __version__
from .column_generation_spec import ColumnGenerationSpec
from .column_spec_options import ColumnSpecOptions
Expand Down
2 changes: 1 addition & 1 deletion dbldatagen/_version.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ def get_version(version):
return version_info


__version__ = "0.3.3post2" # DO NOT EDIT THIS DIRECTLY! It is managed by bumpversion
__version__ = "0.3.4" # DO NOT EDIT THIS DIRECTLY! It is managed by bumpversion
__version_info__ = get_version(__version__)


Expand Down
Loading

0 comments on commit c859475

Please sign in to comment.