Skip to content
Ronan Stokes edited this page Oct 21, 2022 · 17 revisions

Welcome to the data-generator wiki!

The Databricks data generator (dbldatagen) is available as a PyPi package at https://pypi.org/project/dbldatagen/.

Roadmap for initial release

steps:

  • soft release (with docs hosted as GitHub pages)
  • package release (with docs hosted via Github pages) and data generator available via package

Current release feature set:

  • Data generation with support for generation of data conforming to statistical distributions
  • Faker integration via plugin mechanism
  • Support for generation of streaming data
  • Support for generation of multi-table data with consistency between primary and foreign keys
  • Support for generation of CDC style data
  • Support for generation of IOT style data
  • Supports generation of streaming data both in Databricks classic notebook environment and in Delta Live Tables pipelines

Online Help

Clone this wiki locally