Skip to content
Ronan Stokes edited this page Jul 16, 2021 · 17 revisions

Welcome to the data-generator wiki!

The test data generator is available for internal use at present.

It supports all major functionality and is code complete

Roadmap for initial release

steps:

  • soft release (with docs hosted as GitHub pages)
  • package release (with docs hosted via ReadTheDocs) and data generator available via package

Todo items:

  • fixup consistency of arg naming for withColumn, withColumnSpec, withColumnSpecs

  • fixup of function names for consistency (will follow PySpark SQL conventions)

    • fixup of public API method names (should be very few remaining) => will adopt camelCase throughout
    • fixup of private method names => will adopt camelCase thoughout
  • addition of doc sections on CDC and multi-table data generation

Online Help

  • Goto Github pages
  • click on view deployment link to access latest help
Clone this wiki locally