IBM · daw3rd · Apr 29, 2024 · Apr 29, 2024
diff --git a/README.md b/README.md
@@ -35,12 +35,14 @@ Users can incorporate their logic for custom data transformation and then use th
 distributed computing framework to scalably apply the transform to their data. 
 
 Features of the toolkit: 
+
 - Collection of [scalable transformations](transforms) to expedite user onboarding
 - [Data processing library](data-processing-lib) designed to facilitate effortless addition and deployment of new scalable transformations
 - Operate efficiently and seamlessly from laptop-scale to cluster-scale supporting data processing at any data size
 - [Kube Flow Pipelines](https://www.kubeflow.org/docs/components/pipelines/v1/introduction/)-based [workflow automation](kfp) of transforms.
 
 Data modalities supported: 
+
 * Code - support for code datasets as downloaded .zip files of github repositories converted to . 
 [parquet](https://arrow.apache.org/docs/python/parquet.html) files. 
 * Language - Future releases will provide transforms specific to natural language, and like code transforms will operate on parquet files.

diff --git a/data-processing-lib/doc/advanced-transform-tutorial.md b/data-processing-lib/doc/advanced-transform-tutorial.md
@@ -13,6 +13,7 @@ removes duplicate documents across all files. In this tutorial, we will show the
   the operation of our _noop_ transform.
 
 The complete task involves the following:
+
 * EdedupTransform - class that implements the specific transformation
 * EdedupRuntime - class that implements custom TransformRuntime to create supporting Ray objects and enhance job output 
   statistics 
@@ -39,6 +40,7 @@ First, let's define the transform class.  To do this we extend
 the base abstract/interface class
 [AbstractTableTransform](../src/data_processing/transform/table_transform.py),
 which requires definition of the following:
+
 * an initializer (i.e. `init()`) that accepts a dictionary of configuration
   data.  For this example, the configuration data will only be defined by
   command line arguments (defined below).
@@ -138,6 +140,7 @@ First, let's define the transform runtime class.  To do this we extend
 the base abstract/interface class
 [DefaultTableTransformRuntime](../src/data_processing/ray/transform_runtime.py),
 which requires definition of the following:
+
 * an initializer (i.e. `init()`) that accepts a dictionary of configuration
   data.  For this example, the configuration data will only be defined by
   command line arguments (defined below).

diff --git a/data-processing-lib/doc/overview.md b/data-processing-lib/doc/overview.md
@@ -9,6 +9,7 @@ more complex transformations requiring coordination among transforming nodes.
 This might include operations such as de-duplication, merging, and splitting.
 The framework uses a plugin-model for the primary functions.  The key ones for
 developers of data transformation are:
+
 * [Transformation](../src/data_processing/transform/table_transform.py) - a simple, easily-implemented interface defines
 the specifics of a given data transformation.
 * [Transform Configuration](../src/data_processing/ray/transform_runtime.py) - defines
@@ -18,6 +19,7 @@ command line arguments specific to transform, and the short name for the transfo
 This might include provisioning of shared memory objects or creation of additional actors.
 
 To learn more consider the following:
+
 * [Transform Tutorials](transform-tutorials.md)
 * [Testing transformers with S3](using_s3_transformers.md)
 * [Architecture Deep Dive](architecture.md)

diff --git a/data-processing-lib/doc/simplest-transform-tutorial.md b/data-processing-lib/doc/simplest-transform-tutorial.md
@@ -15,11 +15,13 @@ in a single run of the transform.
   the operation of our _noop_ transform.
 
 We will **not** be showing the following:
+
 * The creation of a custom TransformRuntime that would enable more global
 state and/or coordination among the transforms running in other ray actors.
 This will be covered in an advanced tutorial.
 
 The complete task involves the following:
+
 * NOOPTransform - class that implements the specific transformation
 * NOOPTableTransformConfiguration - class that provides configuration for the 
 NOOPTransform, specifically the command line arguments used to configure it.
@@ -37,6 +39,7 @@ First, let's define the transform class.  To do this we extend
 the base abstract/interface class 
 [AbstractTableTransform](../src/data_processing/transform/table_transform.py),
 which requires definition of the following:
+
 * an initializer (i.e. `init()`) that accepts a dictionary of configuration
 data.  For this example, the configuration data will only be defined by
 command line arguments (defined below).

diff --git a/data-processing-lib/doc/transform-tutorials.md b/data-processing-lib/doc/transform-tutorials.md
@@ -13,6 +13,7 @@ In support of this model the class
 [AbstractTableTransform](../src/data_processing/transform/table_transform.py) 
 is expected to be extended when implementing a transform.
 The following methods are defined:
+
 * ```__init__(self, config:dict)``` - an initializer through which the transform can be created 
 with implementation-specific configuration.  For example, the location of a model, maximum number of
 rows in a table, column(s) to use, etc. 
@@ -37,6 +38,7 @@ not need this feature, a default implementation is provided to return an empty l
 
 ### Running in Ray
 When a transform is run using the Ray-based framework a number of other capabilities are involved:
+
 * [Transform Runtime](../src/data_processing/ray/transform_runtime.py) - this provides the ability for the
 transform implementor to create additional Ray resources 
 and include them in the configuration used to create a transform
@@ -53,6 +55,7 @@ This also provide the ability to supplement the statics collected by
 implement `main()` that makes use of a Transform Configuration to start the Ray runtime and execute the transforms.
 
 Roughly speaking the following steps are completed to establish transforms in the RayWorkers
+
 1. Launcher parses the CLI parameters using an ArgumentParser configured with its own CLI parameters 
 along with those of the Transform Configuration, 
 2. Launcher passes the Transform Configuration and CLI parameters to the [RayOrchestrator](../src/data_processing/ray/transform_orchestrator.py)
@@ -171,6 +174,7 @@ With these basic concepts in mind, we start with a simple example and
 progress to more complex transforms. 
 Before getting started  you may want to consult the 
 [transform project root readme](../../transforms/README.md) documentation.
+
 * [Simplest transform](simplest-transform-tutorial.md) - 
 Here we will take a simple example to show the basics of creating a simple transform
 that takes a single input Table, and produces a single Table.
@@ -180,6 +184,7 @@ resources (models, configuration, etc) for a transform.
 * [Porting from GUF 0.1.6](transform-porting.md)
 
 Once a transform has been built, testing can be enabled with the testing framework:
+
 * [Transform Testing](testing-transforms.md) - shows how to test a transform
 independent of the Ray framework.
 * [End-to-End Testing](testing-e2e-transform.md) - shows how to test the