Skip to content

DataProfiler

Steve Cote edited this page Apr 23, 2017 · 1 revision

The Data Profiler listener is a tool used to track and analyze the data frames read into and written from the framework. It is will track metrics related to each of the fields in the the Data Frames passed through the engine.

The output is broken into 2 main sections, input frames and output frames. The Input Section reports on the metrics of all the data read into the engine (from the source frame) and is sampled when the onRead(TransformContext) method is called. The Output Section represents the field metrics for the target frame and is sampled when the onWrite(TransformContext) is called. The output file is written when the onEnd(OperationalContext) is fired.

Both the input and output sections will contain SQL commands to create tables which will hold the data sampled and is useful when trying to load data into relational databases with the JDBCWriter.

Sample output:

Transform Performance:
Job Start: Wed Dec 30 10:27:57 EST 2015
Job End: Wed Dec 30 10:27:57 EST 2015
Elapsed: 21ms  -  0.021 sec
Records Read: 50 - 2,380.95 records per second
Records Written: 0 - 0.00 records per second

Input Data Profile:
Read Count: 50
Byte Count: 12293
Field Count: 12
Field         Type  Nulls    Empty    Blank    Unique   Coincidence Chars        Avg      Minimum  Maximum  Sdev     Bytes        Avg      Minimum  Maximum  SDev     
Role          STR   0        0        0        3         96.0       221          4        0        6        0        621          12       14       0        0       
FirstName     STR   0        0        0        50         2.0       328          6        0        11       1        978          19       24       0        1       
LastName      STR   0        0        0        50         2.0       342          6        0        16       2        942          18       28       0        2       
MiddleInitial STR   0        0        0        23        56.0       50           1        0        1        0        900          18       18       0        0       
Gender        STR   0        0        0        2         98.0       50           1        0        1        0        550          11       11       0        0       
Username      STR   0        0        0        50         2.0       352          7        0        8        1        952          19       20       0        1       
Password      STR   0        0        0        50         2.0       501          10       0        15       2        1101         22       27       0        2       
eMail         STR   0        0        0        50         2.0       859          17       0        20       1        1309         26       29       0        1       
License       STR   0        0        0        50         2.0       1800         36       0        36       0        2350         47       47       0        0       
Rating        STR   0        0        0        50         2.0       447          8        0        11       0        947          18       21       0        0       
Joined        STR   0        0        0        49         4.0       500          10       0        10       0        1000         20       20       0        0       
Visits        STR   0        0        0        50         2.0       143          2        0        3        0        643          12       13       0        0       
Totals                                                              5593                                             12293  12.00 KB

Table Creation for Input
H2: CREATE TABLE TableName( Role VARCHAR(6), FirstName VARCHAR(11), LastName VARCHAR(16), MiddleInitial VARCHAR(1), Gender VARCHAR(1), Username VARCHAR(8), Password VARCHAR(15), eMail VARCHAR(20), License VARCHAR(36), Rating VARCHAR(11), Joined VARCHAR(10), Visits VARCHAR(3) )
Oracle: CREATE TABLE DBUser.TableName ( Role VARCHAR2(6), FirstName VARCHAR2(11), LastName VARCHAR2(16), MiddleInitial VARCHAR2(1), Gender VARCHAR2(1), Username VARCHAR2(8), Password VARCHAR2(15), eMail VARCHAR2(20), License VARCHAR2(36), Rating VARCHAR2(11), Joined VARCHAR2(10), Visits VARCHAR2(3) )
MySQL: CREATE TABLE DBUser.TableName ( Role VARCHAR(6), FirstName VARCHAR(11), LastName VARCHAR(16), MiddleInitial VARCHAR(1), Gender VARCHAR(1), Username VARCHAR(8), Password VARCHAR(15), eMail VARCHAR(20), License VARCHAR(36), Rating VARCHAR(11), Joined VARCHAR(10), Visits VARCHAR(3) )

Output Data Profile:
Write Count: 0
Byte Count: 0
Field Count: 0

In the above example, only the input section is populated. This is because the configuration (see below) only contains a reader and a listener, no writer was specified and no output records were sampled.

It is a common practice to run a job with only a reader and a Data Profiler listener so the nature of the data can be observed. From the data in the Data Profiler output, it is easier to design an output strategy and configure the writer appropriately.

Configuration

All that is needed to use the Data Profiler is the following listener section added to your configuration:

"Listeners": {
	"DataProfiler": { "target": "users-profile.txt" }
}

The Data Profiler can be added to any existing set of listeners; it does not need to be run on its own.

Complete Example

The following is an example configuration which is included in the demo package. It illustrates a simple CSV reader reading a file and the Data Profiler collecting and analyzing data:

{
	"Class" : "coyote.dx.Job",
	"Job" : {
		"Name" : "Profiler",
		"Reader" : {
			"class" : "CSVReader",
			"source" : "[#$app.home#]/demo/users.csv",
			"header" : true
		},
		"Listeners": {
			"DataProfiler": { "target": "users-profile.txt" }
		}
	}

}

Of course, you can substitute any reader ( or writer for that matter) in the configuration and the Data Profiler will analyze their data as well.

Home

  1. Concepts
  2. Features
  3. Transform Engine
  4. Quick Start
  5. Configuration
  6. Secrets Vault
  7. Readers
  8. Writers
    • List of Writers
    • Custom Writers
  9. Filters
    • Accept
    • Reject
    • Custom Filters
  10. Tasks
    • List of Tasks
    • Custom Tasks
  11. Validators
    • List of Validators
    • Custom Validators
  12. Listeners
    • List of Listeners
    • Custom Listeners
  13. Transforms
    • List of Transforms
    • Custom Transforms
  14. Mappers
  15. Context
  16. Databases
  17. Templates
  18. Logging
  19. Encryption
  20. Usage
  21. Expressions
  22. Examples
Clone this wiki locally