-
Notifications
You must be signed in to change notification settings - Fork 2
DataProfiler
The Data Profiler listener is a tool used to track and analyze the data frames read into and written from the framework. It is will track metrics related to each of the fields in the the Data Frames passed through the engine.
The output is broken into 2 main sections, input frames and output frames. The Input Section reports on the metrics of all the data read into the engine (from the source frame) and is sampled when the onRead(TransformContext)
method is called. The Output Section represents the field metrics for the target frame and is sampled when the onWrite(TransformContext)
is called. The output file is written when the onEnd(OperationalContext)
is fired.
Both the input and output sections will contain SQL commands to create tables which will hold the data sampled and is useful when trying to load data into relational databases with the JDBCWriter
.
Sample output:
Transform Performance:
Job Start: Wed Dec 30 10:27:57 EST 2015
Job End: Wed Dec 30 10:27:57 EST 2015
Elapsed: 21ms - 0.021 sec
Records Read: 50 - 2,380.95 records per second
Records Written: 0 - 0.00 records per second
Input Data Profile:
Read Count: 50
Byte Count: 12293
Field Count: 12
Field Type Nulls Empty Blank Unique Coincidence Chars Avg Minimum Maximum Sdev Bytes Avg Minimum Maximum SDev
Role STR 0 0 0 3 96.0 221 4 0 6 0 621 12 14 0 0
FirstName STR 0 0 0 50 2.0 328 6 0 11 1 978 19 24 0 1
LastName STR 0 0 0 50 2.0 342 6 0 16 2 942 18 28 0 2
MiddleInitial STR 0 0 0 23 56.0 50 1 0 1 0 900 18 18 0 0
Gender STR 0 0 0 2 98.0 50 1 0 1 0 550 11 11 0 0
Username STR 0 0 0 50 2.0 352 7 0 8 1 952 19 20 0 1
Password STR 0 0 0 50 2.0 501 10 0 15 2 1101 22 27 0 2
eMail STR 0 0 0 50 2.0 859 17 0 20 1 1309 26 29 0 1
License STR 0 0 0 50 2.0 1800 36 0 36 0 2350 47 47 0 0
Rating STR 0 0 0 50 2.0 447 8 0 11 0 947 18 21 0 0
Joined STR 0 0 0 49 4.0 500 10 0 10 0 1000 20 20 0 0
Visits STR 0 0 0 50 2.0 143 2 0 3 0 643 12 13 0 0
Totals 5593 12293 12.00 KB
Table Creation for Input
H2: CREATE TABLE TableName( Role VARCHAR(6), FirstName VARCHAR(11), LastName VARCHAR(16), MiddleInitial VARCHAR(1), Gender VARCHAR(1), Username VARCHAR(8), Password VARCHAR(15), eMail VARCHAR(20), License VARCHAR(36), Rating VARCHAR(11), Joined VARCHAR(10), Visits VARCHAR(3) )
Oracle: CREATE TABLE DBUser.TableName ( Role VARCHAR2(6), FirstName VARCHAR2(11), LastName VARCHAR2(16), MiddleInitial VARCHAR2(1), Gender VARCHAR2(1), Username VARCHAR2(8), Password VARCHAR2(15), eMail VARCHAR2(20), License VARCHAR2(36), Rating VARCHAR2(11), Joined VARCHAR2(10), Visits VARCHAR2(3) )
MySQL: CREATE TABLE DBUser.TableName ( Role VARCHAR(6), FirstName VARCHAR(11), LastName VARCHAR(16), MiddleInitial VARCHAR(1), Gender VARCHAR(1), Username VARCHAR(8), Password VARCHAR(15), eMail VARCHAR(20), License VARCHAR(36), Rating VARCHAR(11), Joined VARCHAR(10), Visits VARCHAR(3) )
Output Data Profile:
Write Count: 0
Byte Count: 0
Field Count: 0
In the above example, only the input section is populated. This is because the configuration (see below) only contains a reader and a listener, no writer was specified and no output records were sampled.
It is a common practice to run a job with only a reader and a Data Profiler listener so the nature of the data can be observed. From the data in the Data Profiler output, it is easier to design an output strategy and configure the writer appropriately.
All that is needed to use the Data Profiler is the following listener section added to your configuration:
"Listeners": {
"DataProfiler": { "target": "users-profile.txt" }
}
The Data Profiler can be added to any existing set of listeners; it does not need to be run on its own.
The following is an example configuration which is included in the demo
package. It illustrates a simple CSV reader reading a file and the Data Profiler collecting and analyzing data:
{
"Class" : "coyote.dx.Job",
"Job" : {
"Name" : "Profiler",
"Reader" : {
"class" : "CSVReader",
"source" : "[#$app.home#]/demo/users.csv",
"header" : true
},
"Listeners": {
"DataProfiler": { "target": "users-profile.txt" }
}
}
}
Of course, you can substitute any reader ( or writer for that matter) in the configuration and the Data Profiler will analyze their data as well.
- Concepts
- Features
- Transform Engine
- Quick Start
- Configuration
- Secrets Vault
-
Readers
- List of Readers
- Custom Readers
-
Writers
- List of Writers
- Custom Writers
-
Filters
- Accept
- Reject
- Custom Filters
-
Tasks
- List of Tasks
- Custom Tasks
-
Validators
- List of Validators
- Custom Validators
-
Listeners
- List of Listeners
- Custom Listeners
-
Transforms
- List of Transforms
- Custom Transforms
- Mappers
- Context
- Databases
- Templates
- Logging
- Encryption
- Usage
- Expressions
- Examples