Skip to content

Config File Documentation

DonMartin76 edited this page Jul 24, 2015 · 17 revisions

The configuration of no-frills-transformation is done using an XML file. NFT uses an XSD to validate the input; the XSD can be found in the source control; if you have an XML editor which can take advantage of XSD files, it is highly recommended to put the XSD in a location where it can read it, and validate the configuration files upfront.

Table of content:

## Basic Configuration

The most basic configuration file looks as follows:

<Transformation xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                xsi:schemaLocation="http://haufe.com Config.xsd">

  <Source config="[source config]">[source]</Source>
  <Target config="[target config]">[target]</Target>

  <Fields>
    <Field name="[field name1]">[field1 expression]</Field>
    <Field name="[field name2]">[field2 expression]</Field>
    ...
  </Fields>
</Transformation>

NFT reads from one source, writes to one target, using the given field definitions and expressions.

The syntax of the <Source> and <Target> tags depend on the data source which is to be used, which in turn is depending on the Plugin, so see there for more information on how to read data from various sources.

See also:

## Using Command Line Parameters

In almost any tag content or attribute, command line parameters may be used. Command line parameters are given with the NoFrillsTransformation.exe command line in the following form: parameter=value. Example:

C:\Temp> NoFrillsTransformation.exe my_transform.xml source=this_file.csv

Inside the configuration file, the parameter can be referenced using the §parameter§ notation, for the above example §source§; an example for a source tag making use of a parameter can look as follows:

<Source config="delim=';'">file://..\..\input\§source§</Source>

The content of the parameter is replaced into the source URL in this case. The replacement of the parameters takes place very early in the process, so that it's safe to use parameters as part of file references, as in this case.

Reading Parameters from files

As an alternative, parameter values can be read from (text) files, using the @ prefix:

C:\Temp> NoFrillsTransformation.exe my_transform.xml basePath=@..\config\Dev\basePathDefinition.txt

NFT will read the content of the file ..\config\Dev\basePathDefinition.txt and assign it to the parameter basePath for further use inside the configuration file.

If the path contains space characters, use the following notation: param="@C:\My path\file.txt", i.e. put the @ character inside the double quotes.

## Source Definition

There are two different tags that may be used for defining a data source for NFT:

For a single source (pattern), use:

<Source>...</Source>

or, for multiple sources (source patterns), use:

<Sources>
  <Source>...</Source>
  ...
</Sources>

Defining a single source

The normal case is that the data source consists of a single <Source> tag. The data will be read out of the defined data source:

<Source config="[configuration]">[prefix://][location]</Source>

The configuration is passed to the plugin, which in turn is selected using the prefix URI notation. See the Plugin Documentation for the concrete syntax for the supported data sources (like CSV). Examples for prefixes are file://, sqlite://, or soql:// (for Salesforce queries).

Depending on the data source, the configuration can be a reference to a configuration file (this is the case for the Salesforce and SAP plugins), in other cases it's a configuration string.

Defining multiple sources

Instead of using just a single <Source> tag, multiple <Source> tags can be used within a containing <Sources> tag, like this:

<Sources>
  <Source config="[configuration1]">[prefix1://][location1]</Source>
  <Source config="[configuration2]">[prefix2://][location2]</Source>
  ...
</Sources>

The sources will be read sequentially in the order of appearance in the configuration file.

Some notes:

  • The data sources do not have to be of the same type; e.g. you can read from an SQL source first, then from a CSV file.
  • The source fields must match from the data sources! Different sets of field names are not supported. The field names are always read from the first data source defined.
  • You cannot both use <Source> and <Sources> at the top level in the configuration (this will render an error).

Defining Sources using Wild Cards

For all data source plugins reading files (i.e. using the file:// prefix), foremost this applies to the CSV Reader, you may use wild cards (* and ?) in the location definition. Example:

<Source config="delim=';'">file://my_table_??.csv</Source>

This will find all files matching the my_table_??.csv search pattern and create source readers for all of them (in the order they are found by the enumeration algorithm). This may be in alphabetic order.

Wildcards can be used in both cases: Single source and multiple source.

## Target Definition

Currently, only a single target for non-filtered output can be defined:

<Target config="[configuration]">[prefix://][location]</Target>

The logic behind the placeholders are the same as for the Source tag. What has to be passed in depends on the Plugin Configuration, so see there.

## FilterTarget Definition

Records which are filtered out can also be written to a separate target sink:

<FilterTarget config="[configuration]">[prefix://][location]</FilterTarget>

The logic behind the placeholders are the same as for the Source tag. What has to be passed in depends on the Plugin Configuration, so see there.

Records which are filtered out using the <SourceFilter> tag are written into this target. The FilterTarget may have a different field mapping than the normal Target, for logging data quality issues etc.

See also field mappings.

## Source Transformation Definition

A Source Transformation is defined as follows:

<SourceTransform>
  <Transform config="[configuration]">[prefix://][location]</Transform>
  <Parameters>
    <Parameter name="[param name]">[parameter expression]</Parameter>
    ...
  </Parameter>
  <Settings>
    <Setting name="[setting name]">[setting value]</Setting>
    ...
  </Settings>
</SourceTransform>

The <Transform> tag uses the same logic as the source tag, applied to the available transformers (see Plugin Documentation for a list of available transformers out of the box). How these values are interpreted is up to the plugin.

Follows a list of parameters which are used for calling the transformation. All parameters are treated as Expressions, which are - for each source record - evaluated and passed on to the transformation.

In contrast the the expressions used for the parameters, the <Setting> tags are not at all defined in neither usage nor content by the NFT framework. This is up to the Plugin to define which types of settings are supported and which effects they have.

Using a transformation can be a very powerful tool; the SAP Transformer can for example be used to call an RFC with parameters from the data source, and pass on the result into the following mapping.

## Outputting Source Fields

For convenience, NFT is able to output a text representation of a plain source-to-target mapping of the source data fields. Use the following tag to do this:

<OutputFields [noSizes="<true|false>"]>[true|false]</OutputFields>

The noSizes setting defaults to false; this means the output field definitions will contain "best guesses" of field lengths. The algorithm samples 100 source records and outputs the largest field length, rounded up to the nearest multiple of 10. It is recommended to switch this feature off.

The stdout output of NFT will then output a complete "nop" mapping. Please note that you must still define a dummy mapping in order for the configuration file to validate successfully against the XSD.

You can copy and paste the following configuration file snippet:

<OutputFields noSizes="true">true</OutputFields>

</Fields />

Add this and run NFT with valid source and target definitions to receive a sample mapping.

## Mapping Definition

The mapping of the output fields takes place inside the following XML structure:

<Fields [appendSource="[<true|false>]"]>
  <Field name="[field name]" [config="[field config]"]>[field expression]</Field>
  ...
</Fields>

The parameters of the field definition are as follows:

  • Attribute appendSource: Specify this in order to use the source fields additionally to the fields you specify here. Other fields will be appended after the source fields in the output target.
  • Attribute name - field name: required attribute which defines the field name in the target. This may currently not contain command line parameters.
  • Attribute config - field config: optional attribute which does not have a pre-defined semantics. The target writer (see Plugin Documentation) may use this as additional information to achieve certain effects; e.g. the Salesforce Writer uses this to signalize that an external ID is to be used for referencing Lookup types, or the Oracle Writer uses this to hint the writer of the underlying data base field type.
  • field expression: An Expression defining the content of the field in the output target. This can be a source field, a transformation output field, and/or a combination, i.e. an arbitrary expression.

Example:

<Fields>
  <Field name="FirstName">$FirstName</Field>
  <Field name="LastName">$LastName</Field>
  <Field name="FullName">Concat(Concat($FirstName, " "), $LastName)</Field>
</Fields>

This simple transformation outputs three fields from two source fields ($FirstName and $LastName); the first and last names are simply copied from the source using the field operator, the FullName uses an expression using the Concat Operator. For a full list of out of the box operators, see Operator Documentation.

Example:

<Fields appendSource="true" />

This will produce a 1:1 copy of the input source.

FilterField mappings

Equivalent to the <Fields> tag, there can exist a <FilterFields> tag which gives the field mappings for the FilterTarget definition:

<FilterFields [appendSource="[<true|false>]"]>
  <Field name="[field name]" [config="[field config]"]>[field expression]</Field>
  ...
</FilterFields>

The syntax is exactly the same as above, just that these field definitions are used for the filter target.

Example: A useful standard filter fields definition is the following:

<FilterFields appendSource="true">
  <Field name="Message">FilterLogLastMessage()</Field>
</FilterFields>

This outputs the rejected source record with its FilterLog message (in case you make use of the FilterLog operator).

See also:

## Include Files

For recurring lookup maps and custom operators, it is useful to create include files. Any valid configuration file (according to the XSD) can be used as an include file:

<Includes>
  <Include>[include config file]</Include>
  ...
</Includes>

The following tags are evaluated for include files:

Any other tags in the include configuration file are ignored, especially source and target definitions.

Pro-tip

It is sometimes convenient to create include files for the most usual lookup maps, which sometimes tend to be used repeatedly for transformations.

## Lookup Maps Definition

A lookup map can be read from an arbitrary source, which makes this a powerful way of mapping values from one domain to another.

<LookupMaps>
  <LookupMap name="[lookup name]" keyField="[key field]" [noFailOnMiss="<true|false>"]>
    <Source config="[lookup config]">[prefix://][location]</Source>
  </LookupMap>
  ...
</LookupMaps>

The lookup name defines under which name the lookup map can be called; this in effect defines an operator which can be used in any expression. The name must not interfer with any other operator name. See Lookup Operator for a comprehensive documentation of the lookup operator. The content of the source field key field must be unique for the lookup map to work correctly.

The optional attribute noFailOnMiss enables configuring what happens if the lookup map does not contain a given key. Pass true for the lookup map to return an empty string "" instead of throwing an exception causing the processing to cancel. Default behaviour is failing when missing (i.e., false).

The <Source> behaves exactly the same as the normal data source tag, meaning the lookup config, prefix and location depend on the Plugin for the data source.

See also: Lookup Operator

## Custom Operators Definition

A custom operator is defined in one of the following ways:

Using the Function tag

<CustomOperators>
  <CustomOperator name="[operator name]" paramCount="[param count]" returnType="[return type]">
    <Parameters>
      <Parameter name="[param name1]" type="[param type1]"/>
      ...
    </Parameters>
    <Function>[custom operator expression]</Function>
  </CustomOperator>
</CustomOperators>

The operator name gives the name of the custom operator. It may then be used as any other operator in any expression. Please note that the custom operators are evaluated top-down, i.e. if you use custom operators from other operators, they must be defined in the correct order.

The param count denotes the number of parameters the custom operator takes. This has to match the parameters defined in the <Parameters> tag (see below).

The attribute returnType gives the return type of the custom operator. Supported types are: string, bool and int.

The parameters must be defined in name and type using the <Parameter> tag. The parameters can then be used inside the custom operator expressions using the Parameter Operator %. Example:

<CustomOperator name="MakeEMail" paramCount="2" returnType="string">
  <Parameters>
    <Parameter name="firstName" type="string"/>
    <Parameter name="lastName" type="string"/>
  </Parameters>
  <Function>Concat(Concat(Concat(%firstName, "."), %lastName), "@mydomain.com")</Function>
</CustomOperator>

Inside a field definition, this may then be used as a normal operator, operating e.g. on source data fields:

<Field name="EMail">MakeEMail($FirstName, $LastName)</Field>

Note: It is not mandatory to use parameters in a custom operators. The custom operator may have no parameters, and instead just reference data source fields using the field operator $. The custom operator is then just referenced using an empty parameter list: MyOperator().

Using the Switch tag

<CustomOperators>
  <CustomOperator name="[operator name]" paramCount="[param count]" returnType="[return type]">
    <Parameters>
      <Parameter name="[param name1]" type="[param type1]"/>
      ...
    </Parameters>
    <Switch>
      <Case condition="[condition expression1]">[case expression1]</Case>
      ...
      <Otherwise>[otherwise expression]</Otherwise>
    </Switch>
  </CustomOperator>
</CustomOperators>

The parameters operator name, param count, return type and the definition of the operator parameters in <Parameters> is exactly the same for this syntax.

The only difference is that the <Function> tag is not used, but instead a switch-case syntax.

In order of appearance of the <Case> tag, the conditions are evaluated. The condition expression (following the normal Expression Syntax must evaluate to a boolean expression; if the condition returns true, the case expression is subsequently evaluated and returned by the custom operator.

If none of the <Case> conditions are met, the (required) <Otherwise> expression is evaluated (again as an expression) and returned.

Please note the existence of the Error Operator which cancels the processing in case it is called. This operator can be useful for checking for data errors.

See also Using Custom Operators.

## Source Filters Definition

Filters on the source data can be defined. If the filter conditions are not met, the source record is not processed, and NFT moves to the next records. Source filters are defined as follows:

<FilterMode>[AND|OR]</FilterMode>
<SourceFilters>
  <SourceFilter>[filter expression]</SourceFilter>
  ...
</SourceFilters>

The <FilterMode> takes either AND or OR as tag text; if set to AND, all conditions in the <SourceFilters> list must be met. If set to OR, one or more conditions must be met. The evaluation of the filters stop as soon as (in the AND case) one condition is not met, or (in the OR case) the first condition is met. The order of evaluation is as appearing in the configuration file.

The filter expression must follow the usual Expression Syntax and may contain any operator defined (including lookup maps and custom operators).

There exist special filter operators which may be useful for handling specific situations, like filtering for duplicates (FilterDuplicate Operator) or just finding keys once (FilterOnce Operator).

## Operator Configuration

Some operators support configuring (e.g., the FileWriteText Operator). The format of the configuration depends on the operator, and is treated as any string.

<OperatorConfigs>
  <OperatorConfig name="[operator name]">[config string]</OperatorConfig>
</OperatorConfigs>

Example:

<OperatorConfigs>
  <OperatorConfig name="FileWriteText">iso-8859-1</OperatorConfig>
</OperatorConfigs>

The string in the <OperatorConfig> text is passed to the operator for configuration. What the string actually means is defined by the operator; in this case, the FileWriteText operator expects an encoding as configuration string and will use this encoding for writing the text file.

## Logging Configuration

The syntax for the logging tag is as follows:

<Logger type="[type]" level="[level]">[configuration]</Logger>

The type of the logger depends on the available logging plugins (see Extending with Plugins), but out of the box, NFT comes with std, file and a nil logger. See below for a detailed description.

The level has to be one of info, warning and error.

As to the configuration of the logger, this depends on the plugin providing the logging functionality. See below for details.

Note: The default, if a <Logger> is not provided, is equivalent to:

<Logger type="std" level="info"/>

Logger type std

The logger type std simply outputs logging information to the console (stdout). It does not require any configuration; the config string is ignored, if it is present.

<Logger type="std" level="warning" />

This configuration outputs all errors and warnings to the console.

If no logger is given explicitly, the default is the std logger at info level (quite verbose logging).

Logger type file

The logger type file outputs the log information to a file. The file name must be specified as the configuration string of the logger component.

<Logger type="file" level="error">C:\Temp\log.txt</Logger>

This configuration outputs only errors to a log file at C:\Temp\log.txt.

Logger type nil

The nil logger type suppresses all kinds of log messages. You still get the exit code of the process though.

<Logger type="nil" />

The log level is not important in this case, as all logs are suppressed anyway. Same applies to the configuration: There is no configuration available.

## Progress Tick Configuration

As a default, NFT outputs a status message for every 1000 source records, stating Processed <n> records. The interval for which these messages (as INFO logs) occur can be tweaked using the following syntax:

<Transformation>
  ...
  <ProgressTick>[tickInterval]</ProgressTick>
  ...
</Transformation>

Use a negative value for tickInterval to complete turn off the progress ticks:

<ProgressTick>-1</ProgressTick>

Set a value which matches your input data and target speed, e.g. let NFT output a message every 25 records:

<ProgressTick>25</ProgressTick>
Clone this wiki locally