v0.0.22 New Transform "modify" in beta state
Context
Historically, Jolt has been focused on fixing / operating on the "format" of the input JSON. It did not have a good solution / Transform for modifying the actual data, namely the "right hand side" of the input.
"modify" is the product of a bunch of refactoring of Shiftr.
- The "left hand side" of the spec is basically Shiftr; it does a similar parallel tree walk of the input and the spec, and reuses a lot of the Shiftr wildcard logic.
- The "right hand side" of modify determines what will be put in the data, while the "right hand side" of shift determines where the existing data should be moved to in the new output Map.
Note, Modify operates on the same in memory copy of the data, whereas Shift creates a new top level output Map to populate.
Usage and Special Characters
Modify comes in 3 "flavors" that control how it operates on the input data, both at a leaf level but also as it is walking the tree.
- "modify-overwrite-beta" -- (always writes)
- "modify-default-beta" -- (writes when key/index is missing or the value at key/index is null)
- "modify-define-beta" -- (writes when key/index is missing)
The idea is to pick the base flavor you want, and then tweak modify's behavior on a case by case basis by applying node specific overrides.
- "+key": "..." + means overwrite
- "~key": "..." ~ means default
- "_key": "..." _ means define
Additionally the "?" character can suppress the modify operation, to only operate if the key/index exists
- "key?": "..." ? means only act if input contains that key/index
Example, say we want to process a document that may or may not have an "address" subsection.
Spec :
{
"address?" : { // If address exists in the input JSON, then match otherwise skip
"~state" : "Texas" // means if "state" does not exist or is null, then make it be "Texas"
}
}
Functions
Everything on the "right hand side" of modify is actually a "function". In the example above the "right hand side" of "Texas" is actually the function "insert this literal value".
Beyond that, you can invoke "named" functions by using the "=" special character.
Example : Say the input has a list of scores, and we want the min and max values of the list.
input :
{
"scores" : [ 4, 2, 8, 7, 5 ]
}
spec :
{
// Pull individual data out of the scores array
"firstScore" : "=firstElement(@(1,scores))",
"lastScore" : "=lastElement(@(1,scores))",
// Assuming that the scores array is always size of 5
"scoreAtMidPoint" : "=elementAt(@(1,scores),2)"
}
output :
{
"scores" : [ 4, 2, 8, 7, 5 ],
"firstScore" : 4,
"lastScore" : 5,
"scoreAtMidPoint" : 8
}
Available functions:
Existence
- isPresent - returns if arg[0] is present
- notNull - returns if arg[0] is not null
- isNull - returns if arg[0] is null
Strings
- toUpper - returns the Uppercased version of the input
- toLower - returns the Lowercased version of the input
- concat - String concatenate all the supplied arguments
Lists
- toList - returns args as list
- firstElement - returns first element
- lastElement - returns last element
- elementAt - returns element at # index
Math
- max(args) - returns max element from the list of args
- supports int, long, double, and their toString() values
- min(args) - returns min element from list of args
- supports int, long, double, and their toString() values
- abs - returns abs of value
- supports int, long, double, and their toString() values
- supports list of the same inputs, returns list
Type Conversion
- toInteger - returns toInteger()
- toLong - returns toLong()
- toDouble - returns toDouble()
Note all off the Type Conversion functions support
- int, long, double, and their toString() values
- list of the same inputs, returns list
Elvis Operator
Use an Array on the "right hand side" to specify series of functions to run / try until one returns a non-Optional.absent() result.
- "key": [ "=func1", "=func2" ]
Purpose, allows for looking up a value, but if it is not found, applying a default.
Example, back to the "min and maxScore" example from above,
spec :
{
// if the input document did not have a "scores" entry,
// or it was empty,
// or it did not contain any 'numbers'
// then fall back to null and zero
"maxScore" : [ "=max(@(1, scores))", null],
"minScore" : [ "=min(@(1, scores)), 0]"
}
Details about the Java Implementation
Introduced a Function interface, marked as @deprecated as a warning
- As it is work in progress and implementation outside Jolt is discouraged.
Changed baseSpec#apply(...) signature to supply availability of input
- via Optional, which is known at lower level
- Matched signature change into Shiftr and Cardinality, now it is possible to introduce "?" into them if needed
Beta
The new Modify Transform in this release is sufficient for the Bazaarvoice internal project that needs it. Usecases beyond that are not fully thought out / tested.
In general, it still feels like it needs some work / polish before we consider it done, but the ability to do Type conversions and String concatenation are compelling and often requested features.
Plans
Things to do to finish the "beta"
- Make sure all the things can be "escaped".
- If possible, fully implement the existing behavior of the current "cardinality" and "default" transforms as functions of Modify, so they can be deprecated / removed.
- The "default" transform is dated and clunky, as it has not been refactored and curated like Shiftr has.
- Expand the set of built in functions.
- min and max work on lists
- average
- toBoolean
- toString
- length of list, string
- String join : concat is good, but it can have a fencepost problem
- sort
After beta
- Allow for users to specify their own functions.
- Tricky, as one wants to provide some guard rails, but at the same time not limit what ppl can do.
- Allow for a "registry" of Transforms and Functions.
- But not as a simple Static Map, as that can cause problems if to libraries internally used Jolt.
- Maybe "backport" the functions to Shiftr to allow for some interesting use-cases.
Even further out
- Allow functions to be specified inline with the spec as some kind of scripting language, aka JavaScript or Groovy
- Doubly tricky
- Build and arrangement of Jars, so that "jolt-core" suddenly does have a Groovy dependency
- The interface between Jolt code and the script language.