-
Notifications
You must be signed in to change notification settings - Fork 1
Home
This fork includes pull request #1294 from @claussni to extend cross (). The cross function now accepts a 4th parameter defining a regular expression separator for splitting multi-value field values when joining projects.
WARNING: This fork will not be maintained and the code is not compatible with OpenRefine >= 3. There is another solution available for the following use case that works with newer versions of OpenRefine. See this comment and this gist with Jython code
- the original cross function expects normalized data (one foreign key per cell in base column). If you have multiple key values in one cell you need to split them first in multiple rows before you apply cross (and join results afterwards). This can be quite "expensive" if you work with bigger datasets.
- the extended cross function in this repository integrates the split and may provide a massive performance gain for this special use case
- see (long) discussion in issue #1289
clone this git repository
git clone https://github.com/felixlohmeier/OpenRefine.git
on Windows, type:
refine build
refine
on MacOSX or **nix, type:
./refine build
./refine
see also: https://github.com/OpenRefine/OpenRefine/wiki/Get-Development-Version
precompiled linux kit is available in releases
example data
- download and unzip example data from https://github.com/OpenRefine/OpenRefine/files/1422100/example-data-1289.zip
- create project "My Address Book" from My-Address-Book.csv
- create project "christmas gifts" from christmas-gifts.csv
- open project "christmas gifts", select function "add column based on this column..." on column "recipient"
- try the expressions below
Example 1 for splitting multi-value field values in from column and extract more than one value from the results:
forEach(
value.cross(
"My Address Book",
"friend",
","
),
r,
forNonBlank(
r.cells["address"].value,
v,
v,
""
)
).join("|")
Example 2 for splitting multi-value field values in from column, extract only the first value from the results and return a custom string if there is a match in foreign key ("friend") but no value in target column ("address"):
forEach(
value.split(","),
v,
forNonBlank(
v.cross(
"My Address Book",
"friend",
","
)[0].cells["address"].value,
x,
x,
"!"
)
).join("|")
Example 3 for splitting multi-value field values in from column, extract more than one value from the results (join with ",") and return a custom string if there is a match in foreign key ("friend") but no value in target column ("address"):
forEach(
value.split(","),
v,
forEach(
v.cross(
"My Address Book",
"friend",
","
),
r,
forNonBlank(
r.cells["address"].value,
x,
x,
"!"
)
).join(",")
).join("|")