Skip to content
This repository has been archived by the owner on Feb 21, 2020. It is now read-only.
Felix Lohmeier edited this page Feb 20, 2020 · 8 revisions

This fork includes pull request #1294 from @claussni to extend cross (). The cross function now accepts a 4th parameter defining a regular expression separator for splitting multi-value field values when joining projects.

WARNING: This fork will not be maintained and the code is not compatible with OpenRefine >= 3. There is another solution available for the following use case that works with newer versions of OpenRefine. See this comment and this gist with Jython code

rationale

  • the original cross function expects normalized data (one foreign key per cell in base column). If you have multiple key values in one cell you need to split them first in multiple rows before you apply cross (and join results afterwards). This can be quite "expensive" if you work with bigger datasets.
  • the extended cross function in this repository integrates the split and may provide a massive performance gain for this special use case
  • see (long) discussion in issue #1289

install

clone this git repository

git clone https://github.com/felixlohmeier/OpenRefine.git

on Windows, type:

refine build
refine

on MacOSX or **nix, type:

./refine build
./refine

see also: https://github.com/OpenRefine/OpenRefine/wiki/Get-Development-Version

precompiled linux kit is available in releases

usage

example data

Example 1 for splitting multi-value field values in from column and extract more than one value from the results:

forEach(
    value.cross(
        "My Address Book",
        "friend",
        ","
    ),
    r,
    forNonBlank(
        r.cells["address"].value,
        v,
        v,
        ""
    )
).join("|")

Example 2 for splitting multi-value field values in from column, extract only the first value from the results and return a custom string if there is a match in foreign key ("friend") but no value in target column ("address"):

forEach(
    value.split(","),
    v,
    forNonBlank(
        v.cross(
            "My Address Book",
            "friend",
            ","
        )[0].cells["address"].value,
        x,
        x,
        "!"
    )
).join("|")

Example 3 for splitting multi-value field values in from column, extract more than one value from the results (join with ",") and return a custom string if there is a match in foreign key ("friend") but no value in target column ("address"):

forEach(
    value.split(","),
    v,
    forEach(
        v.cross(
            "My Address Book",
            "friend",
            ","
        ),
        r,
        forNonBlank(
            r.cells["address"].value,
            x,
            x,
            "!"
        )
    ).join(",")
).join("|")
Clone this wiki locally