- Java 11
- A Unix-based system (
unzip
must be installed (e.g., viaapt-get install zip unzip
)
- A Unix-based system (
- An active OHNLP Toolkit v3.0.0+ Install (OHNLP Backbone + MedTagger). Installation instructions have been included for your convenience
NB: If you already have an active OHNLP toolkit installation, these instructions can be skipped, please proceed directly to section III
NB: These steps require an internet connection to download requisite libraries. Once download is complete (through step 6), the entire OHNLPTK folder can be copied to a separate machine if execution in an isolated environment is desired.
- Run
git clone https://github.com/OHNLP/OHNLPTK_SETUP.git
- Run
cd OHNLPTK_SETUP
- Run the installation script
./install_or_update_ohnlptk.sh
(note that you may need to first enable execution viachmod +x install_or_update_ohnlptk.sh
) - Change directory into the created
OHNLPTK/
directory chmod +x ./run_pipeline_local.sh
and then run./run_pipeline_local.sh
- Follow the instructions presented onscreen to change configuration settings/job parallelism to suit your local execution environment
- Instead of pressing enter once configuration options are changed, ctrl+c to exit out. At this point you should have a working base OHNLP Toolkit install
- Download FamilyHistoryNLP.zip from the Github Release
- The zip file will contain three folders,
configs/
,modules/
andresources/
. Copy the contents to their respective folders in your OHNLP Tookit installation - Go to
configs/example_fh_nlp_filesystem_to_filesystem.json
and make a copy. Do not modify this example json directly as changes will be overwritten on updates. If desired, debugging pipelines populating sentence segmentation and entity extraction are also provided and should be similarly modified, underconfigs/example_debug_fh_{entity|segments}_nlp_filesystem_to_csv.json
- Pick one of the following:
- If files in/files out is suitable for your use case, change lines 9, 42, 62, 81, and 100 to the appropriate input/output directories.
- If you wish to change input/output formats, replace the CSVLoad instances with the correct backbone input and output function respectively. Supported formats include SQL, BigQuery, HCatalog, and JSON. Please refer to OHNLP Backbone Documentation
NB: Note that in order to have SNOMEDCT condition codes as is the standard, a separate mapping file is required due to SNOMEDCT licensing restrictions.
NB: this assumes you are using a local run. If you wish to run on some distributed platform e.g. Spark or GCP, use the appropriate run script
- If on Windows (via WSL) or Linux/Unix and wish to run via interactive mode: Run
./run_pipeline_local.sh
from the OHNLP Toolkit root directory and enter the number corresponding to the fh config when prompted - If on Mac or if you wish to run in non-interactive/headless mode: Run
./run_pipeline_local.sh your_config_name_here.json
Note that the precedingconfigs/
is intentionally omitted