Medication Extraction and Normalization (MedXN, pronounced [med-eks-en]
) is a Apache UIMA-based medication information extraction system that focuses on assigning the most specific RxNorm RxCUI to medication description. MedXN finds medication and its complete attributes and normalize them to the most specific RxNorm RxCUI using flexible matching, abbreviation expansion, inference, etc. MedXN uses externalized resources (ie, medication dictionary, attribute definitions, and regular expression attribute patterns) to allow a simple customization process for the needs of end users.
- Java 1.8
- Apache Maven
- Apache Ant
If you do not have the prerequisites installed, you may need to download the binary files from their official
download sites and add the unzipped bin directories to PATH
. These steps are dependent on your OS.
MedXN is runnable out-of-the-box: simply download MedXN.zip corresponding to the latest release, extract, and follow the instructions for running
To make changes and re-compile medxn from source code, first clone this repo and enter the project directory as root:
git clone https://github.com/OHNLP/MedXN.git
cd MedXN
git checkout dist
To download MedTagger and Backbone dependencies from github packages, you will have to add the following to your maven settings.xml (typically located in ~/.m2/settings.xml)
<servers>
<server>
<id>medtagger</id>
<username>your_github_username</username>
<password>your_github_access_token</password>
</server>
<server>
<id>backbone-maven</id>
<username>your_github_username</username>
<password>your_github_access_token</password>
</server>
</servers>
where your github access token corresponds to an appropriate github token with the read:packages permissions
To build MedXN, build the jar file using:
mvn clean install
If it goes smoothly, you will see MedXN-{$version}-SNAPSHOT-shaded.jar
under target
.
To further build a distributable directory, use the ant script:
ant dist
Once finished, the dist
directory should contain the required resources, scripts and the MedXN.jar
to be distributed.
The org.ohnlp.medxn.Main
class provides a simple command line interface (CLI) to process a directory of input files (e.g. testdata
) and write into a single output file (e.g. out.txt
).
To run the org.ohnlp.medxn.Main
class through CLI (under dist
):
In Windows:
java -cp resources;MedXN.jar org.ohnlp.medxn.Main $YOUR_INPUT_DIR $YOUR_OUTPUT_FILE
In Unix:
java -cp resources:MedXN.jar org.ohnlp.medxn.Main $YOUR_INPUT_DIR $YOUR_OUTPUT_FILE
Note: the delimiter of classpath is different in Windows (;
) from Unix (:
)
To execute MedXN for a collection of documents, make sure you are in the dist
directory built previously
and run runMedXNCVD.bat
(runMedXNCVD.sh
) or runMedXNCPE.bat
(runMedXNCPE.sh
)
which will test processable analysis engines and collection processing engines.
In Windows:
java -Xms512M -Xmx2000M -cp resources;MedXN.jar org.apache.uima.tools.cvd.CVD
java -Xms512M -Xmx2000M -cp resources;MedXN.jar org.apache.uima.tools.cpm.CpmFrame
In Unix/Linux:
java -Xms512M -Xmx2000M -cp resources:MedXN.jar org.apache.uima.tools.cvd.CVD
java -Xms512M -Xmx2000M -cp resources:MedXN.jar org.apache.uima.tools.cpm.CpmFrame
It will fire up UIMA Cas Visual Debugger (CVD) or the collection processing engine (CPE) GUI.
To visualize a specific aggregate engine through CVD, go to load AE under the Run menu, choose
$MedXNHOME/dist/resources/desc/medxndesc/aggregate_analysis_engine/MedXNAggregateTAE.xml
To process a collection of documents, go to the FILE menu and open the corresponding CPE descriptor file
available in $MedXNHOME/dist/resources/desc/collection_processing_engine/MedXN_CPE.xml
resources/desc
: example descriptors for Aggregate Analysis Engines and Collection Process Engine (CPE)testdata
: test input datatestdata_output
: expected output in xmi formatrunMedXNCVD.bat
(runMedXNCVD.sh
): scripts for Cas Visualize Debugger (CVD)runMedXNCPE.bat
(runMedXNCPE.sh
): scripts for CPE
Text: "Sulfasalazine [AZULFIDINE] 500-mg 2 tabs by mouth two times a day"
-
Medication Extraction
Eg)
Sulfasalazine [AZULFIDINE] RxCUI="9524::IN::202770::BN"
-
Attribute Extraction
Eg)
500-mg (strength), 2 (dose), tabs (form), mouth (route), two times a day (frequency)
-
Medication & Attribute Association
Eg)
<Sulfasalazine [AZULFIDINE]> + <500-mg, 2, tabs, mouth, two times a day>
-
Convert to RxNorm Standard
Eg)
sulfasalazine <in>500 mg<st> oral tablet<df>azulfidine<bn>
-
Convert to RxCUI Representation
Eg)
9524<in>500 mg<st>317541<df>202770<bn>
-
Normalize to Specific RxCUI
Eg)
Sulfasalazine 500 MG Oral Tablet [AZULFIDINE] RxCUI=208437::SBD
Under dist/resources/desc/medxndesc/
:
-
Aggregate TAE:
aggregate_analysis_engine/MedXNAggregateTAE.xml
-
Collection Processing Engine:
collection_processing_engine/MedXN_CPE.xml
-
Primary Annotators
ACLookupDrugAE.xml
: extracts medication nameMedAttrAE.xml
: extract medication attributesMedExtAE.xml
: associates medication name and its attributesMedNormAE.xml
: normalizes medication information to RxNorm standardACLookupDrugNormAE.xml
: maps medication information to a specific RxNorm nameMedNormRxCUIAE.xml
: convert medication information to RxCUI representationACLookupRxCUIDrugNormAE.xml
: maps RxCUI-represented medication information to a specific RxNorm name
-
Cas Consumer
MedXNCC.xml
: prints out results.- Parameters:
OutputFile
– output file path and nameDelimiter
– a delimiter of medication information in the output - Output format:
filename|medication::b::e|medication Rxcui|strength::b::e|dose::b::e|form::b::e|route::b::e|frequency::b::e|duration::b::e|specific RxNorm name|specific RxCUI|sentence (b: begin offset, e: end offset)
- Parameters:
Under dist/resources/medxnresources/lookup
RxNorm_BNIN.alphanum.BnInPinMinSyn.txt
: a dictionary for medication names compiled from RxNorm ingredient and brand name (ie, IN, PIM, MIN, BN, and manually compiled abbreviations). Also, includes any other medication variations that have the same RxCUI as the above medications.- Format: medication name (lower-cased, non-alphanumeric replaced with space, tokens are separated by tap)|RxCUI|RxNorm term type|RxNorm name
- Example:
aspirin|1191|IN|Aspirin
RxNorm_Name.norm.txt
: a dictionary of full medication descriptions complied from RxNorm SCDC, SCDF, SCD, SBDC, SBDF, SBD, and SY- Format: full medication description (lower-cased, [] removed, tokens are separated by tap)|RxCUI|RxNorm term type|RxNorm name
- Example:
aspirin 81 mg oral tablet|243670|SCD|Aspirin 81 MG Oral Tablet
RxCUI.norm.txt
: RxCUI representation of RxName.norm.txt – ie, medication name and dose form are replaced with RxCUI- Format: RxCUI representation|RxCUI|RxNorm term type|RxNorm name
- Example: 1191 81 mg 317541|243670|SCD|Aspirin 81 MG Oral Tablet
doseDict.txt
: list of RxNorm dose forms and its RxCUI- Format: dose form (lower-cased)|RxCUI|RxNorm name
- Example: oral tablet|317541|Oral Tablet
falseMedDic.txt
: list of potential false medication – ie, these are in RxNorm but potentially false drugs in clinical notes- Format: lower-cased medication
- Example: today
Under dist/resources/medxnresources
regExPatterns.txt
: contains medication attribute patterns written in Java regular expression (includes usage descriptions).