Python Match/Merge for SNAC

Installing Match/Merge Code

Once Cheshire and CheshirePy have been built, set up, and installed, and VIAF has been correctly indexed by cheshire (instructions available at the Cheshire repository), the snac match-merge code can be installed and executed.

Checkout the snac2 code from this repository using git clone https://github.com/snac/snac2.git. All the following paths will be relative to the root directory of that repository.
Set up a postgres database (steps omitted). We will assume the database snac_test has been set up with user snac_user and password snac_password.

Edit the configuration files:

Copy snac2/config/db.py.tpl to snac2/config/db.py. Update snac2/config/db.py to include the user, database, and password as follows:

DB_NAME = "snac_test"
DB_USER = "snac_user"
DB_PASS = "snac_password"

Update snac2/config/app.py to include the cheshire config file, cheshire index name, log locations, any data directories (those containing EAC-CPF xml files), and merged output directory as follows:

VIAF_CONFIG = "/full/path/to/config.viaf"
VIAF_INDEX_NAME = "viaf" # the name of your Cheshire database, aka the text inside the `FILETAG` Cheshire config.
log = '/full/path/to/logfile.log'
data_shortname = '/full/path/to/data/directory'
# any number of shortname = path are allowed
merged = '/full/path/to/merged/output/directory'

Install the snac code using sudo python setup.py develop. This will install the necessary python packages that are needed by the snac code.
- Change permissions on the package manager file used in this step to avoid command-line warnings on future python commands. This can be done with the command chmod go-w ~/.python-eggs.
Initialize the database using python snac2/scripts/init_db.py.

Test that everything works by running python shell.py, which should enter you into the following shell:

=== Welcome to SNAC Merge Tool Shell ===
using database postgresql+psycopg2://username:@/snac-test
	 
	In [1]:

To exit, press Control+D and answer the prompt. If there are no errors, you are ready to run the snac code.

Running Match/Merge Code

Now you are ready to run the code.

Loading Data
- Refer to the snac2/config/app.py file for the data_shortname variables set above.
- Load data into the database using python snac2/scripts/load.py data_shortname. This will read through the EAC-CPF records and import them into the database.
  - Alternatively, if the target to be loaded is located within EAD_BASE_DIR and has the naming convention ead_<short_name>, you can load it directly without using a data_shortname, via load.py <short_name>
Matching Data to VIAF
- Matching is performed using the command python snac2/scripts/match.py with the following arguments:
  - Type arguments: -p for Persons, -c for Corporate Bodies, or -f for Families. Note: only one at a time should be used.
  - -s STARTS_AT record to start at (optional)
  - -e ENDS_AT record to end at (optional)
- The following command will match all three types of records:
```
python snac2/scripts/match.py -p && python snac2/scripts/match.py -c && python snac2/scripts/match.py -f
```
Merging Record Groups
- Merging is performed in two steps
  1. Merge: this step creates the matched records from the record groups and assigns unique ARK ids. For testing purposes, it is very important to run without the -r command line argument. Without it, the script will ask for temporary ARKs. The real merge command can be run as
```
python snac2/scripts/merge.py -r -m
```
  while the command that should be used for testing is
```
  ```
    python snac2/scripts/merge.py -m
  ```
```
  If the merge record creation process throws an exception about the NOID API being offline, use the -n switch to avoid requesting ARKs live and just create the merge records. Get a text file of ARKs, one per line, then use the -f switch to fill in the missing canonical IDs.
  1. Assemble: this step exports the matched records out to XML files. It is run as
```
python snac2/scripts/merge.py -a
```

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
snac2		snac2
.gitignore		.gitignore
CONFIG.VIAF4		CONFIG.VIAF4
README.md		README.md
setup.py		setup.py
shell.py		shell.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Python Match/Merge for SNAC

Installing Match/Merge Code

Running Match/Merge Code

About

Releases

Packages

Contributors 3

Languages

snac-cooperative/snac2

Folders and files

Latest commit

History

Repository files navigation

Python Match/Merge for SNAC

Installing Match/Merge Code

Running Match/Merge Code

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages