Once Cheshire and CheshirePy have been built, set up, and installed, and VIAF has been correctly indexed by cheshire (instructions available at the Cheshire repository), the snac match-merge code can be installed and executed.
-
Checkout the snac2 code from this repository using
git clone https://github.com/snac/snac2.git
. All the following paths will be relative to the root directory of that repository. -
Set up a postgres database (steps omitted). We will assume the database
snac_test
has been set up with usersnac_user
and passwordsnac_password
. -
Edit the configuration files:
- Copy
snac2/config/db.py.tpl
tosnac2/config/db.py
. Updatesnac2/config/db.py
to include the user, database, and password as follows:
DB_NAME = "snac_test" DB_USER = "snac_user" DB_PASS = "snac_password"
- Update
snac2/config/app.py
to include the cheshire config file, cheshire index name, log locations, any data directories (those containing EAC-CPF xml files), and merged output directory as follows:
VIAF_CONFIG = "/full/path/to/config.viaf" VIAF_INDEX_NAME = "viaf" # the name of your Cheshire database, aka the text inside the `FILETAG` Cheshire config. log = '/full/path/to/logfile.log' data_shortname = '/full/path/to/data/directory' # any number of shortname = path are allowed merged = '/full/path/to/merged/output/directory'
- Copy
-
Install the snac code using
sudo python setup.py develop
. This will install the necessary python packages that are needed by the snac code.- Change permissions on the package manager file used in this step to avoid command-line warnings on future python commands. This can be done with the command
chmod go-w ~/.python-eggs
.
- Change permissions on the package manager file used in this step to avoid command-line warnings on future python commands. This can be done with the command
-
Initialize the database using
python snac2/scripts/init_db.py
. -
Test that everything works by running
python shell.py
, which should enter you into the following shell:=== Welcome to SNAC Merge Tool Shell === using database postgresql+psycopg2://username:@/snac-test In [1]:
To exit, press Control+D
and answer the prompt. If there are no errors, you are ready to run the snac code.
Now you are ready to run the code.
- Loading Data
- Refer to the
snac2/config/app.py
file for thedata_shortname
variables set above. - Load data into the database using
python snac2/scripts/load.py data_shortname
. This will read through the EAC-CPF records and import them into the database.- Alternatively, if the target to be loaded is located within EAD_BASE_DIR and has the naming convention ead_<short_name>, you can load it directly without using a data_shortname, via load.py <short_name>
- Refer to the
- Matching Data to VIAF
-
Matching is performed using the command
python snac2/scripts/match.py
with the following arguments:- Type arguments:
-p
for Persons,-c
for Corporate Bodies, or-f
for Families. Note: only one at a time should be used. -s STARTS_AT
record to start at (optional)-e ENDS_AT
record to end at (optional)
- Type arguments:
-
The following command will match all three types of records:
python snac2/scripts/match.py -p && python snac2/scripts/match.py -c && python snac2/scripts/match.py -f
-
- Merging Record Groups
-
Merging is performed in two steps
-
Merge: this step creates the matched records from the record groups and assigns unique ARK ids. For testing purposes, it is very important to run without the
-r
command line argument. Without it, the script will ask for temporary ARKs. The real merge command can be run aspython snac2/scripts/merge.py -r -m
while the command that should be used for testing is
``` python snac2/scripts/merge.py -m ```
If the merge record creation process throws an exception about the NOID API being offline, use the -n switch to avoid requesting ARKs live and just create the merge records. Get a text file of ARKs, one per line, then use the -f switch to fill in the missing canonical IDs.
-
Assemble: this step exports the matched records out to XML files. It is run as
python snac2/scripts/merge.py -a
-
-