POWO data archives can be managed through the powo dashboard at /admin
Log in using the password.harvester
value from the appropriate secrets.yaml
file
from the powo-secrets
repository
When creating a new resource, only Title and URL are required. Image Prefix
is used when
the image paths in your archive are relative (e.g., need a CDN prefix). This is not used
for Digifolia image archives.
Skip indexing
can be used to optimise load times. When running a collection of large
jobs, it is much more efficient to skip indexing for the individual resources and do a
full re-index at the end.
Select Harvest names
when loading the backbone names and Harvest taxonomy
when
loading the taxonomy. Names must be loaded before taxonomy since all the taxonomy
harvest job does is make links between existing names.
A harvester job configuration is created for each resource. These can be run from the
Jobs tab. There is also a Re index all taxa
job configuration by default which runs a
full re index.
Job Lists can be created to run a list of jobs in sequence. By default there is a "Load everything" list to load all resources and do a full index at the end. When adding new resources, always remember to add them to this list so they are loaded during each data refresh.
Data configuration can be exported via Settings (gear icon) -> Export. This will export a json representation of all organisations, resources, jobs, and job lists.
Data configuration can only be imported into a blank database.
The automated powo rebuild uses this configuration file to load the full data set. It pulls the configuration from the powo-data repository.
To get data changes to persist through a refresh, you must check in the updated data configuration export to the appropriate file in powo-data.
When troubleshooting data load errors always start with checking the harvester logs:
Thins such as DwCA metadata errors, harvester resource file not found errors will show up here.
The second stop is the annotations table. Proper debugging tools have not been built into the admin interface yet, so for now there are some useful places to look in the database.
If using cloud shell, authenticate to the powo clusters (you will only need to do this once)
make NAME=<cluster> get-credentials
Then connect to the cluster required cluster. Show available clusters with
kubectl config get-contexts
Then select the required one to use with
kubectl config use-context gke_powop-1349_europe-west1-d_powo-dev
or
kubectl config use-context gke_powop-1349_europe-west1-d_powo-prod
Once authenticated and connected to the correct cluster, run
bin/powo-db
to connect to the database. It will prompt you for a password which can be
found in the corresponding secrets.yaml
file.
Find job configurations and the job id of their last run
select description, lastJobExecution, jobStatus, jobExitCode from jobconfiguration
For a given job, check the status of the records harvested. e.g., with job id 10
select code, count(*) from annotation where jobId = 10 group by code
If there are error codes, find the error messages associated with them.
select text from annotation where jobId = 44 and code = "BadField"