The Agents that Crossref operates for Event Data. An Agent talks to a data source (usually an API) and builds Evidence Records, which are sent to the Percolator.
This codebase includes all Agents that Crossref operates. As such, all Agents are versioned in step. Prior to this repository all Agents had different codebases and versioning.
Most agents run on a schedule:
lein run start-schedule «agent-name»*
Some require daemons (e.g. Wikipedia and Twitter)
lein run run-daemons «agent-name»*
Run a one-off cycle of the scheduled job:
lein run run-schedule-once «agent-name»*
Current options:
Schedule
- hypothesis
- newsfeed
- reddit-links
- stackexchange
- twitter (update rules on schedule)
- carberry (FOR TESTING ONLY!)
- f1000
Daemon
- twitter (ingest stream)
- wikipedia
The following environment variables must be set:
AGENT_CHECKPOINT_S3_BUCKET_NAME
AGENT_CHECKPOINT_S3_KEY
AGENT_CHECKPOINT_S3_REGION_NAME
AGENT_CHECKPOINT_S3_SECRET
GLOBAL_ARTIFACT_URL_BASE
, e.g. https://artifact.eventdata.crossref.orgGLOBAL_JWT_SECRETS
GLOBAL_KAFKA_BOOTSTRAP_SERVERS
GLOBAL_STATUS_TOPIC
PERCOLATOR_INPUT_EVIDENCE_RECORD_TOPIC
PERCOLATOR_ROBOTS_CACHE_REDIS_HOST
PERCOLATOR_ROBOTS_CACHE_REDIS_PORT
TWITTER_GNIP_PASSWORD
TWITTER_GNIP_RULES_URL
TWITTER_GNIP_USERNAME
TWITTER_POWERTRACK_ENDPOINT
F1000_DUMP_PATH