README

For regular updates, subscribe to our google group at: https://groups.google.com/forum/#!forum/proppr

==============
2.0 QUICKSTART
==============

1. Write a rulefile as *.ppr:

$ cat > test.ppr
predict(X,Y) :- hasWord(X,W),isLabel(Y),related(W,Y)  {r}.
related(W,Y) :- {w(W,Y)}.
^D

2. Compile a rulefile:

$ python src/scripts/compile.py serialize test.ppr | tee test.wam
0		comment	predict(-1,-2) :- hasWord(-1,-3), isLabel(-2), related(-3,-2) {r}  #v:['X', 'Y', 'W'].
1	predict/2	allocate	3	['W', 'Y', 'X']
2		initfreevar	-1	-2
3		initfreevar	-2	-1
4		fclear
5		fpushstart	r	0
6		freport
7		pushboundvar	-1
8		pushfreevar	-3
9		callp	hasWord/2
10		pushboundvar	-2
11		callp	isLabel/1
12		pushboundvar	-3
13		pushboundvar	-2
14		callp	related/2
15		returnp
16		comment	related(-1,-2) :-  {w(-1,-2)}  #v:['W', 'Y'].
17	related/2	allocate	2	['Y', 'W']
18		initfreevar	-1	-2
19		initfreevar	-2	-1
20		fclear
21		fpushstart	w	2
22		fpushboundvar	-1
23		fpushboundvar	-2
24		freport
25		returnp

3. Write arity-2 facts in a database file as *.graph:

$ cat > test.graph
hasWord	dh	a
hasWord	dh	pricy
hasWord	dh	doll
hasWord	dh	house
hasWord	ft	a
hasWord	ft	little
hasWord	ft	red
hasWord	ft	fire
hasWord	ft	truck
hasWord	rw	a
hasWord	rw	red
hasWord	rw	wagon
hasWord	sc	a
hasWord	sc	pricy
hasWord	sc	red
hasWord	sc	sports
hasWord	sc	car
...
^D

4. Write arity-N facts in a database file as *.cfacts:

$ cat > test.cfacts
isLabel	neg
isLabel	pos
^D

5. Write training examples:

$ cat > test_train.data
predict(dh,Y)	-predict(dh,neg)	+predict(dh,pos)
predict(ft,Y)	-predict(ft,neg)	+predict(ft,pos)
predict(rw,Y)	-predict(rw,neg)	+predict(rw,pos)
predict(sc,Y)	-predict(sc,neg)	+predict(sc,pos)
...
^D

6. Ground training examples:

$ java -cp conf:bin:lib/* edu.cmu.ml.proppr.Grounder --programFiles test.wam:test.graph:test.cfacts --queries test_train.data --grounded test_train.grounded
Time 461 msec
Done.

7. Train parameters:

$ java -cp conf:bin:lib/* edu.cmu.ml.proppr.Trainer --train test_train.grounded --params test.wts
 INFO [Trainer] edu.cmu.ml.proppr.util.ModuleConfiguration:
  Walker: edu.cmu.ml.proppr.learn.L2PosNegLossTrainedSRW
 Trainer: edu.cmu.ml.proppr.Trainer
Weighting Scheme: edu.cmu.ml.proppr.learn.tools.ReLUWeightingScheme

 INFO [Trainer] Training model parameters on test_train.grounded...
 INFO [Trainer] epoch 1 ...
 INFO [Trainer] epoch 2 ...
 INFO [Trainer] epoch 3 ...
 INFO [Trainer] epoch 4 ...
 INFO [Trainer] epoch 5 ...
 INFO [Trainer] Finished training in 650 ms
 INFO [Trainer] Saving parameters to test.wts...

8. Write testing examples:

$ cat > test_testing.data 
predict(pb,Y)	-predict(pb,neg)	+predict(pb,pos)
predict(yc,Y)	-predict(yc,neg)	+predict(yc,pos)
predict(rb2,Y)	-predict(rb2,neg)	+predict(rb2,pos)
...
^D

9. Get untrained rankings:

$ java -cp conf:bin:lib/* edu.cmu.ml.proppr.QueryAnswerer --programFiles test.wam:test.graph:test.cfacts --queries test_testing.data --solutions pre.testing.solutions.txt
edu.cmu.ml.proppr.QueryAnswerer.QueryAnswererConfiguration:
  Prover: edu.cmu.ml.proppr.prove.DprProver
Weighting Scheme: edu.cmu.ml.proppr.learn.tools.ReLUWeightingScheme

 INFO [QueryAnswerer] Running queries from test_testing.data; saving results to pre.testing.solutions.txt
 INFO [QueryAnswerer] Querying: predict(pb,-1)  #v:[?].
 INFO [QueryAnswerer] Writing 2 solutions...
 INFO [QueryAnswerer] Querying: predict(yc,-1)  #v:[?].
 INFO [QueryAnswerer] Writing 2 solutions...
 INFO [QueryAnswerer] Querying: predict(rb2,-1)  #v:[?].
 INFO [QueryAnswerer] Writing 2 solutions...
 INFO [QueryAnswerer] Querying: predict(rp,-1)  #v:[?].
 INFO [QueryAnswerer] Writing 2 solutions...
 INFO [QueryAnswerer] Querying: predict(bp,-1)  #v:[?].
 INFO [QueryAnswerer] Writing 2 solutions...
 INFO [QueryAnswerer] Querying: predict(he,-1)  #v:[?].
 INFO [QueryAnswerer] Writing 2 solutions...
 INFO [QueryAnswerer] Querying: predict(wt,-1)  #v:[?].
 INFO [QueryAnswerer] Writing 2 solutions...

10. Measure untrained performance:

$ python scripts/answermetrics.py --data test_testing.data --answers pre.testing.solutions.txt --metric mrr --metric recall
==============================================================================
metric mrr (Mean Reciprocal Rank): averages 1/rank for all positive answers
. micro: 0.5
. macro: 0.5
. details:
. .  predict(he,-1)  #v:[?]. 	0.5
. .  predict(pb,-1)  #v:[?]. 	0.5
. .  predict(yc,-1)  #v:[?]. 	0.5
. .  predict(bp,-1)  #v:[?]. 	0.5
. .  predict(rb2,-1)  #v:[?]. 	0.5
. .  predict(wt,-1)  #v:[?]. 	0.5
. .  predict(rp,-1)  #v:[?]. 	0.5
==============================================================================
metric recall (Recall): fraction of positive examples that are proposed as solutions anywhere in the ranking
. micro: 1.0
. macro: 1.0
. details:
. .  predict(he,-1)  #v:[?]. 	1.0
. .  predict(pb,-1)  #v:[?]. 	1.0
. .  predict(yc,-1)  #v:[?]. 	1.0
. .  predict(bp,-1)  #v:[?]. 	1.0
. .  predict(rb2,-1)  #v:[?]. 	1.0
. .  predict(wt,-1)  #v:[?]. 	1.0
. .  predict(rp,-1)  #v:[?]. 	1.0

11. Get trained rankings (note --params; --solutions):

$ java -cp conf:bin:lib/* edu.cmu.ml.proppr.QueryAnswerer --programFiles test.wam:test.graph:test.cfacts --queries test_testing.data --solutions post.testing.solutions.txt --params test.wts 
edu.cmu.ml.proppr.QueryAnswerer.QueryAnswererConfiguration:
  Prover: edu.cmu.ml.proppr.prove.DprProver
Weighting Scheme: edu.cmu.ml.proppr.learn.tools.ReLUWeightingScheme

 INFO [QueryAnswerer] Running queries from test_testing.data; saving results to post.testing.solutions.txt
 INFO [QueryAnswerer] Querying: predict(pb,-1)  #v:[?].
 INFO [QueryAnswerer] Writing 2 solutions...
 INFO [QueryAnswerer] Querying: predict(yc,-1)  #v:[?].
 INFO [QueryAnswerer] Writing 2 solutions...
 INFO [QueryAnswerer] Querying: predict(rb2,-1)  #v:[?].
 INFO [QueryAnswerer] Writing 2 solutions...
 INFO [QueryAnswerer] Querying: predict(rp,-1)  #v:[?].
 INFO [QueryAnswerer] Writing 2 solutions...
 INFO [QueryAnswerer] Querying: predict(bp,-1)  #v:[?].
 INFO [QueryAnswerer] Writing 2 solutions...
 INFO [QueryAnswerer] Querying: predict(he,-1)  #v:[?].
 INFO [QueryAnswerer] Writing 2 solutions...
 INFO [QueryAnswerer] Querying: predict(wt,-1)  #v:[?].
 INFO [QueryAnswerer] Writing 2 solutions...

12. Measure trained performance:

$ python scripts/answermetrics.py --data test_testing.data --answers post.testing.solutions.txt --metric mrr --metric recall
==============================================================================
metric mrr (Mean Reciprocal Rank): averages 1/rank for all positive answers
. micro: 1.0
. macro: 1.0
. details:
. .  predict(he,-1)  #v:[?]. 	1.0
. .  predict(pb,-1)  #v:[?]. 	1.0
. .  predict(yc,-1)  #v:[?]. 	1.0
. .  predict(bp,-1)  #v:[?]. 	1.0
. .  predict(rb2,-1)  #v:[?]. 	1.0
. .  predict(wt,-1)  #v:[?]. 	1.0
. .  predict(rp,-1)  #v:[?]. 	1.0
==============================================================================
metric recall (Recall): fraction of positive examples that are proposed as solutions anywhere in the ranking
. micro: 1.0
. macro: 1.0
. details:
. .  predict(he,-1)  #v:[?]. 	1.0
. .  predict(pb,-1)  #v:[?]. 	1.0
. .  predict(yc,-1)  #v:[?]. 	1.0
. .  predict(bp,-1)  #v:[?]. 	1.0
. .  predict(rb2,-1)  #v:[?]. 	1.0
. .  predict(wt,-1)  #v:[?]. 	1.0
. .  predict(rp,-1)  #v:[?]. 	1.0


==============================================
ProPPR: PROGRAMMING WITH PERSONALIZED PAGERANK
==============================================
This is a Java package for using graph walk algorithms to perform inference tasks over local groundings of first-order logic programs. The package makes use of parallelization to substantially speed processing, making it practical even for large databases.

Contents:
1. Build
2. Run
   2.0. Overview of Java main() classes
        2.0.0. Grounder: Construct a proof graph for each query
	2.0.1. Trainer: Train feature weights on the proof graphs
	2.0.2. QueryAnswerer: Generate [un]trained ranked candidate solutions for queries
   2.1. Utilities
   	2.1.0. compiler.py: Convert ProPPR rulefiles (.ppr) to WAM instructions (.wam)
        2.1.1. answermetrics.py: Measure performance
	2.1.2. sparseGraphTools: Construct memory- and CPU-efficient ProPPR databases
3. Use
   3.0. Developing a ProPPR program and database
   3.1. Typical workflow for experiments

1. BUILD
========

ProPPR $ ant clean build


2. RUN
======

For all run phases, control logging output using conf/log4j.properties.


2.0. RUN: JAVA MAIN CLASSES
===========================

edu.cmu.ml.proppr.Grounder
edu.cmu.ml.proppr.Trainer
edu.cmu.ml.proppr.QueryAnswerer


2.0.0. RUN: MAIN CLASSES: GROUNDER
==================================

$ java -cp conf:bin:lib/* edu.cmu.ml.proppr.Grounder --queries \
        inputFile --grounded outputFile.grounded --programFiles \
        file.wam:file.cfacts:file.graph [--ternaryIndex true|false] \
        [--threads integer] [--prover ppr[:depth] | \
        dpr[:eps[:alph[:strat]]] | tr[:depth] ]

Grounder will read the list of queries from inputFile, the WAM program
from file.wam, and the various database plugin files file.cfacts and
file.graph, and produce the proof graph for each query in
outputFile.grounded.

Optional parameters: 

 * If your database contains facts of arity 3 or more, use
`--ternaryIndex true` to spend some memory and increase the speed of
lookups.

 * If you are on a multi-core machine, set --threads up to (#cores-2)
to ground queries in parallel (one thread is used as the controller,
one for writing output, and the others are worker threads).

 * The default prover is dpr:1e-4:0.1, which will fail in graphs with
a maximum out degree >10. Reduce alpha to 1/(max out degree) to suit
your dataset.


2.0.1. RUN: MAIN CLASSES: TRAINER
====================================

$ java -cp conf:bin:lib/* edu.cmu.ml.proppr.Trainer --train inputFile.grounded
        --params outputFile [--threads integer] [--epochs integer]
        [--traceLosses] [--force] [--weightingScheme linear | sigmoid
        | tanh | ReLU | exp]

Trainer will read the grounded proof graphs from inputFile.grounded, then perform stochastic gradient descent to optimize the weights assigned to the edge labels, storing the resulting parameter vector in outputFile.

Optional arguments:

 * If you are on a multicore machine, specify --threads up to (#cores-2) and ProPPR will process examples in parallel (1 controller thread, 1 thread for managing output, N worker threads). Training is threadsafe, but currently (fall 2014) programs with a small number of non-db features (or a large number of db lookups) may experience reduced parallelization speedup due to resource contention.

* Increase or decrease the number of training iterations using --epochs.

* Turn on --traceLosses to view a readout of log loss and regularization loss every epoch.

* Turn on --force to use different settings for training than were used for grounding (not recommended unless you know what you're doing)

* Set --weightingScheme to your desired wrapper function, controlling how the weight of an edge is computed from its features.


2.2. RUN: UTILITIES
===================

2.2.0. RUN: UTILITIES: QUERYANSWERER
====================================

If you want to use a program to answer a series of queries, you can
use the QueryAnswerer class.  If you are running this step you should
already have a compiled program and a file containing a list of
queries, one per line.  Each query is a single goal.

ProPPR $ cat testcases/family.queries
sim(william,X)
sim(rachel,X)

ProPPR $ java edu.cmu.ml.proppr.QueryAnswerer \
  --programFiles testcases/family.cfacts:testcases/family.crules \
  --queries testcases/family.queries --output answers.txt
 INFO [Component] Loading from file 'testcases/family.cfacts' with alpha=0.0 ...
 INFO [Component] Loading from file 'testcases/family.crules' with alpha=0.0 ...

ProPPR $ cat answers.txt
# proved	sim(william,-1)	47 msec
1	0.8838968104504825	-1=c[william]
2	0.035512510088781264	-1=c[lottie]
3	0.035512510088781264	-1=c[rachel]
4	0.035512510088781264	-1=c[sarah]
5	0.002391414820793351	-1=c[poppy]
6	0.0017935611155950133	-1=c[lucas]
7	0.0017935611155950133	-1=c[charlotte]
8	0.0017935611155950133	-1=c[caroline]
9	0.0017935611155950133	-1=c[elizabeth]
# proved	sim(rachel,-1)	18 msec
1	0.9094251636624519	-1=c[rachel]
2	0.0452874181687741	-1=c[caroline]
3	0.0452874181687741	-1=c[elizabeth]


2.2.1. RUN: UTILITIES: PROMPT
=============================

An interactive prompt can be useful while debugging logic program issues, because you can examine a single query in detail. If you are running this step you should already have a compiled program.

Starting up the prompt:

"""
ProPPR $ java -cp conf/:bin/:lib/* edu.cmu.ml.proppr.prove.Prompt --programFiles ${PROGRAMFILES%:}
Starting up beanshell...
prv set: edu.cmu.ml.proppr.prove.TracingDfsProver@57fdc2d
 INFO [Component] Loading from file 'kbp_prototype/doc.crules' with alpha=0.0 ...
 INFO [Component] Loading from file 'kbp_prototype/kb.cfacts' with alpha=0.0 ...
 INFO [Component] Loading from file 'kbp_prototype/lp_predicate_SF_ENG_001-50doc.graph' with alpha=0.0 ...
lp set: edu.cmu.ml.proppr.prove.LogicProgram@2225a091

Type 'help();' for help, 'quit();' to quit; 'list();' for a variable listing.

BeanShell 2.0b4 - by Pat Niemeyer (pat@pat.net)
bsh % 
"""

When it starts up, Prompt instantiates the logic program from the command line as 'lp', and a default prover which prints a depth-first-search-style proof of a query (default maximum depth is 5). You can specify a different prover on the command line if you wish. For information on built-in commands and interpreter syntax, type 'help();':

"""
bsh % help();
This is a beanshell, a command-line interpreter for java. A full beanshell manual is available at <http://www.beanshell.org/manual/contents.html>.

Type java statements and expressions at the prompt. Don't forget semicolons.

Type 'help();' for help, 'quit();' to quit; 'list();' for a variable listing.

'show();' will toggle automatic printing of the results of expressions. Otherwise you must use 'print( expr );' to see results.

'javap( x );' will list the fields and methods available on an object. Be warned; beanshell has trouble locating methods that are only defined on the superclass.

'[sol = ]run(prover,logicprogram,"functor(arg,arg,...,arg)")' will prove the associated state.

'pretty(sol)' will list solutions first, then intermediate states in descending weight order.

bsh %
"""
 

3. FILE FORMATS
===============

****** File format: *.rules

Example:
predict(X,Y) :- hasWord(X,W),isLabel(Y),related(W,Y)  #r.
related(W,Y) :- # w(W,Y).

Grammar:
    line= rhs ':-' lhs ('#' featureList)? '.'
    rhs= goal
    lhs=
      |= goal (',' goal)*
    featureList=
              |= goal (',' goal)*
    goal= functor
       |= functor '(' argList ')'
    argList= constantArgList
          |= variableArgList
          |= constantArgList ',' variableArgList
    constantArgList= constantArg (',' constantArg)*
    variableArgList= variableArg (',' variableArg)*
    constantArg= [a-z][a-zA-Z0-9]*
    variableArg= [A-Z][a-zA-Z0-9]*
    functor= [a-z][a-zA-Z0-9]*

****** File format: *.facts

Example:
isLabel(pos)
isLabel(neg)

Grammar:
    line= goal

****** File format: *.graph

Example:
hasWord bk      punk
hasWord bk      queen
hasWord bk      barbie
hasWord bk      and
hasWord bk      ken
hasWord rb      a
hasWord rb      little
hasWord rb      red
hasWord rb      bike
hasWord mv      a
hasWord mv      big
hasWord mv      7-seater
hasWord mv      minivan
hasWord mv      with
hasWord mv      an
hasWord mv      automatic
hasWord mv      transmission
hasWord hs      a
hasWord hs      big
hasWord hs      house
hasWord hs      in
hasWord hs      the
hasWord hs      suburbs
hasWord hs      with
hasWord hs      crushing
hasWord hs      mortgage

Grammar:
    line= edge '\t' sourcenode '\t' destnode
    edge= functor
    sourcenode,destnode= constantArg

****** File format: *.data

Example:
predict(bk,Y)   -predict(bk,neg)        +predict(bk,pos)
predict(rb,Y)   -predict(rb,neg)        +predict(rb,pos)
predict(mv,Y)   +predict(mv,neg)        -predict(mv,pos)
predict(hs,Y)   +predict(hs,neg)        -predict(hs,pos)

Grammar:
    line= query '\t' exampleList
    query= goal
    exampleList= example ('\t' example)*
    example= positiveExample
          |= negativeExample
    positiveExample= '+' goal
    negativeExample= '-' goal