Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Working smf #151

Open
wants to merge 18 commits into
base: master
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
3243d24
Tentative working SMF
DanielTakeshi Feb 16, 2017
53051bc
Renamed file
DanielTakeshi Feb 16, 2017
bb5ebe3
I think the MH test is working with SMF w/random walk proposer.
DanielTakeshi Mar 8, 2017
af060e0
changed *@ to ddot since doubles are more precise than floats
DanielTakeshi Mar 8, 2017
9c22543
Slight updates, mostly debugging.
DanielTakeshi Mar 8, 2017
bbbeba8
Really confused. Better ask John. =(
DanielTakeshi Mar 9, 2017
1d1bb21
Wow, I think I finally get SMF ... well, the main idea.
DanielTakeshi Mar 15, 2017
8bd3018
More documentation to myself. Not ready for integration into master
DanielTakeshi Mar 15, 2017
0ab7a3e
Updated MH Test, I think ADAGrad and SMF work now, but ...
DanielTakeshi Mar 18, 2017
b1178e4
OK the memory allocation stuff is fine, not really worried now. And I…
DanielTakeshi Mar 18, 2017
ec31c76
OK enough debugprints. Now figure out what to do for thep aper
DanielTakeshi Mar 18, 2017
ae2472a
OK this should be the style of script to look for different values.
DanielTakeshi Mar 18, 2017
9bb1b30
Let's try this to run these in batch mode.
DanielTakeshi Mar 19, 2017
888d677
more slight script updates
DanielTakeshi Mar 21, 2017
0701ec6
fixed logu --> psi
DanielTakeshi Mar 22, 2017
b0f0aeb
Tried my way with energy function and momentum, wasn't working. =(
DanielTakeshi Mar 22, 2017
54cbc2c
Well, got MALA but its not really working ..
DanielTakeshi Mar 23, 2017
685822d
Fixed bug, 1/(4*tau). Better to just use sigma which is what I do.
DanielTakeshi Mar 23, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Slight updates, mostly debugging.
My confusion before was about how the model matrices were still updating even though I wasn't accepting anything in the updater. It turns out that the SMF code will update it in the mupdate method. Ugh ...
  • Loading branch information
DanielTakeshi committed Mar 8, 2017
commit 9c225433b44d8f57ab910cd9cf79b72d3f9a9a4f
29 changes: 25 additions & 4 deletions scripts/daniel_smf_netflix_mhtest.ssc
Original file line number Diff line number Diff line change
@@ -3,7 +3,28 @@ import BIDMach.models.SMF

/**
* Test SMF code on netflix data. This will use OUR MHTest updater, which I put
* in as a new updater (SMF.learner2) to make this script more concise.
* in as a new updater (SMF.learner2) to make this script more concise. Some
* notes on the netflix dataset:
*
* size(a) = (17770,480189)
* a.nnz = 90430138
* min=0, max=5
*
* (a == 1).nnz = 4156151
* (a == 2).nnz = 9120198
* (a == 3).nnz = 25928920
* (a == 4).nnz = 30375037
* (a == 5).nnz = 20849832
* mean (of nonzeros) = 3.6042476
* sqrt((diff ddot diff) / diff.nn) = 1.0852 // Train RMSE using mean predictor
*
* (ta == 1).nnz = 461839
* (ta == 2).nnz = 1011882
* (ta == 3).nnz = 2882327
* (ta == 4).nnz = 3375921
* (ta == 5).nnz = 2318400
* mean (of nonzeros) = 3.6046705
* sqrt((diff ddot diff) / diff.nn) = 1.0851 // Test RMSE using mean predictor
*/

// Get random seed set up.
@@ -40,10 +61,10 @@ opts.matrixOfScores = true
// Daniel Seita: actually, a batch size of 2000 means we may get 100k "elements"
// due to the sparsity. So I'm thinking we stick to batch sizes of 1000 or less.
opts.batchSize = 1000
opts.npasses = 2
opts.uiter = 5
opts.urate = 0.05f
opts.lrate = 0.05f
opts.npasses = 3
val lambda = 4f
opts.lambdau = lambda
opts.regumean = lambda
@@ -57,7 +78,7 @@ nn.train

val model = nn.model.asInstanceOf[SMF]
val xa = (ta != 0)
val (mm, mopts) = SMF.predictor1(model, a, xa)
val (mm, mopts) = SMF.predictor1(model, a, xa) // Provide `a` or `ta` as input?
mopts.batchSize = 10000
mopts.uiter = 5
mopts.urate = opts.urate
@@ -68,5 +89,5 @@ val pa = SMat(mm.preds(1));
min(pa.contents,5,pa.contents)
max(pa.contents,1,pa.contents)
val diff = ta.contents - pa.contents
val rmse = sqrt((diff ^* diff) / diff.length)
val rmse = sqrt((diff ddot diff) / diff.length)
println("rmse = %f" format rmse.v)
19 changes: 12 additions & 7 deletions src/main/scala/BIDMach/models/SMF.scala
Original file line number Diff line number Diff line change
@@ -241,12 +241,10 @@ class SMF(override val opts:SMF.Opts = new SMF.Options) extends FactorModel(opts
* if we're assuming a Gaussian error distribution.
*
* Note: it looks scary to subtract iavg+avg from sdata0, but we don't add
* that to preds so we can still directly compare sdata and preds. I'll leave
* it here since John may have had a reason or doing that.
* that to preds so we can still directly compare sdata and preds.
*/
def evalfun(sdata0:Mat, user:Mat, ipass:Int, pos:Long):FMat = {
val sdata = sdata0 - (iavg + avg);
val preds = DDS(mm, user, sdata);
def evalfun(sdata:Mat, user:Mat, ipass:Int, pos:Long):FMat = {
val preds = DDS(mm, user, sdata) + (iavg + avg);
if (ogmats != null) {
ogmats(0) = user;
if (ogmats.length > 1) {
@@ -255,11 +253,17 @@ class SMF(override val opts:SMF.Opts = new SMF.Options) extends FactorModel(opts
}
val dc = sdata.contents
val pc = preds.contents
val diff = dc - pc;
val diff = DMat(dc - pc);
if (opts.matrixOfScores) {
// TODO Temporary but should be OK for now (b/c we almost never increment MB).
val sigma_sq = variance(diff).dv
-(1.0f/(2*sigma_sq)).v * (diff ddot diff)

//println("evalfun, sdata.contents.length = " +dc.length)
//println("mean of squared diffs = " +(diff ddot diff)/diff.length)
//println("sigma_sq = " +sigma_sq)
//println("result = " +mean(-(1.0f/(2*sigma_sq)).v * FMat(diff *@ diff)))

-(1.0f/(2*sigma_sq)).v * FMat(diff *@ diff)
} else {
val vv = diff ddot diff;
-sqrt(row(vv/sdata.nnz))
@@ -284,6 +288,7 @@ class SMF(override val opts:SMF.Opts = new SMF.Options) extends FactorModel(opts
ogmats(1) = xpreds;
}
}
println("TESTING evalfun, spreds.nnz="+spreds.nnz+", xpreds.nnz="+xpreds.nnz)
preds.contents <-- xpreds.contents;
-sqrt(row(vv/sdata.nnz))
}
3 changes: 2 additions & 1 deletion src/main/scala/BIDMach/updaters/MHTest.scala
Original file line number Diff line number Diff line change
@@ -79,6 +79,7 @@ class MHTest(override val opts:MHTest.Opts = new MHTest.Options) extends Updater
* Note that the file for the norm2logdata should be in the correct directory.
*/
override def init(model0:Model) = {
setseed(1)
model = model0;
modelmats = model.modelmats
updatemats = model.updatemats
@@ -218,7 +219,7 @@ class MHTest(override val opts:MHTest.Opts = new MHTest.Options) extends Updater
modelmats(i) <-- tmpTheta(i) // Now modelmats back to old theta.
}
}
if (newMinibatch && accept) afterEachMinibatch()
if (newMinibatch) afterEachMinibatch()
}