bgdorsal: added a td q-learning reference sim -- tracks nicely with t…

…he model.
emer · Jan 16, 2025 · bd58c0c · bd58c0c
1 parent 61a988a
commit bd58c0c
Show file tree

Hide file tree

Showing 13 changed files with 822 additions and 58 deletions.
diff --git a/axon/layerparams.go b/axon/layerparams.go
@@ -76,7 +76,7 @@ type LayerParams struct {
 	Type LayerTypes
 
 	// Index of this layer in [Layers] list.
-	Index uint32 `display:"-"`
+	Index uint32 `edit:"-"`
 
 	// MaxData is the maximum number of data parallel elements.
 	MaxData uint32 `display:"-"`

diff --git a/axon/pathparams.go b/axon/pathparams.go
@@ -118,7 +118,7 @@ type PathParams struct {
 	Type PathTypes
 
 	// Index is the index of the pathway in global path list: [Layer][SendPaths]
-	Index uint32
+	Index uint32 `edit:"-"`
 
 	pad, pad1 int32
 

diff --git a/sims/bgdorsal/README.md b/sims/bgdorsal/README.md
@@ -18,6 +18,8 @@ Once the dust settles, a summary of the biology and implementation in the model
 
 # Results
 
+Key overall point about performance: it is a very "self organizing" kind of learning and thus susceptible to random effects. Must run large numbers of runs (now 50) and even then there is considerable variation in performance. For example, the original 2024-04-05 can fail 4/25 times vs. 1/25 with same random seed.
+
 As of 2024-02-29, the default parameters with 49 units (7x7) per layer result in:
 
 * 22/25 learn on SeqLen=4, NActions=5, which has 5^4 = 625 total space to be searched
@@ -30,7 +32,8 @@ The learned weights to the BG clearly show that it is disinhibiting the appropri
 
 # TODO:
 
-* Set number of cycles per trial in terms of BG motor gating timing: constant offset from onset of VM gating timing, with a cutoff for "nothing happening" trials. This is likely especially important with new linear approx to SynCa learning, which has degraded learning at 300 cycles, which had been significantly better.
+* Set number of cycles per trial in terms of BG motor gating timing: constant offset from onset of VM gating timing, with a cutoff for "nothing happening" trials.
+    * Attempted an impl but is difficult with CaBins -- tried shifting bins but awk..
 
 * "CL" not beneficial (implemented as direct MotorBS -> Matrix pathways): rel weight of 0.002 is OK but starts to actually impair above that.  Likely that a functional cerebellum is needed to make this useful.  Also, investigate other modulatory inputs to CL that might alter its signal.  Key ref for diffs between CL and PF: LaceyBolamMagill07: C. J. Lacey, J. P. Bolam, P. J. Magill, Novel and distinct operational principles of intralaminar thalamic neurons and their striatal pathways. J. Neurosci. 27, 4374–4384 (2007).
 
@@ -40,6 +43,12 @@ The learned weights to the BG clearly show that it is disinhibiting the appropri
 
 # Param search notes
 
+## 01/14/2025: after fixes
+
+* PF looks weaker in new vs. old; unclear why. Fixed PF weights not helpful (but not too bad either).
+
+* most runs go up in performance but then often come back down: added an asymmetric learning rate for rewpred to try to deal with this -- works well!  `RewPredLRateUp` = 0.5 > 1.  Went from 11 fail to 3 fail on 3x8.
+
 ## 01/14/2025: Hebbian learning in MotorBS, with STN SKCa = 80 instead of 150, State layer size.
 
 In general performance since switching to linear SynCa approximation has been significantly worse. Is it the Ca or something else that changed??

diff --git a/sims/bgdorsal/bg-dorsal.go b/sims/bgdorsal/bg-dorsal.go
@@ -221,15 +221,16 @@ func (ss *Sim) ConfigNet(net *axon.Network) {
 	ev := ss.Envs.ByModeDi(Train, 0).(*MotorSeqEnv)
 
 	np := 1
+	nu := ss.Config.Params.NUnits
 	nuPer := ev.NUnitsPer
 	nAct := ev.NActions
 	nSeq := ev.SeqLen
 	maxSeqAct := max(nAct, nSeq) // layer size
 
-	nuX := 6
-	nuY := 6
-	nuCtxY := 6
-	nuCtxX := 6
+	nuX := nu
+	nuY := nu
+	nuCtxY := nu
+	nuCtxX := nu
 	space := float32(2)
 
 	p1to1 := paths.NewPoolOneToOne()

diff --git a/sims/bgdorsal/config.go b/sims/bgdorsal/config.go
@@ -28,15 +28,8 @@ type EnvConfig struct {
 // ParamConfig has config parameters related to sim params.
 type ParamConfig struct {
 
-	// Tweak means to perform automated parameter tweaking for
-	// parameters marked Hypers Tweak = log,incr, or [vals].
-	Tweak bool
-
-	// Baseline for Tweak, if true, first run a baseline with current default params.
-	Baseline bool
-
-	// DryRun for Tweak, if true, only print what would be done, don't run.
-	DryRun bool
+	// NUnits is the number of units per X,Y dim, for cortex and BG.
+	NUnits int `default:"6"`
 
 	// Script is an interpreted script that is run to set parameters in Layer and Path
 	// sheets, by default using the "Script" set name.
@@ -100,13 +93,13 @@ type RunConfig struct {
 	Sequences int `default:"128"`
 
 	// Cycles is the total number of cycles per trial: at least 200.
-	Cycles int `default:"200"`
+	Cycles int `default:"300"`
 
 	// PlusCycles is the total number of plus-phase cycles per trial. For Cycles=300, use 100.
 	PlusCycles int `default:"50"`
 
 	// CaBinCycles is the number of cycles per CaBin: how fine-grained the synaptic Ca is.
-	CaBinCycles int `default:"25"`
+	CaBinCycles int `default:"10"`
 }
 
 // LogConfig has config parameters related to logging data.

diff --git a/sims/bgdorsal/mseq-env.go b/sims/bgdorsal/mseq-env.go
@@ -34,14 +34,17 @@ type MotorSeqEnv struct {
 	// number of distinct actions represented: determines the difficulty
 	// of learning in terms of the size of the space that must be searched.
 	// effective size = NActions ^ SeqLen
-	// 4 ^ 3 = 64 or 7 ^2 = 49 are reliably solved
+	// 4^4 = 256 or 10^3 = 1000 are reliably solved
 	NActions int
 
 	// learning rate for reward prediction
-	RewPredLRate float32
+	RewPredLRate float32 `default:"0.01"`
+
+	// additional learning rate factor for going up vs. down -- going up slower is better?
+	RewPredLRateUp float32 `default:"0.5"` // 0.5 > 0.8 > 0.2 > 1
 
 	// minimum rewpred value
-	RewPredMin float32
+	RewPredMin float32 `default:"0.1"`
 
 	//	give reward with probability in proportion to number of
 	// correct actions in sequence, above given threshold.  If 0, don't use
@@ -104,7 +107,8 @@ func (ev *MotorSeqEnv) Defaults() {
 	ev.PartialCreditAt = 1  // 1 default: critical for seq len = 3
 	ev.PartialGraded = true // key for seq 3
 	ev.RewPredLRate = 0.01  // GPU 16 0.01 > 0.02 >> 0.05 > 0.1, 0.2 for partial, seq3
-	ev.RewPredMin = 0.1     // 0.1 > 0.05 > 0.2
+	ev.RewPredLRateUp = 1
+	ev.RewPredMin = 0.1 // 0.1 > 0.05 > 0.2
 	ev.NUnitsPer = 5
 	ev.NUnits = ev.NUnitsPer * ev.NActions
 }
@@ -246,7 +250,11 @@ func (ev *MotorSeqEnv) ComputeReward() {
 		}
 	}
 	ev.RPE = ev.Rew - ev.RewPred
-	ev.RewPred += ev.RewPredLRate * (ev.Rew - ev.RewPred)
+	if ev.RPE > 0 {
+		ev.RewPred += ev.RewPredLRateUp * ev.RewPredLRate * ev.RPE
+	} else {
+		ev.RewPred += ev.RewPredLRate * ev.RPE
+	}
 	if ev.RewPred < ev.RewPredMin {
 		ev.RewPred = ev.RewPredMin
 	}

diff --git a/sims/bgdorsal/params.go b/sims/bgdorsal/params.go
@@ -15,14 +15,14 @@ var LayerParams = axon.LayerSheets{
 				ly.Acts.Noise.On.SetBool(true)
 				ly.Acts.Noise.Ge = 0.0001                    // 0.0001 > others; could just be noise ;)
 				ly.Acts.Noise.Gi = 0.0001                    // 0.0001 perhaps better than others
-				ly.Learn.RLRate.SigmoidLinear.SetBool(false) // orig = true
+				ly.Learn.RLRate.SigmoidLinear.SetBool(false) // false >> true; orig = true
 			}},
 		{Sel: ".PFCLayer", Doc: "pfc",
 			Set: func(ly *axon.LayerParams) {
-				ly.Learn.NeuroMod.DAMod = axon.NoDAMod // D1Mod
-				ly.Learn.NeuroMod.DAModGain = 0.005    // 0.005 > higher
-				ly.Learn.NeuroMod.DipGain = 0          // 0 > higher
-				ly.Learn.RLRate.SigmoidLinear.SetBool(false)
+				ly.Learn.NeuroMod.DAMod = axon.NoDAMod       // NoDAMod > D1Mod
+				ly.Learn.NeuroMod.DAModGain = 0.005          // 0.005 > higher
+				ly.Learn.NeuroMod.DipGain = 0                // 0 > higher
+				ly.Learn.RLRate.SigmoidLinear.SetBool(false) // false >> true; orig = true
 			}},
 		{Sel: ".MatrixLayer", Doc: "all mtx",
 			Set: func(ly *axon.LayerParams) {
@@ -37,7 +37,7 @@ var LayerParams = axon.LayerSheets{
 				ly.Acts.Init.GeBase = 0.1
 				ly.Acts.Kir.Gbar = 10           // 10 > 5  > 2 -- key for pause
 				ly.Acts.SKCa.Gbar = 2           // 2 > 5 >> 1 (for Kir = 10)
-				ly.Acts.SKCa.CaRDecayTau = 150  // was 80 -- key diff!
+				ly.Acts.SKCa.CaRDecayTau = 150  // 150 > 180 > 200 > 130 >> 80 def -- key param!
 				ly.Inhib.Layer.On.SetBool(true) // actually needs this
 				ly.Inhib.Layer.Gi = 0.5
 				ly.Learn.NeuroMod.AChDisInhib = 0
@@ -59,37 +59,30 @@ var LayerParams = axon.LayerSheets{
 			Set: func(ly *axon.LayerParams) {
 				ly.Inhib.Layer.Gi = 0.8 // 0.8 def
 				ly.CT.GeGain = 0.05     // 0.05 def
-				ly.CT.DecayTau = 50     // was 100 -- 50 in orig -- OFCposPT ??
+				ly.CT.DecayTau = 100    // was 100 -- 50 in orig -- OFCposPT ??
 			}},
 		{Sel: ".CTLayer", Doc: "",
 			Set: func(ly *axon.LayerParams) {
 				ly.Inhib.Layer.Gi = 1.4 // 0.8 def
 				ly.CT.GeGain = 5        // 2 def
-				ly.CT.DecayTau = 50     // was 100 -- 50 in orig -- OFCposPT ??
+				ly.CT.DecayTau = 100    // was 100 -- 50 in orig -- OFCposPT ??
 			}},
 		{Sel: "#MotorBS", Doc: "",
 			Set: func(ly *axon.LayerParams) {
-				ly.Learn.NeuroMod.DAMod = axon.NoDAMod // D1Mod not beneficial here
-				ly.Learn.NeuroMod.DAModGain = 0.01     // up to 0.04 good
-				ly.Learn.NeuroMod.DipGain = 0.1        // 0.1 > 0 > 0.2
 				ly.Inhib.Layer.On.SetBool(true)
 				ly.Inhib.Pool.On.SetBool(false)
 				ly.Inhib.Layer.Gi = 0.2 // 0.2 def
 				ly.Acts.Clamp.Ge = 2    // 2 > 1.5, >> 1 -- absolutely critical given GPi inhib
+				// ly.Learn.RLRate.Diff.SetBool(false) // true > false
+				// ly.Learn.RLRate.SigmoidLinear.SetBool(false) // false >> true; orig = true
+				// ly.Learn.RLRate.SigmoidMin = 0.05 // 0.05 def > 0.1 > 0.2 > 0.02
 			}},
 		// {Sel: "#M1", Doc: "",
 		// 	Set: func(ly *axon.LayerParams) {
 		// 		ly.Learn.NeuroMod.DAMod = axon.D1Mod // not good here.
 		// 		ly.Learn.NeuroMod.DAModGain = 0.03   // up to 0.04 good
 		// 		ly.Learn.NeuroMod.DipGain = 0.1      // 0.1 > 0 > 0.2
 		// 	}},
-		// {Sel: "#VL", Doc: "",
-		// 	Set: func(ly *axon.LayerParams) {
-		// 		// not obviously beneficial here
-		// 		// ly.Learn.NeuroMod.DAMod = axon.D1Mod
-		// 		// ly.Learn.NeuroMod.DAModGain = 0.02
-		// 		// ly.Learn.NeuroMod.DipGain = 0 // 0 > higher
-		// 	}},
 		{Sel: "#DGPeAk", Doc: "arkypallidal",
 			Set: func(ly *axon.LayerParams) {
 				ly.Acts.Init.GeBase = 0.2 // 0.2 > 0.3, 0.1
@@ -110,8 +103,8 @@ var PathParams = axon.PathSheets{
 	"Base": {
 		{Sel: "Path", Doc: "",
 			Set: func(pt *axon.PathParams) {
-				pt.Learn.LRate.Base = 0.04   // 0.04 def -- works best
-				pt.Learn.DWt.CaPScale = 0.95 // normal default for most cases
+				pt.Learn.LRate.Base = 0.04   // 0.04 > 0.03
+				pt.Learn.DWt.CaPScale = 0.95 // 0.95 > 1 in cur
 				pt.Learn.DWt.Tau = 1         // 1 > 2
 			}},
 		{Sel: ".CTtoPred", Doc: "",
@@ -144,18 +137,24 @@ var PathParams = axon.PathSheets{
 			Set: func(pt *axon.PathParams) {
 				pt.PathScale.Abs = 3.0 // 3
 			}},
+		// {Sel: ".FmState", Doc: "",
+		// 	Set: func(pt *axon.PathParams) {
+		// 		pt.PathScale.Rel = 0.5 // abs, rel < 1 worse
+		// 	}},
 		{Sel: ".ToM1", Doc: "",
 			Set: func(pt *axon.PathParams) {
 				pt.PathScale.Abs = 1.5     // now 1.5 > 2 > 1 ..
 				pt.Learn.LRate.Base = 0.04 // 0.04 > 0.02
 			}},
 		{Sel: ".ToMotor", Doc: "all paths to MotorBS and VL",
 			Set: func(pt *axon.PathParams) {
-				pt.Learn.LRate.Base = 0.02 // 0.02 > 0.04 > 0.01
+				pt.Learn.LRate.Base = 0.02 // 0.02 > 0.04 > 0.01 -- still key
+				// note: MotorBS is a target, key for learning; SWts not used.
 			}},
 		{Sel: ".VLM1", Doc: "",
 			Set: func(pt *axon.PathParams) {
-				pt.Learn.LRate.Base = 0.02 // 0.02 > 0.04 > 0.01
+				pt.Learn.LRate.Base = 0.02 // 0.02 > 0.04 > 0.01 -- still key
+				// note: VL is a target layer; SWts not used.
 			}},
 		{Sel: "#StateToM1", Doc: "",
 			Set: func(pt *axon.PathParams) {
@@ -165,6 +164,11 @@ var PathParams = axon.PathSheets{
 			Set: func(pt *axon.PathParams) {
 				pt.PathScale.Abs = 1       // 1 > 1.1 > 0.9 >> 0.5
 				pt.Learn.LRate.Base = 0.04 // 0.04 > 0.02
+				// fixed is not better:
+				// pt.Learn.Learn.SetBool(false)
+				// pt.SWts.Init.SPct = 0
+				// pt.SWts.Init.Mean = 0.8
+				// pt.SWts.Init.Var = 0.0
 			}},
 		{Sel: "#DGPiToM1VM", Doc: "final inhibition",
 			Set: func(pt *axon.PathParams) {
@@ -196,11 +200,11 @@ var PathParams = axon.PathSheets{
 				pt.PathScale.Abs = 1   // 1
 				pt.PathScale.Rel = 0.1 // 0.1 > 0.2, .05, 0
 			}},
-		{Sel: "#M1PTToM1PT", Doc: "",
+		{Sel: "#M1PTToM1PT", Doc: "self path",
 			Set: func(pt *axon.PathParams) {
-				pt.Learn.LRate.Base = 0.0001 // 0.0001 in orig
+				pt.Learn.LRate.Base = 0.0001 // 0.0001 > .04 but not a major diff
 			}},
-		// {Sel: "#M1PTpToMotorBS", Doc: "",
+		// {Sel: "#M1PTpToMotorBS", Doc: "not used",
 		// 	Set: func(pt *axon.PathParams) {
 		// 		pt.PathScale.Abs = 2
 		// 		pt.PathScale.Rel = 1

diff --git a/sims/bgdorsal/td/README.md b/sims/bgdorsal/td/README.md
@@ -0,0 +1,37 @@
+# TD 
+
+TD provides a simple TD Q learning solution to the simple motor sequence learning problem, to get a sense of its overall learning complexity.
+
+It is a very simple problem so it is expected to be solved easily: the question is how fast it ends up being learned, given the amount of state space exploration that is required.
+
+The trial / epoch setup is the same as the bgdorsal sim, so epochs are comparable units (128 trials per epoch).
+
+# Results
+
+## Seq4x6
+
+* Epsilon Min = 0.01 seems pretty optimal -- any higher and it fails to converge
+* Epsilon Decay = 0.2 or 0.5 work well -- helps converge more quickly.
+* LRate Decay = 0.0001 or 0.00001 seem best: variance vs speed tradeoff overall
+
+```
+./td -runs 50 -epochs 1000 -env-seq-len 4 -env-n-actions 6 -td-l-rate-decay 0.0001 -td-epsilon-decay 0.5 -td-epsilon-min 0.01
+```
+
+The fastest times here are about 10 epochs, and you also get values in the 400+ epochs as well. 
+
+## Seq3x10
+
+Similar param issues, and in general get the same performance with less variability overall:
+
+```
+./td -runs 50 -epochs 1000 -env-seq-len 3 -env-n-actions 10 -td-l-rate-decay 0.0001 -td-epsilon-decay 0.5 -td-epsilon-min 0.01
+```
+
+## Exploring larger spaces
+
+* 4x10 is more difficult for sure: have to reduce epsilon decay to 0.1
+
+In general it seems sensibly related to the size of the space being searched.
+
+