Skip to content

Commit

Permalink
bgdorsal: added a td q-learning reference sim -- tracks nicely with t…
Browse files Browse the repository at this point in the history
…he model.
  • Loading branch information
rcoreilly committed Jan 16, 2025
1 parent 61a988a commit bd58c0c
Show file tree
Hide file tree
Showing 13 changed files with 822 additions and 58 deletions.
2 changes: 1 addition & 1 deletion axon/layerparams.go
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ type LayerParams struct {
Type LayerTypes

// Index of this layer in [Layers] list.
Index uint32 `display:"-"`
Index uint32 `edit:"-"`

// MaxData is the maximum number of data parallel elements.
MaxData uint32 `display:"-"`
Expand Down
2 changes: 1 addition & 1 deletion axon/pathparams.go
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,7 @@ type PathParams struct {
Type PathTypes

// Index is the index of the pathway in global path list: [Layer][SendPaths]
Index uint32
Index uint32 `edit:"-"`

pad, pad1 int32

Expand Down
11 changes: 10 additions & 1 deletion sims/bgdorsal/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@ Once the dust settles, a summary of the biology and implementation in the model

# Results

Key overall point about performance: it is a very "self organizing" kind of learning and thus susceptible to random effects. Must run large numbers of runs (now 50) and even then there is considerable variation in performance. For example, the original 2024-04-05 can fail 4/25 times vs. 1/25 with same random seed.

As of 2024-02-29, the default parameters with 49 units (7x7) per layer result in:

* 22/25 learn on SeqLen=4, NActions=5, which has 5^4 = 625 total space to be searched
Expand All @@ -30,7 +32,8 @@ The learned weights to the BG clearly show that it is disinhibiting the appropri

# TODO:

* Set number of cycles per trial in terms of BG motor gating timing: constant offset from onset of VM gating timing, with a cutoff for "nothing happening" trials. This is likely especially important with new linear approx to SynCa learning, which has degraded learning at 300 cycles, which had been significantly better.
* Set number of cycles per trial in terms of BG motor gating timing: constant offset from onset of VM gating timing, with a cutoff for "nothing happening" trials.
* Attempted an impl but is difficult with CaBins -- tried shifting bins but awk..

* "CL" not beneficial (implemented as direct MotorBS -> Matrix pathways): rel weight of 0.002 is OK but starts to actually impair above that. Likely that a functional cerebellum is needed to make this useful. Also, investigate other modulatory inputs to CL that might alter its signal. Key ref for diffs between CL and PF: LaceyBolamMagill07: C. J. Lacey, J. P. Bolam, P. J. Magill, Novel and distinct operational principles of intralaminar thalamic neurons and their striatal pathways. J. Neurosci. 27, 4374–4384 (2007).

Expand All @@ -40,6 +43,12 @@ The learned weights to the BG clearly show that it is disinhibiting the appropri

# Param search notes

## 01/14/2025: after fixes

* PF looks weaker in new vs. old; unclear why. Fixed PF weights not helpful (but not too bad either).

* most runs go up in performance but then often come back down: added an asymmetric learning rate for rewpred to try to deal with this -- works well! `RewPredLRateUp` = 0.5 > 1. Went from 11 fail to 3 fail on 3x8.

## 01/14/2025: Hebbian learning in MotorBS, with STN SKCa = 80 instead of 150, State layer size.

In general performance since switching to linear SynCa approximation has been significantly worse. Is it the Ca or something else that changed??
Expand Down
9 changes: 5 additions & 4 deletions sims/bgdorsal/bg-dorsal.go
Original file line number Diff line number Diff line change
Expand Up @@ -221,15 +221,16 @@ func (ss *Sim) ConfigNet(net *axon.Network) {
ev := ss.Envs.ByModeDi(Train, 0).(*MotorSeqEnv)

np := 1
nu := ss.Config.Params.NUnits
nuPer := ev.NUnitsPer
nAct := ev.NActions
nSeq := ev.SeqLen
maxSeqAct := max(nAct, nSeq) // layer size

nuX := 6
nuY := 6
nuCtxY := 6
nuCtxX := 6
nuX := nu
nuY := nu
nuCtxY := nu
nuCtxX := nu
space := float32(2)

p1to1 := paths.NewPoolOneToOne()
Expand Down
15 changes: 4 additions & 11 deletions sims/bgdorsal/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -28,15 +28,8 @@ type EnvConfig struct {
// ParamConfig has config parameters related to sim params.
type ParamConfig struct {

// Tweak means to perform automated parameter tweaking for
// parameters marked Hypers Tweak = log,incr, or [vals].
Tweak bool

// Baseline for Tweak, if true, first run a baseline with current default params.
Baseline bool

// DryRun for Tweak, if true, only print what would be done, don't run.
DryRun bool
// NUnits is the number of units per X,Y dim, for cortex and BG.
NUnits int `default:"6"`

// Script is an interpreted script that is run to set parameters in Layer and Path
// sheets, by default using the "Script" set name.
Expand Down Expand Up @@ -100,13 +93,13 @@ type RunConfig struct {
Sequences int `default:"128"`

// Cycles is the total number of cycles per trial: at least 200.
Cycles int `default:"200"`
Cycles int `default:"300"`

// PlusCycles is the total number of plus-phase cycles per trial. For Cycles=300, use 100.
PlusCycles int `default:"50"`

// CaBinCycles is the number of cycles per CaBin: how fine-grained the synaptic Ca is.
CaBinCycles int `default:"25"`
CaBinCycles int `default:"10"`
}

// LogConfig has config parameters related to logging data.
Expand Down
18 changes: 13 additions & 5 deletions sims/bgdorsal/mseq-env.go
Original file line number Diff line number Diff line change
Expand Up @@ -34,14 +34,17 @@ type MotorSeqEnv struct {
// number of distinct actions represented: determines the difficulty
// of learning in terms of the size of the space that must be searched.
// effective size = NActions ^ SeqLen
// 4 ^ 3 = 64 or 7 ^2 = 49 are reliably solved
// 4^4 = 256 or 10^3 = 1000 are reliably solved
NActions int

// learning rate for reward prediction
RewPredLRate float32
RewPredLRate float32 `default:"0.01"`

// additional learning rate factor for going up vs. down -- going up slower is better?
RewPredLRateUp float32 `default:"0.5"` // 0.5 > 0.8 > 0.2 > 1

// minimum rewpred value
RewPredMin float32
RewPredMin float32 `default:"0.1"`

// give reward with probability in proportion to number of
// correct actions in sequence, above given threshold. If 0, don't use
Expand Down Expand Up @@ -104,7 +107,8 @@ func (ev *MotorSeqEnv) Defaults() {
ev.PartialCreditAt = 1 // 1 default: critical for seq len = 3
ev.PartialGraded = true // key for seq 3
ev.RewPredLRate = 0.01 // GPU 16 0.01 > 0.02 >> 0.05 > 0.1, 0.2 for partial, seq3
ev.RewPredMin = 0.1 // 0.1 > 0.05 > 0.2
ev.RewPredLRateUp = 1
ev.RewPredMin = 0.1 // 0.1 > 0.05 > 0.2
ev.NUnitsPer = 5
ev.NUnits = ev.NUnitsPer * ev.NActions
}
Expand Down Expand Up @@ -246,7 +250,11 @@ func (ev *MotorSeqEnv) ComputeReward() {
}
}
ev.RPE = ev.Rew - ev.RewPred
ev.RewPred += ev.RewPredLRate * (ev.Rew - ev.RewPred)
if ev.RPE > 0 {
ev.RewPred += ev.RewPredLRateUp * ev.RewPredLRate * ev.RPE
} else {
ev.RewPred += ev.RewPredLRate * ev.RPE
}
if ev.RewPred < ev.RewPredMin {
ev.RewPred = ev.RewPredMin
}
Expand Down
54 changes: 29 additions & 25 deletions sims/bgdorsal/params.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,14 +15,14 @@ var LayerParams = axon.LayerSheets{
ly.Acts.Noise.On.SetBool(true)
ly.Acts.Noise.Ge = 0.0001 // 0.0001 > others; could just be noise ;)
ly.Acts.Noise.Gi = 0.0001 // 0.0001 perhaps better than others
ly.Learn.RLRate.SigmoidLinear.SetBool(false) // orig = true
ly.Learn.RLRate.SigmoidLinear.SetBool(false) // false >> true; orig = true
}},
{Sel: ".PFCLayer", Doc: "pfc",
Set: func(ly *axon.LayerParams) {
ly.Learn.NeuroMod.DAMod = axon.NoDAMod // D1Mod
ly.Learn.NeuroMod.DAModGain = 0.005 // 0.005 > higher
ly.Learn.NeuroMod.DipGain = 0 // 0 > higher
ly.Learn.RLRate.SigmoidLinear.SetBool(false)
ly.Learn.NeuroMod.DAMod = axon.NoDAMod // NoDAMod > D1Mod
ly.Learn.NeuroMod.DAModGain = 0.005 // 0.005 > higher
ly.Learn.NeuroMod.DipGain = 0 // 0 > higher
ly.Learn.RLRate.SigmoidLinear.SetBool(false) // false >> true; orig = true
}},
{Sel: ".MatrixLayer", Doc: "all mtx",
Set: func(ly *axon.LayerParams) {
Expand All @@ -37,7 +37,7 @@ var LayerParams = axon.LayerSheets{
ly.Acts.Init.GeBase = 0.1
ly.Acts.Kir.Gbar = 10 // 10 > 5 > 2 -- key for pause
ly.Acts.SKCa.Gbar = 2 // 2 > 5 >> 1 (for Kir = 10)
ly.Acts.SKCa.CaRDecayTau = 150 // was 80 -- key diff!
ly.Acts.SKCa.CaRDecayTau = 150 // 150 > 180 > 200 > 130 >> 80 def -- key param!
ly.Inhib.Layer.On.SetBool(true) // actually needs this
ly.Inhib.Layer.Gi = 0.5
ly.Learn.NeuroMod.AChDisInhib = 0
Expand All @@ -59,37 +59,30 @@ var LayerParams = axon.LayerSheets{
Set: func(ly *axon.LayerParams) {
ly.Inhib.Layer.Gi = 0.8 // 0.8 def
ly.CT.GeGain = 0.05 // 0.05 def
ly.CT.DecayTau = 50 // was 100 -- 50 in orig -- OFCposPT ??
ly.CT.DecayTau = 100 // was 100 -- 50 in orig -- OFCposPT ??
}},
{Sel: ".CTLayer", Doc: "",
Set: func(ly *axon.LayerParams) {
ly.Inhib.Layer.Gi = 1.4 // 0.8 def
ly.CT.GeGain = 5 // 2 def
ly.CT.DecayTau = 50 // was 100 -- 50 in orig -- OFCposPT ??
ly.CT.DecayTau = 100 // was 100 -- 50 in orig -- OFCposPT ??
}},
{Sel: "#MotorBS", Doc: "",
Set: func(ly *axon.LayerParams) {
ly.Learn.NeuroMod.DAMod = axon.NoDAMod // D1Mod not beneficial here
ly.Learn.NeuroMod.DAModGain = 0.01 // up to 0.04 good
ly.Learn.NeuroMod.DipGain = 0.1 // 0.1 > 0 > 0.2
ly.Inhib.Layer.On.SetBool(true)
ly.Inhib.Pool.On.SetBool(false)
ly.Inhib.Layer.Gi = 0.2 // 0.2 def
ly.Acts.Clamp.Ge = 2 // 2 > 1.5, >> 1 -- absolutely critical given GPi inhib
// ly.Learn.RLRate.Diff.SetBool(false) // true > false
// ly.Learn.RLRate.SigmoidLinear.SetBool(false) // false >> true; orig = true
// ly.Learn.RLRate.SigmoidMin = 0.05 // 0.05 def > 0.1 > 0.2 > 0.02
}},
// {Sel: "#M1", Doc: "",
// Set: func(ly *axon.LayerParams) {
// ly.Learn.NeuroMod.DAMod = axon.D1Mod // not good here.
// ly.Learn.NeuroMod.DAModGain = 0.03 // up to 0.04 good
// ly.Learn.NeuroMod.DipGain = 0.1 // 0.1 > 0 > 0.2
// }},
// {Sel: "#VL", Doc: "",
// Set: func(ly *axon.LayerParams) {
// // not obviously beneficial here
// // ly.Learn.NeuroMod.DAMod = axon.D1Mod
// // ly.Learn.NeuroMod.DAModGain = 0.02
// // ly.Learn.NeuroMod.DipGain = 0 // 0 > higher
// }},
{Sel: "#DGPeAk", Doc: "arkypallidal",
Set: func(ly *axon.LayerParams) {
ly.Acts.Init.GeBase = 0.2 // 0.2 > 0.3, 0.1
Expand All @@ -110,8 +103,8 @@ var PathParams = axon.PathSheets{
"Base": {
{Sel: "Path", Doc: "",
Set: func(pt *axon.PathParams) {
pt.Learn.LRate.Base = 0.04 // 0.04 def -- works best
pt.Learn.DWt.CaPScale = 0.95 // normal default for most cases
pt.Learn.LRate.Base = 0.04 // 0.04 > 0.03
pt.Learn.DWt.CaPScale = 0.95 // 0.95 > 1 in cur
pt.Learn.DWt.Tau = 1 // 1 > 2
}},
{Sel: ".CTtoPred", Doc: "",
Expand Down Expand Up @@ -144,18 +137,24 @@ var PathParams = axon.PathSheets{
Set: func(pt *axon.PathParams) {
pt.PathScale.Abs = 3.0 // 3
}},
// {Sel: ".FmState", Doc: "",
// Set: func(pt *axon.PathParams) {
// pt.PathScale.Rel = 0.5 // abs, rel < 1 worse
// }},
{Sel: ".ToM1", Doc: "",
Set: func(pt *axon.PathParams) {
pt.PathScale.Abs = 1.5 // now 1.5 > 2 > 1 ..
pt.Learn.LRate.Base = 0.04 // 0.04 > 0.02
}},
{Sel: ".ToMotor", Doc: "all paths to MotorBS and VL",
Set: func(pt *axon.PathParams) {
pt.Learn.LRate.Base = 0.02 // 0.02 > 0.04 > 0.01
pt.Learn.LRate.Base = 0.02 // 0.02 > 0.04 > 0.01 -- still key
// note: MotorBS is a target, key for learning; SWts not used.
}},
{Sel: ".VLM1", Doc: "",
Set: func(pt *axon.PathParams) {
pt.Learn.LRate.Base = 0.02 // 0.02 > 0.04 > 0.01
pt.Learn.LRate.Base = 0.02 // 0.02 > 0.04 > 0.01 -- still key
// note: VL is a target layer; SWts not used.
}},
{Sel: "#StateToM1", Doc: "",
Set: func(pt *axon.PathParams) {
Expand All @@ -165,6 +164,11 @@ var PathParams = axon.PathSheets{
Set: func(pt *axon.PathParams) {
pt.PathScale.Abs = 1 // 1 > 1.1 > 0.9 >> 0.5
pt.Learn.LRate.Base = 0.04 // 0.04 > 0.02
// fixed is not better:
// pt.Learn.Learn.SetBool(false)
// pt.SWts.Init.SPct = 0
// pt.SWts.Init.Mean = 0.8
// pt.SWts.Init.Var = 0.0
}},
{Sel: "#DGPiToM1VM", Doc: "final inhibition",
Set: func(pt *axon.PathParams) {
Expand Down Expand Up @@ -196,11 +200,11 @@ var PathParams = axon.PathSheets{
pt.PathScale.Abs = 1 // 1
pt.PathScale.Rel = 0.1 // 0.1 > 0.2, .05, 0
}},
{Sel: "#M1PTToM1PT", Doc: "",
{Sel: "#M1PTToM1PT", Doc: "self path",
Set: func(pt *axon.PathParams) {
pt.Learn.LRate.Base = 0.0001 // 0.0001 in orig
pt.Learn.LRate.Base = 0.0001 // 0.0001 > .04 but not a major diff
}},
// {Sel: "#M1PTpToMotorBS", Doc: "",
// {Sel: "#M1PTpToMotorBS", Doc: "not used",
// Set: func(pt *axon.PathParams) {
// pt.PathScale.Abs = 2
// pt.PathScale.Rel = 1
Expand Down
37 changes: 37 additions & 0 deletions sims/bgdorsal/td/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# TD

TD provides a simple TD Q learning solution to the simple motor sequence learning problem, to get a sense of its overall learning complexity.

It is a very simple problem so it is expected to be solved easily: the question is how fast it ends up being learned, given the amount of state space exploration that is required.

The trial / epoch setup is the same as the bgdorsal sim, so epochs are comparable units (128 trials per epoch).

# Results

## Seq4x6

* Epsilon Min = 0.01 seems pretty optimal -- any higher and it fails to converge
* Epsilon Decay = 0.2 or 0.5 work well -- helps converge more quickly.
* LRate Decay = 0.0001 or 0.00001 seem best: variance vs speed tradeoff overall

```
./td -runs 50 -epochs 1000 -env-seq-len 4 -env-n-actions 6 -td-l-rate-decay 0.0001 -td-epsilon-decay 0.5 -td-epsilon-min 0.01
```

The fastest times here are about 10 epochs, and you also get values in the 400+ epochs as well.

## Seq3x10

Similar param issues, and in general get the same performance with less variability overall:

```
./td -runs 50 -epochs 1000 -env-seq-len 3 -env-n-actions 10 -td-l-rate-decay 0.0001 -td-epsilon-decay 0.5 -td-epsilon-min 0.01
```

## Exploring larger spaces

* 4x10 is more difficult for sure: have to reduce epsilon decay to 0.1

In general it seems sensibly related to the size of the space being searched.


Loading

0 comments on commit bd58c0c

Please sign in to comment.