Skip to content

Latest commit

 

History

History
27 lines (19 loc) · 865 Bytes

Notes.md

File metadata and controls

27 lines (19 loc) · 865 Bytes

Notes 📔

Observations

Activation functions

  • Sigmoid activation function is generally very good

  • TanH activation function works well with simpler networks, but strugles with complexity

    • In mnist data, it's hitting a cap of 50% moving average success rate for some reason
    • RNN performance is also pretty bad
    • Maybe better for CNN
  • ReLU

    • Maybe better for CNN

(These are probably issues with this implementation since it was built using sigmoid, and others were tried to be forced in)

Learning rate

  • generally 0.1 is a good value
  • recurrent net performs better with 0.05 - 0.08 learning rates

TODO

  • Complete CNN
  • CNN input topology validation (because overflows are not handled yet. assert gets called)
  • CNN padding to avoid lossy pooling and convolution operations
  • remove all asserts