Watch this space.
- decide char or word level
- infrastructure for parameters
- some parameters (sequence length, vocab size...) have to be set by data - how to handle?
- how to deal with unknown max sequence length (i.e. no padding)
- initializing weights
- initial (unsupervised) training step
- much later: add the query branch
- first pass: if not labeled, query