How are batch statistics computed? #6

OverLordGoldDragon · 2019-12-20T04:53:48Z

I'm implementing recurrent BN in Keras, but looking at the original paper and those citing it, a detail remains unclear to me: how are batch statistics computed? In the original, authors state (pg. 3) (emphasis mine):

At training time, the statistics E[h] and Var[h] are estimated by the sample mean and sample variance of the current minibatch

Yet another paper (pg. 3) using and citing it describes:

We subscript BN by time (BN_t) to indicate that each time step tracks its own mean and variance. In practice, we track these statistics as they change over the course of training using an exponential moving average (EMA)

My question's thus two-fold:

Are minibatch statistics computed per immediate minibatch, or as an EMA?
How are the inference parameters, shared across all timesteps, gamma and beta computed? Is the computation in (1) simply averaged across all timesteps? (e.g. average EMA_t for all t)

Existing implementations: in Keras and TF below, but are all outdated, and am unsure regarding correctness

Keras, TF-A, and TF-B
All above agree that during training, immediate minibatch statistics are used, and that beta and gamma are updated as an EMA of these minibatches
Problem: the bn operation (in A, and presumably B & C) is applied on a single timestep slice, to be passed to the K.rnn control flow for re-iteration. Hence, EMA is computed w.r.t. minibatches and timesteps - which I find questionable:
EMA is used in place of a simple average when population statistics are dynamic (e.g. minibatch-to-minibatch), whereas we have access to all timesteps in a minibatch prior having to update gamma and beta
EMA is a worse but at times necessary alternative to a simple average, but per above, we can use latter - so why don't we? Timestep statistics can be cached, averaged at the end, then discarded - holds also for stateful=True

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How are batch statistics computed? #6

How are batch statistics computed? #6

OverLordGoldDragon commented Dec 20, 2019

How are batch statistics computed? #6

How are batch statistics computed? #6

Comments

OverLordGoldDragon commented Dec 20, 2019