-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Incorrect iteration_to_epoch calculation with batch_accumulation #1240
Comments
I'm pretty sure we're doing it right. Please elaborate on why you think what we're doing is incorrect. |
As it is stated in https://github.com/NVIDIA/caffe/blob/caffe-0.15/src/caffe/solver.cpp#L280-L282 the internal iter_ counter in caffe "indicates the number of times the weights have been updated." Therefore, the number of samples processed during one caffe iteration should be This is not done in DIGITS/digits/model/tasks/caffe_train.py Lines 527 to 531 in 713cb83
DIGITS/digits/model/tasks/caffe_train.py Lines 889 to 890 in 713cb83
Please correct me if I'm wrong |
What @bachma is saying makes sense. I tried training the same model without batch accumulation and with batch accumulation and it does indeed take much longer to go through the same number of "epochs" (in DIGITS speak) when using batch accumulation. |
Oh wow, great point! I'm surprised that's how Caffe counts things. |
Hello,
I'm not a 100% sure, but I think the calculation of iteration_to_epoch is incorrect when using batch_accumulation. I guess this is due to the calculation of solver.max_iter since it doesn't take batch_accumulation into account.
The text was updated successfully, but these errors were encountered: