-
-
Notifications
You must be signed in to change notification settings - Fork 132
The mechanism of optimization
The optimizer applies Maximum Likelihood Estimation and Backpropagation Through Time to estimate the stability of memory and learn the laws of memory from time-series review logs. Then, it can find the optimal retention to minimize the repetitions via the stochastic shortest path algorithm.
FSRS is based on the DSR (Difficulty, Stability, Retrievability) memory model. The DSR model uses seventeen parameters and six equations to describe memory dynamics during spaced repetition practice (For details, see The Algorithm).
Here is a brief introduction to the training process of FSRS.
The database schema of Anki's review logs is as follows (Copy from Database Structure):
-- revlog is a review history; it has a row for every review you've ever done!
CREATE TABLE revlog (
id integer primary key,
-- epoch-milliseconds timestamp of when you did the review
cid integer not null,
-- cards.id
usn integer not null,
-- update sequence number: for finding diffs when syncing.
-- See the description in the cards table for more info
ease integer not null,
-- which button you pushed to score your recall.
-- review: 1(wrong), 2(hard), 3(ok), 4(easy)
-- learn/relearn: 1(wrong), 2(ok), 3(easy)
ivl integer not null,
-- interval (i.e. as in the card table)
lastIvl integer not null,
-- last interval (i.e. the last value of ivl. Note that this value is not necessarily equal to the actual interval between this review and the preceding review)
factor integer not null,
-- factor
time integer not null,
-- how many milliseconds your review took, up to 60000 (60s)
type integer not null
-- 0=learn, 1=review, 2=relearn, 3=cram
);
A revlog records card cid reviewed at time id with rating ease. It is easy to trace the entire review history of a card. The memory state of a card depends on its review history. Group the revlogs by card, sort them by time and concatenate the ratings and intervals between each adjacent review; it can generate the history of ratings and history of intervals for each card. Here are the samples after pre-processing:
In machine learning terms, the histories of ratings and intervals are time-series features. I want to use this time-series data to train the DSR model. The inputs are the r_history and t_history, and the outputs are Stability and Difficulty. I built a time-series model via PyTorch to handle this data type, which could automatically apply Backpropagation through time (BPTT) to train this model.
There is another problem with training the model from the time-series data. One main goal of FSRS is to predict the memory state after a sequence of reviews. However, time-series data only records ratings instead of Stability or Difficulty. Stability is a statistical variable of a group of reviews with the same review history. Is it possible to estimate the stability of memory with the raw data? Maximum likelihood estimation (MLE) fits in this case.
Let's say that the function of forgetting curve is
According to the MLE method,
Set
which is the loss function for updating
At this point, we then replace
where
With MLE and BPTT, we can then train the parameters of DSR directly from the time-series data.
My representative paper at ACMKDD: A Stochastic Shortest Path Algorithm for Optimizing Spaced Repetition Scheduling
My fantastic research experience on spaced repetition algorithm: How did I publish a paper in ACMKDD as an undergraduate?
The largest open-source dataset on spaced repetition with time-series features: open-spaced-repetition/FSRS-Anki-20k