Burr Settles - A Trainable Spaced Repetition Model for Language Learning (2016)
History /
Edit /
PDF /
EPUB /
BIB /
Created: February 1, 2018 / Updated: November 2, 2024 / Status: finished / 2 min read (~366 words)
Created: February 1, 2018 / Updated: November 2, 2024 / Status: finished / 2 min read (~366 words)
- The spacing effect is the observation that people tend to remember things more effectively if they use spaced repetition practice (short study periods spread out over time) as opposed to massed practice (i.e., "cramming")
- The lag effect is the related observation that people learn even better if the spacing between practices gradually increases
- Once a lesson is completed, all the target words being taught in the lesson are added to the student model. This model captures what the student has learned, and estimates how well she can recall this knowledge at any given time
- Graduated-interval recall, whereby new vocabulary is introduced and then tested at exponentially increasing intervals, interspersed with the introduction or review of other vocabulary
- This approach is limited since the schedule is pre-recorded and cannot adapt to the learner's actual ability
- The main idea is to have a few boxes that correspond to different practice intervals: 1-day, 2-day, 4-day, and so on
- All cards start out in the 1-day box, and if the student can remember an item after one day, it gets "promoted" to the 2-day box. Conversely, if she is incorrect, the card gets "demoted" to a shorter interval box
- Ebbinghaus model, also known as the forgetting curve
- Memory decays exponentially over time
$$ p = 2^{-\delta/h}$$
- Memory decays exponentially over time
- $p$ is the probability of correctly recalling an item (e.g., a word)
- $\delta$ is the lag time since the item was last practiced
- $h$ is the half-life or measure of strength in the learner's long term memory
$$ \hat{h}_\Theta = 2^{\Theta \text{x}}$$
- $\text{x}$ denotes a feature vector that summarizes a student's previous exposure to a particular word
- $\Theta$ contains weights that correspond to each feature variable in $\text{x}$
- We want to fit $\Theta$ empirically to learning trace data, and accommodate an arbitrarily large set of interesting features
- Settles, Burr, and Brendan Meeder. "A trainable spaced repetition model for language learning." Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Vol. 1. 2016.