diff --git a/README.md b/README.md
index c75c699..08129a2 100644
--- a/README.md
+++ b/README.md
@@ -100,7 +100,7 @@ example using
 
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
-from cappr.huggingface.classify import cache_model, predict_proba
+from cappr.huggingface.classify import cache_model, predict
 
 # Load model and tokenizer
 model = AutoModelForCausalLM.from_pretrained("gpt2")
@@ -118,18 +118,17 @@ Complete this sequence:'''
 prompts = ["a, b, c =>", "X, Y =>"]
 completions = ["d", "Z", "Hi"]
 
-# Cache
+# Cache prompt_prefix because it's used for all prompts
 cached_model_and_tokenizer = cache_model(
     model_and_tokenizer, prompt_prefix
 )
 
 # Compute
-pred_probs = predict_proba(
+preds = predict(
     prompts, completions, cached_model_and_tokenizer
 )
-print(pred_probs.round(2))
-# [[1.   0.   0.  ]
-#  [0.01 0.99 0.  ]]
+print(preds)
+# ['d', 'Z']
 ```
 </details>
 
diff --git a/docs/source/why_probability.rst b/docs/source/why_probability.rst
index 5bc21f2..b0be556 100644
--- a/docs/source/why_probability.rst
+++ b/docs/source/why_probability.rst
@@ -17,9 +17,9 @@ At a higher level, probabilities are useful when making cost-sensitive decisions
 Another application where predicting probabilities turns out to be useful is in
 "multilabel" tasks. In these tasks, a single piece of text can be labeled or tagged with
 multiple categories. For example, a tweet can express multiple emotions at the same
-time. A simple way to have an LLM tag a tweet's emotions is to predict the probability
-of each emotion, and then threshold each probability. All possible emotions can be
-processed in parallel to save time.
+time. A simple way to have an LLM tag a tweet's negative emotions is to predict the
+probability of each one, and then threshold each probability. All negative emotions are
+processed in parallel to save time, which is also how I power through most days.
 
 
 Examples