Skip to content

Commit

Permalink
DOC: show adaptive algorithm performance (#92)
Browse files Browse the repository at this point in the history
* BUG: Reverse posterior calculation
* ENH: add query plot on dashboard
* MAINT: add embedding algorithm SOE
* MAINT: Remove sample_weight from active offline
* DOC: Rework adaptive benchmark
  • Loading branch information
stsievert authored Feb 2, 2021
1 parent 88bb858 commit f9eae7e
Show file tree
Hide file tree
Showing 44 changed files with 381 additions and 561 deletions.
42 changes: 21 additions & 21 deletions docs/source/benchmarks/adaptive.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,31 +6,31 @@ about a random question like random sampling. This can mean that higher
accuracies are reached sooner, or that less human responses are required to
reach a particular accuracy.

Illustrative result
-------------------

Let's run a quick benchmark with Salmon to see how well adaptive performs in
the crowdsourcing context. This benchmark will accurately simulate a
crowdsourcing context:

* Answers will be received by Salmon at a rate of 4 responses/second.
* The answers will come from the Zappos shoe dataset, an exhaustively sampled
triplets dataset with 4 human responses to every possible question.
* This dataset has :math:`n = 85` shoes, and I mirror Heim et. al and embed
into :math:`d = 2` dimensions [1]_.
* The random and adaptive algorithms will be the same in every except in how
how select queries.

With that setup, how much of a difference does query selection matter? Here's
a result that illustrates the benefit of adaptive algorithms:

.. image:: imgs/adaptive.png
:width: 400px
Synthetic simulation
--------------------

Let's compare adaptive sampling and random sampling. Specifically, let's use
Salmon like an experimentalist would:

1. Launch Salmon with the "alien eggs" dataset (with :math:`n=50` objects and
using :math:`d=2` dimensions).
2. Simulate human users (6 users with mean response time of 1s).
3. Download the human responses from Salmon
4. Generate the embedding offline.

Let's run this process for adaptive and random sampling. When we do that, this
is the graph that's produced:

.. image:: imgs/synth-eg-acc.png
:width: 600px
:align: center

These are synthetic results, though they use a human noise model. These
experiments provide evidence that Salmon works well with adaptive sampling.

This measure provide evidence to support the hypothesis that Salmon has better
performance than NEXT for adaptive triplet embeddings. For reference, in NEXT's
introduction paper, the authors provided "no evidence for gains from adaptive
introduction paper, the authors found "no evidence for gains from adaptive
sampling" for the triplet embedding problem [2]_.

.. [1] "Active Perceptual Similarity Modeling with Auxiliary Information" by E.
Expand Down
Binary file removed docs/source/benchmarks/imgs/adaptive.png
Binary file not shown.
Binary file added docs/source/benchmarks/imgs/synth-eg-acc.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed docs/source/imgs/logo.graffle/data.plist
Binary file not shown.
Binary file removed docs/source/imgs/logo.graffle/image1.tiff
Binary file not shown.
Binary file removed docs/source/imgs/logo.graffle/image2.tiff
Binary file not shown.
Binary file removed docs/source/imgs/logo.graffle/image3.tiff
Binary file not shown.
Binary file modified docs/source/imgs/logo.pages
Binary file not shown.
Binary file modified docs/source/imgs/query.graffle/data.plist
Binary file not shown.
Binary file modified docs/source/imgs/query.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
39 changes: 3 additions & 36 deletions docs/source/offline.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,12 +34,6 @@ Install Salmon
Generate embeddings
-------------------

First, let's cover random sampling. Adaptive algorithms require some special
attention.

Random embeddings
"""""""""""""""""

This code will generate an embedding:

.. code-block:: python
Expand All @@ -52,42 +46,16 @@ This code will generate an embedding:
n = int(X.max() + 1) # number of targets
d = 2 # embed into 2 dimensions
X_train, X_test = train_test_split(X, random_state=0, test_size=0.2)
X_train, X_test = train_test_split(X, random_state=42, test_size=0.2)
model = OfflineEmbedding(n=n, d=d)
model.fit(X_train, X_test)
model.embedding_ # embedding
model.history_ # to view information on how well train/test performed
Some customization can be done with ``model.history_``; it may not be necessary
to train for 200 epochs, for example. ``model.history_`` will include
validation and training scores, which might help limit the number of epochs.

Adaptive embeddings
"""""""""""""""""""

Adaptive embeddings are mostly the same, but require the following:

1. Re-weighting the adaptively selected samples.
2. Splitting train/test properly.

Re-weighting the samples is required because we don't want to overfit the
adaptive samples.

.. code-block:: python
df = pd.read_csv("responses.csv") # downloaded from dashboard
test = df.alg_ident == "RandomSampling"
train = df.alg_ident == "TSTE" # an adaptive algorithm
cols = ["head", "winner", "loser"]
X_test = df.loc[test, cols].to_numpy()
X_train = df.loc[train, cols].to_numpy()
model = OfflineEmbedding(n=int(df["head"].max() + 1), d=2, weight=True)
model.fit(X_train, X_test, scores=df.loc[train, "score"])
to train for 1,000,000 epochs. ``model.history_`` will include validation and
training scores, which might help limit the number of epochs.

Embedding visualization
-----------------------
Expand All @@ -107,4 +75,3 @@ the embedding, which might be `Matplotlib`_, the `Pandas visualization API`_,
.. _Bokeh: https://bokeh.org/
.. _Matplotlib: https://matplotlib.org/
.. _Altair: https://altair-viz.github.io/

2 changes: 1 addition & 1 deletion examples/datasets.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
from sklearn.utils import check_random_state


def strange_fruit(head, left, right, random_state=None):
def alien_egg(head, left, right, random_state=None):
"""
Parameters
----------
Expand Down
Binary file added examples/datasets/alien-eggs-triplets/images.zip
Binary file not shown.
Original file line number Diff line number Diff line change
Expand Up @@ -623,7 +623,7 @@
"from sklearn.utils import check_random_state\n",
"\n",
"\n",
"def strange_fruit(head, left, right, random_state=None):\n",
"def alien_egg(head, left, right, random_state=None):\n",
" \"\"\"\n",
" Parameters\n",
" ----------\n",
Expand Down Expand Up @@ -672,7 +672,7 @@
}
],
"source": [
"strange_fruit(0, 1, 3)"
"alien_egg(0, 1, 3)"
]
},
{
Expand All @@ -686,7 +686,7 @@
"n_targets = 600\n",
"num_ans = 100_000\n",
"X = rng.choice(n_targets, size=(num_ans, 3))\n",
"y = [strange_fruit(h, l, r, random_state=rng) for h, l, r in X]"
"y = [alien_egg(h, l, r, random_state=rng) for h, l, r in X]"
]
},
{
Expand Down
Binary file added examples/datasets/faces.zip
Binary file not shown.
Loading

0 comments on commit f9eae7e

Please sign in to comment.