DOC: update adaptive benchmarks (#96)

stsievert · Feb 27, 2021 · 4db75d8 · 4db75d8
1 parent 7e15065
commit 4db75d8
Show file tree

Hide file tree

Showing 46 changed files with 226 additions and 235 deletions.
diff --git a/docs/source/algorithms.rst b/docs/source/algorithms.rst
@@ -5,8 +5,7 @@ Algorithm configuration
 
 .. warning::
 
-   These adaptive algorithms are (currently) experimental and may change at any
-   time.
+   The API for these algorithms is (currently) unstable.
 
 There are many queries to ask about in triplet embedding tasks. Most of these
 queries aren't useful; chances are most queries will have obvious answers and

diff --git a/docs/source/api.rst b/docs/source/api.rst
@@ -5,7 +5,7 @@ Algorithm API
 
 .. warning::
 
-   These APIs are experimental and may change at any time.
+   These APIs are unstable.
 
 All triplet embedding algorithms must conform to this API:
 
@@ -44,11 +44,24 @@ Active Algorithms
    :toctree: generated/
    :template: only-init.rst
 
+   salmon.triplets.algs.RR
    salmon.triplets.algs.TSTE
    salmon.triplets.algs.STE
    salmon.triplets.algs.CKL
    salmon.triplets.algs.GNMDS
 
+These adaptive algorithms are all the same except for the underlying noise
+model, with the exception of :class:`~salmon.triplets.algs.RR`.
+:class:`~salmon.triplets.algs.RR` introduces some randomness by fixing the head
+and adding the top ``5 * n`` triplets to the database. This is useful because
+the information gain measure used by all of these algorithms (by default) is a
+rule-of-thumb.
+
+.. note::
+
+   Use of :class:`~salmon.triplets.algs.RR` is recommended as it performs well
+   in :ref:`the experiments we have run <experiments>`.
+
 Interface
 ---------
 

diff --git a/docs/source/benchmarks/adaptive.rst b/docs/source/benchmarks/adaptive.rst
@@ -1,3 +1,5 @@
+.. _experiments:
+
 Adaptive algorithms
 ===================
 
@@ -6,15 +8,20 @@ about a random question like random sampling. This can mean that higher
 accuracies are reached sooner, or that less human responses are required to
 reach a particular accuracy.
 
+.. note::
+
+   This page shows results of experiments run with Salmon.
+   For complete details, see https://github.com/stsievert/salmon-experiments
+
 Synthetic simulation
 --------------------
 
 Let's compare adaptive sampling and random sampling. Specifically, let's use
 Salmon like an experimentalist would:
 
-1. Launch Salmon with the "alien eggs" dataset (with :math:`n=50` objects and
-   using :math:`d=2` dimensions).
-2. Simulate human users (6 users with mean response time of 1s).
+1. Launch Salmon with the "alien eggs" dataset, with :math:`n=30` objects
+   embedded into :math:`d=2` dimensions.
+2. Simulate human users (10 users with mean response time of 1s).
 3. Download the human responses from Salmon
 4. Generate the embedding offline.
 
@@ -28,57 +35,51 @@ is the graph that's produced:
 These are synthetic results, though they use a human noise model. These
 experiments provide evidence that Salmon works well with adaptive sampling.
 
-This measure provide evidence to support the hypothesis that Salmon has better
-performance than NEXT for adaptive triplet embeddings. For reference, in NEXT's
-introduction paper, the authors found "no evidence for gains from adaptive
-sampling" for the triplet embedding problem [2]_.
+This measure provides evidence that Salmon's active sampling approach
+outperforms random sampling. If true, this is an improvement over existing
+software to deploy triplet queries to crowdsourced audiences: in NEXT's
+introduction paper, [2]_ the authors found "no evidence for gains from adaptive
+sampling" for (nearly) the same problem. [#same]_
 
-.. [1] "Active Perceptual Similarity Modeling with Auxiliary Information" by E.
-       Heim, M. Berger, and L. Seversky, and M. Hauskrecht. 2015.
-       https://arxiv.org/pdf/1511.02254.pdf
-
-.. [2] "NEXT: A System for Real-World Development, Evaluation, and Application
-       of Active Learning" by K. Jamieson, L. Jain, C. Fernandez, N. Glattard
-       and R. Nowak. 2017.
-       http://papers.nips.cc/paper/5868-next-a-system-for-real-world-development-evaluation-and-application-of-active-learning.pdf
 
+Simulation with human responses
+-------------------------------
 
-Search efficacy
----------------
+The Zappos shoe dataset has :math:`n=85` shoes, and asks every possible triplet
+4 times to crowdsourcing users. Let's run a simulation with Salmon on that that
+dataset. We'll embed into :math:`d = 3` dimensions, and have a response rate of
+about 2.5 response/sec (5 users with an average response time of 2.5 seconds).
 
-Adaptive algorithms are more adaptive if they search more queries. Random sampling
-can be thought of as an adaptive algorithm that only searches over one possible
-query. An algorithm that searches over 50,000 queries is more adaptive than a
-algorithm that can only search 50 queries.
+Let's again compare adaptive sampling and random sampling:
 
-How much do these searches matter? Let's run another experiment with this setup:
+.. image:: imgs/zappos.png
+   :width: 600px
+   :align: center
 
-* Dataset: strange fruit dataset. The response model will be determined from human
-  responses. There will be :math:`n=200` objects and that will be embedded into :math:`d=2`
-  dimensions.
-* Let's measure **search efficacy.** To aid this, let's say model updates run instantly.
-  That means we'll run offline using essentially this code:
+The likelihood of a true response conveys "margin by which the models adhere to
+all responses." [1]_ The performance above mirrors the performance by Heim et
+al. in their Figure 3. [1]_
 
-.. code-block:: python
 
-   responses_per_search = 10
-   n_search = 10
-   alg = TSTE(n=n, d=d, ...)
+.. rubric:: References
 
-   for k in itertools.count():
-       queries, scores = alg.score_queries(num=n_search * responses_per_search)
-       queries = _get_top_N_queries(queries, scores, N=responses_per_search)
-       answers = [_get_answer(query) for query in queries]
+.. [1] "Active Perceptual Similarity Modeling with Auxiliary Information" by E.
+       Heim, M. Berger, and L. Seversky, and M. Hauskrecht. 2015.
+       https://arxiv.org/pdf/1511.02254.pdf
 
-       alg.partial_fit(answers)  # performs 1 pass over all answers received thus far
+.. [2] "NEXT: A System for Real-World Development, Evaluation, and Application
+       of Active Learning" by K. Jamieson, L. Jain, C. Fernandez, N. Glattard
+       and R. Nowak. 2017.
+       http://papers.nips.cc/paper/5868-next-a-system-for-real-world-development-evaluation-and-application-of-active-learning.pdf
 
-With that, we see this performance:
 
-.. image:: imgs/search-efficacy.png
-   :width: 600px
-   :align: center
 
-If you only have the budget for 4,000 queries the most complete search will reach about 82% accuracy. The least complete search will only reach about 60% accuracy.
+.. rubric:: Footnotes
 
-If you want to reach 80% accuracy, the most complete searches will require about 3,800 queries. The least complete searches will require 5,100 queries.
+.. [#same] Both experiment use :math:`n=30` objects and embed into :math:`d=2`
+           dimensions. The human noise model used in the Salmon experiments is
+           generated from the responses collected during NEXT's experiment. The
+           are the same experiment, up to different responses (NEXT
+           actually runs crowdsourcing experiments; Salmon's noise model is
+           generated from those responses).
 
diff --git a/docs/source/benchmarks/imgs/search-efficacy.png b/docs/source/benchmarks/imgs/search-efficacy.png
diff --git a/docs/source/benchmarks/imgs/synth-eg-acc.graffle/data.plist b/docs/source/benchmarks/imgs/synth-eg-acc.graffle/data.plist
diff --git a/docs/source/benchmarks/imgs/synth-eg-acc.graffle/image11.png b/docs/source/benchmarks/imgs/synth-eg-acc.graffle/image11.png
diff --git a/docs/source/benchmarks/imgs/synth-eg-acc.graffle/image13.png b/docs/source/benchmarks/imgs/synth-eg-acc.graffle/image13.png
diff --git a/docs/source/benchmarks/imgs/synth-eg-acc.graffle/image14.tiff b/docs/source/benchmarks/imgs/synth-eg-acc.graffle/image14.tiff
diff --git a/docs/source/benchmarks/imgs/synth-eg-acc.graffle/image6.png b/docs/source/benchmarks/imgs/synth-eg-acc.graffle/image6.png
diff --git a/docs/source/benchmarks/imgs/synth-eg-acc.png b/docs/source/benchmarks/imgs/synth-eg-acc.png
diff --git a/docs/source/benchmarks/imgs/zappos-afrl.png b/docs/source/benchmarks/imgs/zappos-afrl.png
diff --git a/docs/source/benchmarks/imgs/zappos.graffle/data.plist b/docs/source/benchmarks/imgs/zappos.graffle/data.plist
diff --git a/docs/source/benchmarks/imgs/zappos.graffle/image1.png b/docs/source/benchmarks/imgs/zappos.graffle/image1.png
diff --git a/docs/source/benchmarks/imgs/zappos.graffle/image3.png b/docs/source/benchmarks/imgs/zappos.graffle/image3.png
diff --git a/docs/source/benchmarks/imgs/zappos.graffle/image4.png b/docs/source/benchmarks/imgs/zappos.graffle/image4.png
diff --git a/docs/source/benchmarks/imgs/zappos.png b/docs/source/benchmarks/imgs/zappos.png
diff --git a/docs/source/generated/salmon.triplets.algs.RR.rst b/docs/source/generated/salmon.triplets.algs.RR.rst
@@ -0,0 +1,7 @@
+:mod:`salmon.triplets.algs`.RR
+=====================================
+
+.. currentmodule:: salmon.triplets.algs
+
+.. autoclass:: RR
+   :members: __init__
diff --git a/docs/source/getting-started.rst b/docs/source/getting-started.rst
@@ -10,7 +10,7 @@ the following options:
 2. Upload of a YAML file describing experiment, and ZIP file for the targets.
 3. Upload of a database dump from Salmon.
 
-.. warning::
+.. note::
 
    By default, Salmon does not support HTTPS. Make sure the URL begins with
    ``http://``, not ``https://``.
@@ -35,7 +35,7 @@ page:
 
    This image is almost certainly out of date.
 
-.. warning::
+.. note::
 
    Please include the version in any bug reports or feature requests.
    The version number is available at ``http://[url]:8421/docs`` and should look

diff --git a/docs/source/imgs/face-embedding.png b/docs/source/imgs/face-embedding.png
diff --git a/docs/source/imgs/query.graffle/data.plist b/docs/source/imgs/query.graffle/data.plist
diff --git a/docs/source/imgs/query.png b/docs/source/imgs/query.png
diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -6,13 +6,21 @@ are of the form "is object :math:`a` more similar to object :math:`b` or
 :math:`b`?" An example is shown below with facial similarities:
 
 .. image:: imgs/query.png
-   :width: 400px
+   :width: 300px
    :align: center
 
 These queries are interesting because they provide some relative similarity
 structure: a response might indicate that object :math:`a` is closer to object
 :math:`b` than object :math:`c` as determined by humans and the instructions
-they are given.
+they are given. For example, these triplet queries have been used by
+psychologists to determine what facial emotions human find similar:
+
+.. image:: imgs/face-embedding.png
+   :width: 500px
+   :align: center
+
+Only distance is relevant in this embedding, not the vertical/horizontal axes.
+However, if you look closely, you can see two axes: positivity and intensity.
 
 Salmon provides efficient methods for collecting these triplet queries.  Salmon
 can be configured to only require (say) 10,000 answers from crowdsourcing
@@ -28,15 +36,7 @@ the same confidence.
    getting-started
    monitoring
    offline
-
-.. toctree::
-   :maxdepth: 2
-   :caption: Algorithms
-
    algorithms
-   adaptive
-   api
-   developers
 
 .. toctree::
    :maxdepth: 2
@@ -45,6 +45,14 @@ the same confidence.
    benchmarks/server
    benchmarks/adaptive
 
+.. toctree::
+   :maxdepth: 2
+   :caption: Algorithm Developers
+
+   adaptive
+   developers
+   api
+
 
 Indices and tables
 ==================

diff --git a/docs/source/installation.rst b/docs/source/installation.rst
@@ -8,10 +8,6 @@ machine. After you get Salmon running, detail on how to launch experiments in
 Experimentalist
 ---------------
 
-.. warning::
-
-   This process is only ready for testing. It is **not** ready for deployment.
-
 1. Sign into Amazon AWS (http://aws.amazon.com/)
 2. Select the "Oregon" region (or ``us-west-2``) in the upper right.
 3. Go to Amazon EC2
@@ -69,7 +65,7 @@ To start using Salmon, these endpoints will be available:
      Download all files when stopping or terminating the machine -- especially
      the responses and experiment file.
 
-.. warning::
+.. note::
 
    If you have an issue with the machine running Salmon, be sure to include the
    logs when contacting the Salmon developers. They'd also appreciate it if

diff --git a/docs/source/offline.rst b/docs/source/offline.rst
@@ -47,7 +47,7 @@ This code will generate an embedding:
    d = 2  # embed into 2 dimensions
 
    X_train, X_test = train_test_split(X, random_state=42, test_size=0.2)
-   model = OfflineEmbedding(n=n, d=d)
+   model = OfflineEmbedding(n=n, d=d, max_epochs=1_000_000)
    model.fit(X_train, X_test)
 
    model.embedding_  # embedding
@@ -66,9 +66,11 @@ the dashboard by downloading the "embeddings" file (or visiting
 the embedding coordinates and the name of the embedding that generated the
 algorithm.
 
-To visualize the embedding, I would use standard plotting tools to visualize
+To visualize the embedding, standard plotting tools can be used to visualize
 the embedding, which might be `Matplotlib`_, the `Pandas visualization API`_,
-`Bokeh`_ or `Altair`_. Salmon uses Bokeh for it's visualization.
+`Bokeh`_ or `Altair`_. The Pandas visualization API is likely the easiest to
+use, but won't support showing HTML (images/video/etc). To do that, Salmon uses
+Bokeh for it's visualization.
 
 
 .. _Pandas visualization API: https://pandas.pydata.org/pandas-docs/stable/user_guide/visualization.html

diff --git a/examples/colors/colors.py b/examples/colors/colors.py
@@ -59,16 +59,20 @@
     {"r": 0.9829, "g": 0.65643, "b": 0.078187},
 ]
 
+
 def _fmt_color(c):
     c = hex(int(255 * c))[2:]
     if len(c) == 1:
         c = f"0{c}"
     assert len(c) == 2
     return c
+
+
 def _convert(r, g, b):
     r, g, b = map(_fmt_color, [r, g, b])
     return f"#{r}{g}{b}"
 
+
 def _fmt(hc):
     target = (
         "<div class='center col-md-5 mx-auto'"
@@ -79,11 +83,13 @@ def _fmt(hc):
     )
     return target
 
+
 htmlcolors = [_convert(c["r"], c["g"], c["b"]) for c in colors]
 targets = [_fmt(hc) for hc in htmlcolors]
 with open("colors.csv", "w") as f:
     f.write("\n".join(targets))
 import os
+
 os.system("zip colors.csv.zip colors.csv")
 os.system("rm colors.csv")
 

diff --git a/examples/queries-searched/offline.py b/examples/queries-searched/offline.py
@@ -104,7 +104,7 @@ def _partial_fit(self):
         # Do one process_answers call
         for k in itertools.count():
             #  if k == 90:
-                #  breakpoint()
+            #  breakpoint()
             if k >= 100:
                 raise ValueError("infinite loop?")
             query, score = self.alg.get_query()

diff --git a/examples/queries-searched/run.py b/examples/queries-searched/run.py
@@ -246,7 +246,7 @@ def _score_fruit(self, X, y):
 if __name__ == "__main__":
     import salmon
 
-    assert salmon.__version__ == 'v0.4.1+8.geafdca2.dirty'
+    assert salmon.__version__ == "v0.4.1+8.geafdca2.dirty"
 
     queries_per_search = 10
     #  _searches = [[1 * 10 ** i, 2 * 10 ** i, 5 * 10 ** i] for i in range(0, 5 + 1)]

diff --git a/salmon/backend/alg.py b/salmon/backend/alg.py
@@ -110,11 +110,7 @@ def submit(fn: str, *args, allow_other_workers=True, **kwargs):
 
                 if hasattr(self, "get_queries"):
                     f_search = submit(
-                        "get_queries",
-                        self_future,
-                        random_state=k,
-                        stop=done,
-                        workers=workers[2],
+                        "get_queries", self_future, stop=done, workers=workers[2],
                     )
                 else:
                     f_search = client.submit(lambda x: ([], []), 0)

diff --git a/salmon/frontend/private.py b/salmon/frontend/private.py
@@ -485,8 +485,7 @@ def _fmt_embedding(
 
 @app.get("/embeddings", tags=["private"])
 async def get_embeddings(
-    authorized: bool = Depends(_authorize),
-    alg: Optional[str] = None,
+    authorized: bool = Depends(_authorize), alg: Optional[str] = None,
 ):
     """
     Get the embeddings for algorithms.

diff --git a/salmon/frontend/utils.py b/salmon/frontend/utils.py
@@ -20,7 +20,6 @@ def __init__(self, msg):
         raise HTTPException(status_code=500, detail=msg)
 
 
-
 def _extract_zipfile(raw_zipfile, directory="targets"):
     p = Path(__file__).absolute().parent  # directory to this file
     imgs = p / "static" / directory