Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements around realistic crowdsourcing simulations #110

Merged
merged 68 commits into from
May 21, 2021

Conversation

stsievert
Copy link
Owner

@stsievert stsievert commented Apr 7, 2021

What does this PR implement?
Over in salmon-experiments, I plan to run a set of experiments that use realistic timing for simulations (thanks to the data I have collected). This PR revolves around those improvements. The changes include:

  • MAINT: Reworking examples
  • MAINT: remove duplicate queries (queries [0, 4, 5] and [0, 5, 4] are not unique because the last two objects get switched randomly).
  • MAINT: clean dashboard
    • only show endpoint timings if more than one hit
    • reorganization
    • reduce size of dashboards (don't plot every model update; plot the median over 2 minutes)
  • ENH: show histograms of the response {rate, gap} on the dashboard.
  • MAINT: don't run RR or RandomSampling (but set the stopped variable in the DB right away)
  • MAINT: launch number of workers equal to number of cores (with n_threads=1).
  • DOC: clean docs for Sampler, SOE
  • API: set default mu in CKL to be 0.05 (as per the original NEXT paper)
  • MAINT: run isort, black
  • API: change salmon.triplets.offline.OfflineEmbedding's default noise_model to CKL.
  • (private) API: allow setting n_search in AdaptiveRunner.
  • MAINT: figure out deps (remove numba, etc)

TODO:

  • check to make sure n_cores workers are launched.

Reference issues/PRs

@stsievert
Copy link
Owner Author

stsievert commented Apr 11, 2021

In total, the time required for the CI run to complete has gone from 22–23min to 15–16min, largely thanks to these basic speed enhancements:

  • Setting stopped-random right away in e32947c: tests in salmon/tests: 7.35min → 4.92min.
  • Basic cleanup around deps in salmon.yml in 6bab90f:
    • installing deps: 5.53min → 3.53min.
    • Building the Salmon server w/ Docker: 8.6min → 6.38min.

@stsievert
Copy link
Owner Author

This PR is almost ready for merge. I've update the docs to represent some experiments in salmon-experiments. In short, I show a clear benefit to Salmon's active sampling:

N-accuracy

Some jobs to finish; all the active runs should go to 20,000 responses.

@stsievert
Copy link
Owner Author

Salmon is working on my own machine; I think the tests have a bug (sigh).

@stsievert stsievert merged commit f0eb9e1 into master May 21, 2021
@stsievert stsievert deleted the examples-cleanup branch May 21, 2021 19:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant