Skip to content

2020: Diego & Kunal

Scott Veirs edited this page Aug 25, 2021 · 5 revisions

Students

Mentor team

  • Valentina Staneva (University of Washington, eScience Institute) -- machine learning & data visualization
  • Jesse Lopez (Axiom Data Science) -- computational data science & machine learning
  • Val Veirs (Beam Reach) -- Orcasound Lab hydrophone host, machine learning & noise analysis
  • Scott Veirs (Beam Reach) -- Orcasound coordinator, marine bioacoustics

For more info, see the Orcasound Hacker Hall of Fame.

Advisors:

  • Abhishek Singh (Google Summer of Code 2019 alum; final year Computer Science & Engineering student at NIT Durgapur, India/ GSoC’19 at ESIP)
  • Dan Olsen (North Gulf Oceanic Society) -- killer whale bioacoustics
  • Hannah Meyers (University of Alaska) -- marine biology
  • Paul Cretu (Freelance software dev) -- lead Orcasound/orcasite dev for v1 UI
  • Shima Abadi, Univ. Washington Mechanical Engineering (acoustical oceanography & machine learning)

Handy links

Meeting procedures

  1. Report progress on goals from last week
  2. Discuss any blocking issues or strategic decisions (e.g. upcoming scheduled events, code reviews, etc.)
  3. Set new goals for next week

Meeting synopses

7/3/20 Friday GSoC call (10-11 Pacific; Kunal, Diego, Jesse, Abhishek, Val, Scott, Hannah)

Kunal update

  • Working on model, scripts
  • Confirmed Jesse's question: you have basic model ready, and scripts to run model from command line
  • Scott asked if first unlabeled data have been chosen: Kunal will work with Scott to prioritize an event from Scott's spreadsheet of labeling candidates; we decided an S3 bucket like the Acoustic Sandbox would be a good place to store the un/labeled audio data, at least initially
  • Jesse asked Kunal to polish up scripts, then share for feedback.

Diego update

  • Implemented performance metric as function of active learning training period (showed new plot, but hasn't pushed to Github yet)
  • Published blog post
  • Gave current UI tour to Hannah+
    • Hannah mentioned main classification task for AK as orca or not
    • But subsequent classifications that would be useful include: orca vs humpback vs boat, and then within orca -- resident vs general transient vs AT1 (unique sounding) calls

General notes & discussion

  • Thanks to all for completing first GSoC evaluations on time (due today 11 Pacific)
  • Oliver of Meridian plans to join next Friday

6/26/20 Friday GSoC call (10-11:15 Pacific)

Kunal's update

  • Working on Valentina’s guidance
    • ROC (0.83, 0.2)
    • Precision, recall plot
  • Jesse: Prepare to automate the active learning
    • Use Rparse, or new libraries like Click (more efficient than Rparse)or Typer (requires Python versions) to create Command Line Interface
  • Blocking:
    • Error/exception above 1 batch in Tensorboard
    • Abhishek will help troubleshoot via DM

Diego's update

  • Documenting API (orcagsoc/tree/feature/statistics/api)
  • Added date in order to plot # sounds validated
    • Idea of tracking speed of labeler (future feature when/if gamification is used to motivate citizen scientists?)
    • Idea of tracking evolution of model performance along with # of sounds validated (possibly on same time-series graph?)

General discussion

  • Kunal: looking ahead, after a few rounds of active learning, could we use a much larger non-validated set of predictions to train in subsequent rounds?
  • Jesse: that is done but it is preferable to validate at least some of the predictions
  • Valentina: Ming used algorithm to get predictions of Beluga signals, then fed training data to deep learning model; other examples of good practice may be found in click detection literature.
  • Valentina: Plot idea for machine learning scientists: spread or distribution of prediction probabilities or scores for each sample (e.g. lots of 0s and 1s with nothing in the middle) to show over-fitting vs confidence...
  • Ideas for other (domain expert or data owner) user options:
    • Scott: Maybe specify what portion of your data set you want to validate during each active learning iteration?
      • Jesse: Maybe good to indicate confidence for each prediction, or whether a threshold is met or not

Scott's updates & questions:

  • When/if to utilize Dan/Hannah data this summer, as well as iteratively improving Abhishek’s model and training/test data?
  • Do either of you need more feedback from me, e.g. user feature specification and priority in the Trello board?
  • How did Pod.Cast team choose the format of the tsv files and organization of the tar balls?
  • How different are the Pod.Cast label format and metadata from other training data sets (in bioacoustics, generally)?
  • DFO meeting #1 synopsis
    • Oliver would like to join call on 2nd Fri in July
    • DFO wants differentiation between ecotypes (SRKWs and Bigg’s)
  • Are there any/many Biggs signals in the OrcaCNN data set?
    • Abhishek: maybe, but if so very few
    • Jesse/Abhishek: general stats/format of OrcaCNN data/labels?
      • About 2000 KW labels (Abhishek generated samples; Dan provided small test set)
      • Humpback train/test data from Monterey Bay (GPL-like usage, so not fully open)

6/19/20 Friday GSoC call (10-11:30 Pacific)

Diego report

  • Pytest implemented; tabling test of click for later (via Praful)
  • enabled extension on backend
  • tested on edge browser (fixed bug), now works on Firefox & Chrome
  • added code snippet to handle expertise tag
  • deployed backend on Heroku, front end to Github pages
  • Github admin needs to publish
  • Using Postman and PGadmin

Diego goals

  • Valentina: add documentation to show how Heroku set-up and how to deploy Flask app on linux and docker
  • Jesse: document API, including end points
  • Charting libraries (8 default charts; will work with dummy data first)
    • Valentina: look at tensor board/chart (are ML measures useful for expert users (scientists like Hannah), or could they be simplified for general audience?
    • Grids to analyze/verify confusion matrix results (e.g. true vs false positives)
    • Valentina: plot model performance over time (choose 1 score to track during internal validation, e.g. for each epoch)
  • Table for 2 weeks: Javascript testing. Scott suggestion: ask Praful for Thurs hack group invite & timing (to jumpstart JS testing next week, and/or following week)

Kunal report

  • working on documentation
  • Resnet 412 models, VGG16, Inception
  • Has not used WHOI data, only podcast round 2 & 3
  • Discussed pre-training on WHOI data vs other orca labeled data

Scott: in anticipation of experiments with different combinations of orca training data, add to orcadata wiki the size of related data sets (with links to them)?

  • OrcaCNN (Alaskan residents)
  • OrcaSPOT (NRKWs with data from Orchive)
  • OBI Lime Kiln data (SRKWs)

Kunal goals

  • Valentina: start looking into and documenting formats for importing/exporting models and performance comparisons (open source formats? HD55?)
  • Valentina: plot model performance over time (choose 1 score to track during internal validation, e.g. for each epoch)
  • Jesse: create a callback for checkpoints, but also an accuracy threshold (stop training if accuracy > 0.95)
  • Valentina: do a little more tuning, but main reason for over-fitting is that we need more data…
  • Jesse: ~70% access is a good place to start, then try to improve through active learning process
  • Valentina: do you have more negatives that you haven’t used for training? If so, does the model suggest that some are “interesting” -- possibly ones that are near your decision boundary?
  • Kunal: All Orcasound negatives have been used in training, but Ketos background sounds (from NRKWs) might be a possibility
  • Scott: Let me know if Google Cloud services help with Colab logistics. This week Beam Reach (my social purpose corp) was granted k credits that need to be used in next year...

6/12/20 Friday GSoC call (10-11:15 Pacific)

Mentor thoughts on process for weekly Friday meetings? Jesse: report on progress, blocking issues Scott: include goals for next week Valentina: also schedule (code) review events

Kunal report:

Scott chat links:

How to visualize the model performance?

  • ROC curves
  • Confusion matrices

Diego Q for Kunal: What is difference in performance if mp3 is used instead of WAV?

Scott thot: Two experiment ideas to seek an answer --

  1. stream both HLS and FLAC when SRKWs are next calling, &
  2. Go back to WAV files in training (e.g. Pod.Cast rounds) and convert WAV samples to mp3, then re-run model... Ask Val for ideas, too...

Kunal goals:

  • VGG may be best, but also trying ResNet and convolution 2D model
  • Jesse: look at how to make code reusable (e.g. Orca), Kunal will convert from colab to to Python scripts…
  • Valentina: Add markdown cells to document code (including organizing packages), and even images
  • Abhishek: For next notebook, add subsections in notebook and a top-level README

Diego report:

  • Added Bigg’s KWs to classification UI
  • Added option to indicate experience level of labeler
  • Table of labels, including mp3 filename, label, and user experience level

Diego goals:

  • Test GUI with Kunal’s processed data (e.g. put it in S3 bucket)
  • Jesse: include tests (for Flask you can use libraries to mock a post, and ensure something is returned) and embed in continuous integration
  • Abhishek goal: you had chance to look at JS library?
  • Valentina: look into each cloud environment’s app service…
  • Diego: Heroku is easier (Github integration vs ssh from Ubuntu instance), but is more expensive

6/5/20 Friday GSoC call (#1)

**Diego updates: ** -- UI branch w/ J,K,L and no orca categories (bird, ship…) -- SV: send goals for “expert user (SRKW, orca)” to Diego -- Error testing pod.cast - Goal: Will compare w/Valentina - SV: share Akash/Prakruti emails?

Kunal updates: -- Will share notebook with Ketos error -- Goals: -- Why getting error in Ketos (share w/Jesse to document for Fabio/Oliver)

Abhishek: -- keep documenting in the orcasoc repo README!

Val: -- experimenting with edge computing (with Fabio!)

5/20/20 Weekly Wednesday meet-up

Kunal, Val, and Scott discussed Kunal's initial call modeling efforts training with Podcast round 3 set and Val's latest pre-processing approaches.

Scott's list of insights from the discussion New open-source bioacoustic labeling tools should provide guidance about decisions made by domain experts (e.g. when validating predictions in a tool like Podcast and move towards standardization of annotation metadata:

  1. time bounds (fixed duration or variable procedure, start bound time vs signal start time, how much background noise included before/after...) and resolution
  2. frequency bounds and resolution
  3. Whether to exclude calls with clicks, or whistles, or snaps?
  4. What is a sufficient signal to noise ratio to qualify as a call vs a faint call vs a possible call?

Kunal ended with a good question about what to do next to improve his model performance...