Skip to content

Intern tasks

Sehyun Oh edited this page Apr 23, 2024 · 5 revisions

See ClickHouse database at http://dash.tabix.io/

URL: http://34.69.72.244:8123
User: reader
Password: reader

Example:

SELECT * 
FROM gene_families 
WHERE feature='UniRef90_A0A174Y6G0';

The gene family table is loaded--11 Billion rows, 136GB compressed.

Tasks:

  1. Document an example like this using the website, also including filtering by samples, showing the data available and returned
  2. Repeat 1 using R and Python clients
  3. Repeat 1-2 using the bigquery database (https://console.cloud.google.com/bigquery?project=omicidx-338300&ws=!1m5!1m4!4m3!1somicidx-338300!2sbiodatalake!3scmgd)
  4. Make sure data are the same as from curatedMetagenomicData Bioconductor package (not on all samples, just a small subset!)
Clone this wiki locally