Create benchmark data sets #11

ejsegall · 2016-07-26T23:30:51Z

Go through the data we have
Select a diverse range of possible formulations based on the actual data, but designed to be diverse so e.g. a small number of positives in a lot of data, a lot of positives, smaller number of genes, larger number of genes, different gene expression distributions, etc. @dhimmel: please review description

rdvelazquez · 2017-05-10T01:05:07Z

What's the status of this? I've looked for a benchmark data set before but haven't found one.

@dhimmel or @ejsegall: Let me know if this is something I should take a crack at... if so, we could discuss some specifics (which data sets/queries to include; whether to implement as a notebook in the explore repo, as a feature for cognoml (which may help with cognoma/cognoml#4 (comment)) or just save file(s) in the format outlined for the MVP #31 (comment)).

dhimmel · 2017-05-26T18:01:27Z

@rdvelazquez let's focus on #94 as a priority. Then we can reassess our needs regarding this issue, especially after #93 is merged.

athril · 2017-05-30T23:43:31Z

You may consider using PMLB as a benchmark instead of creating a new one: https://github.com/EpistasisLab/penn-ml-benchmarks

rdvelazquez · 2017-09-26T22:24:22Z

Thanks for the heads up about PMLB @athril. This is quite a long list of benchmark datasets; the benchmark data-set for testing cognoma is fairly specific and I didn't see anything that would meet our needs on PMLB.

rdvelazquez · 2017-09-26T22:27:06Z

@brankaj provided a very nice list of applicable genes in #52. I also expanded this by subsetting queries by disease in #113. I think we can close this issue for now.

ejsegall mentioned this issue Jul 26, 2016

Characterize predictive performance of each ML algorithm in our toolbox vs various data sets for a representative set of queries #12

Closed

rdvelazquez mentioned this issue Jul 11, 2017

Selecting the number of components returned by PCA #106

Closed

rdvelazquez closed this as completed Sep 26, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create benchmark data sets #11

Create benchmark data sets #11

ejsegall commented Jul 26, 2016 •

edited

Loading

rdvelazquez commented May 10, 2017

dhimmel commented May 26, 2017

athril commented May 30, 2017

rdvelazquez commented Sep 26, 2017

rdvelazquez commented Sep 26, 2017

Create benchmark data sets #11

Create benchmark data sets #11

Comments

ejsegall commented Jul 26, 2016 • edited Loading

rdvelazquez commented May 10, 2017

dhimmel commented May 26, 2017

athril commented May 30, 2017

rdvelazquez commented Sep 26, 2017

rdvelazquez commented Sep 26, 2017

ejsegall commented Jul 26, 2016 •

edited

Loading