Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create benchmark data sets #11

Closed
ejsegall opened this issue Jul 26, 2016 · 5 comments
Closed

Create benchmark data sets #11

ejsegall opened this issue Jul 26, 2016 · 5 comments

Comments

@ejsegall
Copy link

ejsegall commented Jul 26, 2016

Go through the data we have
Select a diverse range of possible formulations based on the actual data, but designed to be diverse so e.g. a small number of positives in a lot of data, a lot of positives, smaller number of genes, larger number of genes, different gene expression distributions, etc. @dhimmel: please review description

@rdvelazquez
Copy link
Member

What's the status of this? I've looked for a benchmark data set before but haven't found one.

@dhimmel or @ejsegall: Let me know if this is something I should take a crack at... if so, we could discuss some specifics (which data sets/queries to include; whether to implement as a notebook in the explore repo, as a feature for cognoml (which may help with cognoma/cognoml#4 (comment)) or just save file(s) in the format outlined for the MVP #31 (comment)).

@dhimmel
Copy link
Member

dhimmel commented May 26, 2017

@rdvelazquez let's focus on #94 as a priority. Then we can reassess our needs regarding this issue, especially after #93 is merged.

@athril
Copy link

athril commented May 30, 2017

You may consider using PMLB as a benchmark instead of creating a new one: https://github.com/EpistasisLab/penn-ml-benchmarks

@rdvelazquez
Copy link
Member

Thanks for the heads up about PMLB @athril. This is quite a long list of benchmark datasets; the benchmark data-set for testing cognoma is fairly specific and I didn't see anything that would meet our needs on PMLB.

@rdvelazquez
Copy link
Member

@brankaj provided a very nice list of applicable genes in #52. I also expanded this by subsetting queries by disease in #113. I think we can close this issue for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants