-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create benchmark data sets #11
Comments
What's the status of this? I've looked for a benchmark data set before but haven't found one. @dhimmel or @ejsegall: Let me know if this is something I should take a crack at... if so, we could discuss some specifics (which data sets/queries to include; whether to implement as a notebook in the explore repo, as a feature for cognoml (which may help with cognoma/cognoml#4 (comment)) or just save file(s) in the format outlined for the MVP #31 (comment)). |
@rdvelazquez let's focus on #94 as a priority. Then we can reassess our needs regarding this issue, especially after #93 is merged. |
You may consider using PMLB as a benchmark instead of creating a new one: https://github.com/EpistasisLab/penn-ml-benchmarks |
Thanks for the heads up about PMLB @athril. This is quite a long list of benchmark datasets; the benchmark data-set for testing cognoma is fairly specific and I didn't see anything that would meet our needs on PMLB. |
Go through the data we have
Select a diverse range of possible formulations based on the actual data, but designed to be diverse so e.g. a small number of positives in a lot of data, a lot of positives, smaller number of genes, larger number of genes, different gene expression distributions, etc. @dhimmel: please review description
The text was updated successfully, but these errors were encountered: