Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Big Data testing #309

Open
zachmayer opened this issue Aug 10, 2024 · 0 comments
Open

Big Data testing #309

zachmayer opened this issue Aug 10, 2024 · 0 comments
Labels
Milestone

Comments

@zachmayer
Copy link
Owner

This is a placeholder for Someone (myself or a volunteer) to do small project to test and improve caretEnsemble with large datasets.

Functions to test:

  • caretList
  • caretStack
  • caretEnsembles

At least 3 test cases:

  • Tall data: 1,000,000+ rows
  • Wide data: 10,000+ columns
  • Many models:caretList of 1,000+ models
  • Others optional

These tests should be run via a script stored somewhere in this repo, and the data should be added via github lfs. The test results should be analyzed to identify bottlenecks in:

  • RAM
  • run time

Based on those results, we may do things like e.g. replace do.call, or use data.table in more places, or trim more data out of the model object, but it is premature to decide what to do until we've done some analysis

See also:
#155
#81
#70

@zachmayer zachmayer added enhancement 4.1 4.1 release labels Aug 10, 2024
@zachmayer zachmayer added this to the 4.1 milestone Aug 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant