Add spam dataset from Elements of Statistical Learning #1294
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add spam_dataset from Elements of Statistical Learning (GSoC-25 tasks)
Description
This PR adds a new dataset implementation for the spam email classification dataset from the "Elements of Statistical Learning" book. This dataset contains 4601 emails with 57 features and a binary classification indicating whether the email is spam or not.
Features
spam_dataset()
function with documentationUse Case
This dataset can be useful for binary classification exercises and demos, as it's a well-known dataset in the statistical learning community. It's relatively small (compared to image datasets) but provides a realistic classification problem.
Testing