Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add spam dataset from Elements of Statistical Learning #1294

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

plato-12
Copy link

Add spam_dataset from Elements of Statistical Learning (GSoC-25 tasks)

Description

This PR adds a new dataset implementation for the spam email classification dataset from the "Elements of Statistical Learning" book. This dataset contains 4601 emails with 57 features and a binary classification indicating whether the email is spam or not.

Features

  • Implementation of spam_dataset() function with documentation
  • Test suite to verify dataset functionality
  • Example vignette showing how to use the dataset for a classification task

Use Case

This dataset can be useful for binary classification exercises and demos, as it's a well-known dataset in the statistical learning community. It's relatively small (compared to image datasets) but provides a realistic classification problem.

Testing

  • All tests for the new dataset pass
  • Manual verification of the dataset loader and example usage has been performed

@dfalbel
Copy link
Member

dfalbel commented Mar 18, 2025

Hi @plato-12,

Can you direct your Pr to trochdatasets? https://github.com/mlverse/torchdatasets

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants