Skip to content
/ enron Public

Exploratory analysis of attachments in the Enron email corpus

Notifications You must be signed in to change notification settings

b-gran/enron

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Exploratory analysis of the attachments in the Enron email corpus

all image attachment media from the enron email corpus

Usage

  1. Figure out the requirements and pip install them. Sorry 😅
  2. Download the full dataset with attachments (Internet Archive).
  3. python data.py --enron-root $path_to_download --media-dir $image_df_output_path --hidden
  4. python viz.py --input $image_df_output_path --output $big_image_output_path
  5. [for tagging emails with OpenAI] python tags.py -i $path_to_kaggle_text_dataframe_joblib (this costs ~$50 as of September 2024)
  6. Check of results.ipynb for some hints on how to work with the tagged text data

References

About

Exploratory analysis of attachments in the Enron email corpus

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published