-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mystery Text Discussion: web.txt #107
Comments
As I looked into the web text file, I found that as I increased the ngram size, the type of words outputted be more "interesting". Here is ngram: 6, compared to: The words in ngram 6 are less frequent but it's cool to see how an authors style can be caught from these higher ngrams. Opposed to ngram 2, it just outputs generic words thaqt do not really provide context. More than anything, the higher ngrams gave me an idea of what text I was reading. But I will give credit to ngram 2 and 3 for showing me the key characters in this text. KWIC Aspect Narrowing in on the character name "Mr. Heathcliffe" unveiled several details of the text that I wouldn't have received before in ngram portion of my findings. I was able to learn more about the time this text was set, and a general premise of the story. <img width="1211" alt="Screenshot 2024-11-13 at 9 13 48 AM" src="https://github.com/user-attachments/assets/1751c141-e7f4-4018-a7be -62c2cf6d5c75">Eventually, I found out the text with some serious scanning. Anyone who has read Pride and Prejudice has most definitely read Wuthering Heights! |
Post your screenshots and discuss your findings about web.txt here!
The text was updated successfully, but these errors were encountered: