Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mystery Text Discussion: web.txt #107

Open
ebeshero opened this issue Oct 25, 2024 · 3 comments
Open

Mystery Text Discussion: web.txt #107

ebeshero opened this issue Oct 25, 2024 · 3 comments

Comments

@ebeshero
Copy link
Contributor

Post your screenshots and discuss your findings about web.txt here!

@GabVoz13
Copy link
Contributor

GabVoz13 commented Nov 4, 2024

For this assignment, I chose to analyze the content of the "www.txt" file using AntConc. As I experimented with the software, I observed that many of the frequently repeated words in the text were "fillers," such as "and," "but," "for," and "a." These words dominated the search results in the NGRAM tab, with most appearing over 100 times. When I shifted to the KWIC (Key Word in Context) view, one word in particular stood out: "for." The KWIC search revealed that "for" commonly appeared in specific phrases, such as "for the most part" (11 times), "for the first time" (9 times), and "for a moment" (19 times). This insight highlighted for me how often certain phrases are used in everyday language, even if we may not consciously notice their frequency.
Screenshot 2024-11-04 at 3 00 15 PM
Screenshot 2024-11-04 at 3 00 21 PM
Screenshot 2024-11-04 at 3 08 24 PM
Screenshot 2024-11-04 at 2 58 52 PM
Screenshot 2024-11-04 at 2 59 04 PM

@rashemish
Copy link
Contributor

As I looked into the web text file, I found that as I increased the ngram size, the type of words outputted be more "interesting".

Here is ngram: 6, compared to:
Screenshot 2024-11-13 at 9 19 12 AM

Ngram 2:
Screenshot 2024-11-13 at 9 19 22 AM

The words in ngram 6 are less frequent but it's cool to see how an authors style can be caught from these higher ngrams. Opposed to ngram 2, it just outputs generic words thaqt do not really provide context. More than anything, the higher ngrams gave me an idea of what text I was reading. But I will give credit to ngram 2 and 3 for showing me the key characters in this text.

KWIC Aspect

Narrowing in on the character name "Mr. Heathcliffe" unveiled several details of the text that I wouldn't have received before in ngram portion of my findings. I was able to learn more about the time this text was set, and a general premise of the story.

Screenshot 2024-11-13 at 9 13 07 AM

<img width="1211" alt="Screenshot 2024-11-13 at 9 13 48 AM" src="https://github.com/user-attachments/assets/1751c141-e7f4-4018-a7be

Screenshot 2024-11-13 at 9 12 09 AM -62c2cf6d5c75">

Eventually, I found out the text with some serious scanning. Anyone who has read Pride and Prejudice has most definitely read Wuthering Heights!

@Temiii857
Copy link
Contributor

Using AntConc, I explored the word "master" and found it frequently used in different contexts. It often reflects authority or relationships in the story. For example: "master’s assistance in coaxing him out of bad ways." and "master availed himself of his privilege to walk straight in." This shows that "master" is connected to power and control which involves interactions with other characters. I also checked for repeated phrases using the n-grams tab. Most longer phrases (6 words) only appeared once such as "master angrily get him something that" and "master and mistress whose flushed cheeks". Shorter phrases appeared more often which might highlight recurring themes or actions.
kwic_master
ngram_mas

The word "said" appears a lot in this dialogue. Some examples are "Said the young lady, scowling and turning her face." and "Said the master, angrily: 'Get him something that he can.'" This shows that "said" is often paired with emotional or descriptive actions making dialogue a big part of the story. For longer phrases with "said," most appeared only once when using the n-gram tab. Shorter phrases were more common, showing patterns in how characters speak.
kwic_said
ngram_said

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants