Notes Anonymizer

I completed this project while working as a Writing Consultant at the Texas A&M University Writing Center. We collected a dataset of session notes from our writing center as well as others in the region. Our goal was to analyze the data to glean insights which would help us improve the effectiveness of our sessions. The first step was to process the data so that the dataset did not compromise the identities of the students, and this program accomplishes exactly this.

The program reads through every word in the session notes and decides whether the given word is a name or not. The words identified as names are replaced by the [NAME] token.

The program first detects capitalized words in the notes, and then queries the WordNet database for them. The program uses the information obtained from the WordNet database to decide whether the given word is a name or not. The program also allows the user to manually correct the program output, by pre-specifying certain words as names (or common words), which overrides the program output, and can be used to correct its mistakes.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
README.md		README.md
commonWords.txt		commonWords.txt
fakeSessionNotes_dummyFile.xlsx		fakeSessionNotes_dummyFile.xlsx
fewCommonWords.txt		fewCommonWords.txt
names.txt		names.txt
patch.txt		patch.txt
replaceNames.py		replaceNames.py
replaceNames_docxOutput.py		replaceNames_docxOutput.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Notes Anonymizer

About

Releases

Packages

Languages

vorzawk/uwcProject

Folders and files

Latest commit

History

Repository files navigation

Notes Anonymizer

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages