Text-Ease, an app which predicts the next word in a sentence and provide the user with 3 possible options to choose from.
Link to the app - Text-Ease
Link to the Slide-Deck - Text-Ease Slide Deck
Data was made available as part of JHU Data Science Capstone Course in collaboration with Microsoft Swiftkey and can be downloaded from here
1. Leek J, Peng RD, Caffo B, Cross S, SwiftKey. Data science capstone course. https://www.coursera.org/learn/data-science-project/supplement/4phKX/about-the-corpora
2. Silge J, Robinson D. Text Mining with r. O’Reilly Media; 2017.
3. Hvitfeldt E, Silge J. Tokenization. In: Supervised Machine Learning for Text Analysis in r. Chapman; Hall/CRC; 2021:9-36. doi:10.1201/9781003093459-3
4. Manning CD, Raghavan P, Schütze H. Introduction to Information Retrieval. Cambridge University Press; 2008. doi:10.1017/cbo9780511809071
5. Chen SF, Goodman J. An empirical study of smoothing techniques for language modeling. Computer Speech & Language. 1999;13(4):359-393. doi:10.1006/csla.1999.0128
6. Jurafsky D, Martin JH. Speech & Language Processing. Pearson Education India; 2000.
- R : A sub-directory to store all the raw R scripts and R markdown
files used to create the report
- download.R : File to define function fro downloading the data
- eda.R : File meant for the author, contains all the rough exploratory data analysis done
- kgrams_model.R : Very slow model built using
kgrams
R package - milestone_report.Rmd : Raw Rmarkdown file for the Milestone report describing data cleaning steps and exploratory data analysis
- mkn_model.R : Code which was used behind the final model used in the app
- quiz1.R : Code used to answer 1st week quiz questions
- tidyData.R : Preliminary data cleaning steps
- data : A sub-directory to store the raw data used for the project.
- results : A sub-directory to store the Milestone Report as an HTML file. A markdown version of the file and a figures sub-directory containing all the figures used in the report in PNG format can also be found here. HTML version of the Slide-Deck is also stored here