We will leverage the Open American National Corpus, which consists of roughly 15 million spoken and written words from a variety of sources. Specifically, we will be using the subcorpus which consists of 4531 Slate magazine articles from 1996 to 2000 (approximately 4.2 million words).
The docker images for this case study are located on dockerhub. Running the commands below will automatically download and start a jupyter notebook.
Run the Docker image:
docker run -p 8888:8888 --rm springernlp/chapter_5:latest
docker build -t chapter_5:latest .
More information can be found at: Deep Learning for NLP and Speech Recognition by Springer