Analyzing unstructured data (text) with Python's Natural Language Toolkit, along with a little RegEx, Pandas (for tabular data), and Altair (for bar charts)
Material created by Lucy Havens and updated by Xandra Dave Cochran for the Centre for Data, Culture, & Society
Folder contents:
- Slides: PDF documents
- Notebooks: Jupyter Notebooks demoing content from the slides and completing the assignments
- Assignments: PDF document linking to external resources
The material in this repo is licensed under Creative Commons Attribution 4.0 International License.
If you are part of the University of Edinburgh you can use Noteable the cloud-based computational notebook system that works on your browser from any device.
Getting started:
- Open the following link in a new tab: https://noteable.edina.ac.uk/login
- Login with your EASE credentials
- Under 'Standard Notebook' click 'Start'
- From the Noteable home page, click on the 'Git' button in the top bar
- Now click 'Cone a Repository' button to copy the content of this repository
- Enter the link to this repository https://github.com/DCS-training/nltk-intro-2025
- leave 'Include submodules' checked and 'Download the repository' unchecked
- Click on Clone
- On the left-hand side you can now see a folder named as this repo containing all the material
- Click on it, then on 'Notebooks', and then select the.ipynb file you want to use
Open Google Colab: https://colab.research.google.com If you are not already logged you will be prompted to log-in via Gmail
- Go to the GitHub header and copy and paste the link to this repo and select the notebook you want to use and press enter
The Notebook contains paragraphs of explanatory text interspersed with grey cells containing code blocks. To run a code block and see the result:
- Place your cursor within the cell
- Click the 'Run' button on the top menu
- The results of running this code will appear below
- If the results don't appear immediately, check the icon in the browser tab. AN egg-timer icon indicates it is processing the code.
- It is best to follow the Notebook from top to bottom as some code blocks will depend on results from previous cells
- You can edit code blocks yourself and run them to see the results of your changes
To clear the results and run the code again you can use the 'Cell' menu on the top menu bar
- To clear the results of the current cell: Cell > Current Outputs > Clear
- To clear the results of all cells: Cell > All Output > Clear
Python is great for general-purpose programming and is a popular language for scientific computing as well. Installing all of the packages required for this lessons individually can be a bit difficult, however, so we recommend the all-in-one installer Anaconda.
Regardless of how you choose to install it, please make sure you install Python version 3.x (preferably Python 3.11 or higher).
Windows - Video tutorial
-
Open anaconda.com/download with your web browser.
-
Download the Python 3 installer for Windows.
-
Double-click the executable and install Python 3 using MOST of the default settings. The only exception is to check the Make Anaconda the default Python option.
macOS - Video tutorial
-
Open anaconda.com/download with your web browser.
-
Download the Python 3 installer for macOS.
-
Install Python 3 using all of the defaults for installation.
To start Jupyter Notebook Open the Anaconda Navigator and Launch Jupyter Notebook.
If you wish, you can create a Python virtual environment and install all dependencies for the course by navigating to the course folder in the terminal and running sh setup.sh
.
- Download the notebook on your machine
- Go to Upload
- Navigate to where you have downloaded your file
- Select Upload again
- Double-click on the uploaded file