Skip to content

Latest commit

 

History

History
 
 

syllabus

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 

DATASCI W266: Natural Language Processing With Deep Learning

Course Overview
Course Evaluation
Final Project
Course Resources
Tentative Schedule and Readings

Course Overview

Course Description Understanding language is fundamental to human interaction. Our brains have evolved language-specific circuitry that helps us learn it very quickly; however, this also means that we have great difficulty explaining how exactly meaning arises from sounds and symbols. This course provides an introduction to natural language processing, linguistic phenomena, and our attempts to analyze them using modern deep learning approaches. We cover a wide range of concepts with a focus on practical applications such as information extraction, machine translation, sentiment analysis, and summarization. The focus of this course prioritizes practical application over theory. Each week, the course is centered on Python notebooks that include exercises and code examples. The asynchronous lectures provide a foundation to build upon in the live sessions.

Course Prerequisites

  • MIDS 207 (Machine Learning): We assume you know what gradient descent is. We review simple linear classifiers and softmax at a high level, but make sure you've at least heard of these! You should also be comfortable with linear algebra, which we use for vector representations and when we discuss deep learning.
  • Language: All assignments are in Python using Jupyter notebooks, Google Colab, NumPy, TensorFlow, and Keras.
  • Time: There are three to four substantial assignments in this course as well as a term project. Make sure you give yourself enough time to be successful! In particular, you may be in for a rough semester if you have other significant commitments at work or home, or take both this course and any of 210 (Capstone), 261, or 271.

Course Goals and Objectives By the completion of this course, students will be able to:

  • Understand and describe multiple facets of linguistic phenomena related to natural language processing.
  • Describe fundamental concepts, techniques, problems, and modern approaches in the domain of natural language processing (NLP).
  • Understand the assumptions, strengths, and limitations of NLP and related deep learning techniques, and make appropriate decisions about application of techniques and solutions to NLP problems.
  • Analyze textual data using a number of machine learning and deep learning-based NLP techniques.
  • Demonstrate familiarity with and comprehension of existing NLP techniques for solving practical problems.
  • Demonstrate an ability to stay current with a constantly evolving field: by developing a familiarity with the neural network architectures that underpin current state of the art, and an ability to seek out and understand new advances published in the field.
  • Demonstrate expertise in using existing libraries and tools related to NLP work, along with an ability to familiarize oneself to new libraries and tools that are key to NLP practitioners as the domain evolves.

Communication and Resources

Live Sessions

  • Section 1: Tuesday 2:00 - 3:30pm PST (Peter Grabowski)
  • Section 2: Wednesday 4 - 5:30pm PST (Natalie Ahn)
  • Section 3: Tuesday 4 - 5:30pm PST (Daniel Cer)
  • Section 4: Wednesday 4 - 5:30pm PST (Mark Butler)
  • Section 5: Monday 6:30 - 8pm PST (Jennifer Zhu)
  • Section 6: Wednesday 6:30 - 8pm PST (Mike Tamir/Paul Spiegelhalter)

Teaching Staff Office Hours

  • Daniel Cer: Monday at noon PST
  • Jennifer Zhu: Thursday at 6:30pm PST
  • Mike Tamir/Paul Spiegelhalter: Wednesday immediately after the live session
  • Natalie Ahn: Wednesday at 6pm PST
  • Peter Grabowski: Tuesday immediately after his live session
  • Mark Butler: Friday at 5pm PST
  • Gurdit Chahal: Wednesday at 2:30pm PST
  • Rajiv Nair: Monday at 5pm PST

Office hours are for the whole class; students from any section are welcome to attend any of the times above.

Async Instructors

  • James Kunz
  • Joachim Rahmfeld
  • Mark Butler

Course Evaluation

Breakdown

AssignmentTopicReleaseDeadlineWeight
Assignment 0 Course Set Up
  • GitHub
  • Ed Discussion
  • Google Cloud
Aug 22 Aug 28 0%
Assignment 1 Basic Neural Nets Aug 27 Sep 4 5%
Assignment 2 Text Classification Sep 10 Sep 25 15%
Assignment 3 Question Answering Oct 1 Oct 16 15%
Assignment 4 Multimodal NLP Oct 21 Nov 6 10%
Final Project Final Project Guidelines Dec 3 55%

Your assignment grade report can be found at https://w266grades.appspot.com.

General Grading Philosophy

A word of warning: Given that we (effectively) release solutions to some parts of assignments in the form of unit tests, it shouldn't be surprising that most students earn high scores. Since the variance is so low, assignment scores aren't the primary driver of the final letter grade for most students. A good assignment score is necessary but not sufficient for a strong grade in the class. A well-structured, novel project with good analysis is what makes the difference between a high B/B+ and an A-/A.

As mentioned above, this course is a lot of work. Give it the time it deserves and you'll be rewarded intellectually and on your transcript.

Late Submission Policy

We recognize that sometimes things happen in life outside the course, especially in MIDS where we all have full-time jobs and family responsibilities to attend to. To help with these situations, we are giving you 5 "late days" to use throughout the term as you see fit. Each late day gives you a 24-hour (or any part thereof) extension to any deliverable in the course except the final project presentation or report. (UC Berkeley needs grades submitted very shortly after the end of classes.)

Once you run out of late days, each 24-hour period (or any part thereof) results in a 10 percentage-point deduction on that deliverable's grade.

You can use a maximum of 2 late days on any single deliverable. We will not be accepting any submissions more than 48 hours past the original due date, even if you have late days. (We want to be more flexible here, but your fellow students also want their graded assignments back promptly!)

We don't anticipate granting extensions beyond these policies. Plan your time accordingly!

More Serious Issues

If you run into a more serious issue that will affect your ability to complete the course, please email the instructors mailing list and cc MIDS Student Services. A word of warning: In previous sections, we have had students ask for an Incomplete (INC) grade because their lives were otherwise busy. Mostly we have declined, opting instead for the student to complete the course to the best of their ability and have a grade assigned based on that work. (MIDS prefers to avoid giving INCs, as they have been abused in the past.) The sooner you start this process, the more options we (and the department) have to help. Don't wait until you're suffering from the consequences to tell us what's going on!

Collaboration Policy/Academic Integrity

All students —undergraduate, graduate, professional full time, part time, law, etc.— must be familiar with and abide by the provisions of the "Student Code of Conduct" including those provisions relating to Academic Misconduct. All forms of academic misconduct, including cheating, fabrication, plagiarism or facilitating academic dishonesty will not be tolerated. The full text of the UC Berkeley Honor Code is available at: https://teaching.berkeley.edu/berkeley-honor-code and the Student Code of Conduct is available at: https://sa.berkeley.edu/student-code-of-conduct#102.01_Academic_Misconduct

We encourage studying in groups of two to four people. This applies to working on homework, discussing labs and projects, and studying. However, students must always adhere to the UC Berkeley Code of Conduct (http://sa.berkeley.edu/code-of-conduct ) and the UC Berkeley Honor Code (https://teaching.berkeley.edu/berkeley-honor-code ). In particular, all materials that are turned in for credit or evaluation must be written solely by the submitting student or group. Similarly, you may consult books, publications, or online resources to help you study. In the end, you must always credit and acknowledge all consulted sources in your submission (including other persons, books, resources, etc.)

Final Project

See the Final Project Guidelines

Attendance and Participation

We believe in the importance of the social aspects of learning —between students, and between students and instructors— and we recognize that knowledge-building does not solely occur on an individual level, but is built by social activity involving people and by members engaged in the activity. Participation and communication are key aspects of this course vital to the learning experiences of you and your classmates.

Therefore, we like to remind all students of the following requirements for live class sessions:

  • Students are required to join live class sessions from a study environment with video turned on and with a headset for clear audio, without background movement or background noise, and with an internet connection suitable for video streaming.

  • You are expected to engage in class discussions, breakout room discussions and exercises, and to be present and attentive for your and other teams’ in-class presentations.

  • Keep your microphone on mute when not talking to avoid background noise. Do your best to minimize distractions in the background video, and ensure that your camera is on while you are engaged in discussions.

That said, in exceptional circumstances, if you are unable to meet in a space with no background movement, or if your connection is poor, make arrangements with your instructor (beforehand if possible) to explain your situation. Sometimes connections and circumstances make turning off video the best option. If this is a recurring issue in your study environment, you are responsible for finding a different environment that will allow you to fully participate in classes, without distraction to your classmates.

Failure to adhere to these requirements will result in an initial warning from your instructor(s), followed by a possible reduction in grades or a failing grade in the course.

Diversity and Inclusion

Integrating a diverse set of experiences is important for a more comprehensive understanding of data science. We make an effort to read papers and hear from a diverse group of practitioners. Still, limits exist on this diversity in the field of data science. We acknowledge that it is possible that there may be both overt and covert biases in the material due to the lens through which it was created. We would like to nurture a learning environment that supports a diversity of thoughts, perspectives and experiences, and honors your identities (including race, gender, class, sexuality, religion, ability, veteran status, etc.) in the spirit of the UC Berkeley Principles of Community https://diversity.berkeley.edu/principles-community

To help us accomplish this, please contact us or submit anonymous feedback through I School channels if you have any suggestions to improve the quality of the course. If something was said in class (by anyone) or you experience anything that makes you feel uncomfortable, please talk to your instructors about it. If you feel like your performance in the class is being impacted by experiences outside of class, please don’t hesitate to come and talk with us. We want to be a resource for you. Also, anonymous feedback is always an option and may lead us to make a general announcement to the class, if necessary, to address your concerns. As a participant in teamwork and course discussions, you should also strive to honor the diversity of your classmates.

If you prefer to speak with someone outside of the course, the MIDS Academic Director Drew Paulin, the I School Assistant Dean of Academic Programs Catherine Cronquist Browning, and the UC Berkeley Office for Graduate Diversity are excellent resources. Also see the following: https://www.ischool.berkeley.edu/about/community.

Disability Services and Accommodations

If you need disability-related accommodations in this class, if you have emergency medical information you wish to share with me, or if you need special arrangements in case the building must be evacuated, please inform me as soon as possible.

The I School recognizes disability in the context of diversity, and the Disabled Students’ Program (DSP) equips students with appropriate accommodations and services to remove barriers to educational access. Students seeking accommodations in this class are responsible for completing the DSP application process to obtain an accommodation letter. You may reach the DSP at (510) 642-0518, or visit the website: https://dsp.berkeley.edu

Publishing Your Work

You are highly encouraged to use your program coursework to build an academic/professional portfolio.

  • Blog about your coursework (and other ideas) and share on the I School Medium channel
  • Publish projects to your I School project portfolio gallery (for more than just the capstone).
  • Publish your work on LinkedIn and tag @UC Berkeley School of Information. Do NOT publish your homework assignments!
  • Publish in academic journals: Contact your professors for assistance. (Note that multiple review iterations are usually required; this can be a time-intensive endeavor.)
  • Publish your news (e.g., conference talks, awards, scholarships) to the I School internal newsletter.

Course Resources

We are not using any particular textbook for this course. We’ll list some relevant readings each week. Here are some general resources:

We’ll be posting materials to the course GitHub repo.

Note: This syllabus may be subject to change. We'll be sure to announce anything major on Ed Discussion.

Code References

The course will be taught in Python, and we'll be making heavy use of NumPy, TensorFlow, Keras, and Jupyter (IPython) notebooks. We'll also be using Git for distributing and submitting materials. If you want to brush up on any of these, we recommend:

Miscellaneous Deep Learning and NLP References

Here are a few useful resources and papers that don’t fit under a particular week -- all optional, but interesting!


Tentative Schedule and Readings

We'll update the table below with assignments as they become available, as well as additional materials throughout the semester. Keep an eye on GitHub for updates!

Dates are tentative: Assignments in particular may change topics and dates. (Updated slides for each week will be posted during the live session week.)

Deliverables

Note: we will update this table as we release assignments. Each assignment will be released around the last live session of the week and due approximately 1 week later (for simple assignments) or 2 to 3 weeks later (for complex assignments).

TopicReleaseDeadline
Assignment 0 Course Set-up
  • GitHub
  • Ed Discussion
  • Google Cloud
Aug 22 Aug 28
Assignment 1 Assignment 1
  • Neural Networks
Aug 27 Sep 4
Assignment 2 Assignment 2
  • Text Classification
Sep 9 Sep 25
Project Proposal Final Project Guidelines Oct 1
Assignment 3 Assignment 3
  • Multiclass Classification
  • Summarization
  • Question Answering
Sep 30 Oct 16
Assignment 4 Assignment 4
  • Image Captioning
Oct 21 Nov 6
Project Reports
Due Dec 3
(hard deadline)
Project Presentations In-class Dec 5-9

Course Schedule

Async Material to Watch Topics Materials
Week 1
(Aug 22)
Introduction
  • Overview of NLP applications
  • NLP tasks, model structures and neural architectures
  • Ambiguity and grounding in language
  • Introduction to word embeddings
Week 2
(Aug 29)

Text Classification
  • Text classification approaches with neural networks
  • Text classification with CNNs
Week 3
(Sep 5)
Language and Context
  • Recurrent Neural Nets (RNNs) and language modeling
  • Attention
  • Context awareness and embeddings
Week 4
(Sep 12)
Pretrained Transformers

  • Transformer Architecture
  • Pretrained Transformers and Language Models
  • Pretrained Transformers and Context-Based Embeddings
Week 5
(Sep 19)
Text Generation Models
  • Sequence to Sequence Architectures
  • Pretrained Encoder-Decoder Transformer Architectures
Interlude (Extra Material) Units of Meaning: Words, Morphology, Sentences
  • Edit distance for strings
  • Tokenization
  • Sentence splitting
Week 6
(Sep 26)
Machine Translation
  • Challenges of Language
  • Encoder Decoder architecutres for Neural Machine Translation
  • Evaluation
  • Subword Models

Week 7
(Oct 3)
Question Answering
and Summarization
  • Question Answering Architectures
  • Extractive and Abstractive Summarization
  • Evaluation
Week 8
(Oct 10)
Linguistic Representation
  • Elements of Language
  • Grammars
  • Dependency Parsing
  • Phrase based Parsing
  • Evaluation

Week 9
(Oct 17)
Entities and Linking
  • Named Entity Recognition
  • Coreference Resolution
  • Entity Linking
  • Relation Extraction

Week 10
(Oct 24)
Embedding-based Retrieval
  • Single- vs. multi-document retrieval
  • Two tower models

Week 11
(Oct 31)
Multimodality in NLP
  • Multimodal Applications
  • Image Captioning
  • Visual Question Answering

Break
(Nov 7)
No Async No class No Readings
Week 12
(Nov 14)
ML Fairness and Privacy
  • Fairness
  • Privacy
Thanksgiving Break
(Nov 21)
No Async No class No Readings
Week 13
(Nov 28)
NLP in the Real World
  • Productization at Scale
  • Infrastructure
  • Metrics
  • Failure Modes
  • NLP Review
  • TBD

Week 14
(Dec 5)
In-class project presentations

Thanks for a great semester!