The lectures that we will cover will help you learn all the background to complete this semester-long course project.
The overall goal of working on this semester long project is to learn the foundations of using Tensorflow/Keras to build, train, and evaluate convolutional neural networks on an image dataset. If you are learning machine learning for the first time, a multi-class classification problem will probably be easier (not a regression problem). A problem is 'multi-class classification' if your output column has multiple options (cat, dog, horse, or a house). The main objective of the project is to design, implement, debug, evaluate, and benchmark deep convolutional neural network (CNN) architectures. You will create and curate your own dataset with at least a 1000 images. You may NOT use any pre-cleaned datasets, but you can collect images from the internet. (Professor Andrew Ng talks about the value of working on your own datadet in this podcast.) You will also compare the accuracy and speed of various CNN architectures. Finally, you will study how data augmentation, regularization, and transfer learning can be used to improve the accuracy.
- You will work on your projects individually (i.e. group submissions are not allowed).
- Reports for all phases (including the final report) must be prepared using Overleaf. Non-overleaf submissions will receive a 0 (zero). You are free to use any templates you want. Here is an example. You can learn more about Overleaf here. If you have accessibility needs please email me and I will waive this requirement.
In each phase you are exepected to submit:
- An HTML version of the notebook
- If you are using Google Colab, please convert the notebook to
.html
files and submit the.html
files, for example using htmltopdf.
- If you are using Google Colab, please convert the notebook to
- A PDF report describing your findings (downloaded from your Overleaf project). The reports for the first three phases can be as long as you make them but the final report has limit on the number of pages.
- A link to view your Overleaf project.
Below is the list of all phases and the outline of what you will be working on in each phase.
- Watch the lectures in Module 5.
- In this phase the first task is to decide a dataset for your project. If you don't have any other project in mind, please choose to work on a "mood classification" project. For the mood classification project, you will need to decide a few moods you want to detect (smiling, laughing, crying, neutral, etc.) and take a few hundred pictures for each mood. For example, you will need to take around 200 pictures of you smiling in various lighting conditions, various clothings, and in various places. It may run into your mind to create a video instead and extract frames as images but previous students have achived almost 100% accuracy with such approach so I don't encourage that.
- The next step is to organize the dataset and visualize the images. A clean way to organize the images is to put them in folders by their categories. For example, put all 'smiling' pictures in one folder. The next step is to visualize sample images (a few images from your ~1000 images) in a Jupyter Notebook.
- In your report you should discuss distribution of output labels, i.e., a bar diagram (or a table) showing how many images belong to which categories.
- In your report you should also discuss how you plan to normalize your input images.
- Watch the lectures in Module 6 (the last lecture, in particular).
- Using all the data (i.e. without splitting) obtain close to 100% accuracy. Build as large model as you need (with many filters and many layers). Here is an example:
model = Sequential() model.add( Conv2D( 64, ( 3, 3 ), activation = 'relu', input_shape = xtrain[0, :, :, :].shape ) ) model.add( MaxPool2D(4, 4) ) model.add( Conv2D( 32, ( 3, 3 ), activation = 'relu' ) ) model.add( MaxPool2D(4, 4) ) model.add( Conv2D( 16, ( 3, 3 ), activation = 'relu' ) ) model.add( Flatten() ) model.add( Dense( 10, activation = 'relu' ) ) model.add( Dense( 10, activation = 'softmax' ) )
- In your report you should discuss how the performance (accuracy, precision, recall, etc.) changes when the number of filters and layers are increased/decreased?
- Plot your learning curves and include them in your report
- [ONLY FOR GRADUATE STUDENTS] If you provide the output as the input (as an additional channel) what is the smallest architecture (minimum number of layers and filters) you need to overfit the data?
If you are using data generators, you can do something like the following to obtain your
# Example of how to use output labels as additional input channel import numpy as np N = len(xtrain[:, 0, 0, 0]) L = len(xtrain[0, :, 0, 0]) xtrain_with_outputlabels = np.zeros((N, L, L, 2)) for i in range(len(xtrain)): existing = xtrain[i, :, :, :] newchannel = np.full((L, L), ytrain_original[i]).reshape(L, L, 1) x = np.concatenate((existing, newchannel), axis = -1) print(existing.shape, newchannel.shape, x.shape) xtrain_with_outputlabels[i] = x break
xtrain
andytrain_original
:# Empty placeholders for 1000 RGB images and their labels mydatax = np.zeros(1000, 256, 256, 3) mydatay = np.zeros(1000, 1) # Read everything from your generator for i in range(1000): x, y = your_generator() mydatax[i] = x mydatay[i] = y
- Watch the lecture in Module 4.
- Split your data into training, development, and test set
- Train your model using the training set, 'Earlystop' using the validation set, and evaluate on the test set
- Study the performance when the number of filters and layers are increased/changed
- Plot your learning curves and include them in your report
- Watch the lectures in Module 5.
- With the best model obtained from the previous step, apply various techniques of data augmentation (Image generators) and study the improvement in accuracy
- Plot your learning curves and include them in your report
- Watch the lectures in Module 8.
- With the best model obtained from the previous step, apply various techniques of regularization (Batchnormalization, Dropout, L2 regularization, etc.) and study the improvement in accuracy
- Plot your learning curves and include them in your report
- Watch the lecture in Module 9.
- Use pretrained models such as VGG16 or ResNet50 and retrain using your dataset.
- Use recent architectures such as ResNet, DenseNet, or NASNet to train a model and study the improvement in accuracy
- Plot your learning curves and include them in your report
- Your report must not be very long; 10/12 pages at most.
- All tables and figures must be numbered and captioned/labelled.
- Don't fill an entire page with a picture or have pictures hanging outside of the page borders.
- It is encouraged but not required to you host your project (and report) at Github.
- Turn off the dark mode in Notebook before you copy images/plots (the lables and ticks are hard to see in dark mode).
- Your report should include an abstract and a conclusion (each 250 words minimum).
The goal in this project is to develop a convolutional neural network model that can identify my mood looking at a picture of my face. Here are the steps involved:
- Take 1000 pictures of my face in various settings - smiling, laughing, sad, crying, and neutral - 200 images each. Then, label each of these pictures.
- Crop images to 256 x 256 dimensions.
- Write a Python matplotlib code to visualize all the 1000 images.
- Randomly spit the data into - 600 pictures for training, 200 for validation, and 200 for testing.
- Build a single layer CNN model with 64 filters, train the model, and evaluate the model on the test set. It is worth noting that the 5-class accuracy of a random classifier is 20% (baseline for the project).
- Apply data augmentation techniques and regularization techniques to improve performance.
- Build and test newer architectures such as ResNets and pre-trained models such as VGG-16.
- Mood Detection - by Jeff Killgore
- Categorizing Equus Members with Deep Learning - by Miguel Corona
- Finger Digit Classification - by Khanh Vong