Skip to content

Latest commit

 

History

History
36 lines (28 loc) · 3.1 KB

File metadata and controls

36 lines (28 loc) · 3.1 KB

Explicit-Content-Classifier-using-ResNet

Objective

Nowadays, The amount of explicit content available on the internet is ginormous and this type of content can be strayed and uploaded to websites that do not endorse these or some sites that are not 18+, hence classifying this type of content is really of paramount significance.

The objective of this project is to classify explicit content that contains inappropriate images like pornography and Hentai. The classifier used for this is ResNet50 and ResNet101 also known as Residual Neural Network. There are five categories that the model is trained on which are Porn, Hentai, Sexy, Drawing, and Neutral. Porn, Hentai, and Sexy can be classified as NSFW (Not Safe For Work) further and the other two are SFW (Safe For Work).

Dataset

The dataset is taken from various sources like Kaggle, Internet Archive, and Github. The dataset sources contained explicit images of different genres, hence they are picked according to the used classes. The final dataset contained around 170,000 images from which 10,000 images were used for testing (2000 per class). As for Validation, from 160,000 images about 10% are used for validation.

The data augmentation is also done on Drawings and Hentai classes as the number of images was on the lower side. For data augmentation, 4 attributes were changed randomly that are brightness, horizontal flip, grayscale, and rotation of 30 degrees.

Files

  • config.py - Configuration File
  • Data_Augmentation.py - Contains code for data augmentation as explained above.
  • Split.py - Used to split the dataset into train and test.
  • dataset_prep.py - Preprocessing is done in the file.
  • engine.py - Contains the training function for the model [training and validation]
  • metrics.py - Contains the metrics for evaluation like confusion matrix plot
  • model.py - contains the PyTorch Models -- Resnet50 and Resnet101
  • predict.py - Use the model to predict on the Test Dataset.
  • train.py - main run file

Preprocessing

For preprocessing, the images were resized to 224x224 as the model input size was the same and the images were normalized as well (given in the data_prep.py) and finally converted to a Tensor.

Model

Two models were trained named ResNet50 and ResNet101 from the Torchvision Model library. Accuracy, classification reports, and confusion metrics are used to judge the performance of the model. Each model is trained for 2 and 5 epochs with 0.001 and 0.0001 Learning rates. The loss curve and confusion matrix plotted are given in the Images folder.

The framework used for models is PyTorch.

How to Run

Run the gui.py file in the GUI folder
The GUI, I made it pretty Simple, it has three options:

  • Choose Folder: Choose the folder that contains the images.
  • Predict: Do the predictions using the saved model in the Output Folder. [new models can also be trained]
  • Exit: Self Explanatory