This repository contains various assignments focused on different aspects of Natural Language Processing (NLP). Each assignment includes a Python program that demonstrates a specific NLP concept or algorithm. Sample input dataset is given within the each code file.
Assignment_1. Tokenization and Stemming
- Perform tokenization (Whitespace, Punctuation-based, Treebank, Tweet, MWE) using NLTK library. Use porter stemmer and snowball stemmer for stemming. Use any technique for lemmatization.
Assignment_2. Bag-of-Words and TF-IDF
- Perform bag-of-words approach (count occurrence, normalized count occurrence), TF-IDF on data.
Assignment_3. Text Cleaning and TF-IDF Representation
- Perform text cleaning, perform lemmatization (any method), remove stop words (any method), label encoding. Create representations using TF-IDF. Save outputs.
Assignment_4. Creating a Transformer with PyTorch
- Create a transformer from scratch using the Pytorch librar
To run these programs, you need to have Python installed along with the necessary libraries.
- Python 3.9.x
To install the required libraries. It will take some time, you can use pip
pip install nltk pandas genism scikit-learn torch
'x' should be replace with the actual number of assignment.
Contributions are welcome! If you find any bugs or have suggestions for improvements, feel free to open an issue or create a pull request.