This repository contains various assignments focused on different aspects of Natural Language Processing (NLP). Each assignment includes a Python program that demonstrates a specific NLP concept or algorithm. Sample input dataset is given within the each code file.
Assignment_1. Tokenization and Stemming
- Perform tokenization (Whitespace, Punctuation-based, Treebank, Tweet, MWE) using NLTK library. Use porter stemmer and snowball stemmer for stemming. Use any technique for lemmatization.
Assignment_2. Bag-of-Words and TF-IDF
- Perform bag-of-words approach (count occurrence, normalized count occurrence), TF-IDF on data.
Assignment_3. Text Cleaning and TF-IDF Representation
- Perform text cleaning, perform lemmatization (any method), remove stop words (any method), label encoding. Create representations using TF-IDF. Save outputs.
Assignment_4. Creating a Transformer with PyTorch
- Create a transformer from scratch using the Pytorch librar
To run these programs, you need to have Python installed along with the necessary libraries.
- Python 3.9.x
nltk
librarysklearn
librarygenism
librarypandas
libraryPyTorch
library
To install the required libraries. It will take some time, you can use pip
:
pip install nltk pandas genism scikit-learn torch
'x' should be replace with the actual number of assignment.
python Assignment_x.py
Contributions are welcome! If you find any bugs or have suggestions for improvements, feel free to open an issue or create a pull request.