Meduzzen & U2D AI Internship

Environment

To accomplish these challenges, I will use the VS Code Jupyter extension, and Google Colab whenever I need to build more sopihsitcated solutions requiring more computational capabilities.

Projects Overview

I'd like to try my hand at all three challenges (NLP, email classification/extraction and object detection), but as a computational limguistics student I am particularly interested in NLP.

🗣️ NLP

Below is the list of Python NLP modules I use in my work as a computational linguist. However, not all of them fit our requirements for this Internship, particularly, due to the limited number of supported languages.

Useful libraries for solving NLP problems:

library	description	fits our needs
stanza	Collection of tools for the linguistic analysis created by the Stanford NLP team. Supports multiple languages. For example, for NER task, stanza has pre-trained models for 34 languages.	✅
SpaCy	Powerful Python NLP library with support for 70+ languages. It has in-built word vectors, has tools for tokenization, NER, POS-tagging, dependancy parsing, text classification, lemmatization, morphological analysis etc.	✅
pymorphy2	Morpohlogical analyser written in Python. It is used for fetching information about grammatical properties of a particular word (POS, case, gender, number). Supports only two languages: Ukrainian and Russian.	⛔
gensim	Python library for topic modelling, document indexing and similarity retrieval with large corpora.	✅
fasttext	Useful Python library for working with word embeddings. Contains pre-trained word vectors for 157 (!) languages.	✅
langdetect	Niche Python library designed exclusively for the language detection task. Able to detect 55 languages.	✅
Polyglot	Polyglot supports various multilingual applications and offers a wide range of analysis. Applications: language detection, tokenization, NER, POS-tagginf, sentiment analysis.	✅
nltk	Suite of libraries and programs for symbolic and statistical NLP for English written in the Python programming language.	⛔

I personally prefer spacy and stanza to a smaller extent for their diversity and overall accuracy for different tasks. When dealing with word vectors I use fasttext. Whenever I need a language detection I use langdetect. For example. recently I've been working on a hatespeech project and had to filter for posts written in the Ukrainian language only.

💌 Emails classification

Working on tasks related to email classification and extraction, we deal with the text data in the first place, therefore, libraries listed in the NLP section will come in handy. For emails classification we can use sklearn and tensorflow/keras libraries.

library	description	fits our need
Scikit-learn	Open-source Python library which includes implementations of many traditional ML algorithms.	✅
TensorFlow	Open-source framework for prototyping and assessing machine learning models, primarily neural networks.	✅

TensorFlow and Scikit-learn can be used for object detection and NLP as well. For instance, Tensorflow CNNs come in handy when working with images/video, while for NLP problems RNNs and LSTMs are often used.

📹 Object detection (CV)

Useful Python packages for image & video data processing:

library	description	fits our needs
OpenCv	CV library focused on real-time applications. The library has a modular structure and includes several hundreds of computer vision algorithms.	✅
Scikit-Image	Includes a collection of algorithms for image processing. Image processing toolbox for SciPy.	✅
matplotlib	Library for creating static, animated and interactive visualisations.	✅
Pillow	Contains all the basic image processing functionality; intuitive and easy-to-use.	✅
numpy	While not being a specifically CV library, numpy provides powerful data structures and algorithms for easy image data manipulation.	✅

👨‍💻 Author

Kyrylo Klychliiev
Kyiv, Ukraine

Name		Name	Last commit message	Last commit date
Latest commit History 98 Commits
emails		emails
nlp		nlp
object_detection		object_detection
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Meduzzen & U2D AI Internship

Environment

Projects Overview

🗣️ NLP

💌 Emails classification

📹 Object detection (CV)

👨‍💻 Author

About

Releases

Packages

Languages

klychliiev/Meduzzen_AI_Internship

Folders and files

Latest commit

History

Repository files navigation

Meduzzen & U2D AI Internship

Environment

Projects Overview

🗣️ NLP

💌 Emails classification

📹 Object detection (CV)

👨‍💻 Author

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages