Build software better, together

alasdairforsythe / tokenmonster

Ungreedy subword tokenizer and vocabulary trainer for Python, Go & Javascript

tokenizer vocabulary vocabulary-builder tokenize tokenization tokenisation tokenizing text-tokenization vocabulary-generator

Updated Jul 2, 2024
Go

twardoch / split-markdown4gpt

Star

A Python tool for splitting large Markdown files into smaller sections based on a specified token limit. This is particularly useful for processing large Markdown files with GPT models, as it allows the models to handle the data in manageable chunks.

Updated Jul 1, 2025
Python

SayamAlt / Resume-Classification-using-fine-tuned-BERT

Star

Successfully developed a resume classification model which can accurately classify the resume of any person into its corresponding job with a tremendously high accuracy of more than 99%.

nlp exploratory-data-analysis word-embeddings model-evaluation text-preprocessing bert-model text-tokenization fine-tuning-bert

Updated Jan 13, 2023
Jupyter Notebook

muthu-kumar-u / Speech-Recognition-RNN

Star

Deep learning-based subtitle generation model that processes audio datasets to generate accurate text transcriptions. Includes audio feature extraction, encoder-decoder architecture, training pipelines, and evaluation metrics for subtitle alignment.

natural-language-processing deep-learning speech-recognition rnn speech-to-text audio-processing transformer-models encoder-decoder-architecture text-tokenization subtitle-generation

Updated Jan 16, 2025

SayamAlt / Symptoms-Disease-Text-Classification

Star

Successfully developed a fine-tuned BERT transformer model which can accurately classify symptoms to their corresponding diseases upto an accuracy of 89%.

natural-language-processing text-classification exploratory-data-analysis multiclass-classification text-preprocessing text-tokenization bert-fine-tuning hugging-face-transformers fine-tune-bert-tensorflow model-inference model-architecture-and-implementation model-training-and-evaluation data-exploration-and-preprocessing

Updated May 6, 2024
Jupyter Notebook

SayamAlt / Financial-News-Sentiment-Analysis

Star

Successfully developed a fine-tuned DistilBERT transformer model which can accurately predict the overall sentiment of a piece of financial news up to an accuracy of nearly 81.5%.

natural-language-processing sentiment-analysis multiclass-classification text-preprocessing text-tokenization distilbert-model hugging-face-transformers fine-tune-bert-tensorflow model-inference model-architecture-and-implementation model-training-and-evaluation data-exploration-and-preprocessing

Updated May 6, 2024
Jupyter Notebook

SayamAlt / Cyberbullying-Classification-using-fine-tuned-DistilBERT

Star

Successfully fine-tuned a pretrained DistilBERT transformer model that can classify social media text data into one of 4 cyberbullying labels i.e. ethnicity/race, gender/sexual, religion and not cyberbullying with a remarkable accuracy of 99%.

natural-language-processing text-classification exploratory-data-analysis data-exploration multiclass-classification cyberbullying-detection text-preprocessing text-tokenization distilbert-model llm fine-tune-bert-tensorflow model-inference model-training-and-evaluation

Updated Jun 10, 2024
Jupyter Notebook

markiskorova / Machine-Learning-NLP-Predict-Author

Star

Machine Learning & Natural Language Processing: Predict the author of literary text snippets. Built with TensorFlow and Keras, this project trains an LSTM model on classic literature to identify writing style and authorship.

python machine-learning natural-language-processing tensorflow keras text-vectorization text-tokenization

Updated May 25, 2025
Python

SayamAlt / Fake-News-Classification-using-fine-tuned-BERT

Star

Successfully developed a text classification model to predict whether a given news text is fake or not by fine-tuning a pretrained BERT transformed model imported from Hugging Face.

deep-learning text-classification data-visualization data-analysis model-evaluation text-preprocessing bert-model bert-embeddings text-tokenization wordcloud-visualization fine-tuning-bert tokenizer-nlp model-training-and-evaluation

Updated Dec 10, 2024
Jupyter Notebook

Software-Research-Lab / dropsuit-tok

Star

The tok function is a JavaScript and Node.js function that processes object instances and tokenizes text arrays. It returns tokenized words number, tokenized words array, and tokenized words concatenated string. It's part of the open-source DropSuit NLP library under the Apache License 2.0.

text-analysis text-processing language-understanding text-tokenization

Updated May 1, 2023
JavaScript

katanabana / Nihotip

Star

Nihotip is a web app that lets users explore Japanese text through interactive tokenization and detailed insights. Built with React and Python, it offers a dynamic way to analyze words and symbols with tooltips for deeper understanding.

react tooltips python nlp language japanese text-analysis webapp japanese-language mecab tokenization japanese-characters wanakana text-tokenization japanese-learning sudachipy jmdictfurigana

Updated Sep 26, 2024
JavaScript

adilrasheed139 / AI-Powered-Resume-Screening-using-BERT

Star

Successfully developed a resume classification model which can accurately classify the resume of any person into its corresponding job with a tremendously high accuracy of more than 99%.

nlp deep-learning word-embeddings nlp-machine-learning model-evaluation text-preprocessing bert-model text-tokenization fine-tuning-bert exploratory-data-analysis-eda word-embeddings-for-nlp

Updated Dec 14, 2024
Jupyter Notebook

SayamAlt / News-Category-Classification

Star

Successfully developed a news category classification model using fine-tuned BERT which can accurately classify any news text into its respective category i.e. Politics, Business, Technology and Entertainment.

nlp text-classification exploratory-data-analysis feature-engineering model-evaluation text-cleaning text-preprocessing bert-embeddings text-tokenization fine-tuning-bert

Updated Jan 17, 2023
Jupyter Notebook

victoryosiobe / kingchop

Star

Kingchop ⚔️ is a JavaScript English based library for tokenizing text (chopping text). It uses vast rules for tokenizing, and you can adjust them easily.

nodejs javascript natural-language-processing text-processing sentence-tokenizer text-tokenization word-tokenizer tokenizers paragraph-tokenizer

Updated May 26, 2025
JavaScript

LokeshKenche / ISP_ChatBot

Star

ISPY ChatBot ISPY is a chatbot designed for ISP customer service, providing automated responses and assistance for various queries such as connection issues, payments, and service requests. Built using Python with libraries like nltk and newspaper3k, it simulates conversation and handles customer interactions effectively.

machine-learning chatbot nltk cosine-similarity webscraping nlp-machine-learning textanalysis customer-services newspaper3k text-tokenization text-based-chatbot article-parsing

Updated Apr 14, 2024
Jupyter Notebook

SayamAlt / Mental-Health-Classification-using-fine-tuned-DistilBERT

Star

Successfully established a multiclass text classification model by fine-tuning pretrained DistilBERT transformer model to classify several distinct types of mental health statuses such as anxiety, stress, personality disorder, etc. with an accuracy of 77%.

natural-language-processing deep-learning text-classification data-visualization model-evaluation text-preprocessing text-tokenization multiclass-text-classification distilbert-model model-inference model-training-and-evaluation distilbert-fine-tuning

Updated Jan 6, 2025
Jupyter Notebook

SayamAlt / Global-News-Headlines-Text-Summarization

Star

Successfully established a text summarization model using Seq2Seq modeling with Luong Attention, which can give a short and concise summary of the global news headlines.

natural-language-processing text-generation text-summarization attention-mechanism seq2seq-model luong-attention text-tokenization model-inference model-architecture-and-implementation data-exploration-and-preprocessing

Updated May 6, 2024
Jupyter Notebook

gudashashank / EqualEyes

Star

The project aimed to push image captioning technology forward by combining recent advances in image recognition and language modeling to generate novel, descriptive captions that go beyond just naming objects and actions

vectorization cnn-model image-augmentation bleu-score blip cnn-rnn text-tokenization vision-transformer

Updated Jul 30, 2024

SayamAlt / Customer-Support-Chatbot-using-NLTK

Star

Successfully developed a chatbot model which can provide accurate and concise responses to a wide variety of customer queries regarding the services offered by a particular company as well as general topics.

nlp deep-neural-networks deep-learning nltk chatbots text-tokenization

Updated Mar 29, 2023
Python

Aalaa4444 / Text_Processing-and-Unique_Word_Extraction_fromHTML

Star

Extract text content from an HTML page, process it, and extract unique words from the processed text. This notebook utilizes various text processing techniques including cleaning, normalization, tokenization, lemmatization or stemming, and stop words removal.

tokenizer text-extraction requests data-extraction beautifulsoup text-processing tokenization stemming lemmatization stopwords-removal text-cleaning text-normalization extract-html text-tokenization text-lemmatization

Updated Apr 5, 2024
Jupyter Notebook

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

text-tokenization

Here are 25 public repositories matching this topic...

alasdairforsythe / tokenmonster

twardoch / split-markdown4gpt

SayamAlt / Resume-Classification-using-fine-tuned-BERT

muthu-kumar-u / Speech-Recognition-RNN

SayamAlt / Symptoms-Disease-Text-Classification

SayamAlt / Financial-News-Sentiment-Analysis

SayamAlt / Cyberbullying-Classification-using-fine-tuned-DistilBERT

markiskorova / Machine-Learning-NLP-Predict-Author

SayamAlt / Fake-News-Classification-using-fine-tuned-BERT

Software-Research-Lab / dropsuit-tok

katanabana / Nihotip

adilrasheed139 / AI-Powered-Resume-Screening-using-BERT

SayamAlt / News-Category-Classification

victoryosiobe / kingchop

LokeshKenche / ISP_ChatBot

SayamAlt / Mental-Health-Classification-using-fine-tuned-DistilBERT

SayamAlt / Global-News-Headlines-Text-Summarization

gudashashank / EqualEyes

SayamAlt / Customer-Support-Chatbot-using-NLTK

Aalaa4444 / Text_Processing-and-Unique_Word_Extraction_fromHTML

Improve this page

Add this topic to your repo