Resume Analysis Model System, ml-talent-match

Description

This project is designed to automate the processes of analyzing and evaluating resumes using cutting-edge natural language processing (NLP) approaches. The system consists of three key components:

Resume Corpus Collection and Preprocessing: The model was trained on a corpus of approximately 135,000 resumes collected from the internet.
LaBSE (Language-agnostic BERT Sentence Embeddings): A multilingual model pretrained for feature extraction. It can be used to map 109 languages to a shared vector space.
LoRA (Low-Rank Adaptation): An efficient fine-tuning method that doesn't require a significant increase in the number of model parameters.

Components

01. Resume Corpus and LaBSE

We assembled a resume corpus and utilized LaBSE for "post-pretraining" the model, allowing it to adapt to the resume format.

02. LaBSE Fine-tuning

The LaBSE model underwent rigorous fine-tuning on our collected corpus to enhance the accuracy of named entity recognition.

03. LoRA Adapter

The use of LoRA enables model fine-tuning without a significant increase in parameter count, making it a "lighter" method compared to traditional approaches.

Results

After fine-tuning LaBSE, our results show:

Precision for named entity recognition: 0.94
F1-score for named entity recognition: 0.95

Application

The model can be used for automated processing and analysis of resumes in HR, improving job search and recommendation systems, and in other areas where resume information extraction is necessary.

Repo

.py files:

model.py - NERModel used for NER in the service
preprocessing.py - preprocessor is used for cleaning the input string
parser.py - ResumeParser our parser that can parse different data formats into text: doc, docx, pdf, md (useful for github), and even png and jpg images with OCR

.ipynb files:

pretraining.ipynb - notebok for performing pretraining on large Resume Corpus
finetuning.ipynb - notebok for model finetuning

Contact

For additional information and inquiries, please contact us:

Email: malbashich@yandex.ru,
Phone: +7 985 475 86 50

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
bot		bot
.gitignore		.gitignore
README.md		README.md
finetuning.ipynb		finetuning.ipynb
model.py		model.py
parser.py		parser.py
preprocessing.py		preprocessing.py
pretraining.ipynb		pretraining.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Resume Analysis Model System, ml-talent-match

Description

Components

01. Resume Corpus and LaBSE

02. LaBSE Fine-tuning

03. LoRA Adapter

Results

Application

Repo

Contact

About

Releases

Packages

Contributors 3

Languages

leffff/ml-talent-match

Folders and files

Latest commit

History

Repository files navigation

Resume Analysis Model System, ml-talent-match

Description

Components

01. Resume Corpus and LaBSE

02. LaBSE Fine-tuning

03. LoRA Adapter

Results

Application

Repo

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages