Multi-Label Genre and Form Classification of Book Reviews

This repository contains code and resources for fine-tuning a BERT-based model for multi-label genre and form classification of book reviews. It uses BERT-based language models from Nasjonalbiblioteket (Norwegian National Library) and a dataset drawn from the open API of Biblioteksentralen. The dataset is highly imbalanced.

Overview

Multi-Label Classification: Each book review can belong to multiple genre and form.
Fine-Tuning BERT: The model is fine-tuned using a chosen BERT-based langauge model.
Evaluation: The model is evaluated using metrics such as F1 macro score.

Resources

Bibbi Metadata REST API: Used for collecting book metadata, including reviews, genre and form labels (https://bibliografisk.bs.no/).
Norwegian Thesaurus on Genre and Form: Used for the genre and form vocabulary (https://www.nb.no/nbvok/ntsf/en/).
NB-BERT-base: Pre-trained Norwegian language model used for fine-tuning (https://huggingface.co/NbAiLab/nb-bert-base).
NB-BERT-large: Pre-trained Norwegian language model used for fine-tuning (https://huggingface.co/NbAiLab/nb-bert-large).

Getting Started

1. Install Python (Mac)

Install pyenv:

brew install pyenv

Install xz (if using M1 or M2 Mac):

brew install xz

Install Python (max version 3.12.*):

pyenv install 3.12.7

Switch to Python version:

pyenv global 3.12.7

Verify Python version

python --version

2. Set Up the Virtual Environment

In the root folder of the project. Start by creating a virtual environment for managing dependencies:

python -m venv env

Activate the virtual environment:

source env/bin/activate

Install requirements:

pip install -r requirements.txt

3. Install JupyterLab Desktop

https://github.com/jupyterlab/jupyterlab-desktop

Open the project in JupyterLAb and activate the newly created virtual environment (upper right corner).

4. Create Dataset

The dataset contains metadata including reviews and associated genre and form labels. Since the dataset is highly imbalanced, techniques such as oversampling, undersampling, or data augmentation may be applied to improve the performance of the model.

Run the create_dataset.ipynb notebook to create the dataset.

5. Describe Dataset

Run the describe_dataset.ipynb notebook to explore and visualize the dataset distribution.

6. Fine-Tuning

Choose and fine-tune a model by running the fine_tune_model.ipynb notebook. This notebook will:

load the dataset.
process the data for multi-label classification.
handle data imbalance using appropriate techniques.
fine-tune the model on the prepared dataset.

7. Classification

Once the model has been fine-tuned, you can use the genre_classification.ipynb notebook to classify new book reviews into genre and form. This notebook allows you to:

load the fine-tuned model and checkpoint.
input book reviews for genre classification.
output the predicted genre and form labels for the reviews.

8. Evaluation and F1 Macro Score

The model performance is evaluated using several metrics, including F1 Macro Score, which is particularly suited for imbalanced datasets like this one.

After training the NB-Bert-base model for one epoch, the F1 Macro Score was: 0.83.

After training the NB-Bert-large model for one epoch, the F1 Macro Score was: 0.89.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
csv		csv
notebooks		notebooks
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-Label Genre and Form Classification of Book Reviews

Overview

Resources

Getting Started

1. Install Python (Mac)

2. Set Up the Virtual Environment

3. Install JupyterLab Desktop

4. Create Dataset

5. Describe Dataset

6. Fine-Tuning

7. Classification

8. Evaluation and F1 Macro Score

About

Releases

Packages

Languages

torleifg/book-reviews-genre-classification

Folders and files

Latest commit

History

Repository files navigation

Multi-Label Genre and Form Classification of Book Reviews

Overview

Resources

Getting Started

1. Install Python (Mac)

2. Set Up the Virtual Environment

3. Install JupyterLab Desktop

4. Create Dataset

5. Describe Dataset

6. Fine-Tuning

7. Classification

8. Evaluation and F1 Macro Score

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages