AI German Easy Language Browsing
- About the Project
- Browser Extension
- Evaluation
- Authors
- License
- Citation
This project was created as part of my master thesis in Computer Science at the Munich University of Applied Sciences. It contains two different parts which are as follows:
browser-extension
: Implementation of a browser extension using local LLMs to translate web content into German "Easy Language", also known as "Leichte Sprache".evaluation
: Python-based scripts to check the suitability of different LLMs in regard to the use case "Easy Language" in German.
TODO
TODO
The execution of the Python scripts require you to have a modern version of
- Python as programming language and
- Poetry as dependency management tool
installed on your system. Please check out the Python documentation and Poetry documentation for installation instructions.
The exact compatible versions of Python and Poetry
can be found in the pyproject.toml
file inside the evaluation
directory.
When the requirements above are met,
you only need to execute poetry install
inside the evaluation
directory
to download the required packages.
All mentioned scripts can be run via poetry
using the following command: poetry run python <script-name>.py
Before starting the evaluation, you can define which models you want to evaluate
inside the models.csv
file in the evaluation
directory.
You can only use models from the HuggingFace platform.
The file has the following columns:
huggingface_repo
: repository name of the model (e.g. google/gemma-3-4b-it).gguf_filename
(optional): Only required when a.gguf
based model should be download from the repository; if kept empty will assumehuggingface_repo
is a standard model compatible with the transformers library.gated
:True
orFalse
whether the model is gated (e.g. when a license agreement consent on HuggingFace platform is necessary for your account).
The .env
file inside the evaluation
directory
allows further customization of the evaluation behaviour.
Relevant environment variables are the following:
HF_TOKEN
(optional): HuggingFace token for your account to fetch gated models that you approved on the platform. See HuggingFace documentation for further information.USE_CPU
:True
orFalse
whether you want to use your CPU or GPU for LLM inference.
Note: To make GPU inference work on your machine,
you might have to do additional steps to use your GPU backend
in llama-cpp-python
(for GGUF inference).
See official documentation
for further information.
transformers
should auto-detect your GPU backend
due to PyTorch being used under the hood.
To download the models you selected for evaluation,
you need to run the download script
using poetry run python src/01_download_models.py
when you are inside the evaluation
directory.
The script will read the content of the models.csv
file
and ask you to confirm the download before starting.
The downloaded models will be stored in the .cache
folder
inside the evaluation
directory for later use.
Tip: If you interrupt the model downloads by quitting the script execution, the script will automatically resume the downloads where they stopped.
When you experiment with different models
your .cache
folder might fill up quickly
and unused models unnecessarly take away storage space.
You can use the cleanup script using poetry run python src/cleanup.py
to get rid off all the models in your .cache
directory.
TBD
TBD
TBD
- Tobias Stadler - devtobi
Distributed under the MIT License. See LICENSE for more information.
If you reuse my work please cite my thesis as follows:
If you are interested in reading the thesis you can find it at ADD TITLE.