Toxicity detector

A simple HTTP server for checking text toxicity (specifically for banned words).

Model

This project uses the one-for-all-toxicity-v3 model, which is distributed under the CC-BY-4.0 license. The model supports multilingualism (55 languages). It was trained on the toxi-text-3M dataset.

Installation

Warning

The project is designed to run on CPU, if you want to use GPU you will have to replace torch dependency in pyproject.toml.

Tip

Recommendation for choosing VPS: the model works much faster on Apple processors.

Local Run

Important

Minimum requirement Python 3.9.

This project uses Rye for dependency management, but it is also possible to install dependencies via pip. This is not necessary.

Clone the repository

git clone https://github.com/twirapp/toxicity-detector.git && cd toxicity-detector

Download model Run this in the project root directory:

mkdir -p ./model && cd ./model && \
curl -O https://huggingface.co/FredZhang7/one-for-all-toxicity-v3/resolve/main/config.json && \
curl -O https://huggingface.co/FredZhang7/one-for-all-toxicity-v3/resolve/main/pytorch_model.bin && \
curl -O https://huggingface.co/FredZhang7/one-for-all-toxicity-v3/resolve/main/special_tokens_map.json && \
curl -O https://huggingface.co/FredZhang7/one-for-all-toxicity-v3/resolve/main/tokenizer.json && \
curl -O https://huggingface.co/FredZhang7/one-for-all-toxicity-v3/resolve/main/tokenizer_config.json && \
curl -O https://huggingface.co/FredZhang7/one-for-all-toxicity-v3/resolve/main/vocab.txt && \
cd ..

Install dependencies

This will automatically create the virtual environment in the .venv directory and install the required dependencies
```
rye sync
```
(not recommended) alternative install via pip
Create a virtual environment and activate:
```
python3 -m venv .venv && source .venv/bin/activate
```
Install only the required dependencies:
```
pip3 install --no-deps -r requirements.lock
```
Run the server

With autoload:
```
rye run dev-server
```
Without autoload:
```
rye run server
```
Without Rye

With autoload:
```
uvicorn app.server:app --reload
```
Without autoload:
```
uvicorn app.server:app
```

Docker Hub

You can pull the pre-built Docker image from Docker Hub:

docker pull twirapp/toxicity-detector

And run it with the command:

docker run --rm -p 8000:8000 --name toxicity-detector twirapp/toxicity-detector

Docker Build

Clone the repository

git clone https://github.com/twirapp/toxicity-detector.git && cd toxicity-detector

Build the Docker image

docker build -t toxicity-detector .

Run the container

docker run --rm -p 8000:8000 --name toxicity-detector toxicity-detector

Docker Compose

Create a docker-compose.yml file with the following content:

services:
  toxicity-detector:
    image: twirapp/toxicity-detector
    ports:
      - "8000:8000"
    environment:
      TOXICITY_THRESHOLD: 0
      # WEB_CONCURRENCY: 1 # uvicorn workers count

Then run:

docker compose up -d

Usage

Make a GET request to / or /predict (preferred) with query parameter ?text=your text here. The result will be 0 or 1, 0 - the text is considered non-toxic, 1 - the text is considered toxic. Curl command for testing:

curl -G 'http://localhost:8000/predict' --data-urlencode 'text=test text'

Note

--data-urlencode is needed to work with letters other than English, for example Russian.

Environment variables

MODEL_PATH - path to the directory where the model files are stored. (which you should have downloaded) Default: ./model
TOXICITY_THRESHOLD - the level below which the text will be considered toxic. Default: 0 - the argmax function is used. This is a float value, example: -0.2, -0.05, 1.
WEB_CONCURRENCY - Number of worker processes. Defaults to the value of this environment variable if set, otherwise 1. Note: Not compatible with --reload option.

Explanation of the log output

01-24 19:01:34 | 0.568 sec | 9.583, -9.616 | False | 'text'

01-24 19:01:34 - date and time: month, day of the month, time of message display.
0.568 sec - execution time spent on the model call.
9.583, -9.616 - returned value from the model. The first for how much toxic text, the second number the opposite of the first. When specifying TOXICITY_THRESHOLD you need to look at the first number. The more negative the first value, the more toxic the text.
False - prediction result based on TOXICITY_THRESHOLD (if set) or the result of the argmax function.
'text' - the text that was passed to the model. After clearing emoji and converting to lowercase.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github/workflows		.github/workflows
app		app
model		model
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements-dev.lock		requirements-dev.lock
requirements.lock		requirements.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Toxicity detector

Model

Installation

Local Run

Docker Hub

Docker Build

Docker Compose

Usage

Environment variables

Explanation of the log output

About

Contributors 2

Languages

License

twirapp/toxicity-detector

Folders and files

Latest commit

History

Repository files navigation

Toxicity detector

Model

Installation

Local Run

Docker Hub

Docker Build

Docker Compose

Usage

Environment variables

Explanation of the log output

About

Topics

Resources

License

Stars

Watchers

Forks

Contributors 2

Languages