Skip to content

Collection of language identification libraries for .NET: FastText, CLD2, CLD3, MediaPipe, Lingua, Whatlang

License

Notifications You must be signed in to change notification settings

gluschenko/panlingo

Repository files navigation

Panlingo

GitHub CI

Overview

Welcome to the Panlingo repository! πŸš€

This project presents a comprehensive collection of language identification libraries for .NET. Its primary purpose is to bring popular language identification models to the .NET ecosystem, allowing developers to seamlessly integrate language detection functionality into their applications.

Libraries

Library Nuget Release
Panlingo.LanguageIdentification.CLD2 NuGet Version
Panlingo.LanguageIdentification.CLD3 NuGet Version
Panlingo.LanguageIdentification.FastText NuGet Version
Panlingo.LanguageIdentification.Whatlang NuGet Version
Panlingo.LanguageIdentification.MediaPipe NuGet Version
Panlingo.LanguageIdentification.Lingua NuGet Version
Panlingo.LanguageCode NuGet Version

Contents

  1. Models
    1. CLD2
    2. CLD3
    3. FastText
    4. Whatlang
    5. MediaPipe
    6. Lingua
  2. Features
  3. Platform Support
  4. Key Concerns
  5. TODO

Models

CLD2

CLD3

FastText

Whatlang

MediaPipe

Lingua

Key concerns

  • Zero-dependency development.
  • The original code of libraries (CLD2, CLD3, FastText, Whatlang) is used as submodules without additional modifications or improvements. Third-party code is not included into this repository.
  • Preserve the original library behavior without breaking changes.

Features

Feature CLD2 CLD3 FastText* Whatlang MediaPipe** Lingua
Single language prediction Yes Yes Yes Yes Yes Yes
Multi language prediction Yes Yes Yes No Yes Yes
Supported languages 80 107 176 or 217 69 110 75
Unknown language detection Yes Yes No No Yes No
Algorithm quadgrams neural network neural network trigrams neural network trigrams
Script detection No No Yes (only lid218e) Yes No No

* When using these models: lid176, lid218e

** When using MediaPipe Language Detector

Platform support

Model Linux Windows macOS Blazor WASM
CLD2 βœ… 🚧 ❌ ❌
CLD3 βœ… 🚧 ❌ ❌
FastText βœ… 🚧 ❌ ❌
Whatlang βœ… 🚧 ❌ ❌
MediaPipe βœ… 🚧 ❌ ❌
Lingua βœ… 🚧 ❌ ❌

βœ… β€” Full support | ❌ β€” No support | 🚧 β€” Under research

TODO

  • Research support for other platforms (Windows, macOS).
  • Add more unit tests.
  • Implement more native methods (FastText).
  • Self-contained models (FastText + MediaPipe).
  • Remove protobuf dependency (CLD3).

Feel free to open issues or contribute to the repository. Together, let's enhance the .NET language identification capabilities! 🌐


Happy hacking! πŸ‘©β€πŸ’»πŸ‘¨β€πŸ’»