大模型预训练中文语料清洗及质量评估 Large model pre-training corpus cleaning
-
Updated
Jul 25, 2024 - Java
大模型预训练中文语料清洗及质量评估 Large model pre-training corpus cleaning
Accurate, fast, lightweight, multilingual, free and open-source next word prediction library
🍰 A library for creating n-grams, skip-grams, bag of words, bag of n-grams, bag of skip-grams.
Search API with spelling correction using ngram-index algorithm: implementation using Java Spring-boot and MySQL ngram full text search index
Ngrams with Basic Smoothings
This project about search fillup using elasticsearch
Sample project for next word predictions using n-grams
Compares keyword frequency analyses between two bodies of text
java n-gram cross-entropy (naturalness) calculation on the line level of granularity.
Implementation of trainable ngram speech prediction as described in OSU Linguistics 3802
Performs an ngram frequency analysis on a text corpus stored in a spreadsheet
Add a description, image, and links to the ngram topic page so that developers can more easily learn about it.
To associate your repository with the ngram topic, visit your repo's landing page and select "manage topics."