GitHub - czcorpus/kontext: An advanced, extensible web front-end for the Manatee-open corpus search engine

Introduction

KonText is an advanced corpus query interface and corpus data integration platform built around corpus search engine Manatee-open. It is written in Python 3 and TypeScript and it runs on any major Linux distribution. The development is maintained by the Institute of the Czech National Corpus.

Features

fully editable query chain
- any operation from a user defined sequence (e.g. query -> filter -> sample -> sorting) can be changed and the whole sequence is then re-executed.
multiple search modes:
- concordance,
- paradigmatic query,
- word list
- keyword analysis
simple and advanced query types
- advanced CQL editor with syntax highlighting and attribute recognition
- interactive PoS tag composing tool for positional and key-value tagsets
- customizable query suggestions and simple type query refinement (e.g. for homonym disambiguation)
support for spoken corpora
- defined text segments can be played back as audio
- KWIC detail with easily distinguishable speeches
rich concordance view options and tools
- any positional attribute can be set as primary
- multiple ways how to display other attributes
- user-defined line groups - filtering, reviewing groups ratios
- tokens and KWICs can be connected to external data services (e.g. dictionaries, encyclopedias)
- individual tokens can be linked to each other using an external service (e.g. for word translation equivalents)
rich subcorpus-related functionality
- any subcorpus is accessible by other users (in case they obtain a URL, otherwise the subcorpus is not discoverable by default)
  - once a public description is set, the subcorpus can be discovered on the "public subcorpora" page
- text types metadata can be gradually refined to a specific subcorpus ("which publishers are there in case only fiction is selected?")
- a custom text types ratio can be defined ("give me 20% fiction and 80% journalism")
- unused subcorpora can be archived (URLs with the subcorpus are still valid) or completely removed (URLs will become invalid)
- searching within a subcorpous can be further refined with ad-hoc text type selection
- a subcorpus can be created with respect to corpora aligned ("give me fiction in Czech but only if there is an English translation for it")
frequency distribution
- univariate
  - positional attributes (including tuples of multiple attributes per token)
  - structural attributes
- multivariate distribution (2 dimensions) for both positional and structural attributes
collocation analysis
persistent URLs - any result page can be easily shared even if the original query is megabytes long
access to previous queries, named queries
convenient corpus access
- finding corpus by a keyword (tag), size, description
- adding corpus to favorites (incl. subcorpora, aligned corpora)
saving result to Excel, CSV, XML, JSONL, TXT
HTTP API access

Internal features

modern client-side application (written in TypeScript, event stream architecture, React components, extensible)
server-side written using the Sanic framework with fully decoupled background concordance/frequency/collocation calculation (using an integrated Rq worker server)
modular code design with dynamically loadable plug-ins providing custom functionality implementation (e.g. custom database adapters, authentication method, corpus listing widgets, HTTP session management)
- integrability with existing information systems

Installation

Docker

Running KonText as a set of Docker containers is the most convenient and flexible way. Docker Compose V2 is required. To run a basic configuration instance (i.e. no MySQL/MariaDB server, no WebSocket server) use:

docker compose up

To run a production grade instance:

docker compose -f docker-compose.yml -f docker-compose.mysql.yml --env-file .env.mysql up

(the .env.mysql allows configuring custom MySQL/MariaDB credentials and KonText configuration file)

Manual installation

Key requirements

Python 3.6 (or newer)
Manatee corpus search engine - version 2.167.8 and onwards (for KonText v0.17, Manatee v2.2xx is recommended)
a key-value storage
- Redis (recommended), SQLite (supported), custom implementations possible
a task queue - Rq
HTTP proxy server
- Nginx (recommended), Apache,...

For Ubuntu OS users, it is recommended to use the install script which should perform most of the actions necessary to install and run KonText. For other Linux distributions we recommend running KonText within a container or a virtual machine. Please refer to the doc/INSTALL.md file for details.

Customization and contribution

Please refer to our Wiki.

Notable users

Institute of the Czech National Corpus
LINDAT/CLARIAH-CZ
CLARIN-PL
CLARIN-SI
Serbski Institut (API version of KonText)

How to cite KonText

Tomáš Machálek (2020) - KonText: Advanced and Flexible Corpus Query Interface

@inproceedings{machalek-2020-kontext,
    title = "{K}on{T}ext: Advanced and Flexible Corpus Query Interface",
    author = "Mach{\'a}lek, Tom{\'a}{\v{s}}",
    booktitle = "Proceedings of the 12th Language Resources and Evaluation Conference",
    month = may,
    year = "2020",
    address = "Marseille, France",
    publisher = "European Language Resources Association",
    url = "https://www.aclweb.org/anthology/2020.lrec-1.865",
    pages = "7003--7008",
    language = "English",
    ISBN = "979-10-95546-34-4",
}

Name		Name	Last commit message	Last commit date
Latest commit History 12,748 Commits
.github/workflows		.github/workflows
build-scripts		build-scripts
conf		conf
cypress		cypress
doc		doc
dockerfiles		dockerfiles
lib		lib
locale		locale
public		public
scripts		scripts
templates		templates
test-data/tags		test-data/tags
tests		tests
worker		worker
.dockerignore		.dockerignore
.env		.env
.env.mysql		.env.mysql
.eslintrc		.eslintrc
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
.travis.yml		.travis.yml
CONTRIBUTING.md		CONTRIBUTING.md
COPYING		COPYING
Makefile		Makefile
README.md		README.md
apt-requirements.txt		apt-requirements.txt
cypress.config.ts		cypress.config.ts
dev-requirements.txt		dev-requirements.txt
docker-compose.cypress.yml		docker-compose.cypress.yml
docker-compose.dev.yml		docker-compose.dev.yml
docker-compose.masm-dev.yml		docker-compose.masm-dev.yml
docker-compose.mysql-dev.yml		docker-compose.mysql-dev.yml
docker-compose.mysql.yml		docker-compose.mysql.yml
docker-compose.test.yml		docker-compose.test.yml
docker-compose.yml		docker-compose.yml
ecosystem.config.js		ecosystem.config.js
launcher-config.json		launcher-config.json
launcher-menu.json		launcher-menu.json
package-lock.json		package-lock.json
package.json		package.json
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
tsconfig.json		tsconfig.json
tslint.json		tslint.json
unit-test-server.sh		unit-test-server.sh
webpack.dev.js		webpack.dev.js
webpack.prod.js		webpack.prod.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Contents

Introduction

Features

Internal features

Installation

Docker

Manual installation

Key requirements

Customization and contribution

Notable users

How to cite KonText

About

Releases 10

Packages

Contributors 14

Languages

License

czcorpus/kontext

Folders and files

Latest commit

History

Repository files navigation

Contents

Introduction

Features

Internal features

Installation

Docker

Manual installation

Key requirements

Customization and contribution

Notable users

How to cite KonText

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 10

Packages 0

Contributors 14

Languages

Packages