Skip to content

mendab1e/SolrCountriesSynonyms

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SolrCountriesSynonyms

This repo contains the synonyms file with all countries in all languages for analyzers that accept Solr format. It can be used to configure a synonym token filter for explicit tokenization of country names in various languages to country names in English.

example

Countries data has been gathered from country-list.

Usage

If you use ElasticSearch, you can define a synonym token filter like this:

"filter" : {
  "countries_synonyms" : {
    "type" : "synonym",
    "synonyms_path" : "countries_synonyms.txt"
  }
}

Then use countries_synonyms in any custom analyzer. You can find more information about Synonym Token Filter in the documentation.

Tokenization to other languages

In case if you need other explicit languages beside English you can generate a synonyms file yourself:

Download countries data

wget https://github.com/umpirsky/country-list/archive/master.zip

Extract it

unzip -e master.zip

Check all available languages

ls country-list-master/data

Run generator with a language option. Here is the example for Russian language

ruby main.rb ru_RU

About

Countries synonyms in all languages for Solr

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages