Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

expand umlauts #146

Closed
wants to merge 2 commits into from
Closed

expand umlauts #146

wants to merge 2 commits into from

Conversation

missinglink
Copy link
Member

@missinglink missinglink commented Jul 21, 2016

[DO NOT MERGE] at this time

"umlaut" : {
  "type" : "mapping",
  "mappings" : [
    "ä=>ae",
    "ö=>oe",
    "ü=>ue",
    "Ä=>Ae",
    "Ö=>Oe",
    "Ü=>Ue",
    "ß=>ss"
  ]
}

closes #74
closes pelias/pelias#319

@missinglink
Copy link
Member Author

Fun fact of the day:

In Germany, A Big Mac used to be a Big Mäc.

And the Filet-o-Fish was the Fishmäc. The spelling with the umlaut actually gets German speakers a little closer to the English pronunciation of “Mac.” But in 2007 McDonald’s took away the umlauts, and now Germans have to get boring ol’ Big Macs like the rest of us.

pulp

@missinglink
Copy link
Member Author

This PR would break some existing behavior, eg:

records indexed as Österreich would be query-able using the input Oesterreich but not Osterreich.

prior to this PR the term Österreich was retrievable using the term Osterreich so this would be a breaking change.

It seems like a better strategy would be to import two versions of the same token; once as Oesterreich and once as Osterreich.

@orangejulius
Copy link
Member

Do we want to close this PR in favor of something icu related?

@missinglink
Copy link
Member Author

I don't believe the ICU library will handle this type of expansion.

@missinglink
Copy link
Member Author

This feature will require alt-names as the country name above can have 3 forms:

Österreich
Oesterreich
Osterreich

moving to alt-names feature as it can only be solved for a maximum of 2 cases before then.

@trescube
Copy link
Contributor

trescube commented May 30, 2017

Closed due to teh new hotness

@trescube trescube closed this May 30, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

expand umlaut, eg. 'ö' -> 'oe' Diacriticals expanded into two characters in query don't work
3 participants