-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sorting arabic, farsi or hebrew numbers is not "natural" #52
Comments
I can make this work for strings that start with a number by extending the regular expressions that check for digits to accept multiple unicode digits:
With numbers at the end of the stings things fail though when there's more than one unicode digit, like in "street ۱۲". This causes the same "TypeError: unorderable types: float() < str()" as in issue #7 again. Turns out that the fake_fastnumbers implementation only identifies single digit unicode numbers as "int", too, otherwise returning "string". When fixing the fastnumbers version check (issue #51) so that the real fastnumbers functions are used things work fine as expected after all. |
My regex change makes one test fail when testing against Python 3.5. With Python 2.7 all tests pass:
|
The reason I chose to not attempt to combine non-ASCII digits into numbers as is done for ASCII numbers is that it was not clear to me if they could be treated the same way 100% of the time. For example, I know that ⅐ would need special care, but this is classified as a number in unicode and not a digit so perhaps I should not worry. Should it be safe to treat any unicode digit as it's ASCII equivalent when converting to integers? For example, I am assuming that |
I only care about integers at this point, not floats, the background being that I was looking for a more natural sorting of street names in the street indexes generated by MapOSMatic. This looks good now for my two original test cases, New York City: https://maposmatic.osm-baustelle.de/maps/16968 and a town in Iran: https://maposmatic.osm-baustelle.de/maps/16961
I now checked for cities using ½ in their street numbering scheme in the OpenStreetMap database and found https://maposmatic.osm-baustelle.de/maps/1699 where I can see things like:
Seeing 8½th between 80th and 81st is a bit odd, but as we only have some ~1300 roads worldwide that use ½ (and none using the other fractionals like 1/3 or 1/4) I can live with that for now.
That should be safe. I could only check with native speakers from Iran (for Persian/Farsi) and Syria (for Arabic) so far, but I also have contacts from Malaysia, Israel, South Korea, and maybe Japan and China that I can ask to verify that street lists come out correctly (if there are examples of numbered streets in those countries at all).
That seems to be way more tricky, at least from looking at https://en.wikipedia.org/wiki/Decimal_separator ... |
It sounds like the validation you are using to assess if changing the Having said that, I do agree some update should be done, especially considering as you point out that I think that perhaps just updating the regex and also |
I had forgotten that I added code in Thanks for reporting this. |
@hholzgra I think I am going to go forward with this release in the next few days. |
Minimum, Complete, Verifiable Example
Error message, Traceback, Desired behavior, Suggestion, Request, or Question
Expected result:
Actual result:
The text was updated successfully, but these errors were encountered: