Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix international strings test / mishandling of international strings #12494

Merged
merged 1 commit into from
Nov 5, 2018

Conversation

eileenmcnaughton
Copy link
Contributor

Overview

Fix mishandling of non-English charsets in very specific searches - e.g organization_name

Before

Specific mysql / server environments cannot successfully search for organization_name of 'Scarabée'

This is replicated by the environment-specifc errors on the testInternationalStrings test

After

testInternationalStrings now passes locally & hopefully will also pass on Jenkins where it was failing.... holds breath.

Technical Details

In trying to review PR #12364 I dug into the strotolower behaviour. I found that
in the path altered in this PR we are setting the value to lower case but not the
mysql clause.

I ran through ContactTest without these lines & found it
only hit this strtolower in one test - testInternationalStrings which searches
by organization name. I found the this test actually failed for me
when I ran it locally - but it passed when I removed the strotlower lines.

Wonder if jenkins finds the same

Comments

@seamuslee001 @monishdeb

@civibot
Copy link

civibot bot commented Jul 17, 2018

(Standard links)

@eileenmcnaughton
Copy link
Contributor Author

@seamuslee001 @monishdeb @colemanw So we now have some more information about the use of LOWER in mysql.

What we knew

  1. wrapping a where param in LOWER() means the index cannot be used, resulting in a performance hit
  2. we have removed various instances of the lower casing over time with no blow back or bug reports
  3. lower casing strings was messing with regex

What we ALSO know now

  1. the use of LOWER actively hurts our searches on some non-english alpha strings and that is the cause of the intermittent issues on those strings

Where to?

  1. I think we should actively remove the last 6 places in the query object where LOWER is used. I'm happy to spread that over a few months just in case anything pops up

eileenmcnaughton added a commit to eileenmcnaughton/civicrm-core that referenced this pull request Jul 18, 2018
Per civicrm#12494 the use of LOWER
- hurts performance
- fails to return results on some char sets
- messes with REGEX

This is part of a continued (we removed from contribution search fields last year)
staggered approach to removing this old mechanism
eileenmcnaughton added a commit to eileenmcnaughton/civicrm-core that referenced this pull request Jul 18, 2018
Per civicrm#12494 the use of LOWER
- hurts performance
- fails to return results on some char sets
- messes with REGEX

This is part of a continued (we removed from contribution search fields last year)
staggered approach to removing this old mechanism
eileenmcnaughton added a commit to eileenmcnaughton/civicrm-core that referenced this pull request Jul 18, 2018
Per civicrm#12494 the use of LOWER
- hurts performance
- fails to return results on some char sets
- messes with REGEX

This is part of a continued (we removed from contribution search fields last year)
staggered approach to removing this old mechanism
@seamuslee001
Copy link
Contributor

@eileenmcnaughton i still have a question about whether the locale of the server plays some role in interpreting the string (especially with accents)

@eileenmcnaughton
Copy link
Contributor Author

@seamuslee001 I don't know - what I DO know is that the strtolower() php function seems to mess with the string locally. Perhaps when mbstrtolower is installed it gets 'lowered' without mucking up the string

@eileenmcnaughton
Copy link
Contributor Author

@seamuslee001 I propose to close this for now & we merge #12503 for 5.5 & then another one of a similar scope next month

@eileenmcnaughton
Copy link
Contributor Author

I'm going to close this for now as we have a plan to phase out lower slowly based on this I believe

eileenmcnaughton added a commit to eileenmcnaughton/civicrm-core that referenced this pull request Aug 2, 2018
…ysql non-case-sensitivity.

Note I could not find any way to call this so I added deprecation notices

Per civicrm#12494 the use of LOWER

hurts performance
fails to return results on some char sets
messes with REGEX
This is part of a continued (we removed from contribution search fields last year)
staggered approach to removing this old mechanism
eileenmcnaughton added a commit to eileenmcnaughton/civicrm-core that referenced this pull request Aug 2, 2018
…ysql non-case-sensitivity.

Note I could not find any way to call this so I added deprecation notices

Per civicrm#12494 the use of LOWER

hurts performance
fails to return results on some char sets
messes with REGEX
This is part of a continued (we removed from contribution search fields last year)
staggered approach to removing this old mechanism
eileenmcnaughton added a commit to eileenmcnaughton/civicrm-core that referenced this pull request Aug 3, 2018
…ysql non-case-sensitivity.

Note I could not find any way to call this so I added deprecation notices

Per civicrm#12494 the use of LOWER

hurts performance
fails to return results on some char sets
messes with REGEX
This is part of a continued (we removed from contribution search fields last year)
staggered approach to removing this old mechanism
xurizaemon pushed a commit that referenced this pull request Aug 10, 2018
…ysql non-case-sensitivity.

Note I could not find any way to call this so I added deprecation notices

Per #12494 the use of LOWER

hurts performance
fails to return results on some char sets
messes with REGEX
This is part of a continued (we removed from contribution search fields last year)
staggered approach to removing this old mechanism
mattwire pushed a commit to mattwire/civicrm-core that referenced this pull request Aug 22, 2018
…ysql non-case-sensitivity.

Note I could not find any way to call this so I added deprecation notices

Per civicrm#12494 the use of LOWER

hurts performance
fails to return results on some char sets
messes with REGEX
This is part of a continued (we removed from contribution search fields last year)
staggered approach to removing this old mechanism
@civibot civibot bot added the master label Oct 23, 2018
eileenmcnaughton added a commit to eileenmcnaughton/civicrm-core that referenced this pull request Oct 23, 2018
Per civicrm#12494 mysql handles lcase.

Adding LOWER to mysql queries makes them slower. lowercasing php strings
breaks under some character sets with some server configs. Less is more
@eileenmcnaughton
Copy link
Contributor Author

I have re-opened this because we have done as discussed above & removed some other places without any issues arising. We have had a related PR arise (#12930) so that is encouragement to push on with phasing out lower.

At the moment this will fail as I'm trying to flush out the tested places that touch on this

eileenmcnaughton added a commit to eileenmcnaughton/civicrm-core that referenced this pull request Oct 23, 2018
Per civicrm#12494 mysql handles lcase.

Adding LOWER to mysql queries makes them slower. lowercasing php strings
breaks under some character sets with some server configs. Less is more
eileenmcnaughton added a commit to eileenmcnaughton/civicrm-core that referenced this pull request Oct 23, 2018
Per civicrm#12494 mysql handles lcase.

Adding LOWER to mysql queries makes them slower. lowercasing php strings
breaks under some character sets with some server configs. Less is more
@eileenmcnaughton eileenmcnaughton force-pushed the strtolower branch 2 times, most recently from 5b1185b to 53c7499 Compare November 2, 2018 00:38
… strings test.

Adding LOWER to mysql queries makes them slower. lowercasing php strings
breaks under some character sets with some server configs.

In trying to review PR 12364 I dug into the strotolower behaviour. I found that
in the path altered in this PR we are setting the value to lower case but not the
mysql clause. I ran through ContactTest without these lines & found it
only hit this strtolower in one test - testInternationalStrings which searches
by organization name. I found the this test actually failed for me
when I ran it locally - but it passed when I removed the strotlower lines.

Wonder in jenkins finds the same
@eileenmcnaughton
Copy link
Contributor Author

@mfb @seamuslee001 OK - this is the last one! I think we should merge this & it will be done!

I searched for LOWER( & there are only 2-3 places left and they all seem very contained - ie. Contribution_BAO_Query still does the bad thing when you search on contribution note. I was going to leave that out of scope though unless on of you want to do it.

@seamuslee001
Copy link
Contributor

This is all consistent with all the other work going on. These parts all look to be well backed up by tests so the fact this has passed Jenkins makes me feel comfortable with this. I note we are re-enabling a couple of tests and all tests have passed and i'm good with this enough to merge

@seamuslee001 seamuslee001 merged commit c010360 into civicrm:master Nov 5, 2018
@seamuslee001 seamuslee001 deleted the strtolower branch November 5, 2018 00:53
@eileenmcnaughton
Copy link
Contributor Author

Thanks @seamuslee001

@mfb I'm not going to work on the last few places that LOWER still happens at the moment but am happy to review if you wish to

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants