Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dev/core#2127 - Don't accidentally trim à characters when importing files #19241

Merged
merged 1 commit into from
Dec 29, 2020

Conversation

demeritcowboy
Copy link
Contributor

Overview

https://lab.civicrm.org/dev/core/-/issues/2127

When importing data with columns ending in à the à disappears.

This is similar to @sluc23's PR #18780 but also handles latin1 encoding and shores up the tests.

Before

Letters like à get trimmed.

After

Letters like à get to stay.

Technical Details

The byte pattern for à is 0xc3 0xa0, which is very close the byte pattern for a non-breaking space 0xc2 0xa0. The trim() function operates on bytes and the parameter is a list of bytes to trim, so when you give it 0xc2 0xa0 it will trim either byte from the string. When there's an à, this means it corrupts the string so that it just ends in 0xc3 and gets truncated.

Further, note that in latin1 encoding, a non-breaking space is just 0xa0. Further further, php seems unable to detect the encoding when you have that, so it fails when trying to apply the regex.

I also updated the original test to check the actual value instead of just the length which is now more varied with the extra tests and also it just seemed better to check the full value. I considered using bin2hex which might be a more robust way of showing what's happening but it seemed more readable with json which should be just as good given the environment civi runs under, just note that json represents à using a unicode representation, as opposed to a utf8 byte sequence.

Comments

Shored up the tests so it also checks a file with a BOM, and also pulled the trim function out in order to throw some more tests at it.

@civibot
Copy link

civibot bot commented Dec 20, 2020

(Standard links)

@civibot civibot bot added the master label Dec 20, 2020
@eileenmcnaughton
Copy link
Contributor

@sluc23 any chance you can confirm this - looks pretty sound to me

@sluc23
Copy link
Contributor

sluc23 commented Dec 29, 2020

@eileenmcnaughton tested this PR on top of 5.32.2 and it worked fine!!
tx @demeritcowboy 💪

@demeritcowboy
Copy link
Contributor Author

Thanks @sluc23

@sluc23
Copy link
Contributor

sluc23 commented Jan 5, 2021

still on time to apply this to 5.33.x ? would be nice to have this fix on next ESR release

@eileenmcnaughton
Copy link
Contributor

It's merged for 5.34 - we normally merge regressions into the ESR rather than bug fixes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants