Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix integer to string conversion when generating code #7941

Merged
merged 4 commits into from
Apr 17, 2023

Conversation

grendello
Copy link
Contributor

@grendello grendello commented Apr 6, 2023

Fixes: #7939
Context: dotnet/runtime#13363
Context: https://www.unicode.org/charts/PDF/U2200.pdf (page 2)

LLVM IR generator outputs a number of integer values, relying on the
standard ToString() method to convert the value to string. However,
since NET6, this conversion is culture-sensitive and in certain
cultures (list below) it will output the minus sign not as the standard
ASCII 0x2D character but rather as the Unicode 0x2212 (mathematical
operator "minus") character. This breaks LLVM LLC:

llc: environment.arm64-v8a.ll:554:7: error: expected value token
  stderr |                 i32 −1, ; apk_fd

Fix the problem by using invariant culture when converting all
integers and unknown objects to strings. The reason all of them are
converted in this way is to avoid future changes to ICU and/or .NET
to-string conversion that might affect the resulting integer format.

Locales/cultures affected by this issue are as follows:

  • (RTL "-") 26 cultures
    • ar
    • ar-001
    • ar-BH
    • ar-DJ
    • ar-EG
    • ar-ER
    • ar-IL
    • ar-IQ
    • ar-JO
    • ar-KM
    • ar-KW
    • ar-LB
    • ar-MR
    • ar-OM
    • ar-PS
    • ar-QA
    • ar-SA
    • ar-SD
    • ar-SO
    • ar-SS
    • ar-SY
    • ar-TD
    • ar-YE
    • sd
    • sd-Arab
    • sd-Arab-PK
  • "‎-": 10 cultures
    • ar-AE
    • ar-DZ
    • ar-EH
    • ar-LY
    • ar-MA
    • ar-TN
    • he
    • he-IL
    • ur
    • ur-PK
  • (RTL "-"): 3 cultures
    • ckb
    • ckb-IQ
    • ckb-IR
  • "−": 38 cultures
    • et
    • et-EE
    • eu
    • eu-ES
    • fi
    • fi-FI
    • fo
    • fo-DK
    • fo-FO
    • gsw
    • gsw-CH
    • gsw-FR
    • gsw-LI
    • hr
    • hr-BA
    • hr-HR
    • ksh
    • ksh-DE
    • lt
    • lt-LT
    • nb
    • nb-NO
    • nb-SJ
    • nn
    • nn-NO
    • no
    • rm
    • rm-CH
    • se
    • se-FI
    • se-NO
    • se-SE
    • sl
    • sl-SI
    • sv
    • sv-AX
    • sv-FI
    • sv-SE
  • "‎−": 3 cultures
    • fa
    • fa-AF
    • fa-IR
  • "‎-‎": 16 cultures
    • ks
    • ks-Arab
    • ks-Arab-IN
    • lrc
    • lrc-IQ
    • lrc-IR
    • mzn
    • mzn-IR
    • pa-Arab
    • pa-Arab-PK
    • ps
    • ps-AF
    • ps-PK
    • ur-IN
    • uz-Arab
    • uz-Arab-AF

Fixes: dotnet#7939
Context: dotnet/runtime#13363
Context: https://www.unicode.org/charts/PDF/U2200.pdf (page 2)

LLVM IR generator outputs a number of integer values, relying on the
standard `ToString()` method to convert the value to string.  However,
since NET6, this conversion is culture-sensitive and in certain
cultures (list below) it will output the minus sign not as the standard
ASCII 0x2D character but rather as the Unicode 0x2212 (mathematical
operator "minus") character.  This breaks LLVM LLC:

    llc: environment.arm64-v8a.ll:554:7: error: expected value token
      stderr |                 i32 −1, ; apk_fd

Fix the problem by using invariant culture when converting **all**
integers and unknown objects to strings.  The reason all of them are
converted in this way is to avoid future changes to ICU and/or .NET
to-string conversion that might affect the resulting integer format.

Locales/cultures affected by this issue are as follows:

  * et
  * et-EE
  * eu
  * eu-ES
  * fi
  * fi-FI
  * fo
  * fo-DK
  * fo-FO
  * gsw
  * gsw-CH
  * gsw-FR
  * gsw-LI
  * hr
  * hr-BA
  * hr-HR
  * ksh
  * ksh-DE
  * lt
  * lt-LT
  * nb
  * nb-NO
  * nb-SJ
  * nn
  * nn-NO
  * no
  * rm
  * rm-CH
  * se
  * se-FI
  * se-NO
  * se-SE
  * sl
  * sl-SI
  * sv
  * sv-AX
  * sv-FI
  * sv-SE
@grendello grendello requested a review from jonpryor April 6, 2023 17:44
@grendello grendello marked this pull request as ready for review April 6, 2023 17:44
Copy link
Member

@jonathanpeppers jonathanpeppers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like I did in: dotnet/maui@c40c6e7

Can we add these rules as errors in the .editorconfig?

dotnet_diagnostic.CA1307.severity = error
dotnet_diagnostic.CA1309.severity = error

@grendello
Copy link
Contributor Author

Like I did in: dotnet/maui@c40c6e7

Can we add these rules as errors in the .editorconfig?

dotnet_diagnostic.CA1307.severity = error
dotnet_diagnostic.CA1309.severity = error

We should definitely do it, but in a separate PR, I think. It's not directly related to this PR.

@jonpryor
Copy link
Member

@grendello: I would like a unit test of some form for this. Could we alter one of the existing app build + deploy tests in tests/MSBuildDeviceIntegration to set the locale for the entire build process, including app deploy?

@dellis1972, @jonathanpeppers?

@jonathanpeppers
Copy link
Member

For Mac, we could add test of a Release build, using the option to pass the $LANG env var:

https://github.com/xamarin/xamarin-android/blob/59e1e466bcb6739e976f61a65ce14a24bb6e3bf6/src/Xamarin.Android.Build.Tasks/Tests/Xamarin.ProjectTools/Common/ProjectBuilder.cs#L63

But then we'd want to do some equivalent for Windows, too?

@grendello
Copy link
Contributor Author

@jonpryor Build stage would be sufficient, we don't need to deploy.
@jonathanpeppers I think mac would be enough, after all we can assume we're using the same dotnet version for both, so the ICU behavior will be identical.

@jonathanpeppers
Copy link
Member

Windows what I'm reading seems like a system-wide weird setting, someone recommends creating a user to run a process under...

So maybe it's just not worth it, and we make a new test that only runs on Mac.

* main:
  Bump to xamarin/Java.Interop/main@554d819 (dotnet#7951)
  [Microsoft.Android.Sdk.ILLink] fix crash when TZ changes (dotnet#7956)
  [tests] Port 'Xamarin.Android.JcwGen-Tests.JcwGen-Tests' to .NET (dotnet#7949)
  [Xamarin.Android.Build.Tasks] remove `pdb2mdb` (dotnet#7950)
  [ci] Add some extra params to configure the test templates (dotnet#7955)
  Convert `/tools` and `/build-tools` projects from `net472` to `$(DotNetStableTargetFramework)` (dotnet#7943)
  [Xamarin.Android.Build.Tasks] fix cases of missing `@(Reference)` (dotnet#7947)
  Bump com.android.tools:r8 from 4.0.52 to 8.0.40 (dotnet#7934)
  Bump to xamarin/Java.Interop/main@a172402 (dotnet#7944)
  [Xamarin.Android] Remove OpenTK, sqlite-xamarin, System.EnterpriseServices. (dotnet#7940)
  [ci] Stop building classic test suites. (dotnet#7938)
  Bumping to the correct monodroid commit
  Trying to bump monodroid to run debugger-tests
  Pass timeout to runtime
Copy link
Member

@jonathanpeppers jonathanpeppers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if the new test passes. 👍

@jonpryor
Copy link
Member

jonpryor commented Apr 15, 2023

Draft commit message:

Fixes: https://github.com/xamarin/xamarin-android/issues/7939

Context: https://github.com/dotnet/runtime/issues/13363
Context: https://www.unicode.org/charts/PDF/U2200.pdf (page 2)
Context: 5271f3e109a2242caa3bd7801dfad9c1683488d7

The LLVM IR generator (5271f3e1) outputs a number of integer values,
relying on the standard `int.ToString()` method to convert the value
to a string.  However, since .NET 6, this conversion is culture-
sensitive and in certain cultures (list below) it will output the
minus sign not as the "standard" ASCII \u002d character `-` but
instead as something else (see complete list below for details).

This breaks LLVM LLC:

	llc: environment.arm64-v8a.ll:554:7: error: expected value token
	  stderr |                 i32 −1, ; apk_fd

Fix the problem by using invariant culture when converting ***all***
integers and unknown objects to strings.  The reason all of them are
converted in this way is to avoid future changes to ICU and/or .NET
to-string conversion that might affect the resulting integer format.

Workaround: export `$LANG` to a locale that uses 0x2D `-` for
negation, e.g. `LANG=C`.

Locales/cultures affected by this issue are as follows:

  * ARABIC LETTER MARK + HYPHEN-MINUS `؜-` (\u061c \u002d): 26 cultures
    * ar
    * ar-001
    * ar-BH
    * ar-DJ
    * ar-EG
    * ar-ER
    * ar-IL
    * ar-IQ
    * ar-JO
    * ar-KM
    * ar-KW
    * ar-LB
    * ar-MR
    * ar-OM
    * ar-PS
    * ar-QA
    * ar-SA
    * ar-SD
    * ar-SO
    * ar-SS
    * ar-SY
    * ar-TD
    * ar-YE
    * sd
    * sd-Arab
    * sd-Arab-PK
  * LEFT-TO-RIGHT MARK + HYPHEN-MINUS `‎-` (\u200e \u002d): 10 cultures
    * ar-AE
    * ar-DZ
    * ar-EH
    * ar-LY
    * ar-MA
    * ar-TN
    * he
    * he-IL
    * ur
    * ur-PK
  * RIGHT-TO-LEFT MARK + HYPHEN-MINUS `‏-` (\u200f \u002d): 3 cultures
    * ckb
    * ckb-IQ
    * ckb-IR
  * MINUS SIGN `` (\u2212): 38 cultures
    * et
    * et-EE
    * eu
    * eu-ES
    * fi
    * fi-FI
    * fo
    * fo-DK
    * fo-FO
    * gsw
    * gsw-CH
    * gsw-FR
    * gsw-LI
    * hr
    * hr-BA
    * hr-HR
    * ksh
    * ksh-DE
    * lt
    * lt-LT
    * nb
    * nb-NO
    * nb-SJ
    * nn
    * nn-NO
    * no
    * rm
    * rm-CH
    * se
    * se-FI
    * se-NO
    * se-SE
    * sl
    * sl-SI
    * sv
    * sv-AX
    * sv-FI
    * sv-SE
  * LEFT-TO-RIGHT MARK + MINUS SIGN `‎−` (\u200e \u2212): 3 cultures
    * fa
    * fa-AF
    * fa-IR
  * LEFT-TO-RIGHT MARK + HYPHEN-MINUS + LEFT-TO-RIGHT MARK `‎-‎` (\u200e \u002d \u002e): 16 cultures
    * ks
    * ks-Arab
    * ks-Arab-IN
    * lrc
    * lrc-IQ
    * lrc-IR
    * mzn
    * mzn-IR
    * pa-Arab
    * pa-Arab-PK
    * ps
    * ps-AF
    * ps-PK
    * ur-IN
    * uz-Arab
    * uz-Arab-AF

Update `BuildTest2.BuildBasicApplication()` to export
`LANG=sv_SE.UTF-8` as part of the `dotnet build` command to test this
scenario on macOS and Linux.  (This change is ignored on Windows.)

@jonpryor
Copy link
Member

@grendello: will my above suggested workaround of LANG=C work? Or would we need LANG=en_US.UTF-8? (Why suggest LANG=C at all? To be ever so slightly less US-centric. Which is probably laughable, but still.)

@grendello
Copy link
Contributor Author

@jonpryor yes, LANG=C should work fine, as it will be as ASCII-based as en_US.UTF-8 is and should be available on all Unix systems.

@grendello
Copy link
Contributor Author

@jonpryor the commit message needs to be updated with the full list of affected cultures (see the OP here) and it needs to mention that it's not just one Unicode character, but several, that replace the ASCII hyphen as the "minus" sign.

@jonpryor jonpryor merged commit b1079a0 into dotnet:main Apr 17, 2023
@grendello grendello deleted the issue-7939 branch April 17, 2023 21:04
grendello added a commit to grendello/xamarin-android that referenced this pull request Apr 24, 2023
* main:
  [Xamarin.Android.Build.Tasks] enable ForceInterpretedInvoke switch (dotnet#7972)
  [Mono.Android] Bind API-UpsideDownCake Beta 1 (dotnet#7980)
  Bump to xamarin/xamarin-android-tools/main@8bc07503 (dotnet#7863)
  [automation] Add 'xaSourcePath' to yaml so they can be used from monodroid. (dotnet#7978)
  Bump to dotnet/installer@16c10f8115 8.0.100-preview.4.23218.1 (dotnet#7969)
  [docs] Add UnitTest.md  (dotnet#7877)
  [ci] Suppress fork PR build warnings (dotnet#7973)
  [Xamarin.Android.Build.Tasks] Bump ZipFlushFilesLimit (dotnet#7957)
  Bump to dotnet/installer@3ca7ad1c79 8.0.100-preview.4.23211.1 (dotnet#7946)
  [CI] Allow passing xamarin-android checkout dir to nested templates. (dotnet#7961)
  [Xamarin.Android.Build.Tasks] Fix `-int.ToString()` for locales (dotnet#7941)
  [ci] Automatically retry failed apk-instrumentation tests. (dotnet#7963)
jonathanpeppers pushed a commit that referenced this pull request Apr 25, 2023
Fixes: #7939

Context: dotnet/runtime#13363
Context: https://www.unicode.org/charts/PDF/U2200.pdf (page 2)
Context: 5271f3e

The LLVM IR generator (5271f3e) outputs a number of integer values,
relying on the standard `int.ToString()` method to convert the value
to a string.  However, since .NET 6, this conversion is culture-
sensitive and in certain cultures (list below) it will output the
minus sign not as the "standard" ASCII \u002d character `-` but
instead as something else (see complete list below for details).

This breaks LLVM LLC:

	llc: environment.arm64-v8a.ll:554:7: error: expected value token
	  stderr |                 i32 −1, ; apk_fd

Fix the problem by using invariant culture when converting ***all***
integers and unknown objects to strings.  The reason all of them are
converted in this way is to avoid future changes to ICU and/or .NET
to-string conversion that might affect the resulting integer format.

Workaround: export `$LANG` to a locale that uses 0x2D `-` for
negation, e.g. `LANG=C`.

Locales/cultures affected by this issue are as follows:

  * ARABIC LETTER MARK + HYPHEN-MINUS `؜-` (\u061c \u002d): 26 cultures
    * ar
    * ar-001
    * ar-BH
    * ar-DJ
    * ar-EG
    * ar-ER
    * ar-IL
    * ar-IQ
    * ar-JO
    * ar-KM
    * ar-KW
    * ar-LB
    * ar-MR
    * ar-OM
    * ar-PS
    * ar-QA
    * ar-SA
    * ar-SD
    * ar-SO
    * ar-SS
    * ar-SY
    * ar-TD
    * ar-YE
    * sd
    * sd-Arab
    * sd-Arab-PK
  * LEFT-TO-RIGHT MARK + HYPHEN-MINUS `‎-` (\u200e \u002d): 10 cultures
    * ar-AE
    * ar-DZ
    * ar-EH
    * ar-LY
    * ar-MA
    * ar-TN
    * he
    * he-IL
    * ur
    * ur-PK
  * RIGHT-TO-LEFT MARK + HYPHEN-MINUS `‏-` (\u200f \u002d): 3 cultures
    * ckb
    * ckb-IQ
    * ckb-IR
  * MINUS SIGN `−` (\u2212): 38 cultures
    * et
    * et-EE
    * eu
    * eu-ES
    * fi
    * fi-FI
    * fo
    * fo-DK
    * fo-FO
    * gsw
    * gsw-CH
    * gsw-FR
    * gsw-LI
    * hr
    * hr-BA
    * hr-HR
    * ksh
    * ksh-DE
    * lt
    * lt-LT
    * nb
    * nb-NO
    * nb-SJ
    * nn
    * nn-NO
    * no
    * rm
    * rm-CH
    * se
    * se-FI
    * se-NO
    * se-SE
    * sl
    * sl-SI
    * sv
    * sv-AX
    * sv-FI
    * sv-SE
  * LEFT-TO-RIGHT MARK + MINUS SIGN `‎−` (\u200e \u2212): 3 cultures
    * fa
    * fa-AF
    * fa-IR
  * LEFT-TO-RIGHT MARK + HYPHEN-MINUS + LEFT-TO-RIGHT MARK `‎-‎` (\u200e \u002d \u002e): 16 cultures
    * ks
    * ks-Arab
    * ks-Arab-IN
    * lrc
    * lrc-IQ
    * lrc-IR
    * mzn
    * mzn-IR
    * pa-Arab
    * pa-Arab-PK
    * ps
    * ps-AF
    * ps-PK
    * ur-IN
    * uz-Arab
    * uz-Arab-AF

Update `BuildTest2.BuildBasicApplication()` to export
`LANG=sv_SE.UTF-8` as part of the `dotnet build` command to test this
scenario on macOS and Linux.  (This change is ignored on Windows.)
@github-actions github-actions bot locked and limited conversation to collaborators Jan 22, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Can't build .NET 7 android projects
4 participants