Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use StandardCharsets instead of Charset.forName() #1603

Merged
merged 1 commit into from
Nov 27, 2018

Conversation

ivangalkin
Copy link
Contributor

@ivangalkin ivangalkin commented Nov 26, 2018

  • class StandardCharsets provides constants, which help to avoid typos
    (compile-time check vs run-time check) and reduce number
    of lookups

This change is Reviewable

* class `StandardCharsets` provides constants, which help to avoid typos
  (compile-time check vs run-time check) and reduce number
  of lookups
@ivangalkin ivangalkin added this to the 1.2.1 milestone Nov 26, 2018
@ivangalkin ivangalkin self-assigned this Nov 26, 2018
@ivangalkin ivangalkin requested review from Bertk and guwirth November 26, 2018 22:50
Copy link
Collaborator

@guwirth guwirth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ivangalkin thx looks good. Only thing I'm wondering is if there is no SQ constant/method to ask for the default?

@ivangalkin
Copy link
Contributor Author

@guwirth SonarQube requires UTF-8 for the database (on the server side). Grep for UTF8 on https://docs.sonarqube.org/display/SONARqube71/Requirements#Requirements-SupportedPlatforms for more details. Although I believe, that we are pretty safe with UTF-8 as default, there is indeed the standard SonarQube way to determine the project's charset (on the client side):

See org.sonar.api.batch.fs.FileSystem.encoding(). It reads the data from the property sonar.sourceEncoding.

We use it in order to create the CxxConfiguration. Afterwards all visitors, which must respect the encoding (see CxxCharsetAwareVisitor) receive the mentioned encoding through the method CxxCharsetAwareVisitor::setCharset(Charset charset). UTF-8 is just used as a reasonable default for these visitors.

There are still visitors, which still ignore the FileSystem.encoding() and use UTF-8:

  • MissingNewLineAtEndOfFileCheck
  • FileRegularExpressionCheck (there might be some order issues between setCharset() and init(), must be doube-checked or refactored)

Also all our importing sensors ignore the FileSystem.encoding(). For some 3rd party reports we allow to specify the file charset (ClangTidy, VC, GCC, please grep for REPORT_CHARSET_DEF). Some of them just hardcode the UTF-8 (e.g. CxxDrMemorySensor). The remaining ones use the default (system) charset for parsing I assume. Not sure if and how should we make it uniform.

@guwirth
Copy link
Collaborator

guwirth commented Nov 27, 2018

@ivangalkin thanks for investigation. Think sonar.sourceEncoding is only for source header files. Think in newer version SQ is also able to handle BOMs? On the other hand reports can have a different encoding (could be from a different machine with a different encoding). So it makes sense to be able to overwrite the default for each sensor individual.

Think we should merge this one. It's for sure an improvement and create a new issue for encoding clean-up.

@guwirth guwirth merged commit 1b9362c into SonarOpenCommunity:master Nov 27, 2018
@guwirth guwirth mentioned this pull request Dec 21, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

2 participants