use StandardCharsets instead of Charset.forName() #1603

ivangalkin · 2018-11-26T22:50:46Z

class StandardCharsets provides constants, which help to avoid typos
(compile-time check vs run-time check) and reduce number
of lookups

This change is

* class `StandardCharsets` provides constants, which help to avoid typos (compile-time check vs run-time check) and reduce number of lookups

guwirth

@ivangalkin thx looks good. Only thing I'm wondering is if there is no SQ constant/method to ask for the default?

ivangalkin · 2018-11-27T09:32:13Z

@guwirth SonarQube requires UTF-8 for the database (on the server side). Grep for UTF8 on https://docs.sonarqube.org/display/SONARqube71/Requirements#Requirements-SupportedPlatforms for more details. Although I believe, that we are pretty safe with UTF-8 as default, there is indeed the standard SonarQube way to determine the project's charset (on the client side):

See org.sonar.api.batch.fs.FileSystem.encoding(). It reads the data from the property sonar.sourceEncoding.

We use it in order to create the CxxConfiguration. Afterwards all visitors, which must respect the encoding (see CxxCharsetAwareVisitor) receive the mentioned encoding through the method CxxCharsetAwareVisitor::setCharset(Charset charset). UTF-8 is just used as a reasonable default for these visitors.

There are still visitors, which still ignore the FileSystem.encoding() and use UTF-8:

MissingNewLineAtEndOfFileCheck
FileRegularExpressionCheck (there might be some order issues between setCharset() and init(), must be doube-checked or refactored)

Also all our importing sensors ignore the FileSystem.encoding(). For some 3rd party reports we allow to specify the file charset (ClangTidy, VC, GCC, please grep for REPORT_CHARSET_DEF). Some of them just hardcode the UTF-8 (e.g. CxxDrMemorySensor). The remaining ones use the default (system) charset for parsing I assume. Not sure if and how should we make it uniform.

guwirth · 2018-11-27T09:59:58Z

@ivangalkin thanks for investigation. Think sonar.sourceEncoding is only for source header files. Think in newer version SQ is also able to handle BOMs? On the other hand reports can have a different encoding (could be from a different machine with a different encoding). So it makes sense to be able to overwrite the default for each sensor individual.

Think we should merge this one. It's for sure an improvement and create a new issue for encoding clean-up.

use StandardCharsets instead of Charset.forName()

30e983b

* class `StandardCharsets` provides constants, which help to avoid typos (compile-time check vs run-time check) and reduce number of lookups

ivangalkin added the enhancement label Nov 26, 2018

ivangalkin added this to the 1.2.1 milestone Nov 26, 2018

ivangalkin self-assigned this Nov 26, 2018

ivangalkin requested review from Bertk and guwirth November 26, 2018 22:50

guwirth reviewed Nov 27, 2018

View reviewed changes

guwirth approved these changes Nov 27, 2018

View reviewed changes

guwirth merged commit 1b9362c into SonarOpenCommunity:master Nov 27, 2018

guwirth mentioned this pull request Nov 27, 2018

Refactoring: clean-up encoding usage/handling #1604

Closed

guwirth mentioned this pull request Dec 21, 2018

V1.2.1 draft #1639

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use StandardCharsets instead of Charset.forName() #1603

use StandardCharsets instead of Charset.forName() #1603

ivangalkin commented Nov 26, 2018 •

edited by SonarOpenCommunityAdmin

Loading

guwirth left a comment

ivangalkin commented Nov 27, 2018

guwirth commented Nov 27, 2018

use StandardCharsets instead of Charset.forName() #1603

use StandardCharsets instead of Charset.forName() #1603

Conversation

ivangalkin commented Nov 26, 2018 • edited by SonarOpenCommunityAdmin Loading

guwirth left a comment

Choose a reason for hiding this comment

ivangalkin commented Nov 27, 2018

guwirth commented Nov 27, 2018

ivangalkin commented Nov 26, 2018 •

edited by SonarOpenCommunityAdmin

Loading