-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Rf_charIsASCII for IS_ASCII instead of testing LEVELS on R >= 4.5 #6422
Conversation
Since data.table now depends on R >= 3.3, the backports are no longer needed. Moreover, MAYBE_SHARED is currently a function, while MAYBE_REFERENCED expands to !NO_REFERENCES (which is a function). In debugging output, show MAYBE_REFERENCED (NAMED > 0) instead of NAMED.
getCharCE appeared in R-2.7, making it possible to check for strings _marked_ as UTF-8 or Latin-1. There is no marking as ASCII, so fixing IS_ASCII will have to wait for R >= 4.5.
There's no explicit encoding code for ASCII, so use charIsASCII() ("eapi", expected to appear in R-4.5.0).
Generated via commit 090dc37 Download link for the artifact containing the test results: ↓ atime-results.zip
|
Don't mind that, we maintainers will take care of it :) |
set #6420 as the target of this PR to make the chain clearer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it an issue that the pre 4.5.0 version returns 1 or 0 whereas the new version returns an Rboolean
?
Good catch! It's a bit worse: LEVELS(x) & 64 returns 0 or 64.
Here are the only users of IS_ASCII:
data.table.h:
#define NEED2UTF8(s) !(IS_ASCII(s) || (s)==NA_STRING || IS_UTF8(s))
fwriteR.c:
#define TO_NATIVE(s) (native && (s)!=NA_STRING && !IS_ASCII(s))
forder.c:
if (!anynotutf8 && /*...*/ !IS_ASCII(s)) {
They all use the result of the macro in boolean context (if / || / &&),
so any non-zero result is interpreted as true.
|
@@ -42,7 +42,11 @@ | |||
/* we mean the encoding bits, not CE_NATIVE in a UTF-8 locale */ | |||
#define IS_UTF8(x) (getCharCE(x) == CE_UTF8) | |||
#define IS_LATIN(x) (getCharCE(x) == CE_LATIN1) | |||
#define IS_ASCII(x) (LEVELS(x) & 64) // API expected in R >= 4.5 | |||
#if R_VERSION < R_Version(4, 5, 0) || R_SVN_REVISION < 86789 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keeping this here even though there's another (4,5,0)
check above at L21
- The
R_SVN_REVISION
s are different - The
IS_UTF8()
andIS_LATIN()
macros are also right here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is ready to go now, too, right? Please merge if you agree. Thanks!
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #6422 +/- ##
=======================================
Coverage 98.59% 98.59%
=======================================
Files 79 79
Lines 14661 14661
=======================================
Hits 14455 14455
Misses 206 206 ☔ View full report in Codecov by Sentry. |
This part is split from #6420 (but includes #6420; can rebase if needed) because the R version containing
Rf_isCharASCII
is not currently released. The overall issue is #6180. Not yet sure how to changeNEWS.md
in case we're expectingdata.table
releases in the meantime.