Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle numbers formatted with underscores in tokenizer #1819

Merged
merged 110 commits into from
May 31, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
110 commits
Select commit Hold shift + click to select a range
a764bc9
Undo changes
Feb 13, 2018
9d1b2cc
Test fixes
Feb 13, 2018
a91291a
Increase timeout
Mar 2, 2018
bf266af
Remove double event listening
Mar 7, 2018
7bc6bd6
Remove test
Mar 7, 2018
8ce8b48
Revert "Remove test"
Mar 7, 2018
e3a549e
Revert "Remove double event listening"
Mar 7, 2018
92e8c1e
#1096 The if statement is automatically formatted incorrectly
Mar 27, 2018
b540a1d
Merge fix
Mar 27, 2018
7b0573e
Add more tests
Mar 27, 2018
facb106
More tests
Mar 27, 2018
f113881
Typo
Mar 27, 2018
3e76718
Test
Mar 28, 2018
6e85dc6
Also better handle multiline arguments
Mar 28, 2018
99e037c
Add a couple missing periods
brettcannon Mar 28, 2018
3caeab7
Undo changes
Feb 13, 2018
eeb1f11
Test fixes
Feb 13, 2018
f5f78c7
Increase timeout
Mar 2, 2018
88744da
Remove double event listening
Mar 7, 2018
65dde44
Remove test
Mar 7, 2018
c513f71
Revert "Remove test"
Mar 7, 2018
ccb3886
Revert "Remove double event listening"
Mar 7, 2018
106f4db
Merge fix
Mar 27, 2018
9e5cb43
Merge branch 'master' of https://github.com/MikhailArkhipov/vscode-py…
Apr 5, 2018
e1da6a6
#1257 On type formatting errors for args and kwargs
Apr 5, 2018
e78f0fb
Handle f-strings
Apr 5, 2018
725cf71
Stop importing from test code
Apr 5, 2018
5cd6d45
#1308 Single line statements leading to an indentation on the next line
Apr 5, 2018
27613db
#726 editing python after inline if statement invalid indent
Apr 5, 2018
8061a20
Undo change
Apr 5, 2018
17dc292
Move constant
Apr 5, 2018
65964b9
Harden LS startup error checks
Apr 10, 2018
4bf5a4c
#1364 Intellisense doesn't work after specific const string
Apr 10, 2018
6f7212c
Merge branch 'master' of https://github.com/Microsoft/vscode-python
Apr 12, 2018
ddbd295
Telemetry for the analysis enging
Apr 12, 2018
ffd1d3f
Merge branch 'master' of https://github.com/Microsoft/vscode-python
Apr 12, 2018
d4afb6c
PR feedback
Apr 13, 2018
12186b8
Fix typo
Apr 16, 2018
ca90529
Test baseline update
Apr 16, 2018
a7267b5
Jedi 0.12
Apr 16, 2018
cfee109
Priority to goto_defition
Apr 16, 2018
1285789
Merge branch 'master' of https://github.com/Microsoft/vscode-python i…
Apr 17, 2018
d1ff1d9
News
Apr 17, 2018
1bd1651
Replace unzip
Apr 17, 2018
a69b6fd
Merge branch 'master' of https://github.com/Microsoft/vscode-python i…
Apr 17, 2018
f916ace
Linux flavors + test
Apr 18, 2018
28ca25f
Grammar check
Apr 19, 2018
ad9a3c9
Grammar test
Apr 20, 2018
ff8dd35
Test baselines
Apr 20, 2018
26726f8
Merge branch 'master' of https://github.com/Microsoft/vscode-python i…
Apr 20, 2018
d7806ca
Add news
Apr 20, 2018
0b3f316
Pin dependency
brettcannon Apr 23, 2018
28a8950
Merge branch 'grammar' of https://github.com/MikhailArkhipov/vscode-p…
Apr 23, 2018
1804617
Merge branch 'master' of https://github.com/Microsoft/vscode-python i…
Apr 23, 2018
f000e5d
Specify markdown as preferable format
Apr 26, 2018
a06fd79
Merge branch 'master' of https://github.com/Microsoft/vscode-python
Apr 26, 2018
ef7c5c7
Improve function argument detection
Apr 26, 2018
f4e88c0
Specify markdown
Apr 27, 2018
d420c34
Merge branch 'master' of https://github.com/Microsoft/vscode-python
May 3, 2018
b819d57
Merge branch 'master' into analysis
May 3, 2018
abff213
Pythia setting
May 3, 2018
d140b3a
Baseline updates
May 3, 2018
4b394d9
Baseline update
May 3, 2018
a397b11
Improve startup
May 4, 2018
e54eaf8
Handle missing interpreter better
May 4, 2018
3b8ddd5
Handle interpreter change
May 4, 2018
9a4500d
Merge branch 'master' of https://github.com/Microsoft/vscode-python i…
May 4, 2018
41f9624
Delete old file
May 4, 2018
3627b85
Fix LS startup time reporting
May 4, 2018
486d11d
Remove Async suffix from IFileSystem
May 8, 2018
cf5cf9c
Merge branch 'master' of https://github.com/Microsoft/vscode-python i…
May 8, 2018
4913e28
Merge branch 'master' of https://github.com/Microsoft/vscode-python i…
May 18, 2018
84214e1
Remove Pythia
May 17, 2018
9c1adb1
Remove pre-packaged MSIL
May 18, 2018
5a6e546
Exe name on Unix
May 18, 2018
1f2ae09
Plain linux
May 18, 2018
f972614
Fix casing
May 19, 2018
e0021a9
Merge branch 'analysis' of https://github.com/MikhailArkhipov/vscode-…
May 19, 2018
d2721cd
Fix message
May 19, 2018
b8bc0a2
Merge branch 'analysis' of https://github.com/MikhailArkhipov/vscode-…
May 19, 2018
56d34f7
Update PTVS engine activation steps
May 21, 2018
9aab160
Merge branch 'analysis' of https://github.com/MikhailArkhipov/vscode-…
May 21, 2018
981290f
Type formatter eats space in from .
May 22, 2018
d279e96
fIX CASING
May 22, 2018
6b466a9
Remove flag
May 23, 2018
d8c6193
Merge branch 'master' of https://github.com/Microsoft/vscode-python i…
May 23, 2018
2904f3b
Don't wait for LS
May 24, 2018
c7d34c9
Small test fixes
May 28, 2018
17775bd
Update hover baselines
May 28, 2018
2edeb3c
Merge branch 'master' of https://github.com/Microsoft/vscode-python i…
May 29, 2018
2fd5387
Rename the engine
May 29, 2018
a06b993
Merge branch 'master' of https://github.com/Microsoft/vscode-python i…
May 29, 2018
0190078
Formatting 1
May 29, 2018
5b93e34
Add support for 'rf' strings
May 30, 2018
781e6b1
Add two spaces before comment per PEP
May 30, 2018
e795309
Fix @ operator spacing
May 30, 2018
d300d0c
Handle module and unary ops
May 30, 2018
dd09087
Type hints
May 30, 2018
46b6dfd
Fix typo
May 30, 2018
cf264b8
Trailing comma
May 30, 2018
3e341e9
Require space after if
May 30, 2018
3788f1d
underscore numbers
May 30, 2018
e5a588e
Merge branch 'master' of https://github.com/Microsoft/vscode-python i…
May 30, 2018
06e7140
Update list of keywords
May 30, 2018
5f4eba4
Merge branch 'format' into nums
May 30, 2018
c597d82
PR feedback
May 31, 2018
361e106
Merge branch 'format' into nums
May 31, 2018
e5163ba
Merge master
May 31, 2018
480069a
News
May 31, 2018
f722a03
Use a bit more Markdown in the news entry
brettcannon May 31, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions news/2 Fixes/1779.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
`editor.formatOnType` no longer breaks numbers formatted with underscores.
12 changes: 8 additions & 4 deletions src/client/language/characters.ts
Original file line number Diff line number Diff line change
Expand Up @@ -83,18 +83,22 @@ export function isLineBreak(ch: number): boolean {
return ch === Char.CarriageReturn || ch === Char.LineFeed;
}

export function isNumber(ch: number): boolean {
return ch >= Char._0 && ch <= Char._9 || ch === Char.Underscore;
}

export function isDecimal(ch: number): boolean {
return ch >= Char._0 && ch <= Char._9;
return ch >= Char._0 && ch <= Char._9 || ch === Char.Underscore;
}

export function isHex(ch: number): boolean {
return isDecimal(ch) || (ch >= Char.a && ch <= Char.f) || (ch >= Char.A && ch <= Char.F);
return isDecimal(ch) || (ch >= Char.a && ch <= Char.f) || (ch >= Char.A && ch <= Char.F) || ch === Char.Underscore;
}

export function isOctal(ch: number): boolean {
return ch >= Char._0 && ch <= Char._7;
return ch >= Char._0 && ch <= Char._7 || ch === Char.Underscore;
}

export function isBinary(ch: number): boolean {
return ch === Char._0 || ch === Char._1;
return ch === Char._0 || ch === Char._1 || ch === Char.Underscore;
}
66 changes: 46 additions & 20 deletions src/client/language/tokenizer.ts
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

// tslint:disable-next-line:import-name
import Char from 'typescript-char';
import { isBinary, isDecimal, isHex, isIdentifierChar, isIdentifierStartChar, isOctal } from './characters';
import { isBinary, isDecimal, isHex, isIdentifierChar, isIdentifierStartChar, isOctal, isWhiteSpace } from './characters';
import { CharacterStream } from './characterStream';
import { TextRangeCollection } from './textRangeCollection';
import { ICharacterStream, ITextRangeCollection, IToken, ITokenizer, TextRange, TokenizerMode, TokenType } from './types';
Expand All @@ -29,13 +29,8 @@ class Token extends TextRange implements IToken {
export class Tokenizer implements ITokenizer {
private cs: ICharacterStream = new CharacterStream('');
private tokens: IToken[] = [];
private floatRegex = /[-+]?(?:(?:\d*\.\d+)|(?:\d+\.?))(?:[Ee][+-]?\d+)?/;
private mode = TokenizerMode.Full;

constructor() {
//this.floatRegex.compile();
}

public tokenize(text: string): ITextRangeCollection<IToken>;
public tokenize(text: string, start: number, length: number, mode: TokenizerMode): ITextRangeCollection<IToken>;

Expand Down Expand Up @@ -224,43 +219,74 @@ export class Tokenizer implements ITokenizer {

if (this.cs.currentChar === Char._0) {
let radix = 0;
// Try hex
if (this.cs.nextChar === Char.x || this.cs.nextChar === Char.X) {
// Try hex => hexinteger: "0" ("x" | "X") (["_"] hexdigit)+
if ((this.cs.nextChar === Char.x || this.cs.nextChar === Char.X) && isHex(this.cs.lookAhead(2))) {
this.cs.advance(2);
while (isHex(this.cs.currentChar)) {
this.cs.moveNext();
}
radix = 16;
}
// Try binary
if (this.cs.nextChar === Char.b || this.cs.nextChar === Char.B) {
// Try binary => bininteger: "0" ("b" | "B") (["_"] bindigit)+
if ((this.cs.nextChar === Char.b || this.cs.nextChar === Char.B) && isBinary(this.cs.lookAhead(2))) {
this.cs.advance(2);
while (isBinary(this.cs.currentChar)) {
this.cs.moveNext();
}
radix = 2;
}
// Try octal
if (this.cs.nextChar === Char.o || this.cs.nextChar === Char.O) {
// Try octal => octinteger: "0" ("o" | "O") (["_"] octdigit)+
if ((this.cs.nextChar === Char.o || this.cs.nextChar === Char.O) && isOctal(this.cs.lookAhead(2))) {
this.cs.advance(2);
while (isOctal(this.cs.currentChar)) {
this.cs.moveNext();
}
radix = 8;
}
if (radix > 0) {
const text = this.cs.getText().substr(start + leadingSign, this.cs.position - start - leadingSign);
if (!isNaN(parseInt(text, radix))) {
this.tokens.push(new Token(TokenType.Number, start, text.length + leadingSign));
return true;
}
}
}

let decimal = false;
// Try decimal int =>
// decinteger: nonzerodigit (["_"] digit)* | "0" (["_"] "0")*
// nonzerodigit: "1"..."9"
// digit: "0"..."9"
if (this.cs.currentChar >= Char._1 && this.cs.currentChar <= Char._9) {
while (isDecimal(this.cs.currentChar)) {
this.cs.moveNext();
}
decimal = this.cs.currentChar !== Char.Period && this.cs.currentChar !== Char.e && this.cs.currentChar !== Char.E;
}

if (this.cs.currentChar === Char._0) { // "0" (["_"] "0")*
while (this.cs.currentChar === Char._0 || this.cs.currentChar === Char.Underscore) {
this.cs.moveNext();
}
decimal = this.cs.currentChar !== Char.Period && this.cs.currentChar !== Char.e && this.cs.currentChar !== Char.E;
}

if (decimal) {
const text = this.cs.getText().substr(start + leadingSign, this.cs.position - start - leadingSign);
if (radix > 0 && parseInt(text.substr(2), radix)) {
if (!isNaN(parseInt(text, 10))) {
this.tokens.push(new Token(TokenType.Number, start, text.length + leadingSign));
return true;
}
}

if (isDecimal(this.cs.currentChar) || this.cs.currentChar === Char.Period) {
const candidate = this.cs.getText().substr(this.cs.position);
const re = this.floatRegex.exec(candidate);
if (re && re.length > 0 && re[0] && candidate.startsWith(re[0])) {
this.tokens.push(new Token(TokenType.Number, start, re[0].length + leadingSign));
this.cs.position = start + re[0].length + leadingSign;
// Floating point
if ((this.cs.currentChar >= Char._0 && this.cs.currentChar <= Char._9) || this.cs.currentChar === Char.Period) {
while (!isWhiteSpace(this.cs.currentChar)) {
this.cs.moveNext();
}
const text = this.cs.getText().substr(start, this.cs.position - start);
if (!isNaN(parseFloat(text))) {
this.tokens.push(new Token(TokenType.Number, start, this.cs.position - start));
return true;
}
}
Expand Down Expand Up @@ -380,7 +406,7 @@ export class Tokenizer implements ITokenizer {
case 'rf':
case 'ur':
case 'br':
return 2;
return 2;
default:
break;
}
Expand Down
38 changes: 29 additions & 9 deletions src/test/language/tokenizer.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -193,7 +193,7 @@ suite('Language.Tokenizer', () => {
test('Hex number', () => {
const t = new Tokenizer();
const tokens = t.tokenize('1 0X2 0x3 0x');
assert.equal(tokens.count, 4);
assert.equal(tokens.count, 5);

assert.equal(tokens.getItemAt(0).type, TokenType.Number);
assert.equal(tokens.getItemAt(0).length, 1);
Expand All @@ -204,13 +204,16 @@ suite('Language.Tokenizer', () => {
assert.equal(tokens.getItemAt(2).type, TokenType.Number);
assert.equal(tokens.getItemAt(2).length, 3);

assert.equal(tokens.getItemAt(3).type, TokenType.Unknown);
assert.equal(tokens.getItemAt(3).length, 2);
assert.equal(tokens.getItemAt(3).type, TokenType.Number);
assert.equal(tokens.getItemAt(3).length, 1);

assert.equal(tokens.getItemAt(4).type, TokenType.Identifier);
assert.equal(tokens.getItemAt(4).length, 1);
});
test('Binary number', () => {
const t = new Tokenizer();
const tokens = t.tokenize('1 0B1 0b010 0b3 0b');
assert.equal(tokens.count, 6);
assert.equal(tokens.count, 7);

assert.equal(tokens.getItemAt(0).type, TokenType.Number);
assert.equal(tokens.getItemAt(0).length, 1);
Expand All @@ -227,13 +230,16 @@ suite('Language.Tokenizer', () => {
assert.equal(tokens.getItemAt(4).type, TokenType.Identifier);
assert.equal(tokens.getItemAt(4).length, 2);

assert.equal(tokens.getItemAt(5).type, TokenType.Unknown);
assert.equal(tokens.getItemAt(5).length, 2);
assert.equal(tokens.getItemAt(5).type, TokenType.Number);
assert.equal(tokens.getItemAt(5).length, 1);

assert.equal(tokens.getItemAt(6).type, TokenType.Identifier);
assert.equal(tokens.getItemAt(6).length, 1);
});
test('Octal number', () => {
const t = new Tokenizer();
const tokens = t.tokenize('1 0o4 0o077 -0o200 0o9 0oO');
assert.equal(tokens.count, 7);
assert.equal(tokens.count, 8);

assert.equal(tokens.getItemAt(0).type, TokenType.Number);
assert.equal(tokens.getItemAt(0).length, 1);
Expand All @@ -253,8 +259,11 @@ suite('Language.Tokenizer', () => {
assert.equal(tokens.getItemAt(5).type, TokenType.Identifier);
assert.equal(tokens.getItemAt(5).length, 2);

assert.equal(tokens.getItemAt(6).type, TokenType.Unknown);
assert.equal(tokens.getItemAt(6).length, 3);
assert.equal(tokens.getItemAt(6).type, TokenType.Number);
assert.equal(tokens.getItemAt(6).length, 1);

assert.equal(tokens.getItemAt(7).type, TokenType.Identifier);
assert.equal(tokens.getItemAt(7).length, 2);
});
test('Decimal number', () => {
const t = new Tokenizer();
Expand Down Expand Up @@ -301,6 +310,17 @@ suite('Language.Tokenizer', () => {
assert.equal(tokens.getItemAt(5).type, TokenType.Number);
assert.equal(tokens.getItemAt(5).length, 5);
});
test('Underscore numbers', () => {
const t = new Tokenizer();
const tokens = t.tokenize('+1_0_0_0 0_0 .5_00_3e-4 0xCAFE_F00D 10_000_000.0 0b_0011_1111_0100_1110');
const lengths = [8, 3, 10, 11, 12, 22];
assert.equal(tokens.count, 6);

for (let i = 0; i < tokens.count; i += 1) {
assert.equal(tokens.getItemAt(i).type, TokenType.Number);
assert.equal(tokens.getItemAt(i).length, lengths[i]);
}
});
test('Simple expression, leading minus', () => {
const t = new Tokenizer();
const tokens = t.tokenize('x == -y');
Expand Down