~Fix failing tests for invalid control characters in comments~ invalid #617

jidicula · 2021-10-05T01:38:48Z

Issue: #613

Modifies scanComment() to error out early if an illegal character is found
Adds illegal characters to scanComment()

Note that the NUL and US control characters weren't causing tests to fail prior to me making my changes... I took a quick look into why but couldn't crack why those tests weren't failing on my machine (TestTOMLTest_Invalid_Control_CommentNull and TestTOMLTest_Invalid_Control_CommentUs).

benchmark old ns/op new ns/op delta BenchmarkParseAll-8 3238 1941 -40.06%

When parsing strings, they can be referenced directly from the document when they don't contain escaped characters. This avoids paying to cost of allocating (and sometimes growing) the bytes buffer unecessarily.

* Use pointers instead of copying around ast.Node Node is a 56B struct that is constantly in the hot path. Passing nodes around by copy had a cost that started to add up. This change replaces them by pointers. Using unsafe pointer arithmetic and converting sibling/child indexes to relative offsets, it removes the need to carry around a pointer to the root of the tree. This saves 8B per Node. This space will be used to store an extra []byte slice to provide contextual error handling on all nodes, including the ones whose data is different than the raw input (for example: strings with escaped characters), while staying under the size of a cache line. * Remove conditional * Add Raw to track range in data for parsed values * Simplify reference tracking

Co-authored-by: Nabetani <takenori@nabetani.sakura.ne.jp>

* Reduces the public API. * Reuses optimized parsing functions. * Removes reliance on Google code under Apache license.

Inline call to hexToRune and uses specialized parsing, as found in encoding/json. Co-authored-by: Thomas Pelletier <thomas@pelletier.codes>

Co-authored-by: Thomas Pelletier <thomas@pelletier.codes>

Minimum supported version: Go 1.16.

When unmarshaling into a nested struct in a map, the value is not addressable. In that case, make a copy of it and modify it instead. Fixes pelletier#575

* parser: don't crash on unterminated table key Fixes pelletier#579 * parser: fix format of error returned by expect EOF was missing the format string and %U is not very human friendly.

Fixes pelletier#581

Fixes pelletier#586

When an invalid TOML expression ends with a comment before the end of file, the decode error would take a nil from scanComment, which is not part of the document. Fixes pelletier#588

Fixes pelletier#585

This is required to support custom types. Fixes pelletier#590

…ier#601) RFC3399 allows for lowercase 't' and 'z' in date-time values. Fixes pelletier#600

Tests are hidden behind a "testsuite" build tag for now since many tests are failing. Use `go test -tags testsuite` to activate. Use `go generate` to regenerate toml_testgen_test.go. Co-authored-by: Thomas Pelletier <thomas@pelletier.codes>

When scanning comments, it makes better sense to halt scanning and immediately return if an illegal character is encountered while scanning. This can save on performance in the perverse case of an extremely long comment that has an early offending character. Related to: pelletier#613

Partially resolves: pelletier#613

Oddly enough, the test passes when it shouldn't. Partially resolves: pelletier#613

jidicula · 2021-10-05T01:40:18Z

Sorry, opened a PR against the wrong branch 😓

pelletier added 30 commits January 30, 2021 09:07

test

b4bb91f

wip: string parsing

abe1005

Track ABNF file

d54ad15

Refactor to use parser state

07aa85e

Dotted keys

1c7e9fe

Boolean values

fd96110

Check for allocs

b96c535

Default to use bytes instead of runes

2ab0f8c

benchmark old ns/op new ns/op delta BenchmarkParseAll-8 3238 1941 -40.06%

Remove error handling for rune

7b4d82a

Parse rvalue string

91d7afb

Array implementation

bac65cc

Inline tables

44f7a7a

Standard Table

aae4656

Array tables

7300b6a

Rename to lexer and split in files

1e8b0dc

wip

94ad175

Add tokens to Document

b123c35

Trying the scanner approach

0ee0fe7

wip parsing

ca12c06

Multiline basic string parsing

736a757

Multiline literal strings

a466f0c

Fix parsing bugs + boolean impl

540c2a7

Implement tables

165f654

Implement array values

b1e11f8

Implement inline tables

9fa2fd4

Very beginning of unmarshaler + builder interface

89052d6

Parse tables

bd8df24

Simple table array

a197513

Add more tests for unmarshal array tables

70d41bd

Move tests out of the package

0e8fd64

pelletier and others added 26 commits June 1, 2021 09:10

Add benchmarks results to readme (pelletier#548)

b202375

Don't use bytes.Buffer when not necessary (pelletier#549)

b0d6c62

When parsing strings, they can be referenced directly from the document when they don't contain escaped characters. This avoids paying to cost of allocating (and sometimes growing) the bytes buffer unecessarily.

Benchmark marshal (pelletier#550)

f3bb20e

Unmarshal recursive structs (pelletier#557)

773f101

Co-authored-by: Nabetani <takenori@nabetani.sakura.ne.jp>

Provide own implementation of Local* (pelletier#558)

f6b38c3

* Reduces the public API. * Reuses optimized parsing functions. * Removes reliance on Google code under Apache license.

Set up Dependabot for GitHub actions and docker (pelletier#570)

9c24fbe

Unicode parsing optimization (pelletier#568)

a93b34d

Inline call to hexToRune and uses specialized parsing, as found in encoding/json. Co-authored-by: Thomas Pelletier <thomas@pelletier.codes>

Add LocalTime to interface{} decode support (pelletier#567)

8be357d

Co-authored-by: Thomas Pelletier <thomas@pelletier.codes>

Add installation instructions (pelletier#572)

fa07960

Go 1.17 release (pelletier#574)

69ab7e1

Minimum supported version: Go 1.16.

unmarshal: make copy of non addressable values (pelletier#576)

1230ca4

When unmarshaling into a nested struct in a map, the value is not addressable. In that case, make a copy of it and modify it instead. Fixes pelletier#575

parser: don't crash on unterminated table key (pelletier#580)

40cfb6f

* parser: don't crash on unterminated table key Fixes pelletier#579 * parser: fix format of error returned by expect EOF was missing the format string and %U is not very human friendly.

unmarshal: fix non-terminated array error

7e2fa1b

Fixes pelletier#581

errors: fix context generation with only one line

4a5ae9e

unmarshal: don't crash on unterminated inline table (pelletier#587)

a0d685d

Fixes pelletier#586

scanner: fix error reporting for last comments (pelletier#591)

f34c9c3

When an invalid TOML expression ends with a comment before the end of file, the decode error would take a nil from scanComment, which is not part of the document. Fixes pelletier#588

parser: don't overflow when parsing bad times (pelletier#593)

fa56f48

Fixes pelletier#585

unmarshal: convert ints if target type is compatible (pelletier#594)

ee9b902

This is required to support custom types. Fixes pelletier#590

unmarshal: support lowercase 'T' and 'Z' in date-time parsing (pellet…

476492a

…ier#601) RFC3399 allows for lowercase 't' and 'z' in date-time values. Fixes pelletier#600

fix(comment test): Fail if DEL 0x7f char is found

f7f02b0

Partially resolves: pelletier#613

fix(comment test): Fail if LF 0x0a char is found

851d9c4

Partially resolves: pelletier#613

fix(comment test): Fail if NUL 0x00 char is found

6bc6702

Oddly enough, the test passes when it shouldn't. Partially resolves: pelletier#613

fix(comment test): Fail if US 0x1f char is found

1c407d1

Oddly enough, the test passes when it shouldn't. Partially resolves: pelletier#613

jidicula closed this Oct 5, 2021

jidicula changed the title ~~Fix failing tests for invalid control characters in comments~~ ~~Fix failing tests for invalid control characters in comments~~ invalid Oct 5, 2021

jidicula changed the title ~~~~Fix failing tests for invalid control characters in comments~~ invalid~~ ~Fix failing tests for invalid control characters in comments~ invalid Oct 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

~Fix failing tests for invalid control characters in comments~ invalid #617

~Fix failing tests for invalid control characters in comments~ invalid #617

jidicula commented Oct 5, 2021

jidicula commented Oct 5, 2021

~Fix failing tests for invalid control characters in comments~ invalid #617

~Fix failing tests for invalid control characters in comments~ invalid #617

Conversation

jidicula commented Oct 5, 2021

jidicula commented Oct 5, 2021