Explore new text buffer for performance improvement #41042

rebornix · 2018-01-02T18:29:39Z

Currently our text buffer is on top of an array which stores all lines of a file, it behaves well in most cases (accessing, editing, etc are all fast). When the size of a file or the line breaks count increases, it takes more time to build the text buffer as we are splitting the content by line breaks, and holds more memory. We'll explore some other text buffer implementation to improve the performance and memory usage of above case, with reasonable time loss of line accessing, editing, etc.

Benchmark: Performance and Memory Usage Metrics

Model Builder (for file loading)
GetLineContent (for display and tokenization)
GetLineContent after edits
Edits
Search and Replace

rebornix · 2018-01-25T00:52:32Z

Our benchmark does benchmarks against multiple file sizes (1B, 1KB, 64KB, 32MB), the data is good for verifying code changes and potential improvements. As they are mock data, they may not reflect the real world use case so here I try to do benchmarks against some realistic files

vscode-loader, 77KB, 1682 lines.
checker.ts 1.46 MB, 26k lines.
sqlite.c 4.31MB, 128k lines.
Russian English Bilingual dictionary 14MB, 552k lines

and some manually created large files (I'll replace these files once I found realistic large files)

heap snapshot of newly opened VSCode Insider, 54MB, 3M lines.
checker.ts * 128, 184MB, 3M lines

Text Buffers

Line Model
- In July 2017, we had a huge memory usage improvements Improve memory usage for large files #30180 (comment). avoid sliced string is an interesting one: after Buffer.from().toString(), we can avoid sliced string ref on each splitted string, it saves quite a lot memory when the file has a lot of lines. However it has one side effect, it slows down the file opening as Buffer.from().toString() is costly comparing to other operations. https://bugs.chromium.org/p/v8/issues/detail?id=2869,
- This month we disabled sliced string optimization.
Piece Tree (Piece Table + RBTree)
Edcore (Chunk based buffer)

1. Model Building

File opening ~= Convert character encodings + Model building + View rendering

	line model w/ sliced string opt	line model w/o sliced string opt	piece tree	edcore
vscode-loader	7.018ms	5.2ms	7.338ms	7.473ms
checker.ts	35.988ms	22.388ms	22.481ms	24.004ms
sqlite.c	124.626 ms	60.413 ms	53.458ms	50.700ms
Russian Dict	577.276ms	311.615ms	212.439ms	217.534ms
heapsnaphot	2217.365ms	1203.953ms	518.287ms	538.233ms
checker128	3033.738ms	1871.421ms	1520.666ms	1594.009ms

Fun facts here:

If we don't do sliced string optimization, most time is spent in splitting string into arrays.
If we do sliced string optimization, the model building time almost doubles when the file size grows.
For Piece Tree and Edcore, most time is spent in finding offsets of line breaks.
Checking charCode is cheaper than splitting.

2. Mem usage

Here we didn't compare heap size as for small files we'll do tokenization and have multiple copies of the buffer (worker, etc) while we stop doing that for large files, so we just check the mem usage of text buffer.

	file size	line model w/ sliced string opt	line model w/o sliced string opt	piece tree	edcore
vscode-loader	77 KB	147 KB	106 KB	83 KB	76 KB
checker.ts	1.47 MB	2.5 MB	2.8 MB	1.5 MB	1.5 MB
sqlite.c	4.3 MB	9.7 MB	11 MB	4.8 MB	5 MB
Russian Dict	14 MB	42 MB	52 MB	21.8 MB	22.8 MB
heapsnaphot	52 MB	195 MB	242 MB	61.7 MB	67.7 MB
checker128	184 MB	331 MB	373 MB	201 MB	207 MB

As we can see from above, whether to do sliced string optimization in Line Model is a difficult decision to make. sliced string uses more memory (proportional to line count), but reducing the memory takes time.

Piece Tree and Edcore don't have this trouble as they don't do string splitting. The reason that Piece Tree uses slightly less memory than Edcore because in Piece Tree, I use UInt16Array to store numbers when they are smaller than 65536, just a small trick.

Once the file is opened, we don't need to worry the sliced string problem, so in following benchmarks, we only compare Line Model w/o sliced string opt, Piece Tree and Edcore.

3. Edits

For now I don't have data of users' behaviors while using the editor, but there are still two typical user workflows I'm familiar with

Jumping around in the document, and make edits. For instance, changing function signature.
Typing in sequence. It's not often but I do enter zen mode sometimes when writing blogs.

So here I try to mimic these two scenarios. Apply 1000 random edits and 1000 sequential inserts to the document, and see how much time every text buffer needs. Besides, we apply edits one by one to avoid bulk edits optimization by purpose.

File		line model	piece tree	edcore
loader.js	apply 1000 random edits	4.031 ms	10.728 ms	134.135 ms
	apply 1000 sequential inserts	43.830 ms	4.080 ms	146.190 ms
checker.ts	apply 1000 random edits	22.095 ms	21.397 ms	232.597 ms
	apply 1000 sequential inserts	43.393 ms	7.943 ms	172.539 ms
sqlite3.c	apply 1000 random edits	63.947 ms	20.166 ms	256.309 ms
	apply 1000 sequential inserts	333.583 ms	14.632 ms	265.432 ms
Russian-English Bilingual.dic	apply 1000 random edits	218.894 ms	15.317 ms	279.063 ms
	apply 1000 sequential inserts	998.764 ms	7.257 ms	215.604 ms
heapfile	apply 1000 random edits	1470.209 ms	13.379 ms	530.236 ms
	apply 1000 sequential inserts	11451.717 ms	23.012 ms	490.390 ms
checker128	apply 1000 random edits	1268.692 ms	16.569 ms	970.225 ms
	apply 1000 sequential inserts	12581.204 ms	18.034 ms	896.744 ms

Line Model wins when the file size small (KBs), this is as expected: accessing a random position in a small array and tweaking a string which has around 100~150 characters is really fast. It starts to choke when the file as a lot of lines (100k+), you may think the array size should not affect the performance but actually it does as array of objects are usually not sitting in a linear memory space.

Seqential inserts in large files makes this situation worse as js engine does heavy lifting when resizing the super large array.

Piece Tree behaves stably in these Mega bytes files as each edit is just a string append and some rbtree operations. Edcore's data doesn't look good as I apply edits one by one, then it loses the chance to do bulk edits.

As long as each edit takes less than 1ms, text buffer performance is the last one to blame. However if an edit takes ~10ms, it will affect frame rates.

4. Get line content

getLineContent is one of the hottest method in text buffer, tokenizer, lsp, link detector, etc, any component relying on document content will finally invoke this method. When we switch text buffer implementation, we should make sure there is no significant downshift.

To avoid js engine optimization, every time we get the content of a line, we'll do some simple math

let firstChar = str.charCodeAt(0);
let lastChar = str.charCodeAt(str.length - 1);
firstChar = firstChar - lastChar;
lastChar = firstChar + lastChar;
firstChar = lastChar - firstChar;

File	Read all lines	line model	piece tree	edcore
loader.js	after 1000 random edits	1.207 ms	4.272 ms	1.893 ms
	after 1000 sequential inserts	0.627 ms	2.556 ms	1.617 ms
checker.ts	after 1000 random edits	4.493 ms	20.356 ms	9.219 ms
	after 1000 sequential inserts	3.608 ms	11.665 ms	6.729 ms
sqlite3.c	after 1000 random edits	24.891 ms	70.447 ms	36.427 ms
	after 1000 sequential inserts	10.960 ms	61.909 ms	38.286 ms
Russian-English Bilingual.dic	after 1000 random edits	24.226 ms	234.301 ms	140.226 ms
	after 1000 sequential inserts	7.712 ms	249.257 ms	138.410 ms
heapfile	after 1000 random edits	25.148 ms	1117.376 ms	649.585 ms
	after 1000 sequential inserts	31.662 ms	1343.143 ms	862.602 ms
checker128	Read all lines after 1000 random edits	324.857 ms	1618.054 ms	1087.511 ms
	Read all lines after 1000 sequential inserts	449.868 ms	1788.849 ms	1222.106 ms

Piece Tree looks pretty bad in this round of competition, the time complexity of lookup in a balanced tree is O(lgM) (M is tree node count) so after 1000 random edits, we may have ~2000 nodes in Piece Tree. It doesn't mean there is no way to optimize this scenario

we can add search cache and then when we/tokenizer are reading content from top to bottom, we only need to do M searchs instead of N (line count).
rbtree may not be the best balanced tree in editor environment, but it requires us to have a good understanding of user behaviors. avl tree may be slower in editing, splay tree may perform too many rotations when tokenizer just reads from top to bottom.

However, IMHO in practice, current piece tree's performance is not a problem. Take the last row of above table as example, reading all lines after 1000 sequential inserts in checker*128.ts file takes 1.8s, reading one line takes 0.6 nano second approximately so when you jumping around in the document, reading hundreds of lines around your cursor take less than 0.1ms. Not even to say tokenizer, it may need tens of minutes to tokenize all lines.

5. Read random windows

This benchmark tries to mimic the behavior of Go To Definition/Page Down/etc, which refreshes the whole viewport.

File	Read 10 random windows	line model	piece tree	edcore
loader.js	after 1000 random edits	0.148 ms	0.445 ms	0.250 ms
	after 1000 sequential inserts	0.124 ms	0.300 ms	0.256 ms
checker.ts	after 1000 random edits	0.210 ms	0.356 ms	0.222 ms
	after 1000 sequential inserts	0.118 ms	0.281 ms	0.225 ms
sqlite3.c	after 1000 random edits	0.184 ms	0.317 ms	0.232 ms
	after 1000 sequential inserts	0.155 ms	0.280 ms	0.243 ms
Russian-English Bilingual.dic	after 1000 random edits	0.209 ms	0.430 ms	0.231 ms
	after 1000 sequential inserts	0.144 ms	0.556 ms	0.263 ms
heapfile	after 1000 random edits	0.174 ms	0.301 ms	0.391 ms
	after 1000 sequential inserts	0.153 ms	1.405 ms	1.397 ms
checker128	Read 10 random windows after 1000 random edits	0.236 ms	0.357 ms	0.362 ms
	Read 10 random windows after 1000 sequential inserts	0.574 ms	2.583 ms	1.130 ms

Performance of all three text buffers are acceptable and adding search cache can boost piece tree as described above.

6. Replace All

The difference between replace all and random edits is the former is bulk edit so text buffer can do some optimization if possible.

File		line buffer	piece table	edcore
loader.js	replace 10 occurrences	2.064 ms	5.477 ms	4.566 ms
	replace 100 occurrences	3.495 ms	4.078 ms	4.376 ms
	replace 500 occurrences	10.239 ms	21.802 ms	11.296 ms
checker.ts	replace 10 occurrences	2.104 ms	4.963 ms	4.688 ms
	replace 100 occurrences	6.073 ms	11.645 ms	4.907 ms
	replace 500 occurrences	18.205 ms	24.180 ms	33.752 ms
sqlite3.c	replace 10 occurrences	3.106 ms	4.474 ms	5.380 ms
	replace 100 occurrences	3.321 ms	14.700 ms	6.948 ms
	replace 500 occurrences	12.224 ms	14.927 ms	38.226 ms
Russian-English Bilingual.dic	replace 10 occurrences	3.382 ms	3.805 ms	5.114 ms
	replace 100 occurrences	2.036 ms	14.354 ms	22.637 ms
	replace 500 occurrences	7.087 ms	11.225 ms	74.017 ms
heapfile	replace 10 occurrences	4.244 ms	4.093 ms	8.081 ms
	replace 100 occurrences	3.341 ms	4.739 ms	30.527 ms
	replace 500 occurrences	15.566 ms	9.953 ms	259.491 ms
checker128	replace 10 occurrences	4.186 ms	4.008 ms	7.457 ms
	replace 100 occurrences	3.577 ms	4.551 ms	30.715 ms
	replace 500 occurrences	31.212 ms	11.769 ms	144.375 ms

rebornix · 2018-01-25T18:15:37Z

It's difficult to tell which implementation is better, whether two buffers are behaving similarly, or whether the text buffers scale or not, by reading tables. Here I converted them to charts and then we can have an intuitive feeling of the benchmarks.

Model Building/File loading

Mem usage

Memory usage of Piece Tree is close to the real file size.

Edits

1000 random edits

1000 sequential inserts

Line Model chokes when the file size increases, while Piece Tree is really stable.

Read All Lines

This is a big TODO for piece tree, search caching can boost it a lot and we'll revisit this benchmark once that's done.

Read 10 random windows

Similar to Read All Lines, we can optimize this case but single digit ms is good always.

Replace All

At this moment, our Piece Tree doesn't do bulk edits, which gives us some space to improve.

rebornix · 2018-02-07T20:34:03Z

As mentioned above, we have some ways to improve the speed of getLineContent. Adding search cache is one obvious solution but after profiling, I found that more time is spent elsewhere

Originally there is one internal api on Piece Tree buffer getLineRawContent, which contains the line break. To make the buffer appeal to text model api, I create a new function will will trim the line break at the end. And without surprise, when you have a lot of lines and the string is long, trimming is costly
Instead of using (position: Position) as parameter, using (line: number, column: number) reduces object creation and gc

This optimization is easy and effective, just avoid bad code.

jens1o · 2018-02-12T15:42:00Z

and some manually created large files (I'll replace these files once I found realistic large files)

Password lists ftw
https://github.com/danielmiessler/SecLists/tree/master/Passwords

btw: Why you cannot make this modular and use the best text buffer suited for the file size?

rebornix · 2018-02-12T16:37:58Z

@jens1o ideally we should do that and we already did similar things like when the document is readonly, we can opt in a simple version of text model. However for current situation, switching between Line Model and Piece Tree is not a good idea because

Line Model works better when the file size is small and line count is small.
Users can' tell the difference between Line and Piece Tree when the file is small
The way to know if the file is small is easy as we check file metadata at the very beginning, checking how many lines a file has is costly, it requires going through the whole content. It means we can't know about this info in advance.
- Piece Tree Model Builder ~= Going through the content and find line break
- Line Model ~= Going through the content, find line break and do line splits (heavy operation).
Model switching simply means more time in file opening.

Let's explain the last bullet point, why modeling switching makes file opening slower. Say the line count magic number is 1 million, file size is 64MB.

If we choose Line Model Builder as the default one, it's simple when no model switching needs to happen. When it does

Say we receive 1000 chunks (64KB each) and we know the line count is already larger than 1M.
We handle following chunks by Piece Tree Model Builder
We need to sanitize the first 1000 chunks as it has 1M lines, we can't simple use them as 1M chunks as that would slow down all operations
The way to do sanitization is string concatenation, reconstructing 1000 chunks from 1M lines

The last step is costly and on top of that, when we access these chunks next time, V8 will flatten these cons-string and GC will kick in. For short, going this way will lead to longer file opening and potential UI freezing for ~100ms for large files.

If we choose Piece Tree Model Builder as the default one, then for all small files, our file opening is slower than before as

We receive all chunks (64KB each), each time we only search for line breaks.
After getting the last chunk, we know we need to use Line Model, and then we should do line splitting on each chunk and create an array from them.

The additional work is usually fast but if your machine is busy or not powerful enough, then you can see the lag every time you open a file.

As long as Piece Tree behaves well with most operations, it's a good idea to use one single text buffer implementation. Hope my explanation makes sense @jens1o

jens1o · 2018-02-12T17:34:15Z

Thanks for your advanced explanation, and yes, it makes sense. Thanks for the heads up!

vscodebot bot added the editor label Jan 2, 2018

rebornix assigned rebornix and alexdima Jan 2, 2018

rebornix added this to the December 2017/January 2018 milestone Jan 2, 2018

kieferrm mentioned this issue Jan 2, 2018

Iteration Plan for January 2018 #41061

Closed

55 tasks

alexdima added the plan-item VS Code - planned item for upcoming label Jan 10, 2018

rebornix mentioned this issue Jan 29, 2018

Test: new text buffer #42357

Closed

3 tasks

alexdima modified the milestones: January 2018, February 2018 Feb 1, 2018

alexdima removed their assignment Feb 8, 2018

kieferrm mentioned this issue Feb 9, 2018

Iteration Plan for February 2018 #43361

Closed

45 tasks

rebornix closed this as completed Mar 2, 2018

vscodebot bot locked and limited conversation to collaborators Apr 16, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explore new text buffer for performance improvement #41042

Explore new text buffer for performance improvement #41042

rebornix commented Jan 2, 2018

rebornix commented Jan 25, 2018 •

edited

Loading

rebornix commented Jan 25, 2018

rebornix commented Feb 7, 2018

jens1o commented Feb 12, 2018 •

edited

Loading

rebornix commented Feb 12, 2018 •

edited

Loading

jens1o commented Feb 12, 2018

Explore new text buffer for performance improvement #41042

Explore new text buffer for performance improvement #41042

Comments

rebornix commented Jan 2, 2018

rebornix commented Jan 25, 2018 • edited Loading

Text Buffers

1. Model Building

2. Mem usage

3. Edits

4. Get line content

5. Read random windows

6. Replace All

rebornix commented Jan 25, 2018

Model Building/File loading

Mem usage

Edits

Read All Lines

Read 10 random windows

Replace All

rebornix commented Feb 7, 2018

jens1o commented Feb 12, 2018 • edited Loading

rebornix commented Feb 12, 2018 • edited Loading

jens1o commented Feb 12, 2018

rebornix commented Jan 25, 2018 •

edited

Loading

jens1o commented Feb 12, 2018 •

edited

Loading

rebornix commented Feb 12, 2018 •

edited

Loading