Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore new text buffer for performance improvement #41042

Closed
rebornix opened this issue Jan 2, 2018 · 6 comments
Closed

Explore new text buffer for performance improvement #41042

rebornix opened this issue Jan 2, 2018 · 6 comments
Assignees
Labels
plan-item VS Code - planned item for upcoming
Milestone

Comments

@rebornix
Copy link
Member

rebornix commented Jan 2, 2018

Currently our text buffer is on top of an array which stores all lines of a file, it behaves well in most cases (accessing, editing, etc are all fast). When the size of a file or the line breaks count increases, it takes more time to build the text buffer as we are splitting the content by line breaks, and holds more memory. We'll explore some other text buffer implementation to improve the performance and memory usage of above case, with reasonable time loss of line accessing, editing, etc.

Benchmark: Performance and Memory Usage Metrics

  • Model Builder (for file loading)
  • GetLineContent (for display and tokenization)
  • GetLineContent after edits
  • Edits
  • Search and Replace
@vscodebot vscodebot bot added the editor label Jan 2, 2018
@rebornix rebornix added this to the December 2017/January 2018 milestone Jan 2, 2018
@alexdima alexdima added the plan-item VS Code - planned item for upcoming label Jan 10, 2018
@rebornix
Copy link
Member Author

rebornix commented Jan 25, 2018

Our benchmark does benchmarks against multiple file sizes (1B, 1KB, 64KB, 32MB), the data is good for verifying code changes and potential improvements. As they are mock data, they may not reflect the real world use case so here I try to do benchmarks against some realistic files

and some manually created large files (I'll replace these files once I found realistic large files)

  • heap snapshot of newly opened VSCode Insider, 54MB, 3M lines.
  • checker.ts * 128, 184MB, 3M lines

Text Buffers

  • Line Model
    • In July 2017, we had a huge memory usage improvements Improve memory usage for large files #30180 (comment). avoid sliced string is an interesting one: after Buffer.from().toString(), we can avoid sliced string ref on each splitted string, it saves quite a lot memory when the file has a lot of lines. However it has one side effect, it slows down the file opening as Buffer.from().toString() is costly comparing to other operations. https://bugs.chromium.org/p/v8/issues/detail?id=2869,
    • This month we disabled sliced string optimization.
  • Piece Tree (Piece Table + RBTree)
  • Edcore (Chunk based buffer)

1. Model Building

File opening ~= Convert character encodings + Model building + View rendering

line model w/ sliced string opt line model w/o sliced string opt piece tree edcore
vscode-loader 7.018ms 5.2ms 7.338ms 7.473ms
checker.ts 35.988ms 22.388ms 22.481ms 24.004ms
sqlite.c 124.626 ms 60.413 ms 53.458ms 50.700ms
Russian Dict 577.276ms 311.615ms 212.439ms 217.534ms
heapsnaphot 2217.365ms 1203.953ms 518.287ms 538.233ms
checker128 3033.738ms 1871.421ms 1520.666ms 1594.009ms

Fun facts here:

  • If we don't do sliced string optimization, most time is spent in splitting string into arrays.
  • If we do sliced string optimization, the model building time almost doubles when the file size grows.
  • For Piece Tree and Edcore, most time is spent in finding offsets of line breaks.
  • Checking charCode is cheaper than splitting.

2. Mem usage

Here we didn't compare heap size as for small files we'll do tokenization and have multiple copies of the buffer (worker, etc) while we stop doing that for large files, so we just check the mem usage of text buffer.

file size line model w/ sliced string opt line model w/o sliced string opt piece tree edcore
vscode-loader 77 KB 147 KB 106 KB 83 KB 76 KB
checker.ts 1.47 MB 2.5 MB 2.8 MB 1.5 MB 1.5 MB
sqlite.c 4.3 MB 9.7 MB 11 MB 4.8 MB 5 MB
Russian Dict 14 MB 42 MB 52 MB 21.8 MB 22.8 MB
heapsnaphot 52 MB 195 MB 242 MB 61.7 MB 67.7 MB
checker128 184 MB 331 MB 373 MB 201 MB 207 MB

As we can see from above, whether to do sliced string optimization in Line Model is a difficult decision to make. sliced string uses more memory (proportional to line count), but reducing the memory takes time.

Piece Tree and Edcore don't have this trouble as they don't do string splitting. The reason that Piece Tree uses slightly less memory than Edcore because in Piece Tree, I use UInt16Array to store numbers when they are smaller than 65536, just a small trick.


Once the file is opened, we don't need to worry the sliced string problem, so in following benchmarks, we only compare Line Model w/o sliced string opt, Piece Tree and Edcore.

3. Edits

For now I don't have data of users' behaviors while using the editor, but there are still two typical user workflows I'm familiar with

  • Jumping around in the document, and make edits. For instance, changing function signature.
  • Typing in sequence. It's not often but I do enter zen mode sometimes when writing blogs.

So here I try to mimic these two scenarios. Apply 1000 random edits and 1000 sequential inserts to the document, and see how much time every text buffer needs. Besides, we apply edits one by one to avoid bulk edits optimization by purpose.

File line model piece tree edcore
loader.js apply 1000 random edits 4.031 ms 10.728 ms 134.135 ms
apply 1000 sequential inserts 43.830 ms 4.080 ms 146.190 ms
checker.ts apply 1000 random edits 22.095 ms 21.397 ms 232.597 ms
apply 1000 sequential inserts 43.393 ms 7.943 ms 172.539 ms
sqlite3.c apply 1000 random edits 63.947 ms 20.166 ms 256.309 ms
apply 1000 sequential inserts 333.583 ms 14.632 ms 265.432 ms
Russian-English Bilingual.dic apply 1000 random edits 218.894 ms 15.317 ms 279.063 ms
apply 1000 sequential inserts 998.764 ms 7.257 ms 215.604 ms
heapfile apply 1000 random edits 1470.209 ms 13.379 ms 530.236 ms
apply 1000 sequential inserts 11451.717 ms 23.012 ms 490.390 ms
checker128 apply 1000 random edits 1268.692 ms 16.569 ms 970.225 ms
apply 1000 sequential inserts 12581.204 ms 18.034 ms 896.744 ms

Line Model wins when the file size small (KBs), this is as expected: accessing a random position in a small array and tweaking a string which has around 100~150 characters is really fast. It starts to choke when the file as a lot of lines (100k+), you may think the array size should not affect the performance but actually it does as array of objects are usually not sitting in a linear memory space.

image

Seqential inserts in large files makes this situation worse as js engine does heavy lifting when resizing the super large array.

Piece Tree behaves stably in these Mega bytes files as each edit is just a string append and some rbtree operations. Edcore's data doesn't look good as I apply edits one by one, then it loses the chance to do bulk edits.

As long as each edit takes less than 1ms, text buffer performance is the last one to blame. However if an edit takes ~10ms, it will affect frame rates.

4. Get line content

getLineContent is one of the hottest method in text buffer, tokenizer, lsp, link detector, etc, any component relying on document content will finally invoke this method. When we switch text buffer implementation, we should make sure there is no significant downshift.

To avoid js engine optimization, every time we get the content of a line, we'll do some simple math

let firstChar = str.charCodeAt(0);
let lastChar = str.charCodeAt(str.length - 1);
firstChar = firstChar - lastChar;
lastChar = firstChar + lastChar;
firstChar = lastChar - firstChar;
File Read all lines line model piece tree edcore
loader.js after 1000 random edits 1.207 ms 4.272 ms 1.893 ms
after 1000 sequential inserts 0.627 ms 2.556 ms 1.617 ms
checker.ts after 1000 random edits 4.493 ms 20.356 ms 9.219 ms
after 1000 sequential inserts 3.608 ms 11.665 ms 6.729 ms
sqlite3.c after 1000 random edits 24.891 ms 70.447 ms 36.427 ms
after 1000 sequential inserts 10.960 ms 61.909 ms 38.286 ms
Russian-English Bilingual.dic after 1000 random edits 24.226 ms 234.301 ms 140.226 ms
after 1000 sequential inserts 7.712 ms 249.257 ms 138.410 ms
heapfile after 1000 random edits 25.148 ms 1117.376 ms 649.585 ms
after 1000 sequential inserts 31.662 ms 1343.143 ms 862.602 ms
checker128 Read all lines after 1000 random edits 324.857 ms 1618.054 ms 1087.511 ms
Read all lines after 1000 sequential inserts 449.868 ms 1788.849 ms 1222.106 ms

Piece Tree looks pretty bad in this round of competition, the time complexity of lookup in a balanced tree is O(lgM) (M is tree node count) so after 1000 random edits, we may have ~2000 nodes in Piece Tree. It doesn't mean there is no way to optimize this scenario

  • we can add search cache and then when we/tokenizer are reading content from top to bottom, we only need to do M searchs instead of N (line count).
  • rbtree may not be the best balanced tree in editor environment, but it requires us to have a good understanding of user behaviors. avl tree may be slower in editing, splay tree may perform too many rotations when tokenizer just reads from top to bottom.

However, IMHO in practice, current piece tree's performance is not a problem. Take the last row of above table as example, reading all lines after 1000 sequential inserts in checker*128.ts file takes 1.8s, reading one line takes 0.6 nano second approximately so when you jumping around in the document, reading hundreds of lines around your cursor take less than 0.1ms. Not even to say tokenizer, it may need tens of minutes to tokenize all lines.

5. Read random windows

This benchmark tries to mimic the behavior of Go To Definition/Page Down/etc, which refreshes the whole viewport.

File Read 10 random windows line model piece tree edcore
loader.js after 1000 random edits 0.148 ms 0.445 ms 0.250 ms
after 1000 sequential inserts 0.124 ms 0.300 ms 0.256 ms
checker.ts after 1000 random edits 0.210 ms 0.356 ms 0.222 ms
after 1000 sequential inserts 0.118 ms 0.281 ms 0.225 ms
sqlite3.c after 1000 random edits 0.184 ms 0.317 ms 0.232 ms
after 1000 sequential inserts 0.155 ms 0.280 ms 0.243 ms
Russian-English Bilingual.dic after 1000 random edits 0.209 ms 0.430 ms 0.231 ms
after 1000 sequential inserts 0.144 ms 0.556 ms 0.263 ms
heapfile after 1000 random edits 0.174 ms 0.301 ms 0.391 ms
after 1000 sequential inserts 0.153 ms 1.405 ms 1.397 ms
checker128 Read 10 random windows after 1000 random edits 0.236 ms 0.357 ms 0.362 ms
Read 10 random windows after 1000 sequential inserts 0.574 ms 2.583 ms 1.130 ms

Performance of all three text buffers are acceptable and adding search cache can boost piece tree as described above.

6. Replace All

The difference between replace all and random edits is the former is bulk edit so text buffer can do some optimization if possible.

File line buffer piece table edcore
loader.js replace 10 occurrences 2.064 ms 5.477 ms 4.566 ms
replace 100 occurrences 3.495 ms 4.078 ms 4.376 ms
replace 500 occurrences 10.239 ms 21.802 ms 11.296 ms
checker.ts replace 10 occurrences 2.104 ms 4.963 ms 4.688 ms
replace 100 occurrences 6.073 ms 11.645 ms 4.907 ms
replace 500 occurrences 18.205 ms 24.180 ms 33.752 ms
sqlite3.c replace 10 occurrences 3.106 ms 4.474 ms 5.380 ms
replace 100 occurrences 3.321 ms 14.700 ms 6.948 ms
replace 500 occurrences 12.224 ms 14.927 ms 38.226 ms
Russian-English Bilingual.dic replace 10 occurrences 3.382 ms 3.805 ms 5.114 ms
replace 100 occurrences 2.036 ms 14.354 ms 22.637 ms
replace 500 occurrences 7.087 ms 11.225 ms 74.017 ms
heapfile replace 10 occurrences 4.244 ms 4.093 ms 8.081 ms
replace 100 occurrences 3.341 ms 4.739 ms 30.527 ms
replace 500 occurrences 15.566 ms 9.953 ms 259.491 ms
checker128 replace 10 occurrences 4.186 ms 4.008 ms 7.457 ms
replace 100 occurrences 3.577 ms 4.551 ms 30.715 ms
replace 500 occurrences 31.212 ms 11.769 ms 144.375 ms

@rebornix
Copy link
Member Author

It's difficult to tell which implementation is better, whether two buffers are behaving similarly, or whether the text buffers scale or not, by reading tables. Here I converted them to charts and then we can have an intuitive feeling of the benchmarks.

Model Building/File loading

image

Mem usage

image

Memory usage of Piece Tree is close to the real file size.

Edits

1000 random edits

image

1000 sequential inserts
image

Line Model chokes when the file size increases, while Piece Tree is really stable.

Read All Lines

image

image

This is a big TODO for piece tree, search caching can boost it a lot and we'll revisit this benchmark once that's done.

Read 10 random windows

image

image

Similar to Read All Lines, we can optimize this case but single digit ms is good always.

Replace All

image

image

image

At this moment, our Piece Tree doesn't do bulk edits, which gives us some space to improve.

@rebornix rebornix mentioned this issue Jan 29, 2018
3 tasks
@alexdima alexdima modified the milestones: January 2018, February 2018 Feb 1, 2018
@rebornix
Copy link
Member Author

rebornix commented Feb 7, 2018

As mentioned above, we have some ways to improve the speed of getLineContent. Adding search cache is one obvious solution but after profiling, I found that more time is spent elsewhere

  • Originally there is one internal api on Piece Tree buffer getLineRawContent, which contains the line break. To make the buffer appeal to text model api, I create a new function will will trim the line break at the end. And without surprise, when you have a lot of lines and the string is long, trimming is costly
  • Instead of using (position: Position) as parameter, using (line: number, column: number) reduces object creation and gc

image

image

This optimization is easy and effective, just avoid bad code.

@alexdima alexdima removed their assignment Feb 8, 2018
@jens1o
Copy link
Contributor

jens1o commented Feb 12, 2018

and some manually created large files (I'll replace these files once I found realistic large files)

Password lists ftw
https://github.com/danielmiessler/SecLists/tree/master/Passwords

btw: Why you cannot make this modular and use the best text buffer suited for the file size?

@rebornix
Copy link
Member Author

rebornix commented Feb 12, 2018

@jens1o ideally we should do that and we already did similar things like when the document is readonly, we can opt in a simple version of text model. However for current situation, switching between Line Model and Piece Tree is not a good idea because

  • Line Model works better when the file size is small and line count is small.
  • Users can' tell the difference between Line and Piece Tree when the file is small
  • The way to know if the file is small is easy as we check file metadata at the very beginning, checking how many lines a file has is costly, it requires going through the whole content. It means we can't know about this info in advance.
    • Piece Tree Model Builder ~= Going through the content and find line break
    • Line Model ~= Going through the content, find line break and do line splits (heavy operation).
  • Model switching simply means more time in file opening.

Let's explain the last bullet point, why modeling switching makes file opening slower. Say the line count magic number is 1 million, file size is 64MB.

If we choose Line Model Builder as the default one, it's simple when no model switching needs to happen. When it does

  • Say we receive 1000 chunks (64KB each) and we know the line count is already larger than 1M.
  • We handle following chunks by Piece Tree Model Builder
  • We need to sanitize the first 1000 chunks as it has 1M lines, we can't simple use them as 1M chunks as that would slow down all operations
  • The way to do sanitization is string concatenation, reconstructing 1000 chunks from 1M lines

The last step is costly and on top of that, when we access these chunks next time, V8 will flatten these cons-string and GC will kick in. For short, going this way will lead to longer file opening and potential UI freezing for ~100ms for large files.

If we choose Piece Tree Model Builder as the default one, then for all small files, our file opening is slower than before as

  • We receive all chunks (64KB each), each time we only search for line breaks.
  • After getting the last chunk, we know we need to use Line Model, and then we should do line splitting on each chunk and create an array from them.

The additional work is usually fast but if your machine is busy or not powerful enough, then you can see the lag every time you open a file.

As long as Piece Tree behaves well with most operations, it's a good idea to use one single text buffer implementation. Hope my explanation makes sense @jens1o

@jens1o
Copy link
Contributor

jens1o commented Feb 12, 2018

Thanks for your advanced explanation, and yes, it makes sense. Thanks for the heads up!

@rebornix rebornix closed this as completed Mar 2, 2018
@vscodebot vscodebot bot locked and limited conversation to collaborators Apr 16, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
plan-item VS Code - planned item for upcoming
Projects
None yet
Development

No branches or pull requests

3 participants