-
Notifications
You must be signed in to change notification settings - Fork 30.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explore new text buffer for performance improvement #41042
Comments
Our benchmark does benchmarks against multiple file sizes (1B, 1KB, 64KB, 32MB), the data is good for verifying code changes and potential improvements. As they are mock data, they may not reflect the real world use case so here I try to do benchmarks against some realistic files
and some manually created large files (I'll replace these files once I found realistic large files)
Text Buffers
1. Model Building
Fun facts here:
2. Mem usageHere we didn't compare heap size as for small files we'll do tokenization and have multiple copies of the buffer (worker, etc) while we stop doing that for large files, so we just check the mem usage of text buffer.
As we can see from above, whether to do sliced string optimization in Line Model is a difficult decision to make. Piece Tree and Edcore don't have this trouble as they don't do string splitting. The reason that Piece Tree uses slightly less memory than Edcore because in Piece Tree, I use UInt16Array to store numbers when they are smaller than 65536, just a small trick. Once the file is opened, we don't need to worry the sliced string problem, so in following benchmarks, we only compare Line Model w/o sliced string opt, Piece Tree and Edcore. 3. EditsFor now I don't have data of users' behaviors while using the editor, but there are still two typical user workflows I'm familiar with
So here I try to mimic these two scenarios. Apply 1000 random edits and 1000 sequential inserts to the document, and see how much time every text buffer needs. Besides, we apply edits one by one to avoid bulk edits optimization by purpose.
Line Model wins when the file size small (KBs), this is as expected: accessing a random position in a small array and tweaking a string which has around 100~150 characters is really fast. It starts to choke when the file as a lot of lines (100k+), you may think the array size should not affect the performance but actually it does as array of objects are usually not sitting in a linear memory space. Seqential inserts in large files makes this situation worse as js engine does heavy lifting when resizing the super large array. Piece Tree behaves stably in these Mega bytes files as each edit is just a string append and some rbtree operations. Edcore's data doesn't look good as I apply edits one by one, then it loses the chance to do bulk edits. As long as each edit takes less than 1ms, text buffer performance is the last one to blame. However if an edit takes ~10ms, it will affect frame rates. 4. Get line content
To avoid js engine optimization, every time we get the content of a line, we'll do some simple math let firstChar = str.charCodeAt(0);
let lastChar = str.charCodeAt(str.length - 1);
firstChar = firstChar - lastChar;
lastChar = firstChar + lastChar;
firstChar = lastChar - firstChar;
Piece Tree looks pretty bad in this round of competition, the time complexity of lookup in a balanced tree is O(lgM) (M is tree node count) so after 1000 random edits, we may have ~2000 nodes in Piece Tree. It doesn't mean there is no way to optimize this scenario
However, IMHO in practice, current piece tree's performance is not a problem. Take the last row of above table as example, reading all lines after 1000 sequential inserts in checker*128.ts file takes 1.8s, reading one line takes 0.6 nano second approximately so when you jumping around in the document, reading hundreds of lines around your cursor take less than 0.1ms. Not even to say tokenizer, it may need tens of minutes to tokenize all lines. 5. Read random windowsThis benchmark tries to mimic the behavior of Go To Definition/Page Down/etc, which refreshes the whole viewport.
Performance of all three text buffers are acceptable and adding search cache can boost piece tree as described above. 6. Replace AllThe difference between replace all and random edits is the former is bulk edit so text buffer can do some optimization if possible.
|
It's difficult to tell which implementation is better, whether two buffers are behaving similarly, or whether the text buffers scale or not, by reading tables. Here I converted them to charts and then we can have an intuitive feeling of the benchmarks. Model Building/File loadingMem usageMemory usage of Piece Tree is close to the real file size. Edits1000 random edits Line Model chokes when the file size increases, while Piece Tree is really stable. Read All LinesThis is a big TODO for piece tree, search caching can boost it a lot and we'll revisit this benchmark once that's done. Read 10 random windowsSimilar to Read All Lines, we can optimize this case but single digit ms is good always. Replace AllAt this moment, our Piece Tree doesn't do bulk edits, which gives us some space to improve. |
As mentioned above, we have some ways to improve the speed of
This optimization is easy and effective, just avoid bad code. |
Password lists ftw btw: Why you cannot make this modular and use the best text buffer suited for the file size? |
@jens1o ideally we should do that and we already did similar things like when the document is readonly, we can opt in a simple version of text model. However for current situation, switching between Line Model and Piece Tree is not a good idea because
Let's explain the last bullet point, why modeling switching makes file opening slower. Say the line count magic number is 1 million, file size is 64MB. If we choose Line Model Builder as the default one, it's simple when no model switching needs to happen. When it does
The last step is costly and on top of that, when we access these chunks next time, V8 will flatten these cons-string and GC will kick in. For short, going this way will lead to longer file opening and potential UI freezing for ~100ms for large files. If we choose Piece Tree Model Builder as the default one, then for all small files, our file opening is slower than before as
The additional work is usually fast but if your machine is busy or not powerful enough, then you can see the lag every time you open a file. As long as Piece Tree behaves well with most operations, it's a good idea to use one single text buffer implementation. Hope my explanation makes sense @jens1o |
Thanks for your advanced explanation, and yes, it makes sense. Thanks for the heads up! |
Currently our text buffer is on top of an array which stores all lines of a file, it behaves well in most cases (accessing, editing, etc are all fast). When the size of a file or the line breaks count increases, it takes more time to build the text buffer as we are splitting the content by line breaks, and holds more memory. We'll explore some other text buffer implementation to improve the performance and memory usage of above case, with reasonable time loss of line accessing, editing, etc.
Benchmark: Performance and Memory Usage Metrics
The text was updated successfully, but these errors were encountered: