Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[std] Add StringBuf.clear() #11848

Open
wants to merge 2 commits into
base: development
Choose a base branch
from

Conversation

Frixuu
Copy link
Contributor

@Frixuu Frixuu commented Dec 1, 2024

Motivation

StringBuf was created as an efficient way to concatenate multiple strings. Its main characteristic is that, on all targets, it's never slower than naively joining those strings, often being much faster.

Given that a substantial portion of Haxe developers are gamedevs, who tend to obsess over extraneous allocations (GC is unpredictable), another important fact about StringBuf is that it also typically allocates less memory than the alternative approach.

However, even in these performance-aware contexts it is necessary to construct many new StringBuf instances. There is no way to reuse the existing instance, perhaps with an already extended internal buffer.

Implementation

This PR introduces a new method, StringBuf.clear(): Void.

Its purpose is to remove all characters from that StringBuf. After clearing a buffer, its length is 0 and toString() returns an empty string. The buffer can then be used again.

  • On JS, PHP and Flash, where the internal buffer is a String and add(x) is implemented as a naive concat b += x, clear simply sets the buffer to an empty string. (This StringBuf instance does not have to be reallocated.)
  • On Neko, clear reallocates the internal buffer. (The buffer's wrapper abstract and this StringBuf instance do not have to be reallocated.)
  • On C++, Eval, HL, Java, Lua and Python, clear moves the buffer index/pointer to 0 (beginning). For performance purposes, the previous data in the buffer is not zeroed.

Closing thoughts

Another suggested name for such a method is reset(). I have not chosen this name, because it implies to me that the whole implementation is being brought back to its original state, including zeroing the internal buffer – which, for performance, is not the case here.

Reusable StringBufs can, potentially, be pooled. However, given many different project needs, especially with regards to threaded usage support, such a pool was not implemented in this PR.

@@ -101,6 +101,10 @@
throw "Invalid unicode char " + c;
}

public function clear():Void {
pos = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the hl.Bytes instance should also be reinitialized for memory reasons. In theory you could have a huge StringBuf that never releases its memory after being cleared.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The StringBuf staying huge after being cleared is intentional! For example, if you built a huge string once:

hl.Gc.major();
Sys.println("before start: " + hl.Gc.stats());

var sb = new StringBuf();
for (i in 0...200000) {
	sb.add("hello");
}

hl.Gc.major();
Sys.println("after filling: " + hl.Gc.stats());
before start: {currentMemory : 589824, allocationCount : 137, totalAllocated : 7104}
after filling: {currentMemory : 3145728, allocationCount : 191, totalAllocated : 7706696}

You may also want to build another one in the future. In that case, you don't pay the cost of allocating buffers. Multiple, if dealing with small elements and the internals need to constantly reallocate and blit.

@:privateAccess sb.pos = 0; // clear()
for (i in 0...200000) {
	sb.add("world");
}

hl.Gc.major();
Sys.println("after reusing: " + hl.Gc.stats());
after reusing: {currentMemory : 3145728, allocationCount : 212, totalAllocated : 7707912}

If you worry however about hl.Bytes leaking, the semantics around it did not change - if Bytes are indeed GC-able, they should leak if and only if the StringBuf also leaks (user error 😅), the same as it currently functions. (Ditto for compacting the heap/releasing memory back to the OS - there's currently nothing stopping users from creating big, non-clearable StringBufs. If this was a concern, this would be a concern right now on stable.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you're saying, it's just that this is not exactly the behavior I'd expect when clearing a buffer. From a design perspective, I expect existingBuf = new StringBuf() and existingBuf.clear() to have the same memory behavior. Introducing a subtle difference like this can easily lead to confusion.

@Frixuu Frixuu force-pushed the stringbuf-shenanigans branch from eb20fac to b658699 Compare December 4, 2024 23:00
@Simn
Copy link
Member

Simn commented Dec 5, 2024

Documentation isn't the solution here, I'll always expect a clear function to just clear the buffer without keeping some reserved memory around. Let's please make sure that this is the standard behavior on all targets.

What you're describing has merit, but it is an orthogonal problem: we would want to be able to pre-allocate memory on a newly created buffer as well, without having to fill it once. Something like Java's ensureCapacity would work for that.

@skial skial mentioned this pull request Dec 9, 2024
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants