Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize testee memory allocation #215

Merged
merged 3 commits into from
Mar 12, 2019
Merged

Conversation

josharian
Copy link
Collaborator

As a data point, before this PR, fuzzing github.com/dvyukov/go-fuzz-corpus/png maxes out on my machine at around 16k execs/sec. After this PR, that increases to about 26k execs/sec.

Please review carefully; the "avoid the giant buffer" commit in particular is subtle.

@josharian
Copy link
Collaborator Author

josharian commented Mar 2, 2019

@@ -182,22 +182,51 @@ retry:
// This goroutine also collects crash output.
ticker := time.NewTicker(time.Second)
const N = 1 << 20
data := make([]byte, N)
buf := make([]byte, 4096)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are saying this does not allocate? That's because... we store stdoutPipe as concrete type rather than interface?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this should cause stack growth and copy, right? As far as I remember goroutines used to start with 512b stack.
Stacks are not garbage collected, but that still can produce some overhead.
Could you test a fuzz function that does not produce any output with this change versus replacing this whole goroutine with t.outputC <- nil? If that shows any noticeable difference then we could start with a small read buffer as well.

data = make([]byte, N)
copy(data, tmp[:filled])
}
copy(data[filled:], buf[:n])
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably could avoid copying the data as well, but I am not sure it matters (if the main intention is to avoid the "no output" case).

@dvyukov
Copy link
Owner

dvyukov commented Mar 4, 2019

After this, https://go-review.googlesource.com/c/go/+/164960 and https://go-review.googlesource.com/c/go/+/164961 will also help.

Nice! I like efficient standard library!

@dvyukov
Copy link
Owner

dvyukov commented Mar 4, 2019

Have you considered plumbing sync.Pool somewhere here?
I understand that we return the output to caller. But if we restart the binary after 10K iters, in that case we could recycle the buffer.
If we actually crashed, we have a choice of either giving the buffer to caller, or making a smaller copy and recycling the original large buffer.
Just thinking maybe there are some other low hanging fruits here. I have not seen the profiles, though.

Also, potentially we could tune the 1MB buffer size and 10K iters restarts. These were chosen pretty much by a fair dice roll very early in the development.

There is also another interesting option of not collecting output at all... unless the binary has crashed. Just as we run sonar binary only when we actually need its results, we could collect output only when triaging/minimizing a crash.

Another interesting possibility is to enlarge pipe buffer then just read the buffer size with ioctl(FIONREAD) and only if we see that the buffer is close to full, we start doing anything. Otherwise we can simply keep it in the pipe itself. That may have some portability issues, though.

@josharian
Copy link
Collaborator Author

Have you considered plumbing sync.Pool somewhere here?

I didn't use sync.Pool here because it tends to not work well for managing a few large allocations (where large is measured relative to heap size). Discussion at golang/go#22950. Ideally we could fix golang/go#29696 and golang/go#18802 and then build a sync.Pool replacement that was better suited to that kind of use. :)

(Also interesting is that the real cost here isn't so much the allocation as the memclr.)

However, re-reading your comments, it occurs to me that we can just hang a giant buffer off of worker and then have each new testee use it in turn. We wouldn't even need to clear it.

I'll try that approach and report back.

Also, potentially we could tune the 1MB buffer size and 10K iters restarts.

If the worker approach sketched above works, then the 1MB buffer will serve as useful heap ballast (see golang/go#19839 (comment) and golang/go#23044).

I do plan to revisit the 10k iters. I want to do two things:

  • Add a way for a Fuzz function to only execute once. Short term I was thinking a go-fuzz flag. Medium term I'd like to migrate to Fuzz functions taking an object rather than a byte slice, and that object could have an "I can only run once" method on it, which moves that information to the code, where it belongs. (I'll open an issue to discuss this.)

  • Have go-fuzz monitor the child process, e.g. memory usage or average execution time over some sliding window, and make a decision dynamically about when a restart is required.

These are small, but they get used a lot.

Also, binary.Read and binary.Write go through reflect,
which allocates. Not a lot, but this is very high volume.
For a fast-running Fuzz function, we might restart a testee
multiple times a second. This giant allocation gets expensive
in that scenario.

It is also a significant slowdown once go-fuzz starts finding crashers.

Makes testee restarts cheaper is also an important step
towards supporting Fuzz functions that are not idempotent,
e.g. because they are hacked into an existing large code base,
such as cmd/compile.

Instead of allocating 1mb on every restart, allocate a single
re-usable buffer up front. We don't even need to clear it between runs,
which makes this particularly cheap and effective.
The io.Reader contract specifies that Read may
return n > 0 and err != nil, and that in that scenario,
the caller should process those n bytes.

This is rare enough that it probably wasn't causing an actual problem.

Fix it nevertheless; one fewer latent bug.
@josharian
Copy link
Collaborator Author

I'll try that approach and report back.

Done. Much nicer. Thanks for pushing back.

Add a way for a Fuzz function to only execute once.

#218

make a decision dynamically about when a restart is required.

Future work. :)

@@ -184,8 +190,7 @@ retry:
// deadlock we periodically read out stdout.
// This goroutine also collects crash output.
ticker := time.NewTicker(time.Second)
const N = 1 << 20
data := make([]byte, N)
data := buffer
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eeee
this buffer is returned from TestBinary.test and then leaked to crasherQueue
we can't just do this
we can do something like this, but we can reuse the buffer only on periodic testee restart, on crashes we can either handoff the buffer to caller (assuming crashes are not so frequent, or if they are perf does not matter) or make a copy of the used part of the buffer and then reuse the original buffer

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I apologize, I don't follow. A few lines down, around line 215, I see:

		trimmed := make([]byte, filled)
		copy(trimmed, data)
		t.outputC <- trimmed

So it seems like data/buffer doesn't leak. What am I missing?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right!
I assumed we send the buffer to the caller as-is. I guess I did it to not pin the large buffer... perhaps...

@dvyukov dvyukov merged commit dea5d38 into dvyukov:master Mar 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants