-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[THOG-315] Replace bytes.buffer with strings.builder. #533
Conversation
pkg/detectors/deepai/deepai.go
Outdated
@@ -61,7 +61,7 @@ func (s Scanner) FromData(ctx context.Context, verify bool, data []byte) (result | |||
continue | |||
} | |||
writer.Close() | |||
req, err := http.NewRequestWithContext(ctx, "POST", "https://api.deepai.org/api/text-tagging", bytes.NewReader(body.Bytes())) | |||
req, err := http.NewRequestWithContext(ctx, "POST", "https://api.deepai.org/api/text-tagging", bytes.NewReader([]byte(body.String()))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the fact that we're now converting a string back into bytes? I suspect the benchmarks did not capture the full picture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before it was just copying bytes around I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea actually the performance benefit here might not be worth the change.
I was more curious about this portion:
trufflehog/pkg/sources/git/git.go
Lines 334 to 351 in 9a4d139
for _, frag := range file.TextFragments { | |
var sb strings.Builder | |
newLineNumber := frag.NewPosition | |
for _, line := range frag.Lines { | |
if line.Op == gitdiff.OpAdd { | |
sb.WriteString(line.Line) | |
} | |
} | |
log.WithField("fragment", sb.String()).Trace("detecting fragment") | |
metadata := s.sourceMetadataFunc(fileName, email, hash, when, urlMetadata, newLineNumber) | |
chunksChan <- &sources.Chunk{ | |
SourceName: s.sourceName, | |
SourceID: s.sourceID, | |
SourceType: s.sourceType, | |
SourceMetadata: metadata, | |
Data: []byte(sb.String()), | |
Verify: s.verify, | |
} |
Because when i was profiling with large repos the WriteString was eating a lot of memory in comparison to strings.Builder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like some profiling files got committed by accident: |
THanks! Not sure how those slipped in. |
I could imagine the the git source change being a small improvement because it's handling strings multiple times, but probably not the detector changes, because those are pure byte operations. Also, |
Thats interesting that |
I believe one of the major changes is a copy check mechanism which saves a good deal of space. It also prevents it's internal byte slice from escaping such that it can only grow and reset making it immutable. |
Makes me wonder if we'd benefit from reusing the buffer and calling Reset(). |
Ohhhh indeed. I think that actually might even be better. Let me submit this and test that as well. |
What?
Replace the use of bytes.buffer with strings.Builder.
Why?
strings.Builder is a decent bit more efficient. It uses less memory and performs better overall.
How?
Replace usage directly.
Testing?
After running some profiling tests on some of the larger repos/orgs I was able to determine during the gitScan as we were adding git diffs to be scanned the memory allocation for bytest.buffer was ballooning. After changing it to use strings.builder that huge memory allocation was gone.
Screenshots (optional)
Benchmark of the two.
Anything Else?