-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pkg/stanza] - Performance improvements while comparing fingerprints in fileconsumer #29273
Comments
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
Summarizing @swiatekm-sumo 's proposal:
Using the above solution + trie wouldn't make much sense. as we'd need to run multiple checks on the trie with different hashes. |
I'm happy to see a competition of ideas here. The current tracking mechanism cannot possibly be the best, so I'm open to anything which improves upon the situation. I think the right way to evaluate this is to clearly define the requirements. Then we only need validate correctness and compare performance. Here's my best attempt. This incorporates the notion of "singleton files", which I believe is critical to properly hardening this package. This was loosely introduced in #27823 but has not yet been strictly enforced. Please call out if I've missed something.
|
To illustrate what I mean by the above, with a simplified management struct, I think we should end up with something like the following: type Tracker struct {
readingFiles FileSet
openFiles FileSet
closedFiles []FileSet
}
// When opening a new file to determine its contents and whether to read
func (t Tracker) readFile(path string) {
// respect max_concurrent_files
if len(r.readingFiles) + len(r.openFiles) >= t.maxConcurrentFiles {
t.closeOne() // maybe need to also track order in which files were added to set
}
id, handle := readID(path string)
if id == nil { // don't bother with empty files
handle.Close()
return
}
// first check if we're already reading it
if f := t.readingFiles(id); f != nil {
handle.Close()
t.readingFiles.Add(f) // readd, since it was removed by Get
return
}
// next best a file that's still open
if f := t.openFiles(id); f != nil {
handle.Close()
t.readingFiles.Add(f)
return true
}
// if we remember it, we can pick up from a checkpoint
for i := 0; i < len(t.closedFiles[i]); i++ {
if md := t.closedFiles[i]; md != nil {
f := reader.NewFromMetadata(md, handle)
t.readingFiles.Add(f)
return
}
}
// No memory of this file
f := reader.New(id, handle)
t.readingFiles.Add(f)
return
}
// When we've read to the end of a file
func (t Tracker) doneReading(id FileID) {
f := t.readingFiles.Get(f.ID)
t.openFiles.Add(id, f)
}
// When we need to close a file
func (t Tracker) closeFile(id FileID) {
f := t.openFiles.Get(f.ID)
md := f.Close()
t.closedFiles.Add(id, md)
}
// When we reach the end of a poll cycle
func (t Tracker) forgetOldest() {
// drop oldest, shift each set, instantiate new at [0]
} |
I like the simplicity here. Thanks for clarifying! |
@djaglowski I compared both approaches.
The results are almost identical considering normal cases. TRIE sometimes falls short, because the writing complexity is always |
Thanks for the analysis @VihasMakwana |
**Description:** Following up from #30219. Adding a new package for fileset. **Link to tracking Issue:** [29273](#29273 (comment)) **Testing:** <Describe what testing was performed and which tests were added.> **Documentation:** <Describe the documentation added.>
**Description:** Following up from open-telemetry#30219. Adding a new package for fileset. **Link to tracking Issue:** [29273](open-telemetry#29273 (comment)) **Testing:** <Describe what testing was performed and which tests were added.> **Documentation:** <Describe the documentation added.>
Pinging code owners for receiver/filelog: @djaglowski. See Adding Labels via Comments if you do not have permissions to add labels yourself. |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
Closing based on #31317 (comment). If anyone else wants to make a PR with another implementation, please feel free to ping me and I'll reopen the issue. Otherwise, this is unplanned. |
Originally posted by @swiatekm-sumo in #29106 (comment)
I'm converting the above discussion to a new issue—more in the comments.
The text was updated successfully, but these errors were encountered: