-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix filebeat file rotation registrar issue #1281 #1375
Conversation
There are some test failures which might or might not be related. |
Windows errors are definitively related. I need to exclude these tests for windows as I check for the inode specifically. For Jenkins it is strange that the tests were working locally and on travis, but no Jenkins. Will look into it, could be a race condition. |
@@ -140,6 +140,9 @@ func (p *ProspectorLog) scanGlob(glob string) { | |||
// Track the stat data for this file for later comparison to check for | |||
// rotation/etc | |||
p.prospectorList[h.Path] = *h.Stat | |||
// TODO: Registrar now updated also if not changes happened. Improve efficiency here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/not/no/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
@urso @andrewkroh Seems to be finally stable. Will rebase and squash. |
|
||
copy := make(map[string]FileState) | ||
for k, v := range r.state { | ||
copy[k] = *v |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check out definition of FileState. The FileStateOS is not copied (only pointer). Is there a chance the content in FileStateOS can be overwritten when operations are run on this copy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All occurence here except for Decode which happens only once at the begining, should be read only. So this should not be an issue right now, but I should still refactor and decouple it in the future to prevent potential problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though "Decode" happens once at startup, the method that does it, LoadState
, is public and could be called at any time. The write operation within LoadState
should be protected by the mutex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Lets see if it breaks something ;-)
98fff3d
to
2ac9daa
Compare
fdd0243
to
91b6957
Compare
@@ -25,7 +26,8 @@ import ( | |||
type Harvester struct { | |||
Path string /* the file path to harvest */ | |||
Config *config.HarvesterConfig | |||
Offset int64 | |||
offset int64 | |||
offsetLock sync.Mutex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why still required? Due to prospector asking for state in order to forward to registrar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I hope to remove this in the future again.
ignoreOlder="1s", | ||
closeOlder="1s", | ||
ignoreOlder="10s", | ||
closeOlder="2s", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just making sure, are these changes intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is intentional, as in a previous version of the change the file was sometimes closed too fast. I will reset it to 1s again as I think with the most recent changes filebeat should be faster again with rotating files. But I renamed the tests to close_older instead of ignore_older as the name is not accurate anymore. It tests close_older. Will push a new commit.
h.offsetLock.Lock() | ||
defer h.offsetLock.Unlock() | ||
|
||
h.offset = offset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just thinking, but wouldn't atomic.Store/Load be as good as mutex?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably also work. Would it make a difference? If not, would keep it like this for the reason that it seems to working now and that it is hopefully removed anyways in the future ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just leave it as is.
When multiple files were rotated at the same time, the first rotation was overwriting the states for the other rotated files. This happened because the new file directly had new content inside and the state for all the other files was fetched from disk, resetting the changes which were in memory. The problem is now fixed by not writing the state to disk every time state changes are discovered by the registrar, as it is up to the prospector to decide if these changes are the most recent ones. Now changes get written to disk every time a scan is done. The downside of this solution is that the registrar is also updated, if no changes did happen. * System test added to verify that issue is fixed * Closes elastic#1281 * Refactor some variables for better sync handling and fix tests * Make offset private and only set it through method * Rename system tests to close_older as this is what is actually checked * Update registrar only when file was rotated or changed * Add test to check state between shutdown and rotation.Only update registry when changes happen.
A direct commit backport was not possible as too many things changed in the prospector and harvester
* New versions for the stack elements * Remove link to github repo for dashboards * Change config example that shows how to turn off process monitoring * Clarify meaning of spool_size and harvester_buffer_size (#1388) * Add more info to doc about configuring TLS (#1414) * Backport of #1375 (#1418) A direct commit backport was not possible as too many things changed in the prospector and harvester
When multiple files were rotated at the same time, the first rotation was overwriting the states for the other rotated files. This happened because the new file directly had new content inside and the state for all the other files was fetched from disk, resetting the changes which were in memory.
The problem is now fixed by not writing the state to disk every time state changes are discovered by the registrar, as it is up to the prospector to decide if these changes are the most recent ones. Now changes get written to disk every time a scan is done. The downside of this solution is that the registrar is also updated, if no changes did happen.