Skip to content
This repository has been archived by the owner on May 25, 2022. It is now read-only.

Add possibility to force flush logs after certain period of time #216

Merged
merged 15 commits into from
Jul 15, 2021
Merged

Add possibility to force flush logs after certain period of time #216

merged 15 commits into from
Jul 15, 2021

Conversation

sumo-drosiek
Copy link
Member

Fixes #215

This is work in progress PR. General idea is to keep information about when last log has been properly flushed and if it was long ago, buffered line should be flushed as we do not except more lines

I'm going to make minor refactor, add unit tests and more comments to code

Dominik Rosiek added 5 commits July 9, 2021 17:41
Signed-off-by: Dominik Rosiek <drosiek@sumologic.com>
Signed-off-by: Dominik Rosiek <drosiek@sumologic.com>
Signed-off-by: Dominik Rosiek <drosiek@sumologic.com>
Signed-off-by: Dominik Rosiek <drosiek@sumologic.com>
@sumo-drosiek sumo-drosiek requested a review from a team July 9, 2021 15:44
@sumo-drosiek sumo-drosiek marked this pull request as draft July 9, 2021 15:45
operator/helper/multiline.go Outdated Show resolved Hide resolved
operator/helper/multiline.go Outdated Show resolved Hide resolved
operator/helper/multiline.go Outdated Show resolved Hide resolved
sumo-drosiek and others added 3 commits July 12, 2021 08:49
Co-authored-by: Patryk Małek <pmalek@sumologic.com>
Signed-off-by: Dominik Rosiek <drosiek@sumologic.com>
Signed-off-by: Dominik Rosiek <drosiek@sumologic.com>
@codecov
Copy link

codecov bot commented Jul 12, 2021

Codecov Report

Merging #216 (b2faa2c) into main (2ffe032) will increase coverage by 0.1%.
The diff coverage is 94.3%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main    #216     +/-   ##
=======================================
+ Coverage   75.6%   75.8%   +0.1%     
=======================================
  Files         95      95             
  Lines       4408    4464     +56     
=======================================
+ Hits        3334    3384     +50     
- Misses       747     751      +4     
- Partials     327     329      +2     
Impacted Files Coverage Δ
operator/builtin/input/file/file.go 72.2% <55.5%> (-1.3%) ⬇️
operator/builtin/input/file/config.go 77.3% <100.0%> (+0.3%) ⬆️
operator/builtin/input/file/reader.go 63.5% <100.0%> (-1.0%) ⬇️
operator/builtin/input/tcp/tcp.go 74.5% <100.0%> (ø)
operator/builtin/input/udp/udp.go 71.1% <100.0%> (ø)
operator/helper/multiline.go 89.5% <100.0%> (+5.2%) ⬆️

Dominik Rosiek added 3 commits July 12, 2021 12:06
After this change there is an issue only with double following force flushes
IMO if this occurs, there is issue with regexes configuration, because if
two following flushes from application to file doesn't match the regex,
something seems to be invalid

Signed-off-by: Dominik Rosiek <drosiek@sumologic.com>
Signed-off-by: Dominik Rosiek <drosiek@sumologic.com>
Signed-off-by: Dominik Rosiek <drosiek@sumologic.com>
@sumo-drosiek sumo-drosiek marked this pull request as ready for review July 12, 2021 11:22
@@ -156,7 +157,7 @@ func (c InputConfig) Build(context operator.BuildContext) ([]operator.Operator,
InputOperator: inputOperator,
Include: c.Include,
Exclude: c.Exclude,
SplitFunc: splitFunc,
Multiline: c.Multiline,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this is just a naming issue, but I'll have a difficult time accepting that the splitfunc is being delegated to a Multiline struct. Put another way, we need a split func regardless of whether or not we are using multiline splitting.

I suspect that we can accomplish the same functionality as you are implementing, but by composing things in a better way.

Would it make sense to create a new struct which would pair the concept of a split func with the concept of a flush? Maybe something like:

type Splitter struct {
    splitFunc // can be simple like '\n' or multiline config
    flusher     // forces a flush based on timing
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After looking at this further, I'm not convinced we even need to do what I've suggested here. Please see my related comment on reader.go. I'll leave this comment in place as a possible alternative, but I think we need to start by justifying the need to alter splitFunc behavior in any way.

Comment on lines 42 to 43
set `force_flush_period` option to [duration string](https://golang.org/pkg/time/#ParseDuration),
eg: `5s`, `1m`. It's by default `0s` which means, that no force flushing will be performed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency, I think we should use helper.Duration instead of time.Duration. We can then link to https://github.com/open-telemetry/opentelemetry-log-collection/blob/main/docs/types/duration.md, which explains formatting sufficiently.

This is basically the same thing, but with a default unit of seconds, since nanoseconds is basically never reasonable in this context.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated documentation

@@ -116,7 +120,7 @@ func (r *Reader) ReadToEnd(ctx context.Context) {
return
}

scanner := NewPositionalScanner(r, r.fileInput.MaxLogSize, r.Offset, r.fileInput.SplitFunc)
scanner := NewPositionalScanner(r, r.fileInput.MaxLogSize, r.Offset, r.multiline.SplitFunc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is the only place we are using the split func, so I'm not sure why it needs to be part of the multiline config.

We have two concepts here:

  1. How to split lines
  2. When to force a flush

I'm not convinced these are dependent on each other, or that either is necessarily a concern of multiline behavior. (Of course a split func may implement multiline splitting, but it does not always.)

Can we leave splitFunc as it was and just add a new config that controls flushing?

Copy link
Member Author

@sumo-drosiek sumo-drosiek Jul 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is some relations I was unsure how to resolve, so I put both functionality into one struct:

  • splitFunc needs to be aware when to force a flush (so new structure is created and pass to the splitFunc)
  • every file should be flushed separately (so splitFunc is created for every unique reader)

Regarding line splitting. Of course it is not always a multiline behavior, but looking at it from user perspective it is intuitive to use multline config for it, so Multiline struct underneath, than mixing Splitter struct with multiline config, or using splitter config.

I will exclude force a flush to separate config, as logic of it is mostly out of the splitFunc, but not sure how to resolve other dependencies in nice way

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made some changes in requested direction, but after all I think Splitter struct seems to be reasonable consensus.

You can see current implementation without Splitter. There are two dependent entities: splitFunc and ForceFlush and I feel that they should be connected together via additional struct. WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think consolidating into one struct makes sense

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the latest code, please

Dominik Rosiek added 2 commits July 13, 2021 08:37
Signed-off-by: Dominik Rosiek <drosiek@sumologic.com>
Signed-off-by: Dominik Rosiek <drosiek@sumologic.com>
Dominik Rosiek added 2 commits July 14, 2021 11:16
…ltiline as Splitter eventually

Signed-off-by: Dominik Rosiek <drosiek@sumologic.com>
Signed-off-by: Dominik Rosiek <drosiek@sumologic.com>
@djaglowski
Copy link
Member

Thanks @sumo-drosiek, nice improvement!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

file_input: last log is not being flushed if multiline is used
3 participants