Merge remote-tracking branch 'upstream/master' into feature/support-f…

…laky-test-analyser * upstream/master: Add new licence status: expired (elastic#22180) [filebeat][okta] Make cursor optional for okta and update docs (elastic#22091) Add documentation of filestream input (elastic#21615) [Ingest Manager] Skip flaky gateway tests elastic#22177 [CI] set env variable for the params (elastic#22143) Fix zeek connection pipeline (elastic#22151) Fix Google Cloud Function configuration file issues (elastic#22156) Remove old TODO on kubernetes node update (elastic#22074)
v1v · Oct 27, 2020 · 0757395 · 0757395
2 parents df4057f + f0da681
commit 0757395
Show file tree

Hide file tree

Showing 19 changed files with 764 additions and 15 deletions.
diff --git a/CHANGELOG.next.asciidoc b/CHANGELOG.next.asciidoc
@@ -188,6 +188,7 @@ https://github.com/elastic/beats/compare/v7.0.0-alpha2...master[Check the HEAD d
 - The `o365input` and `o365` module now recover from an authentication problem or other fatal errors, instead of terminating. {pull}21259[21258]
 - Orderly close processors when processing pipelines are not needed anymore to release their resources. {pull}16349[16349]
 - Fix memory leak and events duplication in docker autodiscover and add_docker_metadata. {pull}21851[21851]
+- Fix parsing of expired licences. {issue}21112[21112] {pull}22180[22180]
 
 *Auditbeat*
 
@@ -287,6 +288,7 @@ https://github.com/elastic/beats/compare/v7.0.0-alpha2...master[Check the HEAD d
 - Fix checkpoint module when logs contain time field. {pull}20567[20567]
 - Add field limit check for AWS Cloudtrail flattened fields. {pull}21388[21388] {issue}21382[21382]
 - Fix syslog RFC 5424 parsing in the CheckPoint module. {pull}21854[21854]
+- Fix incorrect connection state mapping in zeek connection pipeline. {pull}22151[22151] {issue}22149[22149]
 
 *Heartbeat*
 
@@ -400,6 +402,7 @@ https://github.com/elastic/beats/compare/v7.0.0-alpha2...master[Check the HEAD d
 - Do not need Google credentials if not required for the operation. {issue}17329[17329] {pull}21072[21072]
 - Fix dependency issues of GCP functions. {issue}20830[20830] {pull}21070[21070]
 - Fix catchall bucket config errors by adding more validation. {issue}17572[16282] {pull}20887[16287]
+- Fix Google Cloud Function configuration issue. {issue}20864[20864] {pull}22156[22156]
 
 ==== Added
 
@@ -638,6 +641,7 @@ https://github.com/elastic/beats/compare/v7.0.0-alpha2...master[Check the HEAD d
 - Adding support for FIPS in s3 input {pull}21446[21446]
 - Add SSL option to checkpoint module {pull}19560[19560]
 - Add max_number_of_messages config into s3 input. {pull}21993[21993]
+- Update Okta documentation for new stateful restarts. {pull}22091[22091]
 
 *Heartbeat*
 

diff --git a/Jenkinsfile b/Jenkinsfile
@@ -12,6 +12,7 @@ pipeline {
   agent { label 'ubuntu-18 && immutable' }
   environment {
     AWS_ACCOUNT_SECRET = 'secret/observability-team/ci/elastic-observability-aws-account-auth'
+    AWS_REGION = "${params.awsRegion}"
     REPO = 'beats'
     BASE_DIR = "src/github.com/elastic/${env.REPO}"
     DOCKERELASTIC_SECRET = 'secret/observability-team/ci/docker-registry/prod'
@@ -437,7 +438,7 @@ def withCloudTestEnv(Closure body) {
       error("${AWS_ACCOUNT_SECRET} doesn't contain 'secret_key'")
     }
     maskedVars.addAll([
-      [var: "AWS_REGION", password: params.awsRegion],
+      [var: "AWS_REGION", password: "${env.AWS_REGION}"],
       [var: "AWS_ACCESS_KEY_ID", password: aws.access_key],
       [var: "AWS_SECRET_ACCESS_KEY", password: aws.secret_key],
     ])

diff --git a/filebeat/docs/filebeat-options.asciidoc b/filebeat/docs/filebeat-options.asciidoc
@@ -94,6 +94,8 @@ include::inputs/input-container.asciidoc[]
 
 include::inputs/input-docker.asciidoc[]
 
+include::inputs/input-filestream.asciidoc[]
+
 include::../../x-pack/filebeat/docs/inputs/input-google-pubsub.asciidoc[]
 
 include::../../x-pack/filebeat/docs/inputs/input-http-endpoint.asciidoc[]

diff --git a/filebeat/docs/inputs/input-filestream-file-options.asciidoc b/filebeat/docs/inputs/input-filestream-file-options.asciidoc
diff --git a/filebeat/docs/inputs/input-filestream-reader-options.asciidoc b/filebeat/docs/inputs/input-filestream-reader-options.asciidoc
@@ -0,0 +1,143 @@
+//////////////////////////////////////////////////////////////////////////
+//// This content is shared by Filebeat inputs that use the input
+//// but do not process files (the options for managing files
+//// on disk are not relevant)
+//// If you add IDs to sections, make sure you use attributes to create
+//// unique IDs for each input that includes this file. Use the format:
+//// [id="{beatname_lc}-input-{type}-option-name"]
+//////////////////////////////////////////////////////////////////////////
+
+[float]
+===== `encoding`
+
+The file encoding to use for reading data that contains international
+characters. See the encoding names http://www.w3.org/TR/encoding/[recommended by
+the W3C for use in HTML5].
+
+Valid encodings:
+
+	* `plain`: plain ASCII encoding
+	* `utf-8` or `utf8`: UTF-8 encoding
+	* `gbk`: simplified Chinese charaters
+	* `iso8859-6e`: ISO8859-6E, Latin/Arabic
+	* `iso8859-6i`: ISO8859-6I, Latin/Arabic
+	* `iso8859-8e`: ISO8859-8E, Latin/Hebrew
+	* `iso8859-8i`: ISO8859-8I, Latin/Hebrew
+	* `iso8859-1`: ISO8859-1, Latin-1
+	* `iso8859-2`: ISO8859-2, Latin-2
+	* `iso8859-3`: ISO8859-3, Latin-3
+	* `iso8859-4`: ISO8859-4, Latin-4
+	* `iso8859-5`: ISO8859-5, Latin/Cyrillic
+	* `iso8859-6`: ISO8859-6, Latin/Arabic
+	* `iso8859-7`: ISO8859-7, Latin/Greek
+	* `iso8859-8`: ISO8859-8, Latin/Hebrew
+	* `iso8859-9`: ISO8859-9, Latin-5
+	* `iso8859-10`: ISO8859-10, Latin-6
+	* `iso8859-13`: ISO8859-13, Latin-7
+	* `iso8859-14`: ISO8859-14, Latin-8
+	* `iso8859-15`: ISO8859-15, Latin-9
+	* `iso8859-16`: ISO8859-16, Latin-10
+	* `cp437`: IBM CodePage 437
+	* `cp850`: IBM CodePage 850
+	* `cp852`: IBM CodePage 852
+	* `cp855`: IBM CodePage 855
+	* `cp858`: IBM CodePage 858
+	* `cp860`: IBM CodePage 860
+	* `cp862`: IBM CodePage 862
+	* `cp863`: IBM CodePage 863
+	* `cp865`: IBM CodePage 865
+	* `cp866`: IBM CodePage 866
+	* `ebcdic-037`: IBM CodePage 037
+	* `ebcdic-1040`: IBM CodePage 1140
+	* `ebcdic-1047`: IBM CodePage 1047
+	* `koi8r`: KOI8-R, Russian (Cyrillic)
+	* `koi8u`: KOI8-U, Ukranian (Cyrillic)
+	* `macintosh`: Macintosh encoding
+	* `macintosh-cyrillic`: Macintosh Cyrillic encoding
+	* `windows1250`: Windows1250, Central and Eastern European
+	* `windows1251`: Windows1251, Russian, Serbian (Cyrillic)
+	* `windows1252`: Windows1252, Legacy
+	* `windows1253`: Windows1253, Modern Greek
+	* `windows1254`: Windows1254, Turkish
+	* `windows1255`: Windows1255, Hebrew
+	* `windows1256`: Windows1256, Arabic
+	* `windows1257`: Windows1257, Estonian, Latvian, Lithuanian
+	* `windows1258`: Windows1258, Vietnamese
+	* `windows874`:  Windows874, ISO/IEC 8859-11, Latin/Thai
+	* `utf-16-bom`: UTF-16 with required BOM
+	* `utf-16be-bom`: big endian UTF-16 with required BOM
+	* `utf-16le-bom`: little endian UTF-16 with required BOM
+
+The `plain` encoding is special, because it does not validate or transform any input.
+
+[float]
+[id="{beatname_lc}-input-{type}-exclude-lines"]
+===== `exclude_lines`
+
+A list of regular expressions to match the lines that you want {beatname_uc} to
+exclude. {beatname_uc} drops any lines that match a regular expression in the
+list. By default, no lines are dropped. Empty lines are ignored.
+
+The following example configures {beatname_uc} to drop any lines that start with
+`DBG`.
+
+["source","yaml",subs="attributes"]
+----
+{beatname_lc}.inputs:
+- type: {type}
+  ...
+  exclude_lines: ['^DBG']
+----
+
+See <<regexp-support>> for a list of supported regexp patterns.
+
+[float]
+[id="{beatname_lc}-input-{type}-include-lines"]
+===== `include_lines`
+
+A list of regular expressions to match the lines that you want {beatname_uc} to
+include. {beatname_uc} exports only the lines that match a regular expression in
+the list. By default, all lines are exported. Empty lines are ignored.
+
+The following example configures {beatname_uc} to export any lines that start
+with `ERR` or `WARN`:
+
+["source","yaml",subs="attributes"]
+----
+{beatname_lc}.inputs:
+- type: {type}
+  ...
+  include_lines: ['^ERR', '^WARN']
+----
+
+NOTE: If both `include_lines` and `exclude_lines` are defined, {beatname_uc}
+executes `include_lines` first and then executes `exclude_lines`. The order in
+which the two options are defined doesn't matter. The `include_lines` option
+will always be executed before the `exclude_lines` option, even if
+`exclude_lines` appears before `include_lines` in the config file.
+
+The following example exports all log lines that contain `sometext`,
+except for lines that begin with `DBG` (debug messages):
+
+["source","yaml",subs="attributes"]
+----
+{beatname_lc}.inputs:
+- type: {type}
+  ...
+  include_lines: ['sometext']
+  exclude_lines: ['^DBG']
+----
+
+See <<regexp-support>> for a list of supported regexp patterns.
+
+[float]
+===== `buffer_size`
+
+The size in bytes of the buffer that each harvester uses when fetching a file.
+The default is 16384.
+
+[float]
+===== `message_max_bytes`
+
+The maximum number of bytes that a single log message can have. All bytes after
+`mesage_max_bytes` are discarded and not sent. The default is 10MB (10485760).
diff --git a/filebeat/docs/inputs/input-filestream.asciidoc b/filebeat/docs/inputs/input-filestream.asciidoc
@@ -0,0 +1,165 @@
+:type: filestream
+
+[id="{beatname_lc}-input-{type}"]
+=== filestream input
+
+experimental[]
+
+++++
+<titleabbrev>filestream</titleabbrev>
+++++
+
+Use the `filestream` input to read lines from active log files. It is the
+new, improved alternative to the `log` input. However, a few feature are
+missing from it, e.g. `multiline` or other special parsing capabilities.
+These missing options are probably going to be added again. We strive to
+achieve feature parity, if possible.
+
+To configure this input, specify a list of glob-based <<filestream-input-paths,`paths`>>
+that must be crawled to locate and fetch the log lines.
+
+Example configuration:
+
+["source","yaml",subs="attributes"]
+----
+{beatname_lc}.inputs:
+- type: filestream
+  paths:
+    - /var/log/messages
+    - /var/log/*.log
+----
+
+
+You can apply additional
+<<{beatname_lc}-input-{type}-options,configuration settings>> (such as `fields`,
+`include_lines`, `exclude_lines` and so on) to the lines harvested
+from these files. The options that you specify are applied to all the files
+harvested by this input.
+
+To apply different configuration settings to different files, you need to define
+multiple input sections:
+
+["source","yaml",subs="attributes"]
+----
+{beatname_lc}.inputs:
+- type: filestream <1>
+  paths:
+    - /var/log/system.log
+    - /var/log/wifi.log
+- type: filestream <2>
+  paths:
+    - "/var/log/apache2/*"
+  fields:
+    apache: true
+----
+
+<1> Harvests lines from two files:  `system.log` and
+`wifi.log`.
+<2> Harvests lines from every file in the `apache2` directory, and uses the
+`fields` configuration option to add a field called `apache` to the output.
+
+
+[[filestream-file-identity]]
+==== Reading files on network shares and cloud providers
+
+:WARNING: Filebeat does not support reading from network shares and cloud providers.
+
+However, one of the limitations of these data sources can be mitigated
+if you configure Filebeat adequately.
+
+By default, {beatname_uc} identifies files based on their inodes and
+device IDs. However, on network shares and cloud providers these
+values might change during the lifetime of the file. If this happens
+{beatname_uc} thinks that file is new and resends the whole content
+of the file. To solve this problem you can configure `file_identity` option. Possible
+values besides the default `inode_deviceid` are `path` and `inode_marker`.
+
+Selecting `path` instructs {beatname_uc} to identify files based on their
+paths. This is a quick way to avoid rereading files if inode and device ids
+might change. However, keep in mind if the files are rotated (renamed), they
+will be reread and resubmitted.
+
+The option `inode_marker` can be used if the inodes stay the same even if
+the device id is changed. You should choose this method if your files are
+rotated instead of `path` if possible. You have to configure a marker file
+readable by {beatname_uc} and set the path in the option `path` of `inode_marker`.
+
+The content of this file must be unique to the device. You can put the
+UUID of the device or mountpoint where the input is stored. The following
+example oneliner generates a hidden marker file for the selected mountpoint `/logs`:
+Please note that you should not use this option on Windows as file identifiers might be
+more volatile.
+
+["source","sh",subs="attributes"]
+----
+$ lsblk -o MOUNTPOINT,UUID | grep /logs | awk '{print $2}' >> /logs/.filebeat-marker
+----
+
+To set the generated file as a marker for `file_identity` you should configure
+the input the following way:
+
+["source","yaml",subs="attributes"]
+----
+{beatname_lc}.inputs:
+- type: filestream
+  paths:
+    - /logs/*.log
+  file_identity.inode_marker.path: /logs/.filebeat-marker
+----
+
+
+[[filestream-rotating-logs]]
+==== Reading from rotating logs
+
+When dealing with file rotation, avoid harvesting symlinks. Instead
+use the <<filestream-input-paths>> setting to point to the original file, and specify
+a pattern that matches the file you want to harvest and all of its rotated
+files. Also make sure your log rotation strategy prevents lost or duplicate
+messages. For more information, see <<file-log-rotation>>.
+
+Furthermore, to avoid duplicate of rotated log messages, do not use the
+`path` method for `file_identity`. Or exclude the rotated files with `exclude_files`
+option.
+
+[id="{beatname_lc}-input-{type}-options"]
+==== Prospector options
+
+The `filestream` input supports the following configuration options plus the
+<<{beatname_lc}-input-{type}-common-options>> described later.
+
+[float]
+[[filestream-input-paths]]
+===== `paths`
+
+A list of glob-based paths that will be crawled and fetched. All patterns
+supported by https://golang.org/pkg/path/filepath/#Glob[Go Glob] are also
+supported here. For example, to fetch all files from a predefined level of
+subdirectories, the following pattern can be used: `/var/log/*/*.log`. This
+fetches all `.log` files from the subfolders of `/var/log`. It does not
+fetch log files from the `/var/log` folder itself.
+It is possible to recursively fetch all files in all subdirectories of a directory
+using the optional <<filestream-recursive-glob,`recursive_glob`>> settings.
+
+{beatname_uc} starts a harvester for each file that it finds under the specified
+paths. You can specify one path per line. Each line begins with a dash (-).
+
+[float]
+[[filestream-recursive-glob]]
+===== `prospector.scanner.recursive_glob`
+
+Enable expanding `**` into recursive glob patterns. With this feature enabled,
+the rightmost `**` in each path is expanded into a fixed number of glob
+patterns. For example: `/foo/**` expands to `/foo`, `/foo/*`, `/foo/*/*`, and so
+on. If enabled it expands a single `**` into a 8-level deep `*` pattern.
+
+This feature is enabled by default. Set `prospector.scanner.recursive_glob` to false to
+disable it.
+
+include::../inputs/input-filestream-reader-options.asciidoc[]
+
+include::../inputs/input-filestream-file-options.asciidoc[]
+
+[id="{beatname_lc}-input-{type}-common-options"]
+include::../inputs/input-common-options.asciidoc[]
+
+:type!:
diff --git a/filebeat/docs/modules/okta.asciidoc b/filebeat/docs/modules/okta.asciidoc
@@ -32,12 +32,6 @@ the logs while honoring any
 https://developer.okta.com/docs/reference/rate-limits/[rate-limiting] headers
 sent by Okta.
 
-NOTE: This module does not persist the timestamp of the last read event in
-order to facilitate resuming on restart. This feature will be coming in a future
-version. When you restart the module will read events from the beginning of the
-log. To minimize duplicates documents the module uses the event's Okta UUID
-value as the Elasticsearch `_id`.
-
 This is an example configuration for the module.
 
 [source,yaml]
@@ -99,6 +93,15 @@ information.
       supported_protocols: [TLSv1.2]
 ----
 
+*`var.initial_interval`*::
+
+An initial interval can be defined. The first time the module starts, will fetch events from the current moment minus the initial interval value. Following restarts will fetch events starting from the last event read. It defaults to `24h`.
++
+[source,yaml]
+----
+    var.initial_interval: 24h # will fetch events starting 24h ago.
+----
+
 [float]
 === Example dashboard