config: multiline: in_tail: filter_multiline: Add configurable buffer limit for multiline interface #10653

cosmo0920 · 2025-07-28T04:09:39Z

We added an interface for configurable buffer limit for multiline.
Also, we implemented robust processing for multiline concatenations.

Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

Example configuration file for the change
Debug log output from testing the change

Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

Run local packaging test showing all targets (including any new ones) build.
Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

Documentation required for this feature

Backporting

Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Summary by CodeRabbit

New Features
- Configurable multiline buffer size limit (accepts KiB/MiB/GiB binary-size strings).
- New per-input and per-filter truncated multiline metrics.
Behavior Changes / Bug Fixes
- Multiline truncation is now signaled, added to event metadata, and emits warnings when limits are reached.
Tests
- Added tests for buffer-limit truncation and binary-size parsing.

coderabbitai · 2025-08-01T11:28:13Z

Walkthrough

Adds a configurable multiline buffer limit and binary-size parser; enforces per-group buffer limits with truncation tracking and metadata; propagates FLB_MULTILINE_TRUNCATED through multiline core and plugins; exposes truncation metrics; introduces a params-based multiline parser API and related tests.

Changes

Cohort / File(s)	Change Summary
Config field `include/fluent-bit/flb_config.h`, `src/flb_config.c`	Added `FLB_CONF_STR_MULTILINE_BUFFER_LIMIT` and `char *multiline_buffer_limit`; registered the config entry and manage its lifetime.
Multiline core types & defaults `include/fluent-bit/multiline/flb_ml.h`, `src/multiline/flb_ml.h`	Added buffer-limit defaults (`FLB_ML_BUFFER_LIMIT_DEFAULT_STR`, `FLB_ML_BUFFER_LIMIT_DEFAULT`), status codes (`FLB_MULTILINE_OK`, `FLB_MULTILINE_PROCESSED`, `FLB_MULTILINE_TRUNCATED`), `flb_ml.buffer_limit`, and `flb_ml_stream_group.truncated` + `stream`.
Group API & impl `include/fluent-bit/multiline/flb_ml_group.h`, `src/multiline/flb_ml_group.c`	Added `int flb_ml_group_cat(struct flb_ml_stream_group group, const char data, size_t len)` to append with buffer-limit enforcement and return OK/TRUNCATED status.
Multiline processing & rules `src/multiline/flb_ml.c`, `src/multiline/flb_ml_rule.c`, `src/multiline/flb_ml_stream.c`, `src/multiline/flb_ml_group.c`	Propagated truncation status through append/processing and flush paths; replaced direct buffer writes with `flb_ml_group_cat()`; set/clear group truncated flag; add `multiline_truncated` metadata on flush; initialize `ml->buffer_limit` from config.
Multiline parser API (params) `include/fluent-bit/multiline/flb_ml_parser.h`, `src/multiline/flb_ml_parser.c`	Added `struct flb_ml_parser_params`, `flb_ml_parser_params_default()`, and `flb_ml_parser_create_params()`; legacy `flb_ml_parser_create()` now delegates to params API.
Filter plugin metrics `plugins/filter_multiline/ml.h`, `plugins/filter_multiline/ml.c`	Added truncated metric ID and counter (`FLB_MULTILINE_METRIC_TRUNCATED`, `cmt_truncated`); register and increment truncated metric alongside emitted metric.
Tail plugin metrics & instrumentation `plugins/in_tail/tail_config.h`, `plugins/in_tail/tail_config.c`, `plugins/in_tail/tail_file.c`	Added `FLB_TAIL_METRIC_M_TRUNCATED` and `struct cmt_counter *cmt_multiline_truncated`; register legacy metric; emit warning and increment per-input truncated metric on truncation.
Size parsing utility & tests `include/fluent-bit/flb_utils.h`, `src/flb_utils.c`, `tests/internal/utils.c`	Added `flb_utils_size_to_binary_bytes()` (KiB/MiB/GiB, 1024-based parsing with overflow guards) and unit tests covering binary-size parsing.
Slist robustness & config map cleanup `src/flb_slist.c`, `src/flb_config_map.c`	Added error handling on slist token additions (propagate failures and free temporary SDS) and ensured slist cleanup on parse_string_map_to_list error.
Tests - multiline truncation `tests/internal/multiline.c`	Added `buffer_limit_truncation` and `buffer_limit_disabled` tests and adjusted fixtures/flush callback guard.

Sequence Diagram(s)

sequenceDiagram
    participant Config
    participant Tail as TailInput
    participant ML as MultilineCore
    participant Group as StreamGroup
    participant Filter
    participant Metrics

    Config->>ML: init buffer_limit (string -> bytes)
    Tail->>ML: flb_ml_append_text/append_object(data)
    ML->>Group: flb_ml_group_cat(data, len)
    alt appended fully
        Group-->>ML: FLB_MULTILINE_OK
        ML->>Filter: emit/process event
        Filter->>Metrics: inc emitted metric
    else truncated or partially appended
        Group-->>ML: FLB_MULTILINE_TRUNCATED
        ML->>Tail: log warning
        ML->>Metrics: inc truncated metric
        Note right of ML #aqua: mark group->truncated and attach metadata on flush
    end
    ML->>Group: flush -> include "multiline_truncated": true if set

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

in_tail: fix memory leak when using generic unicode conversion (backport #10781) #10785 — Modifies plugins/in_tail/tail_file.c; related overlap in tail/multiline handling and instrumentation.

Suggested reviewers

leonardo-albertovich
edsiper
koleini
fujimotos

Poem

I nibble bytes beneath the moon,
I stitch long lines and trim too soon.
When buffers brim and rules collide,
I flag the cut and hop aside.
Metrics hum — the pipeline sighs. 🐇

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 10.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title clearly and accurately captures the primary change — adding a configurable buffer limit for multiline processing — and names the relevant components (config, in_tail, filter_multiline), so it relates directly to the changes described in the PR summary and objectives; it is specific and understandable to a reviewer scanning history.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch cosmo0920-add-limit-for-multiline-concatenation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

cosmo0920 · 2025-08-04T11:40:21Z

include/fluent-bit/multiline/flb_ml.h

+/* Return codes */
+#define FLB_MULTILINE_OK         0
+#define FLB_MULTILINE_PROCESSED  1 /* Reserved */
+#define FLB_MULTILINE_TRUNCATED  2


Changing to status code 2 is needed because status code 1 will be collided for FLB_TRUE status.

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/flb_config_map.c (1)

662-666: Bug: list-type range check uses || (always true) — can mis-handle non-list types

These conditions should use && to bound the range; with || they evaluate true for almost all types, risking wrong assignments (e.g., VARIANT) and hard-to-trace bugs.

Apply this diff:

-        else if (m->type >= FLB_CONFIG_MAP_CLIST ||
-                 m->type <= FLB_CONFIG_MAP_SLIST_4) {
+        else if (m->type >= FLB_CONFIG_MAP_CLIST &&
+                 m->type <= FLB_CONFIG_MAP_SLIST_4) {
           m_list = (struct mk_list **) (base + m->offset);
           *m_list = m->value.val.list;
         }
@@
-            else if (m->type >= FLB_CONFIG_MAP_CLIST ||
-                     m->type <= FLB_CONFIG_MAP_SLIST_4) {
+            else if (m->type >= FLB_CONFIG_MAP_CLIST &&
+                     m->type <= FLB_CONFIG_MAP_SLIST_4) {
 
                 list = parse_string_map_to_list(m, kv->val);
@@
-            else if (m->type >= FLB_CONFIG_MAP_CLIST ||
-                     m->type <= FLB_CONFIG_MAP_SLIST_4) {
+            else if (m->type >= FLB_CONFIG_MAP_CLIST &&
+                     m->type <= FLB_CONFIG_MAP_SLIST_4) {
                 list = parse_string_map_to_list(m, kv->val);

Also applies to: 740-742, 814-816

🧹 Nitpick comments (1)

src/flb_config_map.c (1)
662-666: Optional: centralize list-type check to avoid regressions

Create a tiny helper (static inline) like is_list_type(int type) and replace repeated range checks. This reduces future chances of reintroducing ||.

Example helper:
static inline int is_list_type(int type)
{
    return (type >= FLB_CONFIG_MAP_CLIST && type <= FLB_CONFIG_MAP_SLIST_4);
}

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 36dbee5 and 43bfd9d.

📒 Files selected for processing (1)

src/flb_config_map.c (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

src/flb_config_map.c (1)

src/flb_slist.c (1)

flb_slist_destroy (327-339)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (29)

GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
GitHub Check: pr-compile-centos-7
GitHub Check: PR - fuzzing test

🔇 Additional comments (1)

src/flb_config_map.c (1)

98-103: Good catch: free slist entries on error path

Calling flb_slist_destroy(list) before freeing the list head prevents leaks when split functions partially append entries. Looks correct and consistent with destroy_map_val.

Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

src/multiline/flb_ml.c (3)

293-303: Enforce buffer_limit for ENDSWITH/EQ (currently bypassed).

These paths concatenate with flb_sds_cat_safe, ignoring ml->buffer_limit. This defeats the new safety cap and can grow unbounded.

Apply:

@@ /* ENDSWITH concatenation */
-            /* Concatenate value */
-            if (val_content) {
-                flb_sds_cat_safe(&stream_group->buf,
-                                 val_content->via.str.ptr,
-                                 val_content->via.str.size);
-            }
-            else {
-                flb_sds_cat_safe(&stream_group->buf, buf_data, buf_size);
-            }
+            /* Concatenate value with limit enforcement */
+            {
+                int rc;
+                if (val_content) {
+                    rc = flb_ml_group_cat(stream_group,
+                                          val_content->via.str.ptr,
+                                          val_content->via.str.size);
+                }
+                else {
+                    rc = flb_ml_group_cat(stream_group, buf_data, buf_size);
+                }
+                if (rc == FLB_MULTILINE_TRUNCATED) {
+                    truncated = FLB_TRUE;
+                }
+            }
@@ /* EQ concatenation */
-        /* Concatenate value */
-        if (val_content) {
-            flb_sds_cat_safe(&stream_group->buf,
-                             val_content->via.str.ptr,
-                             val_content->via.str.size);
-        }
-        else {
-            flb_sds_cat_safe(&stream_group->buf, buf_data, buf_size);
-        }
+        /* Concatenate value with limit enforcement */
+        {
+            int rc;
+            if (val_content) {
+                rc = flb_ml_group_cat(stream_group,
+                                      val_content->via.str.ptr,
+                                      val_content->via.str.size);
+            }
+            else {
+                rc = flb_ml_group_cat(stream_group, buf_data, buf_size);
+            }
+            if (rc == FLB_MULTILINE_TRUNCATED) {
+                truncated = FLB_TRUE;
+            }
+        }

Also applies to: 327-336

641-647: Initialize ret on stream lookup failure to avoid undefined return.

If mst is NULL, we goto exit; without setting ret. Ensure -1 is returned.

Apply:

-    if (!mst) {
+    if (!mst) {
         flb_error("[multiline] invalid stream_id %" PRIu64 ", could not "
                   "append content to multiline context", stream_id);
-        goto exit;
+        ret = -1;
+        goto exit;
     }
@@
-    return ret;
+    return ret;

Also applies to: 662-662

552-579: Avoid NULL deref on key_content in map pre-parse path.

len = flb_sds_len(parser->key_content); is executed before checking parser->key_content. Compute length only when non-NULL and use a separate key_len.

Apply:

@@
-        /* lookup key_content */
-
-        len = flb_sds_len(parser->key_content);
+        /* lookup key_content */
+        int key_len = 0;
+        if (parser->key_content) {
+            key_len = flb_sds_len(parser->key_content);
+        }
         map_size = map->via.map.size;
         for(i = 0; i < map_size; i++) {
@@
-            if (key.type == MSGPACK_OBJECT_STR &&
-                parser->key_content &&
-                key.via.str.size == len &&
-                strncmp(key.via.str.ptr, parser->key_content, len) == 0) {
+            if (parser->key_content &&
+                key.type == MSGPACK_OBJECT_STR &&
+                key.via.str.size == key_len &&
+                strncmp(key.via.str.ptr, parser->key_content, key_len) == 0) {

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 43bfd9d and 51b3f41.

📒 Files selected for processing (2)

src/multiline/flb_ml.c (13 hunks)
tests/internal/multiline.c (4 hunks)

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-09-04T12:35:22.872Z

Learnt from: shadowshot-x
PR: fluent/fluent-bit#10825
File: plugins/out_s3/s3.c:1339-1344
Timestamp: 2025-09-04T12:35:22.872Z
Learning: In the Fluent Bit S3 plugin, the user prefers to maintain current retry_limit behavior without special handling for FLB_OUT_RETRY_UNLIMITED (-1), as there's no documentation indicating -1 should be used for infinite retries and consistency with current logic is preferred.

Applied to files:

src/multiline/flb_ml.c

🧬 Code graph analysis (2)

tests/internal/multiline.c (4)

src/flb_config.c (2)

flb_config_init (216-427)

flb_config_exit (429-606)

src/multiline/flb_ml.c (3)

flb_ml_create (869-921)

flb_ml_append_text (665-755)

flb_ml_destroy (982-1007)

src/multiline/flb_ml_parser.c (3)

flb_ml_parser_params_default (32-44)

flb_ml_parser_create_params (47-129)

flb_ml_parser_instance_create (261-312)

src/multiline/flb_ml_stream.c (1)

flb_ml_stream_create (223-276)

src/multiline/flb_ml.c (3)

src/multiline/flb_ml_stream.c (2)

flb_ml_stream_get (278-292)

flb_ml_stream_group_get (91-135)

src/multiline/flb_ml_group.c (1)

flb_ml_group_cat (89-122)

src/flb_utils.c (1)

flb_utils_size_to_binary_bytes (610-694)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (29)

GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
GitHub Check: PR - fuzzing test
GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
GitHub Check: pr-compile-centos-7

🔇 Additional comments (13)

tests/internal/multiline.c (5)
114-115: Confirm expected ordering/merging in container_mix_output.

The added outputs "bbccdd-out\n" and "dd-err\n" change the merge/order semantics. Please confirm this matches the new truncation-aware pipeline behavior and is not a regression in per-stream isolation.

395-397: Good: safe early-return when no expected-result context.

Prevents dereferencing NULL in tests that intentionally pass NULL to the flush callback.

1537-1559: LGTM: disabled buffer limit path is correct.

"false" → 0 and asserting ml->buffer_limit == 0 aligns with unlimited behavior.

1571-1572: LGTM: tests registered.

New tests are added to TEST_LIST.

1463-1535: Fix test to actually cap concatenated 'log' content; remove unused var.

As written, the parser doesn't pre-parse JSON, so you're concatenating raw JSON lines, not the "log" field. The assertion may pass by accident and won't catch regressions. Also, p is declared but unused.

Update to use the docker parser for JSON pre-parse and lower the limit so concatenated log content exceeds it; skip gracefully if docker parser is unavailable.

Apply:
@@
 static void test_buffer_limit_truncation()
 {
     int ret;
     uint64_t stream_id;
     struct flb_config *config;
     struct flb_ml *ml;
     struct flb_ml_parser *mlp;
     struct flb_ml_parser_ins *mlp_i;
-    struct flb_parser *p;
+    struct flb_parser *p;
     struct flb_time tm;
@@
-    char *line1 = "{\"log\": \"12345678901234567890\", \"stream\": \"stdout\"}";
-    char *line2 = "{\"log\": \"abcdefghijklmnopqrstuvwxyz\", \"stream\": \"stdout\"}";
+    char *line1 = "{\"log\": \"12345678901234567890\", \"stream\": \"stdout\"}";
+    char *line2 = "{\"log\": \"abcdefghijklmnopqrstuvwxyz\", \"stream\": \"stdout\"}";
@@
     config = flb_config_init();
-    /* The buffer limit is for the concatenated 'log' content, not the full JSON */
+    /* The buffer limit is for the concatenated 'log' content, not the full JSON */
+    /* Guard: require single-line docker parser availability for JSON pre-parse */
+    p = flb_parser_get("docker", config);
+    TEST_CHECK(p != NULL);
+    if (!p) {
+        flb_config_exit(config);
+        return;
+    }
@@
-    config->multiline_buffer_limit = flb_strdup("80");
+    /* 40 < length("12345678901234567890" + "abcdefghijklmnopqrstuvwxyz") = 46 */
+    config->multiline_buffer_limit = flb_strdup("40");
@@
-    /* --- New params-based initializer --- */
+    /* --- New params-based initializer --- */
     struct flb_ml_parser_params params = flb_ml_parser_params_default("test-concat");
     params.type        = FLB_ML_REGEX;
     params.negate      = FLB_FALSE;
     params.flush_ms    = 1000;
     params.key_content = "log";
-    params.parser_ctx  = NULL;
-    params.parser_name = NULL;
+    params.parser_ctx  = NULL;
+    params.parser_name = "docker";  /* pre-parse JSON and extract key_content */
@@
     mlp = flb_ml_parser_create_params(config, &params);
     TEST_CHECK(mlp != NULL);
+    /* Link params.parser_name -> parser ctx */
+    ret = flb_ml_parsers_init(config);
+    TEST_CHECK(ret == 0);
+    if (ret != 0) {
+        flb_ml_destroy(ml);
+        flb_config_exit(config);
+        return;
+    }
src/multiline/flb_ml.c (8)
27-27: LGTM: needed headers added.

flb_utils.h for size parsing and <stdint.h> for fixed-width types.

Also applies to: 34-34

218-273: Good: propagate truncation status from rule processing.

Capturing FLB_MULTILINE_TRUNCATED and returning it preserves signal to callers.

Also applies to: 348-351

421-424: LGTM: return raw status for text path.

Allows callers to observe truncation vs OK.

671-693: LGTM: status propagation and fallback single-line packaging.

Correctly retains TRUNCATED and flushes pending on non-match before emitting standalone message with limit enforcement.

Also applies to: 697-722, 724-755

766-786: LGTM: object path mirrors text semantics and preserves status.

Validates map type, propagates TRUNCATED, and performs proper flush-before-standalone behavior.

Also applies to: 789-799, 803-827, 830-856

872-894: LGTM: buffer_limit config parsing honors 0=false (unlimited) and -1=default.

Matches the intended semantics from size parser.

1439-1445: LGTM: emits multiline_truncated metadata flag.

Good observability addition; pairs with group->truncated.

1347-1385: Guard against NULL key_content during repack to avoid crash.

flb_sds_len(parser_i->key_content) is called unconditionally; if key_content is unset, this derefs NULL.

Apply:
-        if (flb_sds_len(group->buf) > 0) {
-            /* Take the first line keys and repack */
-            len = flb_sds_len(parser_i->key_content);
+        if (flb_sds_len(group->buf) > 0) {
+            /* Take the first line keys and repack */
+            int key_len = parser_i->key_content ? flb_sds_len(parser_i->key_content) : 0;
             size = map.via.map.size;
             msgpack_pack_map(&mp_pck, size);
@@
-                if (k.type == MSGPACK_OBJECT_STR &&
-                    parser_i->key_content &&
-                    k.via.str.size == len &&
-                    strncmp(k.via.str.ptr, parser_i->key_content, len) == 0) {
+                if (parser_i->key_content &&
+                    k.type == MSGPACK_OBJECT_STR &&
+                    k.via.str.size == key_len &&
+                    strncmp(k.via.str.ptr, parser_i->key_content, key_len) == 0) {

src/multiline/flb_ml.c

Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

src/multiline/flb_ml.c (3)

291-307: Enforce buffer_limit in ENDSWITH/EQ paths (stop bypassing flb_ml_group_cat).

Direct flb_sds_cat_safe writes skip the new buffer limit and won’t set group->truncated. Route concatenations through flb_ml_group_cat and propagate TRUNCATED status.

Apply:

@@
-            /* Concatenate value */
-            if (val_content) {
-                flb_sds_cat_safe(&stream_group->buf,
-                                 val_content->via.str.ptr,
-                                 val_content->via.str.size);
-            }
-            else {
-                flb_sds_cat_safe(&stream_group->buf, buf_data, buf_size);
-            }
+            /* Concatenate value honoring buffer limit */
+            if (val_content) {
+                ret = flb_ml_group_cat(stream_group,
+                                       val_content->via.str.ptr,
+                                       val_content->via.str.size);
+            }
+            else {
+                ret = flb_ml_group_cat(stream_group, buf_data, buf_size);
+            }
+            if (ret == FLB_MULTILINE_TRUNCATED) {
+                truncated = FLB_TRUE;
+            }
@@
-        /* Concatenate value */
-        if (val_content) {
-            flb_sds_cat_safe(&stream_group->buf,
-                             val_content->via.str.ptr,
-                             val_content->via.str.size);
-        }
-        else {
-            flb_sds_cat_safe(&stream_group->buf, buf_data, buf_size);
-        }
+        /* Concatenate value honoring buffer limit */
+        if (val_content) {
+            ret = flb_ml_group_cat(stream_group,
+                                   val_content->via.str.ptr,
+                                   val_content->via.str.size);
+        }
+        else {
+            ret = flb_ml_group_cat(stream_group, buf_data, buf_size);
+        }
+        if (ret == FLB_MULTILINE_TRUNCATED) {
+            truncated = FLB_TRUE;
+        }

Also applies to: 327-341

191-199: Append newline via flb_ml_group_cat to respect buffer_limit.

This can overflow the cap by one byte today.

Apply:

-    if (stream_group->buf[len - 1] != '\n') {
-        flb_sds_cat_safe(&stream_group->buf, "\n", 1);
-    }
+    if (stream_group->buf[len - 1] != '\n') {
+        (void) flb_ml_group_cat(stream_group, "\n", 1);
+    }

552-579: Avoid NULL deref when key_content is unset in map path.

flb_sds_len(parser->key_content) runs without checking key_content.

Apply:

     if (parser->ml_parser->parser) {
-        /* lookup key_content */
-
-        len = flb_sds_len(parser->key_content);
+        /* lookup key_content */
+        if (parser->key_content == NULL) {
+            /* No content key to parse; fall back to raw map handling */
+            *out_buf = buf;
+            *out_size = size;
+            return 0;
+        }
+
+        len = flb_sds_len(parser->key_content);

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 51b3f41 and a185802.

📒 Files selected for processing (1)

src/multiline/flb_ml.c (13 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

src/multiline/flb_ml.c (3)

src/multiline/flb_ml_stream.c (2)

flb_ml_stream_get (278-292)

flb_ml_stream_group_get (91-135)

src/multiline/flb_ml_group.c (1)

flb_ml_group_cat (89-122)

src/flb_utils.c (1)

flb_utils_size_to_binary_bytes (610-694)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (29)

GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
GitHub Check: pr-compile-centos-7
GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
GitHub Check: PR - fuzzing test

🔇 Additional comments (6)

src/multiline/flb_ml.c (6)
886-894: 0 = unlimited semantics implemented correctly.

Treating -1 as invalid/unset and 0 as “no limit” is correct.

609-615: Good: sub-parser failure no longer aborts text processing.

Keeps multiline logic robust; lets rules decide on raw text.

688-717: Good: propagate TRUNCATED and flush-on-no-match in append_text.

Status handling and LRU updates look correct; fallback path is clear.

Also applies to: 724-755

793-825: Good: mirror truncation-aware flow for append_object.

Consistent status propagation, LRU update, and fallback flush.

Also applies to: 830-856

1439-1445: Good: emit multiline_truncated metadata and reset state.

Metadata emission and post-flush resets prevent leakage into next records.

Also applies to: 1485-1487

1347-1379: Guard parser_i->key_content to avoid NULL deref during repack.

len = flb_sds_len(parser_i->key_content) can crash when key_content is unset.

Apply:
-        if (flb_sds_len(group->buf) > 0) {
-            /* Take the first line keys and repack */
-            len = flb_sds_len(parser_i->key_content);
+        if (flb_sds_len(group->buf) > 0) {
+            /* Take the first line keys and repack */
+            if (parser_i->key_content) {
+                len = flb_sds_len(parser_i->key_content);
+            }
+            else {
+                len = 0; /* skip replacement below */
+            }
@@
-                if (k.type == MSGPACK_OBJECT_STR &&
-                    parser_i->key_content &&
+                if (k.type == MSGPACK_OBJECT_STR &&
+                    parser_i->key_content &&

coderabbitai · 2025-09-22T04:58:35Z

src/multiline/flb_ml.c

    ml->config = ctx;
+    limit = flb_utils_size_to_binary_bytes(ml->config->multiline_buffer_limit);
+    if (limit >= 0) {
+        ml->buffer_limit = (size_t) limit;
+    }
+    else {
+        ml->buffer_limit = FLB_ML_BUFFER_LIMIT_DEFAULT;
+    }
    ml->last_flush = time_ms_now();


⚠️ Potential issue

Add overflow guard when casting limit to size_t.

Very large values can wrap on 32‑bit or constrained platforms.

Apply:

- limit = flb_utils_size_to_binary_bytes(ml->config->multiline_buffer_limit); - if (limit >= 0) { - ml->buffer_limit = (size_t) limit; - } - else { - ml->buffer_limit = FLB_ML_BUFFER_LIMIT_DEFAULT; - } + limit = flb_utils_size_to_binary_bytes(ml->config->multiline_buffer_limit); + if (limit == -1) { + ml->buffer_limit = FLB_ML_BUFFER_LIMIT_DEFAULT; + } + else if (limit > 0 && (uint64_t) limit > (uint64_t) SIZE_MAX) { + flb_warn("[multiline] buffer limit too large (%" PRId64 "), capping to %zu", + limit, (size_t) SIZE_MAX); + ml->buffer_limit = SIZE_MAX; + } + else { + ml->buffer_limit = (size_t) limit; /* 0 => unlimited */ + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

ml->config = ctx;

limit = flb_utils_size_to_binary_bytes(ml->config->multiline_buffer_limit);

if (limit >= 0) {

ml->buffer_limit = (size_t) limit;

}

else {

ml->buffer_limit = FLB_ML_BUFFER_LIMIT_DEFAULT;

}

ml->last_flush = time_ms_now();

ml->config = ctx;

limit = flb_utils_size_to_binary_bytes(ml->config->multiline_buffer_limit);

if (limit == -1) {

ml->buffer_limit = FLB_ML_BUFFER_LIMIT_DEFAULT;

}

else if (limit > 0 && (uint64_t) limit > (uint64_t) SIZE_MAX) {

flb_warn("[multiline] buffer limit too large (%" PRId64 "), capping to %zu",

limit, (size_t) SIZE_MAX);

ml->buffer_limit = SIZE_MAX;

}

else {

ml->buffer_limit = (size_t) limit; /* 0 => unlimited */

}

ml->last_flush = time_ms_now();

🤖 Prompt for AI Agents

In src/multiline/flb_ml.c around lines 886 to 894, the code casts a signed long long limit to size_t without checking for overflow which can wrap on 32-bit/constrained platforms; update it to validate the limit before casting: after computing limit, check that limit is >= 0 and that (unsigned long long)limit <= (unsigned long long)SIZE_MAX (or compare against (long long)SIZE_MAX if appropriate), and only then assign ml->buffer_limit = (size_t) limit; otherwise set ml->buffer_limit = FLB_ML_BUFFER_LIMIT_DEFAULT; keep the existing else branch and ensure you include <limits.h> if not already available.

This is corresponding to fluent/fluent-bit#10653. Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>

cosmo0920 requested review from edsiper, leonardo-albertovich, fujimotos and koleini as code owners July 28, 2025 04:09

github-actions bot added the docs-required label Jul 28, 2025

cosmo0920 temporarily deployed to pr July 28, 2025 04:09 — with GitHub Actions Inactive

cosmo0920 temporarily deployed to pr July 28, 2025 04:27 — with GitHub Actions Inactive

cosmo0920 marked this pull request as draft July 29, 2025 12:06

cosmo0920 force-pushed the cosmo0920-add-limit-for-multiline-concatenation branch from 578f46a to 3a8abd9 Compare August 1, 2025 11:28

cosmo0920 temporarily deployed to pr August 1, 2025 11:28 — with GitHub Actions Inactive

cosmo0920 temporarily deployed to pr August 1, 2025 11:44 — with GitHub Actions Inactive

cosmo0920 temporarily deployed to pr August 1, 2025 11:45 — with GitHub Actions Inactive

cosmo0920 force-pushed the cosmo0920-add-limit-for-multiline-concatenation branch from 3a8abd9 to dec20f1 Compare August 1, 2025 17:37

cosmo0920 temporarily deployed to pr August 1, 2025 17:37 — with GitHub Actions Inactive

cosmo0920 temporarily deployed to pr August 1, 2025 17:53 — with GitHub Actions Inactive

cosmo0920 temporarily deployed to pr August 1, 2025 17:54 — with GitHub Actions Inactive

cosmo0920 marked this pull request as ready for review August 4, 2025 11:39

cosmo0920 added this to the Fluent Bit v4.1 milestone Aug 4, 2025

cosmo0920 commented Aug 4, 2025

View reviewed changes

cosmo0920 temporarily deployed to pr September 18, 2025 07:03 — with GitHub Actions Inactive

coderabbitai bot reviewed Sep 18, 2025

View reviewed changes

cosmo0920 temporarily deployed to pr September 18, 2025 07:26 — with GitHub Actions Inactive

cosmo0920 added 2 commits September 22, 2025 11:23

multiline: ml: Disable multiline truncation with zero limit

1dea0a6

Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>

ml: tests: Add disabling case for multiline truncation

51b3f41

Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>

cosmo0920 temporarily deployed to pr September 22, 2025 02:26 — with GitHub Actions Inactive

coderabbitai bot reviewed Sep 22, 2025

View reviewed changes

src/multiline/flb_ml.c Show resolved Hide resolved

cosmo0920 temporarily deployed to pr September 22, 2025 02:45 — with GitHub Actions Inactive

cosmo0920 temporarily deployed to pr September 22, 2025 02:46 — with GitHub Actions Inactive

multiline: ml: Reset length of msgpack unpacking buffer

a185802

Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>

cosmo0920 temporarily deployed to pr September 22, 2025 04:52 — with GitHub Actions Inactive

cosmo0920 had a problem deploying to pr September 22, 2025 04:52 — with GitHub Actions Failure

coderabbitai bot reviewed Sep 22, 2025

View reviewed changes

cosmo0920 temporarily deployed to pr September 22, 2025 06:08 — with GitHub Actions Inactive

cosmo0920 temporarily deployed to pr September 22, 2025 06:25 — with GitHub Actions Inactive

edsiper merged commit b75f3bc into master Sep 22, 2025
64 of 66 checks passed

edsiper deleted the cosmo0920-add-limit-for-multiline-concatenation branch September 22, 2025 14:59

cosmo0920 added a commit to fluent/fluent-bit-docs that referenced this pull request Oct 7, 2025

multiline: Add a description for the buffer limit of multiline

8dc8d66

This is corresponding to fluent/fluent-bit#10653. Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>

cosmo0920 added a commit to fluent/fluent-bit-docs that referenced this pull request Oct 7, 2025

multiline: Add a description for the buffer limit of multiline

a22fd3a

This is corresponding to fluent/fluent-bit#10653. Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>

cosmo0920 mentioned this pull request Oct 7, 2025

multiline: Add a description for the buffer limit of multiline fluent/fluent-bit-docs#2095

Open

cosmo0920 added a commit to fluent/fluent-bit-docs that referenced this pull request Oct 7, 2025

multiline: Add a description for the buffer limit of multiline

e48eb29

This is corresponding to fluent/fluent-bit#10653. Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

config: multiline: in_tail: filter_multiline: Add configurable buffer limit for multiline interface #10653

config: multiline: in_tail: filter_multiline: Add configurable buffer limit for multiline interface #10653

Uh oh!

cosmo0920 commented Jul 28, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Aug 1, 2025 •

edited

Loading

Uh oh!

cosmo0920 Aug 4, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Sep 22, 2025

Uh oh!

Uh oh!

Uh oh!

config: multiline: in_tail: filter_multiline: Add configurable buffer limit for multiline interface #10653

config: multiline: in_tail: filter_multiline: Add configurable buffer limit for multiline interface #10653

Uh oh!

Conversation

cosmo0920 commented Jul 28, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Aug 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

cosmo0920 Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

cosmo0920 commented Jul 28, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Aug 1, 2025 •

edited

Loading