Skip to content

Conversation

cosmo0920
Copy link
Contributor

@cosmo0920 cosmo0920 commented Jul 28, 2025

We added an interface for configurable buffer limit for multiline.
Also, we implemented robust processing for multiline concatenations.


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change
  • Debug log output from testing the change
  • Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • Run local packaging test showing all targets (including any new ones) build.
  • Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

  • Documentation required for this feature

Backporting

  • Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Summary by CodeRabbit

  • New Features

    • Configurable multiline buffer size limit (accepts KiB/MiB/GiB binary-size strings).
    • New per-input and per-filter truncated multiline metrics.
  • Behavior Changes / Bug Fixes

    • Multiline truncation is now signaled, added to event metadata, and emits warnings when limits are reached.
  • Tests

    • Added tests for buffer-limit truncation and binary-size parsing.

Copy link

coderabbitai bot commented Aug 1, 2025

Walkthrough

Adds a configurable multiline buffer limit and binary-size parser; enforces per-group buffer limits with truncation tracking and metadata; propagates FLB_MULTILINE_TRUNCATED through multiline core and plugins; exposes truncation metrics; introduces a params-based multiline parser API and related tests.

Changes

Cohort / File(s) Change Summary
Config field
include/fluent-bit/flb_config.h, src/flb_config.c
Added FLB_CONF_STR_MULTILINE_BUFFER_LIMIT and char *multiline_buffer_limit; registered the config entry and manage its lifetime.
Multiline core types & defaults
include/fluent-bit/multiline/flb_ml.h, src/multiline/flb_ml.h
Added buffer-limit defaults (FLB_ML_BUFFER_LIMIT_DEFAULT_STR, FLB_ML_BUFFER_LIMIT_DEFAULT), status codes (FLB_MULTILINE_OK, FLB_MULTILINE_PROCESSED, FLB_MULTILINE_TRUNCATED), flb_ml.buffer_limit, and flb_ml_stream_group.truncated + stream.
Group API & impl
include/fluent-bit/multiline/flb_ml_group.h, src/multiline/flb_ml_group.c
Added int flb_ml_group_cat(struct flb_ml_stream_group *group, const char *data, size_t len) to append with buffer-limit enforcement and return OK/TRUNCATED status.
Multiline processing & rules
src/multiline/flb_ml.c, src/multiline/flb_ml_rule.c, src/multiline/flb_ml_stream.c, src/multiline/flb_ml_group.c
Propagated truncation status through append/processing and flush paths; replaced direct buffer writes with flb_ml_group_cat(); set/clear group truncated flag; add multiline_truncated metadata on flush; initialize ml->buffer_limit from config.
Multiline parser API (params)
include/fluent-bit/multiline/flb_ml_parser.h, src/multiline/flb_ml_parser.c
Added struct flb_ml_parser_params, flb_ml_parser_params_default(), and flb_ml_parser_create_params(); legacy flb_ml_parser_create() now delegates to params API.
Filter plugin metrics
plugins/filter_multiline/ml.h, plugins/filter_multiline/ml.c
Added truncated metric ID and counter (FLB_MULTILINE_METRIC_TRUNCATED, cmt_truncated); register and increment truncated metric alongside emitted metric.
Tail plugin metrics & instrumentation
plugins/in_tail/tail_config.h, plugins/in_tail/tail_config.c, plugins/in_tail/tail_file.c
Added FLB_TAIL_METRIC_M_TRUNCATED and struct cmt_counter *cmt_multiline_truncated; register legacy metric; emit warning and increment per-input truncated metric on truncation.
Size parsing utility & tests
include/fluent-bit/flb_utils.h, src/flb_utils.c, tests/internal/utils.c
Added flb_utils_size_to_binary_bytes() (KiB/MiB/GiB, 1024-based parsing with overflow guards) and unit tests covering binary-size parsing.
Slist robustness & config map cleanup
src/flb_slist.c, src/flb_config_map.c
Added error handling on slist token additions (propagate failures and free temporary SDS) and ensured slist cleanup on parse_string_map_to_list error.
Tests - multiline truncation
tests/internal/multiline.c
Added buffer_limit_truncation and buffer_limit_disabled tests and adjusted fixtures/flush callback guard.

Sequence Diagram(s)

sequenceDiagram
    participant Config
    participant Tail as TailInput
    participant ML as MultilineCore
    participant Group as StreamGroup
    participant Filter
    participant Metrics

    Config->>ML: init buffer_limit (string -> bytes)
    Tail->>ML: flb_ml_append_text/append_object(data)
    ML->>Group: flb_ml_group_cat(data, len)
    alt appended fully
        Group-->>ML: FLB_MULTILINE_OK
        ML->>Filter: emit/process event
        Filter->>Metrics: inc emitted metric
    else truncated or partially appended
        Group-->>ML: FLB_MULTILINE_TRUNCATED
        ML->>Tail: log warning
        ML->>Metrics: inc truncated metric
        Note right of ML #aqua: mark group->truncated and attach metadata on flush
    end
    ML->>Group: flush -> include "multiline_truncated": true if set
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • leonardo-albertovich
  • edsiper
  • koleini
  • fujimotos

Poem

I nibble bytes beneath the moon,
I stitch long lines and trim too soon.
When buffers brim and rules collide,
I flag the cut and hop aside.
Metrics hum — the pipeline sighs. 🐇

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 10.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title clearly and accurately captures the primary change — adding a configurable buffer limit for multiline processing — and names the relevant components (config, in_tail, filter_multiline), so it relates directly to the changes described in the PR summary and objectives; it is specific and understandable to a reviewer scanning history.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch cosmo0920-add-limit-for-multiline-concatenation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

/* Return codes */
#define FLB_MULTILINE_OK 0
#define FLB_MULTILINE_PROCESSED 1 /* Reserved */
#define FLB_MULTILINE_TRUNCATED 2
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing to status code 2 is needed because status code 1 will be collided for FLB_TRUE status.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/flb_config_map.c (1)

662-666: Bug: list-type range check uses || (always true) — can mis-handle non-list types

These conditions should use && to bound the range; with || they evaluate true for almost all types, risking wrong assignments (e.g., VARIANT) and hard-to-trace bugs.

Apply this diff:

-        else if (m->type >= FLB_CONFIG_MAP_CLIST ||
-                 m->type <= FLB_CONFIG_MAP_SLIST_4) {
+        else if (m->type >= FLB_CONFIG_MAP_CLIST &&
+                 m->type <= FLB_CONFIG_MAP_SLIST_4) {
           m_list = (struct mk_list **) (base + m->offset);
           *m_list = m->value.val.list;
         }
@@
-            else if (m->type >= FLB_CONFIG_MAP_CLIST ||
-                     m->type <= FLB_CONFIG_MAP_SLIST_4) {
+            else if (m->type >= FLB_CONFIG_MAP_CLIST &&
+                     m->type <= FLB_CONFIG_MAP_SLIST_4) {
 
                 list = parse_string_map_to_list(m, kv->val);
@@
-            else if (m->type >= FLB_CONFIG_MAP_CLIST ||
-                     m->type <= FLB_CONFIG_MAP_SLIST_4) {
+            else if (m->type >= FLB_CONFIG_MAP_CLIST &&
+                     m->type <= FLB_CONFIG_MAP_SLIST_4) {
                 list = parse_string_map_to_list(m, kv->val);

Also applies to: 740-742, 814-816

🧹 Nitpick comments (1)
src/flb_config_map.c (1)

662-666: Optional: centralize list-type check to avoid regressions

Create a tiny helper (static inline) like is_list_type(int type) and replace repeated range checks. This reduces future chances of reintroducing ||.

Example helper:

static inline int is_list_type(int type)
{
    return (type >= FLB_CONFIG_MAP_CLIST && type <= FLB_CONFIG_MAP_SLIST_4);
}
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 36dbee5 and 43bfd9d.

📒 Files selected for processing (1)
  • src/flb_config_map.c (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/flb_config_map.c (1)
src/flb_slist.c (1)
  • flb_slist_destroy (327-339)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (29)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-centos-7
  • GitHub Check: PR - fuzzing test
🔇 Additional comments (1)
src/flb_config_map.c (1)

98-103: Good catch: free slist entries on error path

Calling flb_slist_destroy(list) before freeing the list head prevents leaks when split functions partially append entries. Looks correct and consistent with destroy_map_val.

Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
src/multiline/flb_ml.c (3)

293-303: Enforce buffer_limit for ENDSWITH/EQ (currently bypassed).

These paths concatenate with flb_sds_cat_safe, ignoring ml->buffer_limit. This defeats the new safety cap and can grow unbounded.

Apply:

@@ /* ENDSWITH concatenation */
-            /* Concatenate value */
-            if (val_content) {
-                flb_sds_cat_safe(&stream_group->buf,
-                                 val_content->via.str.ptr,
-                                 val_content->via.str.size);
-            }
-            else {
-                flb_sds_cat_safe(&stream_group->buf, buf_data, buf_size);
-            }
+            /* Concatenate value with limit enforcement */
+            {
+                int rc;
+                if (val_content) {
+                    rc = flb_ml_group_cat(stream_group,
+                                          val_content->via.str.ptr,
+                                          val_content->via.str.size);
+                }
+                else {
+                    rc = flb_ml_group_cat(stream_group, buf_data, buf_size);
+                }
+                if (rc == FLB_MULTILINE_TRUNCATED) {
+                    truncated = FLB_TRUE;
+                }
+            }
@@ /* EQ concatenation */
-        /* Concatenate value */
-        if (val_content) {
-            flb_sds_cat_safe(&stream_group->buf,
-                             val_content->via.str.ptr,
-                             val_content->via.str.size);
-        }
-        else {
-            flb_sds_cat_safe(&stream_group->buf, buf_data, buf_size);
-        }
+        /* Concatenate value with limit enforcement */
+        {
+            int rc;
+            if (val_content) {
+                rc = flb_ml_group_cat(stream_group,
+                                      val_content->via.str.ptr,
+                                      val_content->via.str.size);
+            }
+            else {
+                rc = flb_ml_group_cat(stream_group, buf_data, buf_size);
+            }
+            if (rc == FLB_MULTILINE_TRUNCATED) {
+                truncated = FLB_TRUE;
+            }
+        }

Also applies to: 327-336


641-647: Initialize ret on stream lookup failure to avoid undefined return.

If mst is NULL, we goto exit; without setting ret. Ensure -1 is returned.

Apply:

-    if (!mst) {
+    if (!mst) {
         flb_error("[multiline] invalid stream_id %" PRIu64 ", could not "
                   "append content to multiline context", stream_id);
-        goto exit;
+        ret = -1;
+        goto exit;
     }
@@
-    return ret;
+    return ret;

Also applies to: 662-662


552-579: Avoid NULL deref on key_content in map pre-parse path.

len = flb_sds_len(parser->key_content); is executed before checking parser->key_content. Compute length only when non-NULL and use a separate key_len.

Apply:

@@
-        /* lookup key_content */
-
-        len = flb_sds_len(parser->key_content);
+        /* lookup key_content */
+        int key_len = 0;
+        if (parser->key_content) {
+            key_len = flb_sds_len(parser->key_content);
+        }
         map_size = map->via.map.size;
         for(i = 0; i < map_size; i++) {
@@
-            if (key.type == MSGPACK_OBJECT_STR &&
-                parser->key_content &&
-                key.via.str.size == len &&
-                strncmp(key.via.str.ptr, parser->key_content, len) == 0) {
+            if (parser->key_content &&
+                key.type == MSGPACK_OBJECT_STR &&
+                key.via.str.size == key_len &&
+                strncmp(key.via.str.ptr, parser->key_content, key_len) == 0) {
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 43bfd9d and 51b3f41.

📒 Files selected for processing (2)
  • src/multiline/flb_ml.c (13 hunks)
  • tests/internal/multiline.c (4 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-04T12:35:22.872Z
Learnt from: shadowshot-x
PR: fluent/fluent-bit#10825
File: plugins/out_s3/s3.c:1339-1344
Timestamp: 2025-09-04T12:35:22.872Z
Learning: In the Fluent Bit S3 plugin, the user prefers to maintain current retry_limit behavior without special handling for FLB_OUT_RETRY_UNLIMITED (-1), as there's no documentation indicating -1 should be used for infinite retries and consistency with current logic is preferred.

Applied to files:

  • src/multiline/flb_ml.c
🧬 Code graph analysis (2)
tests/internal/multiline.c (4)
src/flb_config.c (2)
  • flb_config_init (216-427)
  • flb_config_exit (429-606)
src/multiline/flb_ml.c (3)
  • flb_ml_create (869-921)
  • flb_ml_append_text (665-755)
  • flb_ml_destroy (982-1007)
src/multiline/flb_ml_parser.c (3)
  • flb_ml_parser_params_default (32-44)
  • flb_ml_parser_create_params (47-129)
  • flb_ml_parser_instance_create (261-312)
src/multiline/flb_ml_stream.c (1)
  • flb_ml_stream_create (223-276)
src/multiline/flb_ml.c (3)
src/multiline/flb_ml_stream.c (2)
  • flb_ml_stream_get (278-292)
  • flb_ml_stream_group_get (91-135)
src/multiline/flb_ml_group.c (1)
  • flb_ml_group_cat (89-122)
src/flb_utils.c (1)
  • flb_utils_size_to_binary_bytes (610-694)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (29)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
  • GitHub Check: PR - fuzzing test
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-centos-7
🔇 Additional comments (13)
tests/internal/multiline.c (5)

114-115: Confirm expected ordering/merging in container_mix_output.

The added outputs "bbccdd-out\n" and "dd-err\n" change the merge/order semantics. Please confirm this matches the new truncation-aware pipeline behavior and is not a regression in per-stream isolation.


395-397: Good: safe early-return when no expected-result context.

Prevents dereferencing NULL in tests that intentionally pass NULL to the flush callback.


1537-1559: LGTM: disabled buffer limit path is correct.

"false" → 0 and asserting ml->buffer_limit == 0 aligns with unlimited behavior.


1571-1572: LGTM: tests registered.

New tests are added to TEST_LIST.


1463-1535: Fix test to actually cap concatenated 'log' content; remove unused var.

As written, the parser doesn't pre-parse JSON, so you're concatenating raw JSON lines, not the "log" field. The assertion may pass by accident and won't catch regressions. Also, p is declared but unused.

Update to use the docker parser for JSON pre-parse and lower the limit so concatenated log content exceeds it; skip gracefully if docker parser is unavailable.

Apply:

@@
 static void test_buffer_limit_truncation()
 {
     int ret;
     uint64_t stream_id;
     struct flb_config *config;
     struct flb_ml *ml;
     struct flb_ml_parser *mlp;
     struct flb_ml_parser_ins *mlp_i;
-    struct flb_parser *p;
+    struct flb_parser *p;
     struct flb_time tm;
@@
-    char *line1 = "{\"log\": \"12345678901234567890\", \"stream\": \"stdout\"}";
-    char *line2 = "{\"log\": \"abcdefghijklmnopqrstuvwxyz\", \"stream\": \"stdout\"}";
+    char *line1 = "{\"log\": \"12345678901234567890\", \"stream\": \"stdout\"}";
+    char *line2 = "{\"log\": \"abcdefghijklmnopqrstuvwxyz\", \"stream\": \"stdout\"}";
@@
     config = flb_config_init();
-    /* The buffer limit is for the concatenated 'log' content, not the full JSON */
+    /* The buffer limit is for the concatenated 'log' content, not the full JSON */
+    /* Guard: require single-line docker parser availability for JSON pre-parse */
+    p = flb_parser_get("docker", config);
+    TEST_CHECK(p != NULL);
+    if (!p) {
+        flb_config_exit(config);
+        return;
+    }
@@
-    config->multiline_buffer_limit = flb_strdup("80");
+    /* 40 < length("12345678901234567890" + "abcdefghijklmnopqrstuvwxyz") = 46 */
+    config->multiline_buffer_limit = flb_strdup("40");
@@
-    /* --- New params-based initializer --- */
+    /* --- New params-based initializer --- */
     struct flb_ml_parser_params params = flb_ml_parser_params_default("test-concat");
     params.type        = FLB_ML_REGEX;
     params.negate      = FLB_FALSE;
     params.flush_ms    = 1000;
     params.key_content = "log";
-    params.parser_ctx  = NULL;
-    params.parser_name = NULL;
+    params.parser_ctx  = NULL;
+    params.parser_name = "docker";  /* pre-parse JSON and extract key_content */
@@
     mlp = flb_ml_parser_create_params(config, &params);
     TEST_CHECK(mlp != NULL);
+    /* Link params.parser_name -> parser ctx */
+    ret = flb_ml_parsers_init(config);
+    TEST_CHECK(ret == 0);
+    if (ret != 0) {
+        flb_ml_destroy(ml);
+        flb_config_exit(config);
+        return;
+    }
src/multiline/flb_ml.c (8)

27-27: LGTM: needed headers added.

flb_utils.h for size parsing and <stdint.h> for fixed-width types.

Also applies to: 34-34


218-273: Good: propagate truncation status from rule processing.

Capturing FLB_MULTILINE_TRUNCATED and returning it preserves signal to callers.

Also applies to: 348-351


421-424: LGTM: return raw status for text path.

Allows callers to observe truncation vs OK.


671-693: LGTM: status propagation and fallback single-line packaging.

Correctly retains TRUNCATED and flushes pending on non-match before emitting standalone message with limit enforcement.

Also applies to: 697-722, 724-755


766-786: LGTM: object path mirrors text semantics and preserves status.

Validates map type, propagates TRUNCATED, and performs proper flush-before-standalone behavior.

Also applies to: 789-799, 803-827, 830-856


872-894: LGTM: buffer_limit config parsing honors 0=false (unlimited) and -1=default.

Matches the intended semantics from size parser.


1439-1445: LGTM: emits multiline_truncated metadata flag.

Good observability addition; pairs with group->truncated.


1347-1385: Guard against NULL key_content during repack to avoid crash.

flb_sds_len(parser_i->key_content) is called unconditionally; if key_content is unset, this derefs NULL.

Apply:

-        if (flb_sds_len(group->buf) > 0) {
-            /* Take the first line keys and repack */
-            len = flb_sds_len(parser_i->key_content);
+        if (flb_sds_len(group->buf) > 0) {
+            /* Take the first line keys and repack */
+            int key_len = parser_i->key_content ? flb_sds_len(parser_i->key_content) : 0;
             size = map.via.map.size;
             msgpack_pack_map(&mp_pck, size);
@@
-                if (k.type == MSGPACK_OBJECT_STR &&
-                    parser_i->key_content &&
-                    k.via.str.size == len &&
-                    strncmp(k.via.str.ptr, parser_i->key_content, len) == 0) {
+                if (parser_i->key_content &&
+                    k.type == MSGPACK_OBJECT_STR &&
+                    k.via.str.size == key_len &&
+                    strncmp(k.via.str.ptr, parser_i->key_content, key_len) == 0) {

Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
src/multiline/flb_ml.c (3)

291-307: Enforce buffer_limit in ENDSWITH/EQ paths (stop bypassing flb_ml_group_cat).

Direct flb_sds_cat_safe writes skip the new buffer limit and won’t set group->truncated. Route concatenations through flb_ml_group_cat and propagate TRUNCATED status.

Apply:

@@
-            /* Concatenate value */
-            if (val_content) {
-                flb_sds_cat_safe(&stream_group->buf,
-                                 val_content->via.str.ptr,
-                                 val_content->via.str.size);
-            }
-            else {
-                flb_sds_cat_safe(&stream_group->buf, buf_data, buf_size);
-            }
+            /* Concatenate value honoring buffer limit */
+            if (val_content) {
+                ret = flb_ml_group_cat(stream_group,
+                                       val_content->via.str.ptr,
+                                       val_content->via.str.size);
+            }
+            else {
+                ret = flb_ml_group_cat(stream_group, buf_data, buf_size);
+            }
+            if (ret == FLB_MULTILINE_TRUNCATED) {
+                truncated = FLB_TRUE;
+            }
@@
-        /* Concatenate value */
-        if (val_content) {
-            flb_sds_cat_safe(&stream_group->buf,
-                             val_content->via.str.ptr,
-                             val_content->via.str.size);
-        }
-        else {
-            flb_sds_cat_safe(&stream_group->buf, buf_data, buf_size);
-        }
+        /* Concatenate value honoring buffer limit */
+        if (val_content) {
+            ret = flb_ml_group_cat(stream_group,
+                                   val_content->via.str.ptr,
+                                   val_content->via.str.size);
+        }
+        else {
+            ret = flb_ml_group_cat(stream_group, buf_data, buf_size);
+        }
+        if (ret == FLB_MULTILINE_TRUNCATED) {
+            truncated = FLB_TRUE;
+        }

Also applies to: 327-341


191-199: Append newline via flb_ml_group_cat to respect buffer_limit.

This can overflow the cap by one byte today.

Apply:

-    if (stream_group->buf[len - 1] != '\n') {
-        flb_sds_cat_safe(&stream_group->buf, "\n", 1);
-    }
+    if (stream_group->buf[len - 1] != '\n') {
+        (void) flb_ml_group_cat(stream_group, "\n", 1);
+    }

552-579: Avoid NULL deref when key_content is unset in map path.

flb_sds_len(parser->key_content) runs without checking key_content.

Apply:

     if (parser->ml_parser->parser) {
-        /* lookup key_content */
-
-        len = flb_sds_len(parser->key_content);
+        /* lookup key_content */
+        if (parser->key_content == NULL) {
+            /* No content key to parse; fall back to raw map handling */
+            *out_buf = buf;
+            *out_size = size;
+            return 0;
+        }
+
+        len = flb_sds_len(parser->key_content);
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 51b3f41 and a185802.

📒 Files selected for processing (1)
  • src/multiline/flb_ml.c (13 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/multiline/flb_ml.c (3)
src/multiline/flb_ml_stream.c (2)
  • flb_ml_stream_get (278-292)
  • flb_ml_stream_group_get (91-135)
src/multiline/flb_ml_group.c (1)
  • flb_ml_group_cat (89-122)
src/flb_utils.c (1)
  • flb_utils_size_to_binary_bytes (610-694)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (29)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-centos-7
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
  • GitHub Check: PR - fuzzing test
🔇 Additional comments (6)
src/multiline/flb_ml.c (6)

886-894: 0 = unlimited semantics implemented correctly.

Treating -1 as invalid/unset and 0 as “no limit” is correct.


609-615: Good: sub-parser failure no longer aborts text processing.

Keeps multiline logic robust; lets rules decide on raw text.


688-717: Good: propagate TRUNCATED and flush-on-no-match in append_text.

Status handling and LRU updates look correct; fallback path is clear.

Also applies to: 724-755


793-825: Good: mirror truncation-aware flow for append_object.

Consistent status propagation, LRU update, and fallback flush.

Also applies to: 830-856


1439-1445: Good: emit multiline_truncated metadata and reset state.

Metadata emission and post-flush resets prevent leakage into next records.

Also applies to: 1485-1487


1347-1379: Guard parser_i->key_content to avoid NULL deref during repack.

len = flb_sds_len(parser_i->key_content) can crash when key_content is unset.

Apply:

-        if (flb_sds_len(group->buf) > 0) {
-            /* Take the first line keys and repack */
-            len = flb_sds_len(parser_i->key_content);
+        if (flb_sds_len(group->buf) > 0) {
+            /* Take the first line keys and repack */
+            if (parser_i->key_content) {
+                len = flb_sds_len(parser_i->key_content);
+            }
+            else {
+                len = 0; /* skip replacement below */
+            }
@@
-                if (k.type == MSGPACK_OBJECT_STR &&
-                    parser_i->key_content &&
+                if (k.type == MSGPACK_OBJECT_STR &&
+                    parser_i->key_content &&

Comment on lines 886 to 894
ml->config = ctx;
limit = flb_utils_size_to_binary_bytes(ml->config->multiline_buffer_limit);
if (limit >= 0) {
ml->buffer_limit = (size_t) limit;
}
else {
ml->buffer_limit = FLB_ML_BUFFER_LIMIT_DEFAULT;
}
ml->last_flush = time_ms_now();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add overflow guard when casting limit to size_t.

Very large values can wrap on 32‑bit or constrained platforms.

Apply:

-    limit = flb_utils_size_to_binary_bytes(ml->config->multiline_buffer_limit);
-    if (limit >= 0) {
-        ml->buffer_limit = (size_t) limit;
-    }
-    else {
-        ml->buffer_limit = FLB_ML_BUFFER_LIMIT_DEFAULT;
-    }
+    limit = flb_utils_size_to_binary_bytes(ml->config->multiline_buffer_limit);
+    if (limit == -1) {
+        ml->buffer_limit = FLB_ML_BUFFER_LIMIT_DEFAULT;
+    }
+    else if (limit > 0 && (uint64_t) limit > (uint64_t) SIZE_MAX) {
+        flb_warn("[multiline] buffer limit too large (%" PRId64 "), capping to %zu",
+                 limit, (size_t) SIZE_MAX);
+        ml->buffer_limit = SIZE_MAX;
+    }
+    else {
+        ml->buffer_limit = (size_t) limit; /* 0 => unlimited */
+    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
ml->config = ctx;
limit = flb_utils_size_to_binary_bytes(ml->config->multiline_buffer_limit);
if (limit >= 0) {
ml->buffer_limit = (size_t) limit;
}
else {
ml->buffer_limit = FLB_ML_BUFFER_LIMIT_DEFAULT;
}
ml->last_flush = time_ms_now();
ml->config = ctx;
limit = flb_utils_size_to_binary_bytes(ml->config->multiline_buffer_limit);
if (limit == -1) {
ml->buffer_limit = FLB_ML_BUFFER_LIMIT_DEFAULT;
}
else if (limit > 0 && (uint64_t) limit > (uint64_t) SIZE_MAX) {
flb_warn("[multiline] buffer limit too large (%" PRId64 "), capping to %zu",
limit, (size_t) SIZE_MAX);
ml->buffer_limit = SIZE_MAX;
}
else {
ml->buffer_limit = (size_t) limit; /* 0 => unlimited */
}
ml->last_flush = time_ms_now();
🤖 Prompt for AI Agents
In src/multiline/flb_ml.c around lines 886 to 894, the code casts a signed long
long limit to size_t without checking for overflow which can wrap on
32-bit/constrained platforms; update it to validate the limit before casting:
after computing limit, check that limit is >= 0 and that (unsigned long
long)limit <= (unsigned long long)SIZE_MAX (or compare against (long
long)SIZE_MAX if appropriate), and only then assign ml->buffer_limit = (size_t)
limit; otherwise set ml->buffer_limit = FLB_ML_BUFFER_LIMIT_DEFAULT; keep the
existing else branch and ensure you include <limits.h> if not already available.

@edsiper edsiper merged commit b75f3bc into master Sep 22, 2025
64 of 66 checks passed
@edsiper edsiper deleted the cosmo0920-add-limit-for-multiline-concatenation branch September 22, 2025 14:59
cosmo0920 added a commit to fluent/fluent-bit-docs that referenced this pull request Oct 7, 2025
This is corresponding to
fluent/fluent-bit#10653.

Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
cosmo0920 added a commit to fluent/fluent-bit-docs that referenced this pull request Oct 7, 2025
This is corresponding to
fluent/fluent-bit#10653.

Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
cosmo0920 added a commit to fluent/fluent-bit-docs that referenced this pull request Oct 7, 2025
This is corresponding to
fluent/fluent-bit#10653.

Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants