From 6613d57e8eb885d65903819a2f83919a164a3680 Mon Sep 17 00:00:00 2001
From: nick evans <nick@rubinick.dev>
Date: Wed, 8 Jan 2025 22:06:47 -0500
Subject: [PATCH] =?UTF-8?q?=F0=9F=94=92=20Limit=20exponential=20memory=20u?=
 =?UTF-8?q?sage=20to=20parse=20uid-set?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The UID sets in UIDPlusData are stored as arrays of UIDs.  In common
scenarios, copying between one and a few hundred emails at a time, this
is barely noticable.  But the memory use expands _exponentially_.

This should not be an issue for _trusted_ servers, and (I assume)
compromised servers will be more interested in evading detection and
stealing your credentials and your email than in causing client Denial
of Service.  Nevertheless, this is a very simple DoS attack against
clients connecting to untrusted servers (for example, a service that
connects to user-specified servers).

For example, assuming a 64-bit architecture, considering only the data
in the two arrays, assuming the arrays' internal capacity is no more
than needed, and ignoring the fixed cost of the response structs:
* 32 bytes expands to ~160KB (about 5000 times more):
  `"* OK [COPYUID 1 1:9999 1:9999]\r\n"`
* 40 bytes expands to ~1.6GB (about 50 million times more):
  `"* OK [COPYUID 1 1:99999999 1:99999999]\r\n"`
* In the worst scenario (uint32 max), 44 bytes expands to 64GiB in
  memory, using over 1.5 billion times more to store than to send:
  `"* OK [COPYUID 1 1:4294967295 1:4294967295]\r\n"`

----

The preferred fix is to store `uid-set` as a SequenceSet, not an array.
Unfortunately, this is not fully backwards compatible.  For v0.4 and
v0.5, use `Config#parser_use_deprecated_uidplus_data` to false to use
AppendUIDData and CopyUIDData instead of UIDPlusData.  Unless you are
_using_ UIDPLUS, this is completely safe.  v0.6 will drop UIDPlusData.

----

The simplest _partial_ fix (preserving full backward compatibility) is
to raise an error when the number of UIDs goes over some threshold, and
continue using arrays inside UIDPlusData.

For v0.3.x (and in this commit) the maximum count is hard-coded to
10,000.  This is high enough that it should almost never be triggered by
normal usage, and low enough to be a less extreme problem.  For v0.4 and
v0.5, the next commit will make the maximum array size configurable,
with a much lower default: 1000 for 0.4 and 100 for 0.5.  These are low
enough that they are _unlikely_ to cause a problem, but 0.4 and 0.5 can
also use the newer AppendUIDData and CopyUIDData classes.

However, because unhandled responses are stored on the `#responses`
hash, this can still be a problem.  A malicious server could repeatedly
use 160Kb of client memory by sending only 32 bytes in a loop.  To fully
solve this problem, a response handler must be added to prune excessive
APPENDUID/COPYUID responses as they are received.

Because unhandled responses have always been retained, managing
unhandled responses is already documented as necessary for long-lived
connections.
---
 lib/net/imap/response_parser.rb            | 15 ++++++++++++---
 test/net/imap/test_imap_response_parser.rb | 10 ++++++++++
 2 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/lib/net/imap/response_parser.rb b/lib/net/imap/response_parser.rb
index e9775c00..04e2ee2b 100644
--- a/lib/net/imap/response_parser.rb
+++ b/lib/net/imap/response_parser.rb
@@ -8,6 +8,8 @@ class IMAP < Protocol
 
     # Parses an \IMAP server response.
     class ResponseParser
+      MAX_UID_SET_SIZE = 10_000
+
       include ParserUtils
       extend  ParserUtils::Generator
 
@@ -1889,9 +1891,16 @@ def CopyUID(...)   DeprecatedUIDPlus(...) || CopyUIDData.new(...)   end
       # TODO: remove this code in the v0.6.0 release
       def DeprecatedUIDPlus(validity, src_uids = nil, dst_uids)
         return unless config.parser_use_deprecated_uidplus_data
-        src_uids &&= src_uids.each_ordered_number.to_a
-        dst_uids   = dst_uids.each_ordered_number.to_a
-        UIDPlusData.new(validity, src_uids, dst_uids)
+        compact_uid_sets = [src_uids, dst_uids].compact
+        count = compact_uid_sets.map { _1.count_with_duplicates }.max
+        max   = MAX_UID_SET_SIZE
+        if count <= max
+          src_uids &&= src_uids.each_ordered_number.to_a
+          dst_uids   = dst_uids.each_ordered_number.to_a
+          UIDPlusData.new(validity, src_uids, dst_uids)
+        else
+          parse_error("uid-set is too large: %d > %d", count, max)
+        end
       end
 
       ADDRESS_REGEXP = /\G
diff --git a/test/net/imap/test_imap_response_parser.rb b/test/net/imap/test_imap_response_parser.rb
index e85b0b4e..695f21b9 100644
--- a/test/net/imap/test_imap_response_parser.rb
+++ b/test/net/imap/test_imap_response_parser.rb
@@ -206,6 +206,11 @@ def test_fetch_binary_and_binary_size
     parser = Net::IMAP::ResponseParser.new(config: {
       parser_use_deprecated_uidplus_data:      true,
     })
+    assert_raise_with_message Net::IMAP::ResponseParseError, /uid-set is too large/ do
+      parser.parse(
+        "A004 OK [APPENDUID 1 10000:20000,1] Done\r\n"
+      )
+    end
     response = parser.parse("A004 OK [APPENDUID 1 101:200] Done\r\n")
     uidplus  = response.data.code.data
     assert_instance_of Net::IMAP::UIDPlusData, uidplus
@@ -254,6 +259,11 @@ def test_fetch_binary_and_binary_size
     parser = Net::IMAP::ResponseParser.new(config: {
       parser_use_deprecated_uidplus_data:      true,
     })
+    assert_raise_with_message Net::IMAP::ResponseParseError, /uid-set is too large/ do
+      parser.parse(
+        "A004 OK [copyUID 1 10000:20000,1 1:10001] Done\r\n"
+      )
+    end
     response = parser.parse("A004 OK [copyUID 1 101:200 1:100] Done\r\n")
     uidplus  = response.data.code.data
     assert_instance_of Net::IMAP::UIDPlusData, uidplus