Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add numeric value size limits via StreamReadConstraints (fixes sonatype-2022-6438) -- default 1000 chars #827

Merged
merged 17 commits into from
Nov 29, 2022

Conversation

pjfanning
Copy link
Member

@pjfanning pjfanning commented Oct 22, 2022

  • Introduces a default max length for numbers of 1000 chars - this default can be overridden (eg set to a much higher limit)
  • we shouldn't allow super large numbers as a default (I can justify this in private exchanges but don't want to spell it out in a public forum)
  • checks the string/char[] length before parsing the text as a number
  • this is a bit late but better than not checking at all - because parsing numbers is expensive and slower than O(n)
  • the problem with doing an earlier check (while building the char[] is that is not always clear how the TextBuffer is going to be used. Is it being used to get text or is it being used to buffer up the chars needed to parse a number
    • if someone can point out how we can know in advance if the TextBuffer is destined to used for number parsing, then the num size limit can be checked earlier

To create a JSONFactory that allows larger numbers, you can use this code

        JsonFactory f = JsonFactory.builder()
                .streamReadConstraints(StreamReadConstraints.builder().withMaxNumberLength(10000).build())
                .build();

EDIT: fixes sonatype-2022-6438

* @param maxNumLen max number of characters that are allowed
* @since 2.14
*/
public void setMaxNumLen(int maxNumLen) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick comment: I would want to avoid setters (and probably getters too) since 3.0 has parser and generators fully immutable. Also settings should probably be passed in a separate config object to reduce number of methods/parameters being passed: something like StreamReadConfig / StreamWriteConfig (maybe), sub-classable on per-format basis. Configured using builder()-style approach; starting with default settings.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like wiring the config together is the most complicated bit of implementing this. Should probably start with master and later backport to 2.x.

With master, if I were to add a StreamReadConfig - is it ok to pass that via the IOContext? This seems like the easiest way to get the StreamReadConfig to the TextBuffer and JsonParser instances (where the num len checking would happen). So the TSFFactoryBuilder could have a method for setting the StreamReadConfig and when the TokenStreamFactory creates the IOContext, it could set the StreamReadConfig on it.

Copy link
Member

@cowtowncoder cowtowncoder Oct 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmh. Ok, first things first: I would suggest going via 2.15, not master, due to package renaming. It'll probably be tricky either way, but I don't think master->2.15 is any easier than the other way around.

But be that as it may, bigger question is that of passing config objects. My first gut reaction is that making these go through IOContext is not clean. But the thing about TextBuffer might be relevant -- passing to JsonParser / JsonGenerator I wouldn't worry as much about (factories have been changed a few times and while unfortunate it's internal interface).
I wouldn't be totally against passing it via IOContext but would prefer alternative.

One thing I do not remember, which may be relevant, is whether format backends sub-class IOContext or not; I have a vague recollection might do, for recycling or such.

EDIT: yes, Csv format module has

public class CsvIOContext extends IOContext { } 

but has pretty minimal functionality; basically just wires in secondary TextBuffer with recycler. So that wouldn't be a problem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've introduced a StreamReadConfig - and it can only be set via the TSFBuilder.

Internally, I still haven't found a better way to pass around the StreamReadConfig than using the IOContext.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for not thinking this through. As per my CR comments, I think it should be StreamReadLimits (or perhaps StreamReadConstraints).

But one thing that is also needed is to make sure values are fully immutable so they can be shared by parser instances etc. Changes could be done using wihtMaxNumberLength() or such, but that may become unmaintainable when adding values. So perhaps it is necessary to start with builder style construction immediately.

However.... I guess I am ok without doing that right away so we could merge things in sooner.

So it is ok not to add builder as the first thing.

Otherwise I like this; passing limits to/via IOContext makes sense, I think.

@pjfanning pjfanning changed the base branch from 2.14 to 2.15 November 11, 2022 14:09
@pjfanning pjfanning changed the title WIP: Num size limit Num size limit Nov 11, 2022
@pjfanning pjfanning marked this pull request as ready for review November 11, 2022 14:14
@pjfanning
Copy link
Member Author

@cowtowncoder would you be able to re-review this when you get a chance?

@cowtowncoder
Copy link
Member

Yes @pjfanning I will try to get back to this as soon as possible.


private static final int DEFAULT_MAX_NUM_LEN = 1000;

private int _maxNumLen = DEFAULT_MAX_NUM_LEN;
Copy link
Member

@cowtowncoder cowtowncoder Nov 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to also consider the fact that as-is this would be specific to textual formats. Binary formats would either:

  1. Interpret this as byte limit instead of char (which is probably fine), or
  2. use different setting

Either way would need to be documented once we figure out way to go.
Also worth mentioning in Javadocs that application of limits is up to format backend implementation (but that can be added later on).

Copy link
Member Author

@pjfanning pjfanning Nov 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently all the existing code takes Strings and parses Strings - everything has been converted to text before this check is applied

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand this is the case for default handling, and in 2.x. Just noting that these checks would not be applied by, say, CBOR or Smile parsers as they use binary encoding of numbers and do not convert to Strings in most cases.

But I think it'd make sense to consider length in "units of input", which can then be chars or bytes as appropriate.

@pjfanning
Copy link
Member Author

@cowtowncoder I think I've addressed most of your issues with some recent commits. I've updated the PR description with how the code is called. The class is now StreamReadConstraints and has a builder().

@cowtowncoder
Copy link
Member

Ok couple of follow-up question on limits:

  1. Is it reasonable to have same length limit for integral and floating-point numbers? Latter are (afaik) much more expensive to handle -- and former might have more legit use cases (for encryption keys, I think?). Or should those be split into 2 settings
  2. Is 1000 bit low as a limit? I don't know if there are use case past that, or perhaps limits other parsers use. I know it's to some degree arbitrary, but since we are adding a new constraint it'd be nice to reduce chance of breaking existing use cases

@pjfanning
Copy link
Member Author

Ok couple of follow-up question on limits:

  1. Is it reasonable to have same length limit for integral and floating-point numbers? Latter are (afaik) much more expensive to handle -- and former might have more legit use cases (for encryption keys, I think?). Or should those be split into 2 settings
  2. Is 1000 bit low as a limit? I don't know if there are use case past that, or perhaps limits other parsers use. I know it's to some degree arbitrary, but since we are adding a new constraint it'd be nice to reduce chance of breaking existing use cases
  1. I think you'd want to want to have some very specialised requirement to want to set a different limit for integers and for BigDecimals. For normal users, it's simpler to have 1 setting. For edge case users, they would have to set the single limit to be the higher of the limits that they have in their heads.
  2. 1000 is a very long number under almost any user's definition. From my testing, parsing performance of both integers and BigDecimals tails off dramatically after about 1000 digits - exponential loss in performance.

@cowtowncoder
Copy link
Member

Ok couple of follow-up question on limits:

  1. Is it reasonable to have same length limit for integral and floating-point numbers? Latter are (afaik) much more expensive to handle -- and former might have more legit use cases (for encryption keys, I think?). Or should those be split into 2 settings
  2. Is 1000 bit low as a limit? I don't know if there are use case past that, or perhaps limits other parsers use. I know it's to some degree arbitrary, but since we are adding a new constraint it'd be nice to reduce chance of breaking existing use cases
1. I think you'd want to want to have some very specialised requirement to want to set a different limit for integers and for BigDecimals. For normal users, it's simpler to have 1 setting. For edge case users, they would have to set the single limit to be the higher of the limits that they have in their heads.

2. 1000 is a very long number under almost any user's definition. From my testing, parsing performance of both integers and BigDecimals tails off dramatically after about 1000 digits - exponential loss in performance.

Ok. Besides, I hope we get some feedback during 2.15 RC phase (I'm incurable optimist).
But maybe I should do some googling to see how others set these limits.

nielserik pushed a commit to folio-org/edge-connexion that referenced this pull request Dec 7, 2023
Upgrade Vert.x from 4.3.8 to the Poppy version 4.4.6.

Upgrade log4j from 2.17.* to 2.20.0 to use a version that is compatible with Vert.x.

Upgrading Vert.x indirectly upgrades jackson-core from 2.14.0 to 2.15.0 fixing Number Parse DoS (RISMA-2023-0067): FasterXML/jackson-core#827

Upgrading Vert.x indirectly upgrades Netty from 4.1.87.Final to 4.1.100.Final fixing HTTP/2 DoS: https://nvd.nist.gov/vuln/detail/CVE-2023-44487
@cristinaconstantinc
Copy link

Hi! Shouldn't be a CVE created for this?

@cowtowncoder
Copy link
Member

@cristinaconstantinc There is no real upside from project perspective, but if someone wants to create a reproduction of a problem for older version(s) and file CVE they can go ahead. And if so we would include information about version fixed in and so on.

This based on there not being (to my knowledge) actual published attack against this, work was done pro-actively based on concerns.

julianladisch added a commit to folio-org/mod-invoice-storage that referenced this pull request May 5, 2024
https://folio-org.atlassian.net/browse/MODINVOSTO-181

jackson-core package versions before 2.15.0 are vulnerable to Denial of Service (DoS): FasterXML/jackson-core#827

mod-invoice-storage pins the jackson version to 2.13.4. This effectively downgrades the jackson version provided by RMB (domain-models-runtime, domain-models-api-interfaces) from 2.16.1 to 2.13.4.

Fix: Unpin jackson.
julianladisch added a commit to folio-org/mod-audit that referenced this pull request May 5, 2024
jackson-core package versions before 2.15.0 are vulnerable to Denial of Service (DoS):
FasterXML/jackson-core#827

mod-audit pins the jackson version to 2.14.1. This effectively downgrades the jackson version provided by RMB (domain-models-runtime, domain-models-api-interfaces) from 2.16.1 to 2.14.1.

Fix: Unpin jackson.
julianladisch added a commit to folio-org/mod-invoice that referenced this pull request May 5, 2024
https://folio-org.atlassian.net/browse/MODINVOICE-545

jackson-core package versions before 2.15.0 are vulnerable to Denial of Service (DoS):
FasterXML/jackson-core#827

mod-invoice pins the jackson version to 2.13.4. This effectively downgrades the jackson version provided by RMB (domain-models-runtime, domain-models-api-interfaces) from 2.16.1 to 2.13.4.

Fix: Unpin jackson.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cve Issues related to public CVEs (security vuln reports) processing-limits Issues related to limiting aspects of input/output that can be processed without exception
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants