Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CBL-5522: Port - N1QL Parser has exponential slowdown for redundant parentheses #1984

Merged
merged 3 commits into from
Apr 8, 2024

Conversation

callumbirks
Copy link
Contributor

No description provided.

@callumbirks callumbirks requested a review from pasin March 19, 2024 10:29
@borrrden
Copy link
Member

Context of why the change fixes something?

@callumbirks
Copy link
Contributor Author

@borrrden For expressions with many redundant layers of parentheses (i.e. (((expr)))) N1QL parser performance decreases exponentially, the more levels of parentheses. An expression that takes 60ms with 1 pair of parentheses, takes several minutes with 5.
The reason is because the parser operates recursively, and previously the only expression which captured parentheses is at the very end of that recursive chain, at _baseExpr. So for every level of parentheses, the parser would go through expr9, 8, 7, etc until reaching _baseExpr, strip one level of parentheses, then start at the top again.

This change strips all but 1 pair of parentheses from each expression, at the very top-level. So all unnecessary recursion due to parentheses is avoided.

@jianminzhao
Copy link
Contributor

Have you compared with this test, "N1QL Performance"/"N1QL benchmark"?

@callumbirks
Copy link
Contributor Author

callumbirks commented Mar 27, 2024

Have you compared with this test, "N1QL Performance"/"N1QL benchmark"?

@jianminzhao No? I don't believe that's relevant?
For the cases of redundant parentheses, I saw an expression like
SELECT * FROM _ WHERE ((((((type == '7')))))) reduced from several minutes to milliseconds.
FYI this fix is already in 3.1.7 also, this is just a port.

@callumbirks
Copy link
Contributor Author

@jianminzhao See previous PR #1983
Jens was also happy with this change.

@cbl-bot
Copy link

cbl-bot commented Mar 27, 2024

Code Coverage Results:

Type Percentage
branches 69.85
functions 80.42
instantiations 36.84
lines 80.37
regions 76.97

@callumbirks
Copy link
Contributor Author

@jianminzhao I ran the test out of curiosity. This issue doesn't really show with the small amount (4) nested parentheses in the test.
I changed it to 6 nested, I get this in master:

Elapsed time/check time = 2.52361/0.5

And in this branch:

Elapsed time/check time = 0.000412333/0.5

@callumbirks
Copy link
Contributor Author

With 4 nested, in master:

Elapsed time/check time = 0.026721/0.5

In this branch:

Elapsed time/check time = 0.000403917/0.5

@jianminzhao
Copy link
Contributor

The purpose of that test is to catch regression in performance. PEG has quirky thing that some rules may affect performance gravely. The degradation can show up in as simple as several level of parenthesis.
How does your new rule affect that test? Will it be run the test just as fast no matter how many levels of parenthesis?

@callumbirks
Copy link
Contributor Author

@jianminzhao Exactly yes. Because it removes all the parentheses at the top level, without recursion, the performance impact of the number of parentheses is now negligible.

@callumbirks callumbirks requested a review from jianminzhao March 28, 2024 11:12
@jianminzhao
Copy link
Contributor

That is not the purpose of that test. The purpose of the test is to discover performance degradation of other grammar rules. Now, this test becomes meaningless.

@borrrden
Copy link
Member

The problem is solved though, so what is the issue? If you want to write more tests to try other potentially performance impacting queries that's fine.

@jianminzhao
Copy link
Contributor

The purpose of this commit is to improve the performance by removing unnecessary parenthesis. The purpose of this particular test is to magnify the performance degradation of additional grammar rule. In short, this test can be removed with this commit.

@callumbirks callumbirks requested a review from jianminzhao April 4, 2024 10:40
@callumbirks callumbirks merged commit 6ea6fbf into master Apr 8, 2024
9 checks passed
@callumbirks callumbirks deleted the CBL-5522 branch April 8, 2024 17:13
jianminzhao added a commit that referenced this pull request May 23, 2024
CBL-5629: Update zlib to 1.3.1 (#2032)
CBL-5627: Update min MacOS version to 12.0 (#2033)
CBL-5539: Add an API to check if a vector index is trained or not (#2035)
CBL-5628: Update mbedtls to 2.28.8 (#2027)
374d485 Support latest vectorsearch (dev branch) and hybrid queries (#1980)
5c3c854 Lazy vector index updating (#1949)
CBL-5522: Port - N1QL Parser has exponential slowdown for redundant parentheses (#1984)
ab19634 Part of CBL 5579 in order to facilitate VS on .NET Android (#1993)
CBL-5507: Fix index-past-end in CookieStore (#1982)
CBL-5591: Binary Decoder to account for the new Logging object path (#1995)
294c3f8 Define _LIBCPP_REMOVE_TRANSITIVE_INCLUDES (#1987)
CBL-5438: DateTime standard format parser (#1977)
CBL-5498: Util changes for ConnectedClient (#1978)
CBL-5450: Remote rev KeepBody flag could be cleared accidentally
f8a8de2 Remove UWP builds from build scripts (#1954)
CBL-5425: Binary Encoder to encode the (Logging) object path (#1986)
CBL-4661: Fix ROUND_EVEN. (#1981)
jianminzhao added a commit that referenced this pull request Aug 26, 2024
CBL-5629: Update zlib to 1.3.1 (#2032)
CBL-5627: Update min MacOS version to 12.0 (#2033)
CBL-5539: Add an API to check if a vector index is trained or not (#2035)
CBL-5628: Update mbedtls to 2.28.8 (#2027)
374d485 Support latest vectorsearch (dev branch) and hybrid queries (#1980)
5c3c854 Lazy vector index updating (#1949)
CBL-5522: Port - N1QL Parser has exponential slowdown for redundant parentheses (#1984)
ab19634 Part of CBL 5579 in order to facilitate VS on .NET Android (#1993)
CBL-5507: Fix index-past-end in CookieStore (#1982)
CBL-5591: Binary Decoder to account for the new Logging object path (#1995)
294c3f8 Define _LIBCPP_REMOVE_TRANSITIVE_INCLUDES (#1987)
CBL-5438: DateTime standard format parser (#1977)
CBL-5498: Util changes for ConnectedClient (#1978)
CBL-5450: Remote rev KeepBody flag could be cleared accidentally
f8a8de2 Remove UWP builds from build scripts (#1954)
CBL-5425: Binary Encoder to encode the (Logging) object path (#1986)
CBL-4661: Fix ROUND_EVEN. (#1981)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants