Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bare string concatenation uses lots o' stack #1795

Closed
enebo opened this issue Nov 9, 2023 · 4 comments
Closed

bare string concatenation uses lots o' stack #1795

enebo opened this issue Nov 9, 2023 · 4 comments

Comments

@enebo
Copy link
Collaborator

enebo commented Nov 9, 2023

When JRuby is loading the serialized blob via Ruby stdlib unicode_normalize/tables.rb it hits large chains of strings built using string syntax concat chains:

  accents = "" \
    "[\u0300-\u034E" \
    "\u0350-\u036F" \
    "\u0483-\u0487" \
    "\u0591-\u05BD" \
    "\u05BF" \
    "\u05C1\u05C2" \
    "\u05C4\u05C5" \
    "\u05C7" \
    "\u0610-\u061A" \
    "\u064B-\u065F" \
    "\u0670" \
    "\u06D6-\u06DC" \
    "\u06DF-\u06E4" \
    "\u06E7\u06E8" \
    "\u06EA-\u06ED" \
    "\u0711" \
    "\u0730-\u074A" \
    "\u07EB-\u07F3" \
    "\u07FD" \
    "\u0816-\u0819" \
    #... a lot more.

This leads to a large amount of a stack trace like:

	at org.prism.Loader.loadNode(Loader.java:561)
	at org.prism.Loader.loadNode(Loader.java:561)
	at org.prism.Loader.loadNode(Loader.java:561)
	at org.prism.Loader.loadNode(Loader.java:561)
	at org.prism.Loader.loadNode(Loader.java:561)
	at org.prism.Loader.loadNode(Loader.java:561)
	at org.prism.Loader.loadNode(Loader.java:561)
	at org.prism.Loader.loadNode(Loader.java:561)
	at org.prism.Loader.loadNode(Loader.java:561)
	at org.prism.Loader.loadNode(Loader.java:561)
	at org.prism.Loader.loadNode(Loader.java:561)
	at org.prism.Loader.loadNode(Loader.java:561)
	at org.prism.Loader.loadNode(Loader.java:561)
	at org.prism.Loader.loadNode(Loader.java:561)
	at org.prism.Loader.loadNode(Loader.java:561)
	at org.prism.Loader.loadNode(Loader.java:561)
	at org.prism.Loader.loadNode(Loader.java:561)
	at org.prism.Loader.loadNode(Loader.java:561)
	at org.prism.Loader.loadNode(Loader.java:561)
	at org.prism.Loader.loadNode(Loader.java:561)
	at org.prism.Loader.loadNode(Loader.java:561)
	at org.prism.Loader.loadNode(Loader.java:561)
	at org.prism.Loader.loadNode(Loader.java:561)
	at org.prism.Loader.loadNode(Loader.java:561)
	at org.prism.Loader.loadNode(Loader.java:561)
	at org.prism.Loader.loadNode(Loader.java:561)
	at org.prism.Loader.loadNode(Loader.java:561)
	at org.prism.Loader.loadNode(Loader.java:561)
	at org.prism.Loader.loadNode(Loader.java:561)
	at org.prism.Loader.loadNode(Loader.java:561)

If I increase our stack (which thankfully is possible to do with the JVM) then I can get this file to load. In legacy parsers this is all built up into a single string instead of putting it into a cons chain in the AST. JRuby does not want to grow stack to load this data file so I am wondering if we should rethink putting this into the tree or not?

@kddnewton
Copy link
Collaborator

Yup! That's still on my list. This would be fixed by #1479. If it's alright, I'm going to close this one in favor of that one.

@enebo
Copy link
Collaborator Author

enebo commented Nov 10, 2023

@kddnewton hah. I forgot I opened that or I would have added this as a comment to it. I guess the only value of opening this was to point out a real-world problem with the current implementation.

@duerst
Copy link
Member

duerst commented Nov 10, 2023

We can also change the format of unicode_normalize/tables.rb if that helps.

@kddnewton
Copy link
Collaborator

Thank you @duerst that's very kind. I think we should fix it on our end, however, as I'm sure it's not just that file that has this issue and we want to be as performant as possible regardless of the source.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants