-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance problem about reindexChildren with replace parentNode #1281
Comments
Hi - do you have an example code snippet and some HTML? So I can validate and perf test. |
Hi, this is sample code about this
sample is not special, just many rows html data below this is just 100 rows data, it will be not slow
|
Thanks. I took a first look, and I think the bulk of the spent is in the re-index when removing from the original parent. It looks like it's quadratic performance. I think an approach will be in addChildren, see if all the input nodes have the same parent (which is likely almost all the case), and if so, do a bulk move and only call reindex once on each of the input and output nodes. |
it's good your idea. ( can I know when 1.12.2 will be released? ) |
Thanks, have implemented a fast path. Hoping to get 1.12.2 out over the next week. Please feel free to build from HEAD and let me know if you run into any issues. |
Hi @whsoul, jsoup 1.12.2 is available now. https://jsoup.org/news/release-1.12.2 |
@jhy |
Hello,
my application suffers from poor performance in case of over 10Mbyte html data..
In my case,
I take a parsed HTML document from jsoup parser, and replace element with wrapperElement class ( custom class RichElement ) to add more custom data with element attributes.
This call the method addChildren at Element.class,
with over 100,000 parsed elements,
but this is very slow...
because addChildren call reparentChild ( => setParentNode) 100,000 times,
each setParentNode occures unnecessary reindexing for sibling children about 1/2 * 100,000 times in this case;;
Do you hava any idea,
only replace parent keep children without unnecessary reindexing?
or
Could you add some method like below?
Element.class
Node.class
The text was updated successfully, but these errors were encountered: