-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance issue with new search #1993
Comments
Ok, my current workaround is to use the latest release version (JabRef 3.6) which does not exhibit these performance problems. |
I cannot confirm this on my ~2200 entry database. There, search returns instant results. However, my database is also considerably smaller than yours. Also, I am not up to date with the recent changes in search. @koppor? Also, @JabRef/developers What is the largest test file we have for testing performance? |
@bartsch-dev Can you have a look?
|
@lenhard If I send you the database, however, I would ask you not to share it or make it publicly available, as it may contain sensible information (preferentially you would delete the file, once it is no longer needed). |
@AEgit ok, thanks! You can send it to joerg.lenhard@kau.se Then I will see how search works with my machine and delete your file afterwards. It would be cool if one day we had a huge file for JabRef for testing purposes. Maybe we could go for auto-generated entries, but then again randomized data will never be representative of real actual data. |
Thanks! I played around with the file a little bit and also noticed a delay, albeit less than a second, which would not annoy me. But that is very person-dependent and also, of course, machine-dependent. If performance has been better before, we should try to at least keep it at that level. For the record, according to @AEgit's observation, the decrease was introduced some time between today and August 31st (The first comment mentioned that everything was still fine with this commit: 4913665 ) |
The performance benchmarks confirm the problem: current master: So the search score went down from nearly 200 to 30. (note that all the numbers decreased slightly...not good) |
@ Everyone: Thank you for your quick replies. Yea, I prefer it if I can avoid sending the file - maybe once I'll have the time to go through the database and delete all the stuff, that shouldn't be public, I could make it available (but currently I don't have the time). As pointed out by lenhard, the changes that let to the performance drop must have happened between the following two builds: JabRef 3.7dev--snapshot--2016-09-16--master--4913665 |
I made some quick tests and the entry search in the searchworker is about 8 times slower since 543c176 than before. |
Making the stream a parallelStream makes it significantly better but still worse than before (around 2 times slower). |
@bartsch-dev You bet me by 17 seconds with suggesting parallelization ;) Well, 2 times slower is better than 8 times slower. Anyways, if lambda expressions cause the delay we should probably think about re-implementing performance-critical operations in the old style. @bartsch-dev can you reimplement that fragment in an interative style and see how it performs? for(Entry entry: database.getEntries()){
if(searchQuery.matches(entry)) {
myLinkedList.add(entry);
}
} Care to use a |
It's not really the lambda that causes that problem. |
Well, the |
Could it be related to the Unicode stuff? |
Just to keep everyone updated: I've tried the most recent snapshot (JabRef 3.7dev--snapshot--2016-09-27--master--b8d1c2e), but it is still clearly slower than previous versions. I will fall back again to the latest release version JabRef 3.6. |
I played around a little bit, and got following benchmark (note the score of the parallel search which is much higher than the search before the performance issue):
The problem is that in the PR where I changed the searchbar I changed the Refs #518 |
Hmmm, indeed, your benchmark results indicate that the parallel search is much faster than the initial one. I guess that the change to the ContainBasedSearchRule, however, has such a high impact on the performance (indeed, you mention it to be 8 times quicker before the change), that the increased performance in the parallel search is not able overcome the overall problem, i.e., that the search still is still much slower than it was previously. If it is not possible to solve the problem in another way: Would it be possible to implement a switch which enables the user to activate either the "fast search" (which does not convert the Latex commands) or the "slow/correct search" (which does convert the Latex commands - with the associated performance drop)? I guess it is just a workaround, but that might be a solution? |
First of all, it is a good thing if we have found the component that is causing the trouble, good job @bartsch-dev It will probably make no difference if the database contains LaTeX characters or not, I think it is the searching for LaTeX in #518 is a possible long-term solution. However, if we compute a LaTeX-free version of every field on every change, load times of databases might go up significantly (which could be a compromise. Normally, you load once and might search often). Also other features might be affected. On the other side, we can of course circumvent increased load-times by implementing a cache-based version (computing LaTeX-free field content on demand). Then, the first search would be slow, but subsequent ones fast. All in all, I think this solution, including the cache, is worthwhile. @JabRef/developers What do you think other implications would be? Should we go for implementing #518? |
I think we should discuss this one day in a devcall. But in general it looks like this is the only long term solution we have. But I do not really understand why people keep latex encoded entries in their db, as normally all modern text writing programs support UTF-8. |
IMHO there are two obvious solutions for this to speed things up:
I just tried the first option, the gained performance is enormous (https://github.com/bartsch-dev/jabref/commit/7a9740a758146f3af5016a42af73b5ba93298cda):
|
Option 1 goes in the direction of "store latex free version" #518.
Thus every entry should be asked about its fields exactly once. Thus caching shouldn't affect the benchmark. Since I'm not a big fan of caching (it makes development harder in general, since you often have to change two things), I would propose to improve performance of the unicode to latex formatter instead. I'll have a look at it during the weekend if nobody wants to do it before (@bartsch-dev: ?) |
The latex free field gets stored as soon as the field gets set (thus the database needs about double the space it did before), not when it gets accessed. I'm away over the long weekend. You can try your luck (I opt for option 2, where do we the strings with latex commands anyway, except saving the database to disk which only occurs if the user enters the special characters in latex style). |
Regarding caching: There is no point of improving performance if JabRef dies for large databases due to memory usage. @AEgit What is the current memory footprint for your 8000+ entry database? Will it be a problem for your enviroment if JabRef takes up twice of what it currently does? |
On Windows 10 Jabref currently uses ~1.4 GB of RAM for a database with nearly 9,000 entries (and several hundreds of groups). Doubling the memory footprint would be no issue for me, as I'm running this on a machine with 16 GB of RAM. |
Yes I have. |
I note that I pointed this out as a probable reason quite some time ago. ;-) @Siedlerchr There are for sure still use cases for non-UTF-8 databases. IEEE is just one. |
Plain LaTeX is important for so many users. For instance, when publishing scientific papers. Nearly no publisher supports biblatex. And bibtex8 still seems to be a hack and is 20 years old now. Sure, switching to biblatex is recommended, but not practical. |
I doubt that caching the UTF-8 string ("latex-free version") takes twice as memory, because I assume that 80% of the Strings will be the same. The latex-free version can use String.intern() to reduce memory consumption drastically (see also http://stackoverflow.com/q/10578984/873282). We can also add |
@oscargus Indeed, your intuition paved the way :-) Summarizing the discussion, I think we have a consensus that something will be done. We currently have the following options:
Let us decide which way to go at the next devcall! |
I think rewriting the converters would make sense anyway. I have just touched them, but there are at least three(?) places where the conversion map is checked, which seems quite redundant. I haven't really studied it in depth, but it may be possible to write a faster converter from scratch with a selectable conversion map (to deal with Latex to {Unicode, HTML, XML, RTF}), although unification and speed may be contradictory... A first step might be to profile the LatexToUnicode converter to see where the actual bottleneck is. |
It is not my place to say this, as I'm not an active developer: But, would it be possible for you to go for the quickest solution among the ones mentioned so that I can switch to the new developer version. Because currently I can't and because of that I can also not use Google Scholar (see recent posts here: #1886), which is quite essential to my work (much of it has no DOI but is found by Google Scholar). Again, this is just a suggestion and apologies if I'm overstepping a boundary as a passive user who doesn't help in actively developing this wonderful piece of software. |
@AEgit if you desperately want a development version with quicker search, then you can get one right now here: http://builds.jabref.org/latex-free-fields/ This should be somewhat faster already. I'll do a few more minor improvements tomorrow and I have also not yet tested as to how much faster this actually is and how much memory it consumes. But it would be interesting to hear from you how search performs anyway. The first search run should still be pretty slow, but subsequent ones would be faster. @oscargus I am doing a few minor optimizations in the LatexToUnicode, but a rewrite would probably the most reasonable option. I have to say that I barely understand what that algorithm does... |
JabRef 3.7dev--snapshot--2016-09-29--latex-free-fields--93e665e @lenhard : Thank you so much!!! The memory footprint seems to be nearly the same: I think there's a max. increase of 200 MB (Jabref is now using 1.6 GB instead of 1.4 GB) - but that is negligible on my machine. I think it also uses more CPU power - at least initially I heard my fans spinning up: Again, no issue for me, as I'm running this on a quite powerful Core i7 laptop. Thank you very much! |
Ok, small bug spotted in JabRef 3.7dev--snapshot--2016-09-29--latex-free-fields--93e665e It's no longer possible to assign an article to a group by drag'n'drop. If I try that, nothing happens. Instead I have to manually enter the groups name in the "Groups" field: Then it works. (I didn't know whether I should open a new ticket as you mentioned that you would do a few minor improvements tomorrow anyway) |
@AEgit That is great to hear :-) I will do some more testing tomorrow, but it sounds like we are getting this issue solved. It is also good that the additional memory needed seems to be acceptable. If so, then that is certainly due to the suggestions by @koppor The groups bug you mention should be unrelated and I only touch the LaTeX converter tomorrow, so please open a new issue. |
Another small bug in JabRef 3.7dev--snapshot--2016-09-29--latex-free-fields--93e665e The DOI resolver does not appear anymore in the Web search interface. |
The DOI resolver is now in BibTeX -> New entry |
@mlep |
Would it help to press CTRL n and then the Input field is focussed?
|
Hmmm, yes, I was already using CTRL+N and focussing the input field would help - but it's not as convenient as it was before. At least I think so - maybe I just need to get used to it: But changing the focus to the input field would definitely help. |
@zellerdev
|
Another small bug I came across in JabRef 3.7dev--snapshot--2016-09-29--latex-free-fields--93e665e
I think a similar issue has been reported so far. |
I think we should add a side panel for creation of entries based on DOI etc.
|
As far as I can tell, the small bug mentioned for JabRef 3.7dev--snapshot--2016-09-29--latex-free-fields--93e665e (entry preview not correctly displayed after having performed a search - see posts above me), appears to be fixed in JabRef 3.7dev--snapshot--2016-09-30--latex-free-fields--0ee1d17. UPDATE: No, I was to quick - the bug still appears (but it is a bit more difficult to reproduce - I will open a new ticket). |
This is fixed in #2102. Discussion regarding related issues that have been discovered in this context should be carried on in separate threads. |
JabRef 3.7dev--snapshot--2016-09-16--master--629c3f0
windows 10 10.0 amd64
Java 1.8.0_101
With the current master branch (master--629c3f0) there is a significant decrease in performance when using the "search" feature of Jabref compared to a previous master branch (master--4913665).
Before the update entering a search term led to an instantaneous search and nearly instantaneous results. Now I have to wait up to 3 sec every time I'm entering a search term or modifying it. As I'm using the search feature quite extensively this is quite annoying.
The database is quite large (>8000 entries), but this shouldn't be the problem as it worked before the update.
Unfortunately I forgot to save the previous installer - Could you provide me with the older master--4913665?
The text was updated successfully, but these errors were encountered: