-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for dictionary synchronization issue #4
Fix for dictionary synchronization issue #4
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great start, let's review Guava dependency and some style issues.
@@ -21,6 +21,8 @@ | |||
import java.util.Iterator; | |||
import java.util.List; | |||
|
|||
import com.google.common.base.Supplier; | |||
import com.google.common.base.Suppliers; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't see Guava in the maven pom file, how is it in scope as a dependency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't find it out either, but it is there as compile time dependency when I run maven :/
import org.apache.lucene.analysis.kr.morph.CompoundEntry; | ||
import org.apache.lucene.analysis.kr.morph.MorphException; | ||
import org.apache.lucene.analysis.kr.morph.WordEntry; | ||
|
||
public class DictionaryUtil { | ||
|
||
private static Trie<String,WordEntry> dictionary; | ||
private static Supplier<Trie<String,WordEntry>> DICTIONARY = Suppliers.memoize(new Supplier<Trie<String, WordEntry>>() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand "keep the style as the surrounding file" but not having spaces between the type params in the generic declaration is crazy! ಠ_ಠ
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
argh, my bad, I tried to fix that when I see them. I will fix it
public Map<String, String> get() { | ||
try { | ||
return readDict(KoreanEnv.FILE_PREFIX); | ||
} catch (MorphException e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like all the try/catch layout is repetitive and needs to be factored out into its own method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, good point, I will move that into readDict()
List<String> lines; | ||
|
||
try { | ||
lines = FileUtil.readLines(KoreanEnv.getInstance().getValue(KoreanEnv.FILE_UNCOMPOUNDS),"UTF-8"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
formatting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what formatting issue your seeing here?
throw new RuntimeException("Failed to initialize UNCOMPOUNDS dictionary.", e); | ||
} | ||
|
||
for(String compound: lines) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs some commenting, formatting, and cleanup - this block is messy and why do we need a custom StringUtil?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just moved that block, so I tried to change as little as possible. If I start changing, I will probably change every line in that code. There is StringBuffer usage everywhere!!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good news is: everything looks fine, and if you look at modern ES plugins (https://github.com/elastic/elasticsearch/blob/master/plugins/repository-hdfs/build.gradle) you'll see some of them actually use Guava, so we shouldn't get any pushback from the community for adding a guava dependency here.
The bad news: I think that before this merges, you need to align on guava versions, and fix the POM so that we understand how guava's getting pulled in. Remember that ES 2.3 will have a Guava version of "guava-18.0.jar". Somehow the ES5 hdfs plugin depends on guava 16 for compiling, so that doesn't seem right. I just reviewed the gradle build file for core ES, and it seems to just use Guava for tests (it's a test compile dependency only).
@@ -312,12 +296,12 @@ public static String combineAndEomiCheck(char s, String eomi) throws MorphExcept | |||
* @param type 1: josa, 2: eomi | |||
* @throws MorphException |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no longer true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The plugin depends on elasticsearch (2.3), which depends on guava 1.8. That is how guava jar is being added to the classpath at compile time. I don't see any change required here to fix the guava reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait are you replying to the "no longer true" because that's about the javadoc being outdated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the guava issue: yep, 18.0 but how is ES core depending on Guava 18.0? It's listed in their gradle build file as a test dependency only
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- fix the bad javadoc
- merge it shipit
There is one (or two) issue when dictionaries are lazily initialized. The null verification that checks whether a dictionary has been initialized or not is not synchronized. That means that a dictionary may be loaded more than once, because the reference to it is not synced.
The other big issue is that on at least two of those dictionaries, the reference to the new dictionary escapes before it has completely been built. That leads to several problems when a second instance of the same dictionary is initialized while another thread is reading the first instance of that same dictionary.
The fix uses Guava's memoize to make sure that a dict is initialized only once and safely published. I was able to reproduce the bug and verify the fix with the code below.
Note that I did a very minimal effort to make those dictionary immutable, which would be ideal to avoid any future sync issue. I considered it out of the scope of this fix, since it wasn't required to fix the current dict initialization issues.