Skip to content

Commit

Permalink
corrected issue JabRef#2652
Browse files Browse the repository at this point in the history
  • Loading branch information
svenjaeger committed Mar 19, 2017
1 parent 63620ff commit 4346451
Show file tree
Hide file tree
Showing 2 changed files with 12 additions and 11 deletions.
17 changes: 9 additions & 8 deletions src/main/java/org/jabref/model/entry/AuthorList.java
Original file line number Diff line number Diff line change
Expand Up @@ -30,15 +30,15 @@
* <li> every comma separates tokens, while sequences of other separators are
* equivalent to a single separator; for example: "a - b" consists of 2 tokens
* ("a" and "b"), while "a,-,b" consists of 3 tokens ("a", "", and "b")
* <li> anything enclosed in braces belonges to a single token; for example:
* <li> anything enclosed in braces belongs to a single token; for example:
* "abc x{a,b,-~ c}x" consists of 2 tokens, while "abc xa,b,-~ cx" consists of 4
* tokens ("abc", "xa","b", and "cx");
* <li> a token followed immediately by a dash is "dash-terminated" token, and
* all other tokens are "space-terminated" tokens; for example: in "a-b- c - d"
* tokens "a" and "b" are dash-terminated and "c" and "d" are space-terminated;
* <li> for the purposes of splitting of 'author name' into parts and
* construction of abbreviation of first name, one needs definitions of first
* latter of a token, case of a token, and abbreviation of a token:
* letter of a token, case of a token, and abbreviation of a token:
* <ul>
* <li> 'first letter' of a token is the first letter character (<CODE>Character.isLetter(c)==true</CODE>)
* that does not belong to a sequence of letters that immediately follows "\"
Expand All @@ -49,7 +49,7 @@
* "{\noopsort{\"o}}xyz" 'first letter' is "o", in "{\AE}x" 'first letter' is
* "A", in "\aex\ijk\Oe\j" 'first letter' is "j"; if there is no letter
* satisfying the above rule, 'first letter' is undefined;
* <li> token is "lower-case" token, if its first letter id defined and is
* <li> token is "lower-case" token if its first letter is defined and is
* lower-case (<CODE>Character.isLowerCase(c)==true</CODE>), and token is
* "upper-case" token otherwise;
* <li> 'abbreviation' of a token is the shortest prefix of the token that (a)
Expand All @@ -63,19 +63,19 @@
* as "{\noopsort{A}}.", while BiBTeX produces "j."; fixing this problem,
* however, requires processing of the preabmle;
* </ul>
* <li> 'author name's in 'author field' are subsequences of tokens separated by
* <li> 'author names' in 'author field' are subsequences of tokens separated by
* token "and" ("and" is case-insensitive); if 'author name' is an empty
* sequence of tokens, it is ignored; for examle, both "John Smith and Peter
* Black" and "and and John Smith and and Peter Black" consists of 2 'author
* name's "Johm Smith" and "Peter Black" (in erroneous situations, this is a bit
* different from BiBTeX behavior);
* <li> 'author name' consists of 'first-part', 'von-part', 'last-part', and
* 'junior-part', each of which is a sequence of tokens; how a sequence of
* tokens has to be splitted into these parts, depends the number of commas:
* tokens has to be split into these parts, depends the number of commas:
* <ul>
* <li> no commas, all tokens are upper-case: 'junior-part' and 'von-part' are
* empty, 'last-part' consist of the last token, 'first-part' consists of all
* other tokens ('first-part' is empty, if 'author name' consists of a single
* other tokens ('first-part' is empty if 'author name' consists of a single
* token); for example, in "John James Smith", 'last-part'="Smith" and
* 'first-part'="John James";
* <li> no commas, there exists lower-case token: 'junior-part' is empty,
Expand All @@ -100,7 +100,7 @@
* <li> two or more commas (any comma after the second one is ignored; it merely
* separates tokens): 'junior-part' consists of all tokens between first and
* second commas, 'first-part' consists of all tokens after the second comma,
* tokens before the first comma are splitted into 'von-part' and 'last-part'
* tokens before the first comma are split into 'von-part' and 'last-part'
* similarly to the case of one comma; for example: in "de la Vall{\'e}e
* Poussin, Jr., Charles Louis Xavier Joseph", 'first-part'="Charles Louis
* Xavier Joseph", 'von-part'="de la", 'last-part'="Vall{\'e}e la Poussin", and
Expand Down Expand Up @@ -167,7 +167,8 @@ public static AuthorList parse(String authors) {

// Handle case names in order lastname, firstname and separated by ","
// E.g., Ali Babar, M., Dingsøyr, T., Lago, P., van der Vliet, H.
if (!authors.toUpperCase(Locale.ENGLISH).contains(" AND ") && !authors.contains("{") && !authors.contains(";")) {
if (!authors.toUpperCase(Locale.ENGLISH).contains(" AND ") && !authors.contains("{") && !authors.contains(";")
&& ((authors.length() - authors.replace(",", "").length()) > 2)) {
List<String> arrayNameList = Arrays.asList(authors.split(","));

// Delete spaces for correct case identification
Expand Down
6 changes: 3 additions & 3 deletions src/main/java/org/jabref/model/entry/AuthorListParser.java
Original file line number Diff line number Diff line change
Expand Up @@ -309,14 +309,14 @@ private String concatTokens(List<Object> tokens, int start, int end, int offset,
* additional information is given in global variables <CODE>token_start</CODE>,
* <CODE>token_end</CODE>, <CODE>token_abbr</CODE>, <CODE>token_term</CODE>,
* and <CODE>token_case</CODE>; namely: <CODE>orig.substring(token_start,token_end)</CODE>
* is the thext of the token, <CODE>orig.substring(token_start,token_abbr)</CODE>
* is the text of the token, <CODE>orig.substring(token_start,token_abbr)</CODE>
* is the token abbreviation, <CODE>token_term</CODE> contains token
* terminator (space or dash), and <CODE>token_case</CODE> is <CODE>true</CODE>,
* if token is upper-case and <CODE>false</CODE> if token is lower-case.
*
* @return <CODE>TOKEN_EOF</CODE> -- no more tokens, <CODE>TOKEN_COMMA</CODE> --
* token is comma, <CODE>TOKEN_AND</CODE> -- token is the word
* "and" (or "And", or "aND", etc.) or a colon, <CODE>TOKEN_WORD</CODE> --
* "and" (or "And", or "aND", etc.) or a semicolon, <CODE>TOKEN_WORD</CODE> --
* token is a word; additional information is given in global
* variables <CODE>token_start</CODE>, <CODE>token_end</CODE>,
* <CODE>token_abbr</CODE>, <CODE>token_term</CODE>, and
Expand All @@ -339,7 +339,7 @@ private int getToken() {
tokenEnd++;
return TOKEN_COMMA;
}
// Colon is considered to separate names like "and"
// Semicolon is considered to separate names like "and"
if (original.charAt(tokenStart) == ';') {
tokenEnd++;
return TOKEN_AND;
Expand Down

0 comments on commit 4346451

Please sign in to comment.