latex_to_unicode produces problematic filename #3920

bdcaf · 2018-04-04T07:20:39Z

I encountered this when retrieving information for doi:10.1088/1752-7155/7/1/017106.
Which retrieves the authors: Patrik {\v{S}}pan{\v{e}}l and Kseniya Dryahina and David Smith.

A cleanup/rename pdf gives following filename: S}pane{l}EtAl/Spanel2013 - A quantitative study.pdf (note the braces are produced like this) which not only is wrong, but also gives an error: Could not save file. Error in field 'file': Braces don't match.

I have directory pattern set to: [authEtAl:latex_to_unicode] and file format pattern set to [bibtexkey] - [shorttitle:latex_to_unicode]. And bibtex key pattern is: [auth:latex_to_unicode][year]. Also note that the bibtex key doesn't contain any }.

The text was updated successfully, but these errors were encountered:

Siedlerchr · 2018-04-06T19:04:01Z

I tried it locally and can confirm the behaviour. In fact it seems like the file directory pattern is not interpreted correctly.

        String fileNamePattern = "[bibtexkey] - [shorttitle:latex_to_unicode]";
        String directoryPattern = "[authEtAl:latex_to_unicode]";

Rename PDF is correct . Sample file Toot.pdf -> Toot - A quantitative study.pdf
Move Files Cleanup breaks it then -> S}pan{e}lEtAl/Toot - A quantitative study.pdf

Siedlerchr · 2018-04-06T19:34:02Z

Okay, the problem is not the latex2unicode itself, but our Authorlist parser.

Siedlerchr · 2018-04-06T20:05:49Z

@JabRef/developers Does anyone of you know why the first brace is removed?
That actually is the underlying root problem here:

jabref/src/main/java/org/jabref/model/entry/Author.java

Lines 370 to 386 in 0c34fa4

    
               /** 
        
                * Returns the name as "Last, Jr, F." omitting the von-part and removing 
        
                * starting braces. 
        
                * 
        
                * @return "Last, Jr, F." as described above or "" if all these parts 
        
                * are empty. 
        
                */ 
        
               public String getNameForAlphabetization() { 
        
                   StringBuilder res = new StringBuilder(); 
        
                   getLast().ifPresent(res::append); 
        
                   getJr().ifPresent(jr -> res.append(", ").append(jr)); 
        
                   getFirstAbbr().ifPresent(firstA -> res.append(", ").append(firstA)); 
        
                   while ((res.length() > 0) && (res.charAt(0) == '{')) { 
        
                       res.deleteCharAt(0); 
        
                   } 
        
                   return res.toString(); 
        
               }

tobiasdiez · 2018-04-07T09:17:54Z

Probably to sort {{JabRef}} under J and not {.

stefan-kolb · 2018-04-16T16:24:07Z

This should not be done in the model, but only inside the model for the UI, so we need to move this part of the code @Siedlerchr
Or at least the filename generation should not depend on it.

stefan-kolb · 2018-05-25T14:03:42Z

Tests:

    @Test
    public void testAuthEtAlBraces() {
        assertEquals("{\v{S}}pan{\v{e}}l",
                BibtexKeyGenerator.authEtal("Patrik {\\v{S}}pan{\\v{e}}l and Kseniya Dryahina and David Smith", "", "EtAl"));
        assertEquals("\\v{S}pan\\v{e}lEtAl",
                BibtexKeyGenerator.authEtal("Patrik \\v{S}pan\\v{e}l and Kseniya Dryahina and David Smith", "", "EtAl"));
    }

It is actually problemetic what to expect here.

Making the braces unbalanced is leading to problems in any code except sorting!
Keeping the braces produces problematic output for cases like key generation imho (not sure if key generation should only produce alphanumeric keys?!)

bdcaf · 2018-05-25T14:22:57Z

FWIW - as user I would expect the first. Or one using the correct unicode symbols.

koppor · 2018-06-01T09:21:56Z

DevCall:

AuthorClass: Strangest method to get names is taken.
Remove non-ASCII characters to ensure compatibility with pdflatex

Siedlerchr · 2019-04-20T19:15:45Z

Still present in 5.0 dev

k3KAW8Pnf7mkmdSMPHz27 · 2020-07-09T21:44:34Z

I can try to fix this one.

koppor · 2020-07-09T22:11:21Z

This is a hard one, but just goahead!

k3KAW8Pnf7mkmdSMPHz27 · 2020-07-10T19:41:05Z

@koppor right now directory names are allowed to contain unicode. Unless there have been complaints, shouldn't that remain the case?

I currently believe there are two issues the solution depends on

Is unicode allowed in the directory path?
Is it too much of a performance hit to call the LatexToUnicodeAdapter on all auth... patterns? (the "latex-free" string will not have been cached)

I prefer to use the LatexToUnicodeAdapter because I think it is a better user experience if both Gödel and G{\"o}del generate the same directory name, and changing the directory structure is an infrequent event. I'd guess that it would be very hard to generate the same directory name for both Gödels without using LatexToUnicodeAdapter.

Siedlerchr · 2020-07-10T20:41:16Z

Unicode path names are perfectly valid.
you can even use emoji on windows 10.

https://docs.microsoft.com/en-us/windows/win32/fileio/naming-a-file#file-and-directory-names

Regarding performance. Don't know but I would also go for the latex2unicode adapter. Seems reasonable to me. Don't you have the latex free author already in the author list class?
The author patterns are just equivalent to methods for getting the authors

k3KAW8Pnf7mkmdSMPHz27 · 2020-07-10T20:57:35Z

@Siedlerchr , @koppor earlier pointed out that,

Remove non-ASCII characters to ensure compatibility with pdflatex

but perhaps that is only relevant for the bibtex key? I am not entirely sure about the use case (except for organizing pdfs). Does people use it to organize plots, etc. that they later import into a .tex file?

Don't you have the latex free author already in the author list class?

No, the latex-free methods cache full "patterns" (e.g., authorsLastOnly), unfortunately not individual authors or this particular pattern.

k3KAW8Pnf7mkmdSMPHz27 · 2020-07-10T21:04:46Z

Actually, you could use AuthorList#getAsLastNamesLatexFree and split that string. That would remove the performance bottle-neck at the cost of having a "hacky" solution.

k3KAW8Pnf7mkmdSMPHz27 · 2020-07-10T21:12:10Z

Never mind. Unless the user has the exact right preferences AuthorList#getAsLastNamesLatexFree would amount to the exact same solution, with an extra split operation at the end.

Authors fields are currently not latex-free. I'd consider it an option to change that, and cache latex-free Authors instead of AuthorLists.

Siedlerchr added bug Confirmed bugs or reports that are very likely to be bugs cleanup-ops labels Apr 6, 2018

stefan-kolb added this to the v4.3 milestone Apr 26, 2018

Siedlerchr mentioned this issue Apr 27, 2018

Incorrect(?) parsing of {\textendash} when renaming files #3990

Closed

1 task

Siedlerchr removed this from the v4.4 milestone Oct 2, 2018

stefanct mentioned this issue Dec 2, 2018

"Rename file" fails silently #2 #4527

Closed

Siedlerchr added the good first issue An issue intended for project-newcomers. Varies in difficulty. label Apr 20, 2019

This was referenced Jul 15, 2019

Fix issue #3920 #5131

Closed

[WIP] Fix for issue 3920 and tests #5132

Closed

koppor assigned koppor and calixtus Mar 31, 2020

koppor removed the good first issue An issue intended for project-newcomers. Varies in difficulty. label Jul 9, 2020

k3KAW8Pnf7mkmdSMPHz27 mentioned this issue Jul 23, 2020

Readability for citation key patterns #6706

Merged

13 tasks

koppor closed this as completed in #6706 Sep 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

latex_to_unicode produces problematic filename #3920

latex_to_unicode produces problematic filename #3920

bdcaf commented Apr 4, 2018 •

edited

Loading

Siedlerchr commented Apr 6, 2018

Siedlerchr commented Apr 6, 2018 •

edited

Loading

Siedlerchr commented Apr 6, 2018

tobiasdiez commented Apr 7, 2018

stefan-kolb commented Apr 16, 2018 •

edited

Loading

stefan-kolb commented May 25, 2018 •

edited

Loading

bdcaf commented May 25, 2018

koppor commented Jun 1, 2018

Siedlerchr commented Apr 20, 2019

k3KAW8Pnf7mkmdSMPHz27 commented Jul 9, 2020

koppor commented Jul 9, 2020

k3KAW8Pnf7mkmdSMPHz27 commented Jul 10, 2020 •

edited

Loading

Siedlerchr commented Jul 10, 2020

k3KAW8Pnf7mkmdSMPHz27 commented Jul 10, 2020

k3KAW8Pnf7mkmdSMPHz27 commented Jul 10, 2020 •

edited

Loading

k3KAW8Pnf7mkmdSMPHz27 commented Jul 10, 2020 •

edited

Loading

latex_to_unicode produces problematic filename #3920

latex_to_unicode produces problematic filename #3920

Comments

bdcaf commented Apr 4, 2018 • edited Loading

Siedlerchr commented Apr 6, 2018

Siedlerchr commented Apr 6, 2018 • edited Loading

Siedlerchr commented Apr 6, 2018

tobiasdiez commented Apr 7, 2018

stefan-kolb commented Apr 16, 2018 • edited Loading

stefan-kolb commented May 25, 2018 • edited Loading

bdcaf commented May 25, 2018

koppor commented Jun 1, 2018

Siedlerchr commented Apr 20, 2019

k3KAW8Pnf7mkmdSMPHz27 commented Jul 9, 2020

koppor commented Jul 9, 2020

k3KAW8Pnf7mkmdSMPHz27 commented Jul 10, 2020 • edited Loading

Siedlerchr commented Jul 10, 2020

k3KAW8Pnf7mkmdSMPHz27 commented Jul 10, 2020

k3KAW8Pnf7mkmdSMPHz27 commented Jul 10, 2020 • edited Loading

k3KAW8Pnf7mkmdSMPHz27 commented Jul 10, 2020 • edited Loading

bdcaf commented Apr 4, 2018 •

edited

Loading

Siedlerchr commented Apr 6, 2018 •

edited

Loading

stefan-kolb commented Apr 16, 2018 •

edited

Loading

stefan-kolb commented May 25, 2018 •

edited

Loading

k3KAW8Pnf7mkmdSMPHz27 commented Jul 10, 2020 •

edited

Loading

k3KAW8Pnf7mkmdSMPHz27 commented Jul 10, 2020 •

edited

Loading

k3KAW8Pnf7mkmdSMPHz27 commented Jul 10, 2020 •

edited

Loading