HTTP 500 for processCitationList
on non-breaking whitespace string
#849
Labels
bug
From Hemiptera and especially its suborder Heteroptera
This is a pretty obscure corner case (parsing a mangled string), so may not be a priority to reproduce and fix. But it did come up in real use of GROBID and resulted in an internal error (HTTP 500) instead of a 4xx or 2xx status code.
The source of this citation string is the Crossref DOI reference metadata for the DOI
10.5817/cz.muni.m210-9541-2019
. The JSON metadata can be fetched from https://api.crossref.org/v1/works/http://dx.doi.org/10.5817/cz.muni.m210-9541-2019. In thereferences[]
array, the reference with keyref127
has the following JSON structure:If parsed in to Python and printed:
it is easier to see that this is not a simple space, it is Unicode character
\uA0
, which is "NO-BREAK SPACE".When this string is submitted as a citation to
parseCitationList
, GROBID returns a 500 error. The stack trace looks like:As a workaround, clients can simply not submit weird wihtespace strings for parsing.
The text was updated successfully, but these errors were encountered: