-
-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multilingual bibliography #126
base: main
Are you sure you want to change the base?
Conversation
485f99e
to
4f7bf0a
Compare
src/lang/codes.rs
Outdated
lazy_static! { | ||
static ref LANGUAGE_CODE_MAPPING: HashMap<&'static str, &'static str> = | ||
HashMap::from([ | ||
("english", "en"), | ||
("german", "ge"), | ||
("french", "fr"), | ||
("russian", "ru"), | ||
("italian", "it"), | ||
("chinese", "cn"), | ||
("japanese", "jp"), | ||
("ukranian", "ua") | ||
]); | ||
} | ||
|
||
/// This function returns mapping for required language | ||
pub fn get_mapping(s: &str) -> Option<&str> { | ||
return LANGUAGE_CODE_MAPPING.get(s).copied(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mapping seems incomplete and strangely used. As I see it, it should only be required for BibLaTeX as CSL defines its language field in terms of ISO-639-1 codes with regions. We interpret this as RFC 5646 language tags.
According to the BibLaTeX manual, the langid
entry (p. 28) controls the language of a citation. Its values shall be either a Babel/Polyglossia language tag which includes RFC 5646 language tags. The normalization to RFC 5646 belongs in the biblatex
crate from where we then can create 1:1 LocaleCode
structs in the interop
module. You may want to use an external dependency in BibLaTeX for the language names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, will change this soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please help me understand how and where you convert LanguageIdentifier
into LocaleCode
that is used everywhere?
UPD. I found out that CLI doesn't make this conversion at all. Currently, I've added following lines in reference
command parsing
for entry in &bibliography {
let mut item = CitationItem::with_entry(entry);
item.locale = Some(LocaleCode(String::from(
entry.language().unwrap().language.as_str(),
)));
driver.citation(CitationRequest::new(
vec![item],
&style,
locale.clone(),
&locales,
None,
))
}
It works, but I can't truly understand where should this conversion actually be done. It seems that it should happen somewhere earlier than this place.
Hello anyone?... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While reviewing this PR, I have investigated how this fits into the CSL ecosystem.
citeproc.js
and its use in Zotero is the reference implementation for CSL. When using Zotero to generate a bibliography, the app will ask the user to explicitly choose a language for that bibliography and will ignore the language
field of each entry.
This corresponds with our current behavior of using the locale in the BibliographyRequest
. Merging this PR would lead to a divergence between Hayagriva and Zotero/citeproc.js and violate the CSL spec.
There is an extension to CSL called CSL-M which aims, amongst other things, to accommodate multilingual bibliographies. In it, a CSL-M style file may contain multiple locale-specific layout attributes for bibliographies and citations as well as a fallback locale. Consider the following example:
<bibliography>
<layout suffix="." locale="da de">
<text macro="bibliography" />
</layout>
<layout suffix=".">
<text macro="bibliography" />
</layout>
</bibliography>
This file changes how citeproc.js
retrieves term localizations:
- An item with the language
da-DK
would be rendered with thelocale="da de"
layout and receive Danish terms. - An item with the language
de-DE
would be rendered with thelocale="da de"
layout but use Danish terms as well. This is because the locale attribute defines a fallback order for terms locales for all entries rendered with it. - An item with the language
jp-JP
does not match any locale-specific layouts and would be rendered with the locale-less last layout. Assuming the user specified their default locale asen-US
in the dropdown, the citation would not use Japanese but American English terms.
You can try this using the Juris-M fork of Zotero, previously known as Multilingual Zotero (MLZ) that focusses on CSL-M support. This behavior requires the citation processor to supply multiple styles, one of which has multiple layouts (known as polyglot) and one with just one layout. You can see this in the CSL-M style repo, for example with jm-chicago-fullnote-bibliography-polyglot.csl
and jm-chicago-fullnote-bibliography.csl
.
I do not personally find the CSL-M implementation the best since it requires a proliferation of styles as well as manually listing each supported language in a style. However, I also do not think that we should unconditionally choose to violate the spec. A configuration option to always prefer the terms from the language
field of an entry in BibliographyRequest
is a possible solution, another solution is to implement CSL-M which would be a larger effort and require changes in citationberg
.
What would you think to be the best course of action? Let's discuss it here before you push more code.
On a different note I noticed that this PR still contains formatting changes of unchanged code which I cannot reproduce. Please revert them to maintain a clean diff!
@@ -320,8 +320,11 @@ fn main() { | |||
|
|||
let mut driver = BibliographyDriver::new(); | |||
for entry in &bibliography { | |||
let mut item = CitationItem::with_entry(entry); | |||
let id = entry.language().unwrap_or_default(); | |||
item.locale = Some(LocaleCode(String::from(id.language.as_str()))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of doing this, CitationItem::with_entry
should use EntryLike
's resolve_standard_variable
feature to set the CitationItem
's locale.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm alive. Please, could you explain what's main purpose for resolve_standard_variable
and how can I achieve same result with it? I can't find any useful examples in codebase.
Thanks for your answer! I will study your investigation and reply a little bit later. Currently, I definitely have poor domain knowledge. |
Ok, I got it. Well, I think that the best idea for now is to just add configuration option. Let me explain why:
I wouldn't say that this situation matches at least one of this points, because
Also, I find the idea of making configuration option as good one, because personally I didn't care of any CSL specs before this day and as end-user I don't care about it at all. |
I'll also review changes in formatting and revert them. Still can't understand why do we have different formatting rules ;) I'll also squash commits when PR is done. |
Sounds good. I'll review the change once it hits this PR. |
Currently, there's no any support for
language
entry in CSL or BibLatex. This PR can add possibility to parse and use multiple languages in single bibliography, depending on the value of language parameter for each entry.