Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

number vs issue #726

Closed
moewew opened this issue Mar 6, 2018 · 25 comments
Closed

number vs issue #726

moewew opened this issue Mar 6, 2018 · 25 comments

Comments

@moewew
Copy link
Collaborator

moewew commented Mar 6, 2018

Cf. https://tex.stackexchange.com/q/418590/35864, retorquere/zotero-better-bibtex#925, plk/biblatex-apa#45

Currently the docs are quite strict about number being an integer. For the numbers as they appear in @articles that seems to be a bit too restrictive. There are at least two cases where one would put something other than a plain integer there

  1. Special issues identified with a letter and integer, e.g. 'S1' (Journal articles in special issues biblatex-apa#45)
  2. Number ranges such as '2-3' (Exporting the reference of a journal's double issues retorquere/zotero-better-bibtex#925)

At the moment the documentation seems to suggest to use issue for non-integer input, and this is what Zotero BBT does. I find this unsatisfying since the output in the standard styles is significantly different when using issue as compared to number. It has also been established that for most intents and purposes number is the correct field for the subdivision of a journal volume.

If biblatex and Biber were to accept non-integer values for number the two cases above would easily give the expected output. Cautious style developers could use \ifnumerals, \ifnumeral or \ifinteger if they want to make sure the output does not end up looking stupid if they do anything special to the number field.

The only downside to this that I can see is sorting. If number is treated as a string, sorting pure integer-valued number fields might not give the expected output. But no standard sorting schemes sort by number...

There is even precedence for a non-integer number in biblatex-examples.bib

@patent{almendro,
author = {Almendro, Jos{\'e} L. and Mart{\'i}n, Jacinto and S{\'a}nchez,
Alberto and Nozal, Fernando},
title = {Elektromagnetisches Signalhorn},
number = {EU-29702195U},
date = 1998,
location = {countryfr and countryuk and countryde},
langid = {german},
annotation = {This is a \texttt{patent} entry with a \texttt{location}
field. The number is given in the \texttt{number} field. Note
the format of the \texttt{location} field in the database
file. Compare \texttt{laufenberg}, \texttt{sorace}, and
\texttt{kowalik}},
}

and

@report{chiu,
author = {Chiu, Willy W. and Chow, We Min},
title = {A Hybrid Hierarchical Model of a {Multiple Virtual Storage}
({MVS}) Operating System},
type = {resreport},
institution = {IBM},
date = 1978,
number = {RC-6947},
langid = {english},
langidopts = {variant=american},
sorttitle = {Hybrid Hierarchical Model of a Multiple Virtual Storage (MVS)
Operating System},
indextitle = {Hybrid Hierarchical Model, A},
annotation = {This is a \texttt{report} entry for a research report. Note
the format of the \texttt{type} field in the database file
which uses a localization key. The number of the report is
given in the \texttt{number} field. Also note the
\texttt{sorttitle} and \texttt{indextitle} fields},
}
@report{padhye,
author = {Padhye, Jitendra and Firoiu, Victor and Towsley, Don},
title = {A Stochastic Model of {TCP Reno} Congestion Avoidance and
Control},
type = {techreport},
institution = {University of Massachusetts},
date = 1999,
number = {99-02},
location = {Amherst, Mass.},
langid = {english},
langidopts = {variant=american},
sorttitle = {A Stochastic Model of TCP Reno Congestion Avoidance and
Control},
indextitle = {Stochastic Model of {TCP Reno} Congestion Avoidance and Control,
A},
annotation = {This is a \texttt{report} entry for a technical report. Note
the format of the \texttt{type} field in the database file
which uses a localization key. The number of the report is
given in the \texttt{number} field. Also note the
\texttt{sorttitle} and \texttt{indextitle} fields},
abstract = {The steady state performance of a bulk transfer TCP flow
(i.e. a flow with a large amount of data to send, such as FTP
transfers) may be characterized by three quantities. The first
is the send rate, which is the amount of data sent by the
sender in unit time. The second is the throughput, which is
the amount of data received by the receiver in unit time. Note
that the throughput will always be less than or equal to the
send rate due to losses. Finally, the number of non-duplicate
packets received by the receiver in unit time gives us the
goodput of the connection. The goodput is always less than or
equal to the throughput, since the receiver may receive two
copies of the same packet due to retransmissions by the
sender. In a previous paper, we presented a simple model for
predicting the steady state send rate of a bulk transfer TCP
flow as a function of loss rate and round trip time. In this
paper, we extend that work in two ways. First, we analyze the
performance of bulk transfer TCP flows using more precise,
stochastic analysis. Second, we build upon the previous
analysis to provide both an approximate formula as well as a
more accurate stochastic model for the steady state throughput
of a bulk transfer TCP flow.},
file = {ftp://gaia.cs.umass.edu/pub/Padhey99-markov.ps},
}

Here are three real-life examples that could benefit from number not being an integer

\documentclass[british]{article}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage{babel}
\usepackage{csquotes}

\usepackage[style=authoryear, backend=biber]{biblatex}

\usepackage{filecontents}
\begin{filecontents}{\jobname.bib}
@article{finkelstein2013,
  author       = {Amy Finkelstein and Erzo F. P. Luttmer and Matthew J. Notowidigdo},
  title        = {What Good is Wealth Without Health?},
  subtitle     = {The Effect of Health on the Marginal Utility of Consumption},
  journaltitle = {Journal of the European Economic Association},
  volume       = {11},
  number       = {Suppl. 1},% Suppl. 1/Supplement 1 would probably need an adjustment of the output format to look nice
  date         = {2013},
  pages        = {221–258},
  doi          = {10.1111/j.1542-4774.2012.01101.x},
}
@article{keels2013,
  author       = {Keels, Micere},
  title        = {Getting them enrolled is only half the battle},
  subtitle     = {College Success as a Function of Race or Ethnicity, Gender, and Class},
  journaltitle = {American Journal of Orthopsychiatry},
  volume       = {83},
  number       = {2-3},
  date         = {2013},
  pages        = {310-322},
  doi          = {10.1111/ajop.12033},
}
@article{fogliano2011,
  author       = {Fogliano, Vincenzo and Corollaro, Maria Laura and Vitaglione, Paola and Napolitano, Aurora and Ferracane, Rosalia and Travaglia, Fabiano and Arlorio, Marco and Costabile, Adele and Klinder, Annett and Gibson, Glenn},
  title        = {In Vitro Bioaccessibility and Gut Biotransformation of Polyphenols Present in the Water-Insoluble Cocoa Fraction},
  journaltitle = {Molecular Nutrition \& Food Research},
  volume       = {55},
  number       = {S1},
  date         = {2011},
  pages        = {S44-S55},
  doi          = {10.1002/mnfr.201000360},
}
\end{filecontents}

\addbibresource{\jobname.bib}

\begin{document}
\cite{finkelstein2013,keels2013,fogliano2011}
\printbibliography
\end{document}
@retorquere
Copy link

Given that people are already putting things like ranges in number, and biblatex already accepts non-numbers there, things wouldn't get worse by just making the docs reflect the implemented behavior, correct?

@retorquere
Copy link

(although I'd then have no way to decide in what field to put the zotero data, but that's tangential to this issue)

@moewew
Copy link
Collaborator Author

moewew commented Mar 10, 2018

@plk What do you think about this? The only drawback to not requiring number to be an integer that I can see at the moment is sorting.

@plk
Copy link
Owner

plk commented Mar 10, 2018

In fact, the entire sorting system was overhauled in a major way to use a more efficient and typed algorithm precisely to accommodate integer sorting for these fields. The field type for sorting can be modified by the user in the sorting template but I would like to have a numeric field. I can perhaps make some changes to accommodate ranges (sorting on the first number in the range) but for arbitrary non-numeric parts, I think a new field would be better. Numeric sorting is important and having a free-form field again reverts to alpha sorting which impacts performance and is much less algorithmically clean.

@retorquere
Copy link

So does that mean the issue field is back in favor?

@moewew
Copy link
Collaborator Author

moewew commented Mar 10, 2018

Mhhh, so this solitary drawback is quite a massive one. At the moment, however, no sorting scheme sorts by number, so practically this should not be too problematic.

I'm not too keen on a new field, since volume+number has been an established combination for very long, squeezing a new one in would need changes in many places and probably hamper adoption.

I definitely do not want people to put ranges in the issue field. And I think it would be great if things like S1 would also be considered OK in the number field.

@plk
Copy link
Owner

plk commented Mar 10, 2018

I see the point so it would seem as simple as changing the datatype of number from integer to literal in the default data model? Since nothing sorts on this by default, I see no reason not to change the default and let people who really want number to be numeric to restrict it by a custom data model?

@moewew
Copy link
Collaborator Author

moewew commented Mar 10, 2018

That would be the best option in my book. Of course I would not mind if you looked into integer range sorting, but that would not solve the core problem here (plus I appreciate that you have better things to do ...).

@retorquere
Copy link

So what is left for the issue field then?

@moewew
Copy link
Collaborator Author

moewew commented Mar 10, 2018

Good question. I would use issue only for subdivisions of a year, say "summer", "Michaelmas term", ... in that regard it's probably more like the season part of the date (hence the position). Whereas number is a subdivision of a volume. The only useful values for number are only integer ranges and a few special things like "supplement"/"special issue" plus possibly a number. I don't know if there is a good way to determine algorithmically where to put what.

@retorquere
Copy link

Unfortunately for me, determining algorithmically where to put what is exactly what I'd have to do. I'll probably do something like:

  • consists only of numbers and dashes, numbers optionally preceded by S: number
  • appears in a configurable list of things like "supplement/special issue" with some sensible defaults (any ideas on what constitutes sensible defaults would be much appreciated): number
  • anything else: issue

@moewew
Copy link
Collaborator Author

moewew commented Mar 11, 2018

Mhhhh... Since I really don't like issue I'd go for

  • ((S)?\d+(-(S)?\d+)?)(,\s*(S)?\d+(-(S)?\d+)?)* (excuse the crude RegEx) is number (so allow not only dashed ranges, but also those with commas).
  • The seasons are issue
  • (Optionally supplement/special issue/special + number are number)
  • Everything else is probably number, but you could actually ask the user to reconsider their choice here.

but really your choice is fine as well. There are only a few things that make sense here and everything else will give weird and unpleasant results regardless of what you go for. There may be better choices in specific cases for specific styles, but that is not something you should have to worry about.

@retorquere
Copy link

I can't ask the user -- Zotero preps the export and hands me the references to convert, no user interaction possible. It also doesn't have a separate field for seasons, so if the above collapses to "seasons go to issue, all the rest goes to number, I'm still stuck with detecting seasons.

Another option is to dump everything in number and have the users enter data using the cheater syntax if they want something in the issue field, but that would mean that data would not show up anywhere but in the biblatex export, which would mean double work for the user.

If this discussion goes too far of track for this repo, please do let me know.

@moewew
Copy link
Collaborator Author

moewew commented Mar 11, 2018

Let's head back over to retorquere/zotero-better-bibtex#925 to discuss this further.

moewew added a commit to moewew/biblatex that referenced this issue Mar 11, 2018
@moewew
Copy link
Collaborator Author

moewew commented Mar 11, 2018

dev...moewew:numberint has hopefully all the necessary changes to turn number back into a literal.

@moewew
Copy link
Collaborator Author

moewew commented Mar 18, 2018

#730 / b2d9097 have officially turned number into a literal field. The documentation no longer implies that only integer values are valid for number.

@moewew moewew added this to the v3.12 milestone Oct 21, 2018
@moewew
Copy link
Collaborator Author

moewew commented Nov 6, 2018

biblatex 3.12 has been released and is available in TeX live 2018 and MikTeX now. That means that the changes discussed here have made it to the release version of biblatex.

@ThiloteE
Copy link

If i may ask for your humble opinion,

Question:

Would it be conform with Biblatex, if Jabref were to somehow fetch the (article-) number, move it to the number field and move the issue-number from the number field into the issue field?

"Short" summary and description of the problem:

Since the issue field is mainly declared for seasons according to Biblatex standard, but publishers largely provide issue-numbers that are not only seasons but are integers, they put the issue-number into the number field, which is Biblatex conform, i think. Now i am unable to fathom to understand where ideally, the article-number should be put. I personally have not encountered any dataset yet, that put both the issue-number AND the article-number both into the number field at the same time.

Some publishers just abstain from providing it in Bibtex formated data. Some other publishers have now even started putting the actual article-number into the pages field. For the latter, I only can speculate that it is because the citation-style they prefer to use, only renders the issue-number and article-number at the same time, if there is no page-range present.

Additional info:

@moewew
Copy link
Collaborator Author

moewew commented Jan 11, 2022

The whole issue vs number issue is a bit confusing.

Base BibTeX only has number and does not know an issue field. The BibTeX documentation btxdoc explains that

An issue of a journal or magazine is usually identified by its volume and number

and so in BibTeX you unambiguously use volume and number to specify the issue in which an article appeared (BibTeX was developed in the eighties, so this would be a printed issue we're talking about). The BibTeX standard styles print volume and number as

<journal>, <volume>(<number>)

biblatex added issue to the mix. The biblatex documentation had

This field is intended for journals whose individual issues are identified by a designation such as ‘Spring’ or ‘Summer’ rather than the month or a number.

and printed

<journal> <volume>.<number> (<issue> <year>)

Apparently, it was felt that the volume+number scheme alone was insufficiently flexible for all kinds of journal types. I find that the combination "<volume>.<number>" really only looks good when number is a number of a short alphanumeric designator, whereas the BibTeX "<volume>(<number>)" would also look OK-ish with slightly more complex number designators. So maybe that thought played a role. But maybe the new field was simply motivated by journals that traditionally don't use volume designations at all and just go with the publication year (Summer 2021 issue or 3/2021). In almost all real-world examples of @article entries that I have seen so far number was the best choice to represent the issue number.

Roughly speaking number subdivides volume and issue is much closer to subdividing year. I don't think I would want to say that issue is subordinate to number or vice versa. They sort of operate on a similar level.


But this is all from the good old 'we have a printed journal with page numbers' perspective. Once you throw electronic journals - where articles are identified via an article number and not a page range (within a printed issue) - into the mix, things get more interesting, because you get an additional number: the 'article number'.

In my opinion these article numbers should be rendered in pretty much the same position like page numbers, but they obviously should not be prefixed with "p."/"pp." or the like. The biblatex field for article numbers is eid. It is not particularly well known and I cannot guarantee that all contributed styles make sense of it (especially those following style guides, which may make no mention of article numbers). For a long time eid was only supported for @articles, but recently (#847, #1000) eid was added for all entry types, for which it makes sense.

Base BibTeX has no corresponding field (see also https://tex.stackexchange.com/q/445888/35864), so I can understand that publisher put this into the pages field. But that may not come out nicely in all situations.


To answer your question: Moving the issue number to issue and article number to number would not be my preference, because the issue number is traditionally number and the article number is eid in biblatex.

@ThiloteE
Copy link

Thank you so much! This made everything a little bit more clear.

I always wondered what eid is, but i never understood the explanation in the documentation and failed to associate it with article-number. I rarely had seen it being included in bibliographic data and since it is not the only type of ID that can be rendered via Biblatex (e.g. DOI, ISSN, Eprint,...) i thought it must be something quite exotic.

Good to know!

@pauloney
Copy link
Collaborator

pauloney commented Jan 11, 2022 via email

@moewew
Copy link
Collaborator Author

moewew commented Jan 12, 2022

Comments on 308a69d would be appreciated.

@ThiloteE
Copy link

ThiloteE commented Jan 12, 2022

With regard to 308a69d

  • A quick search on the net found: Instead of "issue-number" it should probably be "issue number". Sorry, you probably followed my spelling, but I make typos. I am not a native English speaker.

  • "by only enumerating articles or papers and not pages." and "(within a work that is not published in print and\slash or without page numbers)"

    An article having a eid does not necessarily mean that it has not been published in print or does not have page numbers or a page range, no? A publisher could print it, publish it online and also provide all three: number pages and eid. I think this could be handled more flexible. Is it not the job of the citation style to choose if eid or the pages field are preferred for rendering or in general which of the fields should be rendered?

    Just my thoughts, correct me if i am wrong.

moewew added a commit that referenced this issue Jan 12, 2022
@moewew
Copy link
Collaborator Author

moewew commented Jan 12, 2022

Thanks for the comments.

  • I agree that "issue number" is probably better than "issue-number" (but then again, I'm not a native speaker either, so that probably does not mean much). As far as I can see, the documentation uses only "issue number" without the hyphen, so we should be good, right?
  • Agreed. There may be cases where you have an eid and pages, though my feeling would be that this would be uncommon, since once you assign meaningful page numbers, the eid is superfluous. I think we're good in the first passage, because that is just an example "This field may replace ...". This does not say it has to replace pages or that you cannot have both pages and eid. To me this gives the most salient use case of eid. I have rephrased the other passage.

1d01aa1

@ThiloteE
Copy link

Looks good!

Sorry, i must have misread issue-number.

"This field may replace the \bibfield{pages} field for journals deviating from the classic pagination scheme of printed journals by only enumerating articles or papers and not pages."

The above sentence could be replaced with:

This field may replace---or be used along with---the \bibfield{pages} field for journals deviating from the classic pagination scheme of printed journals.

I think this would be more coherent with your fine lines provided in 1094 and 1831, but as long as people understand. Yours is fine too 😁

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants