-
-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve "title" page in docx by using DocProperties instead of simple text #5839
Comments
Sounds like a good idea: please do let us know what changes would be needed in the XML. |
Second thoughts: one issue may be that titles can contain formatting, but DocProperties not. |
You may be right, and we could lose complex formatting on the title if we inserted it as a DocProperty. Personally I think it is more valuable to have the title as a Property than to support complex formatting there (very simple formatting can be achieved with the Title style, which applies to the whole title), but I understand others could need complex formatting in the title text (so this would be a breaking change). Unless we find a sensible solution for both I guess this is currently a no-go 🤔 |
How about inserting the property if it's an unformatted string and the formatted title as-is otherwise? |
I'm not sure. That might lead to some unpredictability -- people expecting the things to be in sync based on past behavior, and then this breaking when a bit of formatting is added. How does it work, anyway? If we insert the property, does that mean they can no longer manually edit the title? Or does editing the title affect the property? In the latter case, what happens if they do try to add formatting? If editing the title isn't possible, that would be bad I think. |
Quick testing with a Word document (no pandoc involved):
It seems that if we want rich and complex styling in the document text, they cannot be tied to their document properties. On the other hand if they are not tied, the document metadata and the text could diverge, unless you are careful and manually modify both at the same time always (when editing in Word) |
Given this, I think we should just keep things as they are. |
I would like to reopen discussion of this point with the suggestions below. I would be willing to take a stab at the coding if a pull request would be likely to be accepted. ExtensionHow about adding an extension for the docx writer (for example XMLWith the extension enabled, the generated XML would look something like:
Author formattingMultiple authors will not be on individual lines, but separated by semi-colons (as per how pandoc populates the author docproperty). It think this is an acceptable trade-off. Abstract formattingI feel that dropping formatting for title, subtitle, author, date is a worthwhile trade-off (and expected for anyone who has use for this extension). For abstract I am not so sure. Generally speaking, since the abstract is included in the docproperties I think it would be nice to be able to link the representation to the property. But perhaps removing formatting in the abstract is too heavy-handed for many cases, in particular since it also removes paragraph separations. Perhaps a second extension As a side note: paragraphs in the abstract get mashed together without any separator in the docproperties. That is, the example below becomes ExampleHere is a document for testing purposes. Just run it through pandoc to a docx.
In the resulting document the FYI this is what you will get (after updating the fields), with the frontmatter generated by pandoc appearing first followed by the openxml contents: |
@jgm, my comment/proposal seems to have gone unnoticed. Should I open a new issue to get attention? (I guess I will if I don't hear anything within the next few days.) Thanks! |
Sorry, I haven't had a chance to think about this. But we can reopen the issue. |
My thinking was that in such cases one simply wouldn't use |
I'm still unclear about the motivation for the change. The stated motivation is to allow one to change the title, author, or abstract in just one place, rather than having to change it both in the properties and in the document itself. This is an issue that would come up only if you use pandoc to generate the docx and then do further work on the docx itself (rather than regenerating again from a markdown source). I tried inserting the Title property into a document using Word. When I then modified the property, the document didn't update. And modifying the rendered field in the document didn't update the property. I must be missing something. |
What @agusmba said. I struggle to find where to do that on mac, might be one of those things where the “insert title” function is available only on Windows, but the field itself actually works on mac. Anyway, you can use pandoc with the example in #5839 (comment) (just edited to correct indentation in the yaml metadata) to see what it looks like. Here is a screenshot on mac of the editable field you get for title (and authors): |
Yes. Unfortunately, in a professional environment I often generate a docx with pandoc that is then shared and further edited by non-pandoc folks. As a side note: even if fields such as date do not auto-sync with the property, the fact that it is a field will indicate to collaborators that they should update in properties and then update the field (and any date fields in headers/footers) rather than doing “inline changes” of each date. At least with a modicum of instruction. My motivation is that it makes it more likely that people will work with the document in Word in such a way that the document contents and properties stay in sync. |
Thanks for explaining the motivation further. I would prefer not to add an extension. What about a convention like this?
|
Or perhaps:
|
Sure. The first one is nice in that it allows to be selective in which properties to make into fields, although it will be bit verbose when desired for all/most fields. I think I could live with that. :) |
Is |
Maybe |
This is a request for enhancement related to how pandoc creates the "title" page in word.
Currently the metadata values used in docx's title page are inserted as text (title, author, date, subtitle, abstract), and they are also included as document properties.
While this looks good if your workflow stops there, it's not so convenient if you modify the docx later on, and want to change any of those properties. Basically you'd need to change them twice (one in the text on the title page and again as a document property).
The request would be to make pandoc write the title page using DocProperty references for these values instead of using simple text, allowing future evolution of the docx by changing only the docx metadata (no need to re-type the title, etc.)
If it's needed, I could analyze the differences between the new and current approach at the xml level.
Thanks!
The text was updated successfully, but these errors were encountered: