Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No reference to equations in docx output #1

Closed
deb75 opened this issue Dec 24, 2024 · 5 comments
Closed

No reference to equations in docx output #1

deb75 opened this issue Dec 24, 2024 · 5 comments
Labels
enhancement New feature or request

Comments

@deb75
Copy link

deb75 commented Dec 24, 2024

I use this pandoc filter in the process of translating org (emacs) files to docx.

Here is an example of org file :

#+TITLE: Test
#+PANDOC_OPTIONS: number-sections:t
#+PANDOC_OPTIONS: toc:t
#+PANDOC_OPTIONS: citeproc:t
#+PANDOC_OPTIONS: filter:pandoc-tex-numbering
#+PANDOC_METADATA: number-sections:False
#+PANDOC_METADATA: equation-prefix:Eq


* First section

\begin{equation}
\label{eq:1}
\sqrt{2}
\end{equation}


\begin{equation}
\label{eq:2}
a=2
\end{equation}

This is equation \ref{eq:1}

and this is equation \ref{eq:2}

It is possible to write raw latex equations in Emacs org files, the cross reference syntax is slightly different but latex syntax
can still be used.

Here is in attachment the output docx file
test.docx

Equations get labeled, but references to them do not appear.

Alternatively, I tried on the command line :

printf "\\\begin{equation}\\\label{eq:1}a=b\\\end{equation} this is \\\ref{eq:1} hello" | pandoc --from latex -t native  --filter=pandoc-tex-numbering

and I get the output :

[ Div
    ( "eq:1" , [] , [] )
    [ Para
        [ Math DisplayMath "\\label{eq:1}a=b\\qquad{(0.1)}"
        , Space
        , Str "this"
        , Space
        , Str "is"
        , Space
        , Link
            ( ""
            , []
            , [ ( "reference-type" , "ref" )
              , ( "reference" , "eq:1" )
              ]
            )
            [ Str "0.1" ]
            ( "#eq:1" , "" )
        , Space
        , Str "hello"
        ]
    ]
]

which seems correct. To the least, pandoc-tex-numbering does the job. The curious thing is that I have to escape three times (\) each latex command.

However, if I change of input format, but keeping the very same input :

 printf "\\\begin{equation}\\\label{eq:1}a=b\\\end{equation} this is \\\ref{eq:1} hello" | pandoc --from org -t native  --filter=pandoc-tex-numbering

with the output :

[ Para
    [ Emph [ Str "a" ]
    , Str "\8196"
    , Str "="
    , Str "\8196"
    , Emph [ Str "b" ]
    , Space
    , Str "this"
    , Space
    , Str "is"
    , Space
    , RawInline (Format "latex") "\\ref{eq:1}"
    , Space
    , Str "hello"
    ]
]

In this case, pandoc-tex-numbering does not seem to do anything.

What am I doing wrong ? and is it possible to extend pandoc-tex-numbering to org files ?

@fncokg
Copy link
Owner

fncokg commented Dec 24, 2024

Short answer

I fixed this with a lua filter. Download the org_helper.lua fitler and use:

pandoc --lua-filter org_helper.lua --filter pandoc-tex-numbering.py input.org -o output.docx

This works well for me on your testing file. However, I have no more org files/projects to test. Please test this on your real project and let me know the result.

Long answer

The problem originates from the default org reader of pandoc rather than this filter. The default org reader of pandoc does not parse LaTeX codes by default. For example, LaTeX equations in equation environments and cross references via \ref{} macros are parsed as RawBlock and RawInline nodes, while we desire Math nodes and Link nodes respectively. The org_helper.lua filter helps further read these blocks via latex reader and after that, the pandoc-tex-numbering.py filter can work as expected.

Related discussions can also be found in pandoc issue #1764 (codes in org_helper.lua are based on comments from @tarleb in this issue) .

@fncokg
Copy link
Owner

fncokg commented Dec 24, 2024

You can also test the behavior of the org reader with and without org_helper.lua filter (with powershell):

Equations

With:

echo "\begin{equation} a=b \end{equation}" | pandoc -f org -t native

You get:

[ RawBlock
    (Format "latex") "\\begin{equation} a=b \\end{equation}\n"
]

With:

echo "\begin{equation} a=b \end{equation}" | pandoc -f org -t native --lua-filter org_helper.lua

You get:

[ Div ( "" , [] , [] ) [ Para [ Math DisplayMath "a=b" ] ] ]

Reference

With:

echo "\ref{eq:1}" | pandoc -f org -t native

You get:

[ Para [ RawInline (Format "latex") "\\ref{eq:1}" ] ]

With:

echo "\ref{eq:1}" | pandoc -f org -t native --lua-filter org_helper.lua

You get:

[ Para
    [ Link
        ( ""
        , []
        , [ ( "reference-type" , "ref" )
          , ( "reference" , "eq:1" )
          ]
        )
        [ Str "[eq:1]" ]
        ( "#eq:1" , "" )
    ]
]

@deb75
Copy link
Author

deb75 commented Dec 24, 2024

Thank you very much for adressing this issue 👍

A workaround that works also is first to convert from org to tex file, then to apply pandoc with pandox-tex-numbering filter from latex to docx. But of course, a direct solution is always better.

The last adjustment that could enhance the solution you give is that cross references in orgmode
are usually performed with org-ref. A reference to an equation reads as follows :

This is equation eqref:eq:1

This does not work : "eqref:eq:1" is treated as raw text. But I guess it is possible to add another lua filter to change
any "eqref:xxxx" occurence into "\ref{xxxx}", something like this :

text = string.gsub(text, "eqref:(.-)", "\\ref{%1}")

Any hint about how to embed this in the appropriate lua pandoc filter function is welcome.

Regards

@deb75 deb75 closed this as completed Dec 24, 2024
@fncokg fncokg added the enhancement New feature or request label Dec 24, 2024
@fncokg fncokg reopened this Dec 24, 2024
@fncokg
Copy link
Owner

fncokg commented Dec 24, 2024

The eqref issue has been solved in commit 83fd3a8, via the solution you mentioned.

function Str (elem)
local text,count = string.gsub(elem.text, "eqref:(.*)", "\\ref{%1}")
if count > 0 then
return pandoc.read(text, "latex").blocks[1].content[1]
else
return elem
end
end

@fncokg
Copy link
Owner

fncokg commented Dec 24, 2024

I'm not familiar with org-mode, but all further features can be added like this (by replacing them with equivalent LaTeX macros).

@fncokg fncokg closed this as completed Dec 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants