Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pandoc filters are not working anymore with quarto #5392

Closed
4 of 5 tasks
FabienSe opened this issue May 3, 2023 · 26 comments
Closed
4 of 5 tasks

Pandoc filters are not working anymore with quarto #5392

FabienSe opened this issue May 3, 2023 · 26 comments
Assignees
Labels
bug Something isn't working triaged-to Issues that were not self-assigned, signals that an issue was assigned to someone.
Milestone

Comments

@FabienSe
Copy link

FabienSe commented May 3, 2023

Bug description

I was using quarto with a filter for pandoc, it was working great but it is not working anymore.
I am working on a custom filter developed with panflute to render plantuml diagram in image. A library used to develop pandoc filter with python.
A similar filter can be found here. I was trying to improve it.
I got the same error with this filter too.

I was using it with the filters parameter like this in a index.qmd file:

    ---
    execute:
      echo: false
    pagination_next: null
    filters:
      - custom-filter
    editor:
      render-on-save: true
    ---
    
    # Welcome for fun
    
    ```plantuml
    @startuml
    actor "User" as user
    @enduml
    ```

This file is in a quarto project to render it in a docusaurus website.

The code block was rendered into image by my custom filter. I assume the filter was passed to pandoc directly because the filter is working great with --filter of pandoc cli.

Now I got the following error:

> quarto preview /home/user/project/template_docs/docs/index.qmd --no-browser --no-watch-inputs
pandoc 
  to: >-
    markdown_strict+raw_html+all_symbols_escapable+backtick_code_blocks+fenced_code_blocks+space_in_atx_header+intraword_underscores+lists_without_preceding_blankline+shortcut_reference_links+autolink_bare_uris+emoji+footnotes+gfm_auto_identifiers+pipe_tables+strikeout+task_lists+tex_math_dollars+pipe_tables+tex_math_dollars+header_attributes+raw_html+all_symbols_escapable+backtick_code_blocks+fenced_code_blocks+space_in_atx_header+intraword_underscores+lists_without_preceding_blankline+shortcut_reference_links
  output-file: index.md
  standalone: true
  default-image-extension: png
  wrap: none
  html-math-method: webtex
  
metadata
  pagination_next: null
  editor:
    render-on-save: true
  
Error running filter /opt/quarto/share/filters/main.lua:
Error running filter /home/user/project/template_docs/docs/custom-filter:
Could not find executable /home/user/project/template_docs/docs/custom-filter
stack traceback:
        /opt/quarto/share/filters/main.lua:4026: in function </opt/quarto/share/filters/main.lua:4005>
        [C]: in ?
        [C]: in method 'walk'
        /opt/quarto/share/filters/main.lua:171: in function 'run_emulated_filter'
        /opt/quarto/share/filters/main.lua:449: in local 'callback'
        /opt/quarto/share/filters/main.lua:454: in upvalue 'run_emulated_filter_chain'
        /opt/quarto/share/filters/main.lua:495: in function </opt/quarto/share/filters/main.lua:476>
stack traceback:
        /opt/quarto/share/filters/main.lua:171: in function 'run_emulated_filter'
        /opt/quarto/share/filters/main.lua:449: in local 'callback'
        /opt/quarto/share/filters/main.lua:454: in upvalue 'run_emulated_filter_chain'
        /opt/quarto/share/filters/main.lua:495: in function </opt/quarto/share/filters/main.lua:476>

Operating system : WSL on Ubuntu 20.04.6
Quarto check:

> quarto check

[✓] Checking versions of quarto binary dependencies...
      Pandoc version 3.1.1: OK
      Dart Sass version 1.55.0: OK
[✓] Checking versions of quarto dependencies......OK
[✓] Checking Quarto installation......OK
      Version: 1.3.340
      Path: /opt/quarto/bin

[✓] Checking basic markdown render....OK

[✓] Checking Python 3 installation....OK
      Version: 3.11.3
      Path: /home/user/.pyenv/versions/3.11.3/bin/python3
      Jupyter: (None)

      Jupyter is not available in this Python installation.
      Install with python3 -m pip install jupyter

[✓] Checking R installation...........(None)

      Unable to locate an installed version of R.
      Install R from https://cloud.r-project.org/

Checklist

  • Please include a minimal, fully reproducible example in a single .qmd file? Please provide the whole file rather than the snippet you believe is causing the issue.
  • Please format your issue so it is easier for us to read the bug report.
  • Please document the RStudio IDE version you're running (if applicable), by providing the value displayed in the "About RStudio" main menu dialog?
  • Please document the operating system you're running. If on Linux, please provide the specific distribution.
  • Please provide the output of quarto check so we know which version of quarto and its dependencies you're running.
@FabienSe FabienSe added the bug Something isn't working label May 3, 2023
@cderv cderv added the triaged-to Issues that were not self-assigned, signals that an issue was assigned to someone. label May 3, 2023
@cderv
Copy link
Collaborator

cderv commented May 3, 2023

Thanks for the report. custom-filter is a python script right ? Did you try adding an extension ?

I pinged @cscheid who knows more about our filter execution layer.

@FabienSe
Copy link
Author

FabienSe commented May 3, 2023

custom-filter is a python package which is installed on my computer.

Test can be also made with pandoc-plantuml-filter which I based my custom-filter.

> pip list
Package                Version
---------------------- -------
argcomplete            3.0.5
click                  8.1.3
packaging              23.0
pandoc-plantuml-filter 0.1.2
pandocfilters          1.5.0
panflute               2.3.0
pip                    23.0.1
pipx                   1.2.0
PyYAML                 6.0
setuptools             65.5.0
userpath               1.8.0

This file was working on previous version of quarto.

    ---
    execute:
      echo: false
    pagination_next: null
    filters:
      - pandoc-plantuml
    editor:
      render-on-save: true
    ---
    
    # Welcome for fun
    
    ```plantuml
    @startuml
    actor "User" as user
    @enduml
    ```

Got the same error with it.

But the command following command is working:

> pandoc -s docs/index.qmd -o docs/index.html --filter pandoc-plantuml
[WARNING] Could not deduce format from file extension .qmd
  Defaulting to markdown
Created directory plantuml-images
Created image plantuml-images/8e377f6639c577b8eec950fe2b906a90812ee344.svg
[WARNING] This document format requires a nonempty <title> element.
  Defaulting to 'index' as the title.
  To specify a title, use 'title' in metadata or --metadata title="...".

@cscheid
Copy link
Collaborator

cscheid commented May 3, 2023

Note the error message:

Could not find executable /home/user/project/template_docs/docs/custom-filter

Panflute should work (there are a number of caveats wrt custom AST nodes though); the issue here is that we are not finding the filter where we expect to, and we no longer resolve arbitrary executable files (like pandoc does).

We can reconsider that decision, but the easiest way for you to solve it right now is to add (eg) a python script to the path that then calls the files on your path.

@cderv
Copy link
Collaborator

cderv commented May 3, 2023

and we no longer resolve arbitrary executable files (like pandoc does).

This was my missing part. I wasn't ware of that change. Thanks for the clarification

@cscheid
Copy link
Collaborator

cscheid commented May 3, 2023

This was my missing part. I wasn't ware of that change. Thanks for the clarification

We have to wrap user filters so they can handle custom AST nodes, and that requires special code. The last time I thought through this I came to the conclusion that it would be hard to do this portably in Lua, but I could be wrong.

I also think that allowing a filter from the user's PATH goes against quarto's philosophy of keeping everything important in the project source, so I'm actually happy with this new restriction. (Otherwise, the result of rendering this file would be non-obviously dependent on the surrounding environment.)

@cderv
Copy link
Collaborator

cderv commented May 3, 2023

I also think that allowing a filter from the user's PATH goes against quarto's philosophy of keeping everything important in the project source, so I'm actually happy with this new restriction. (Otherwise, the result of rendering this file would be non-obviously dependent on the surrounding environment.)

Yes I agree - it is just a big difference with how it works with Pandoc itself. So I understand the user-side questioning on this for a "pandoc wrapper".

Our extension system support non-lua filter already right ? It seems to be case as in our doc.

Extensions seems to be the good way to user (any) filter with quarto. if it is not possible already, should be adapt our extension to resolve that

@FabienSe did you try using the Extension form to use your filter with a Quarto project ?

On Lua filter side, some new org is following the _extension organisation to be quarto compatible for example: https://github.com/pandoc-ext/abstract-section

@FabienSe
Copy link
Author

FabienSe commented May 3, 2023

Thanks @cderv and @cscheid for your answers.

the easiest way for you to solve it right now is to add (eg) a python script to the path that then calls the files on your path.

I succeed to do it with the following python script named custom-filter.py located at the called path:

from custom_filter.main import main
from panflute import Doc

def test(doc: Doc = None) -> Doc:
    return main(doc=doc)

if __name__ == "__main__":
    test()

The documentation metadata seems to be wrong. Indeed the provided document format is
$DocMetadata(doc_format='/opt/quarto/share/filters/customwriter/customwriter.lua') but it is not a problem for me now because I do not use it to do specific treatment according to the format of the document.

I also changed the filter definition to custom-filter.py in the qmd file.

did you try using the Extension form to use your filter with a Quarto project ?

I did not tested it yet.
I have to say I liked panflute solution to be able to write pandoc filter with python because I do not know how to use Lua.
I would like to use Quarto extension but I did not see how to use non-lua filter in the linked documentation.

@cderv
Copy link
Collaborator

cderv commented May 3, 2023

I would like to use Quarto extension but I did not see how to use non-lua filter in the linked documentation.

Yes we don't document it with example, but we say

You can write Pandoc filters using Lua (via Pandoc’s built-in Lua interpreter) or using any other language using a JSON representation of the Pandoc AST piped to/from an external process. We strongly recommend using Lua Filters

So I am assuming we support already other type of filter. @dragonstyle do we support non lua filter in extension ? Do we have an example somewhere ?

@cscheid
Copy link
Collaborator

cscheid commented May 3, 2023

So I am assuming we support already other type of filter. @dragonstyle do we support non lua filter in extension ? Do we have an example somewhere ?

Extensions won't know the difference, this distinction happens entirely inside of the Lua filter chain now. (quartodoc used to work like that but I helped those folks convert it to a Lua filter exactly because of these limitations.)

@cderv
Copy link
Collaborator

cderv commented May 3, 2023

I also think that allowing a filter from the user's PATH goes against quarto's philosophy of keeping everything important in the project source

I was thinking that using Extension would be easier to find non-lua filter as it would be relative to the project, and not from the user's PATH.

I understand now:

  • Quarto does not resolve the filter in the PATH as pandoc would do, not non-lua filter needs to be loaded a bit differently
  • Though they can be used with Quarto using a specific script
  • However, using an extension to locally provide the filter will not help directly that much

Not sure it is all crystal clear to me yet. Anyway, as always you're precision helped me understand the changes better.

Thanks!

@cscheid
Copy link
Collaborator

cscheid commented May 3, 2023

The documentation metadata seems to be wrong. Indeed the provided document format is
$DocMetadata(doc_format='/opt/quarto/share/filters/customwriter/customwriter.lua') but it is not a problem for me now because I do not use it to do specific treatment according to the format of the document.

Unfortunately that metadata is not wrong, and the behavior is currently inevitable; this is the output format in quarto (because reasons). That's one of the problems with using non-Lua filters in quarto. We have Lua APIs for checking formats, but we don't yet have language-agnostic APIs in filters. We intend to do that, but not in the near future.

@cscheid cscheid added this to the v1.5 milestone May 3, 2023
@FabienSe
Copy link
Author

FabienSe commented May 4, 2023

Thanks again for your answers @cderv and @cscheid.

Short term, I will keep the quick solution with an additional python script which call my custom script.

I looked at the discussion at quartodoc regarding the limitation of non-lua filter.
Later to improve the usage of my filter and transform it to a quarto extension, I will try to have a look at migrating to Lua and learn this new language.

Is it possible to do stuff like create folder or file, call external command line in Lua ?

@cderv
Copy link
Collaborator

cderv commented May 4, 2023

Is it possible to do stuff like create folder or file, call external command line in Lua ?

Yes it is possible Look at our doc https://quarto.org/docs/extensions/lua-api.html and the other linked doc

  • You can access Lua library like the io or file one

    The I/O library provides two different styles for file manipulation: one uses implicit file handles and the other explicit handles.

  • You can access Pandoc API also which as a pandoc.pipe() function to run some program

  • We also expose with Quarto a specific LUA API for better integration with Quarto feature.

In addition to Quartodoc example, here is another: shinylive calls a custom program (from a python package) from within the LUA filter

You can do a lot with Lua !

@stefanv
Copy link

stefanv commented May 4, 2023

I can confirm that this is a backward incompatible change in Quarto that has not been advertised. We had filters installed via a Python package, and that simply had to be on the path. Those filters used to work, but no longer do.

We will also follow the Python wrapper approach for now, but this is perhaps a common enough use case that it warrants a section in the docs?

@matthew-brett
Copy link

matthew-brett commented May 6, 2023

Just to say - the ability to use Panflute for filters was a big sell for us, as Python programmers using Quarto. It made it easy and pleasant to switch from e.g. Jupyter-book. Forcing the use of Lua would be a significant barrier to uptake, I suspect - because filters are so important to customising output.

@cscheid
Copy link
Collaborator

cscheid commented May 8, 2023

Let me be clear: you can 100% still use panflute. The only change is that binaries on your path are no longer interpreted as JSON filters. All you need to do is provide a filter.py file that does the processing you want.

@albert-ying

This comment was marked as resolved.

@cscheid

This comment was marked as resolved.

@matthew-brett
Copy link

matthew-brett commented May 9, 2023

I opened a new issue for another nil error above - maybe related, maybe not:

@ChrisJefferson
Copy link
Contributor

ChrisJefferson commented Dec 17, 2023

Is there an easy / documented way to wrap an executable pandoc filter?

I want to use the filter pandoc-katex, which is a rust program. I can run it by adding --filter pandoc-katex to the render command line, but I could figure out how to put it in _extensions, or if that's possible?

Would it be reasonable to look in _extensions for a raw executable / symbolic link?

@cscheid
Copy link
Collaborator

cscheid commented Dec 18, 2023

@ChrisJefferson Interesting. Did you try just adding pandoc-katex to your extension filter declaration (instead of foo.lua). The following syntax works for sure in single documents:

---
title: Test Python Filter
filters:
  - type: json
    path: behead.py
---

## This is a test

## This should be em

And I think we use the same codepath in extensions. Here, behead.py is a generic +x file that we run through a JSON filter.

@ChrisJefferson
Copy link
Contributor

It works if I write path: _extensions/pandoc-katex, or if I put pandoc-katex in the same directory as my source and run pandoc-katex. In fact (and this is what I'll use for now, unless it's horrible, the easiest thing to write is:

- _extensions/pandoc-katex

Then put a symlink to pandoc-katex in my _extensions directory.

I'd prefer not to put filters in the same directory as my .qmd files, but it does work.

@cscheid
Copy link
Collaborator

cscheid commented Dec 19, 2023

Hm. This looks like a bug on our side.

@jonassmedegaard
Copy link

Do I understand it correctly, that the following should work currently for Rust filters?:

---
title: Test Rust Filter
filters:
  - _extensions/pandoc-katex
---
Foo

I ask because it is unclear from the conversation above it that was the exact syntax proposed, and also if it was confirmed working or not. And because when I try that syntax with pandoc-filter-diagram it fails like this:

FATAL (/opt/quarto/share/filters/main.lua:3369) An error occurred:
Could not run /home/jonas/Projects/RUC/PROJECTS/piller/report/../_extensions/pandoc-filter-diagram as a JSON filter.
Please make sure the file exists and is executable.

Did you intend 'pandoc-filter-diagram' as a Lua filter in an extension?
If so, make sure you've spelled the name of the extension correctly.

The original Pandoc error follows below.
Error running filter /home/jonas/Projects/RUC/PROJECTS/piller/report/../_extensions/pandoc-filter-diagram:
Filter returned error status 101
```

@ChrisJefferson
Copy link
Contributor

ChrisJefferson commented Dec 28, 2023

To be clear the following message is not recommend quarto practice, might not work and may break in future, because I'm just some person, getting my filter working, not giving official quarto guidance. Long term, I imagine there may well be better ways of doing this.

Having said that, here's what I did in your situation, I'm assuming you've globally installed pandoc-filter-diagram, you can check that by running which pandoc-filter-diagram, on my computer I get:

> which pandoc-filter-diagram
/home/caj/.cargo/bin/pandoc-filter-diagram

You can pop a link to it into the current directory (ln -s = make symbolic link, $(...) puts the result of that command here, you could cut+paste it instead if you like).

ln -s $(which pandoc-filter-diagram) ./pandoc-filter-diagram

Then if you just refer to it, like this:

---
title: "Example"
filters:
  - pandoc-filter-diagram
---
## Heading

The _extensions, is for neatness I did:

mkdir _extensions
ln -s $(which pandoc-filter-diagram) _extensions/pandoc-filter-diagram

Just so the pandoc filter diagram link was in a subdirectory, rather than mixed in with my .qmd files.

@cscheid
Copy link
Collaborator

cscheid commented Jun 11, 2024

I really don't think there's a bug here; if there is one, I'd like to ask folks to open a new issue.

Here's a specific example of using a binary inside a Quarto filter extension:

$ cat example.qmd
---
title: "Issue-5392 Example"
filters:
  - issue-5392
---

## Heading

This filter replaces the entire document with a pre-computed "Foo. This is silly." document.
$ cat _extensions/issue-5392/_extension.yml
title: Issue-5392
author: Carlos Scheidegger
version: 1.0.0
quarto-required: ">=99.9.0"
contributes:
  filters:
    - an_executable_filter
$ cat _extensions/issue-5392/an_executable_filter
#!/usr/bin/env python3

import sys
sys.stdout.write('{"pandoc-api-version":[1,23,1],"meta":{},"blocks":[{"t":"Header","c":[2,["foo",[],[]],[{"t":"Str","c":"Foo"}]]},{"t":"Para","c":[{"t":"Str","c":"This"},{"t":"Space"},{"t":"Str","c":"is"},{"t":"Space"},{"t":"Str","c":"silly."}]}]}')
$ quarto render example.qmd --to md -o -
pandoc -o /var/folders/nm/m64n9_z9307305n0xtzpp54m0000gn/T/quarto-session1a14ffba7d4ce1c0/c58737780b146d7f/281a41bec75ca8b3.md
  to: >-
    markdown_strict+raw_html+all_symbols_escapable+backtick_code_blocks+fenced_code_blocks+space_in_atx_header+intraword_underscores+lists_without_preceding_blankline+shortcut_reference_links
  standalone: true
  default-image-extension: png

metadata
  title: Issue-5392 Example



## Foo

This is silly.

In this example, an_executable_filter has 755 permissions so it can be executed. Quarto is successfully finding that binary inside the extension folder and running it.

@cscheid cscheid closed this as completed Jun 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triaged-to Issues that were not self-assigned, signals that an issue was assigned to someone.
Projects
None yet
Development

No branches or pull requests

8 participants