Pandoc 2.x renders images' alternative texts in an inaccessible fashion #6491

jmuheim · 2020-06-30T11:54:23Z

As stated on StackOverflow (https://stackoverflow.com/questions/62639927/pandoc-2-x-renders-images-alternative-texts-in-an-inaccessible-fashion?noredirect=1#comment110781365_62639927), Pandoc 2.x renders images' alternative texts in an inaccessible fashion. I was told there to ask for a bugfix here.

Here's the original post:

Since I upgraded from Pandoc v1.19 to 2.9, decorative images are not exported as expected anymore.

First of all, when generating HTML from ![](test.jpg), in v1.19 a <p class="figure"> structure was wrapped around the image, but now it's only a <p>:

<p>
  <img src="test.jpg">
</p>

This makes it harder to style in line with other images that have an alternative text.

But what's really a problem here: there's no alt="" attribute produced anymore! This means that e.g. screen readers will not recognise this as a decorative image anymore.

So let's see what happens to an image with an actual alternative text, e.g. when generating HTML from ![Hello](test.jpg):

<div class="figure">
  <img src="test.jpg" alt="">
  <p class="caption">Hello</p>
</div>

Here we get a class="figure" in the surrounding element, but now it's a <div> instead of a <p> (I don't bother too much about this, but again, it makes it harder to style everything the same).

What again is a big problem though is the fact that the alt attribute is now set empty: this prevents screen readers from perceiving them at all, which is horribly wrong! I guess that Pandoc concludes that having alternative text and caption would be redundant, which is correct, and that the caption below would be the right thing to show - which it is not.

The right structure would look something like this:

<div class="figure">
  <img src="test.jpg" alt="Hello"><!-- Leave the alternative text on the image -->
  <p class="caption" aria-hidden="true">Hello</p><!-- Hide the redundant visual alternative text from screen readers -->
</div>

Any reason why this behaviour would make sense? Can it be changed somehow? Otherwise I will have to fiddle around with some post-processing JavaScript...

The text was updated successfully, but these errors were encountered:

tarleb · 2020-06-30T12:41:03Z

I started to implement this, but was given pause by the fact that this would cause pandoc to produce invalid xhtml when targeting HTML4. @jmuheim, do you know of a good workaround for HTML4?

On the other hand, we already produce invalid xhtml for any document which includes code blocks, as line numbers contain the aria-hidden="true" attribute.

jmuheim · 2020-06-30T12:54:48Z

Interesting. You mean because aria-hidden has a dash in the attribute name, right?

I don't know of a good technical work around. I could think of doing something like this which would work in some situations:

<figure>
  <img src="..." alt="See below" />
  <figcaption>Bla bla bla</figcaption>
</figure>

But this isn't really a general solution.

In my honest opinion though it is so much more important not to programmatically exclude users (especially users with special needs who already are suffering a lot of awkwardnesses), compared to having minor code invalidities. And as you're stating that there is already some aria-hidden in code blocks in HTML4, we should definitely not bother to add them for alternative texts.

mb21 · 2020-06-30T13:02:23Z

Is this issue only about HTML4 output, because I think much of the reason we do things the way we do them is because in HTML5 (which is the default), we produce a figure tag...

I guess that Pandoc concludes that having alternative text and caption would be redundant,

yes.

and that the caption below would be the right thing to show - which it is not

well.. why not? HTML5 output is:

<figure>
  <img src="foo.jpg" alt="" />
  <figcaption>bar</figcaption>
</figure>

jmuheim · 2020-06-30T13:10:47Z

well.. why not? HTML5 output is:

<figure>
  <img src="foo.jpg" alt="" />
  <figcaption>bar</figcaption>
</figure>

As far as I know, screen readers will always treat images with empty alt attribute as purely decorative, so the user will never know about them. For instance, they will not show them in a list of images or any other functionality that screen readers offer.

While it may seem counter intuitive to non-blind people, blind people also make use of images, e.g. saving them to their hard drive or uploading them to social media portals. So we should never prevent them to access the same elements like others do.

tarleb · 2020-06-30T13:16:46Z

Furthermore, here is what MDN says about the alt attribute.

Omitting alt altogether indicates that the image is a key part of the content and no textual equivalent is available. Setting this attribute to an empty string (alt="") indicates that this image is not a key part of the content (it’s decoration or a tracking pixel), and that non-visual browsers may omit it from rendering. Visual browsers will also hide the broken image icon if the alt is empty and the image failed to display.

Figures are rarely just decoration, and I think leaving users in the dark about the existence of an image seems not good.

mb21 · 2020-06-30T13:37:41Z

Pretty sure we actually changed this to the way it's currently after the request of a blind person generating ePub.... but cannot find the issue anymore...

tarleb · 2020-06-30T15:25:24Z

Found the issue: #4737

jgm · 2020-06-30T15:55:34Z

I didn't know til now that hyphenated attribute names aren't allowed in XHTML. Interesting.
We do try to create polyglot HTML, and this is especially important because we use the HTML writer in creating EPUBs. EPUB contents are supposed to be XHTML. On the other hand, I haven't heard any reports that the hyphenated aria- attributes have caused problems with any e-readers or with epub validation.

Screen readers read an image's `alt` attribute and the figure caption, both of which come from the same source in pandoc. The figure caption is hidden from screen readers with the `aria-hidden` attribute. This improves accessibility. For HTML4, where `aria-hidden` is not allowed, pandoc still uses an empty `alt` attribute to avoid duplicate contents. Closes: jgm#6491

tarleb · 2020-07-01T13:12:56Z

I tried two EPUB2 validators with current pandoc output, and they fail if the input contains a syntax highlighted code block. The PR therefore leaves the HTML4/XHTML output as it was, and just updates HTML5 output to include the suggested changes.

jmuheim · 2020-07-19T07:02:38Z

Any news on this? I will fix the issue on my side with some (ugly) JavaScript, looking out for the inaccessible code created by Pandoc and fixing it.

jmuheim · 2020-07-19T09:11:54Z

Just for the records: Instead of using JavaScript, I decided to put it into my markdown method in Ruby. This is faster, cleaner, and better suited for automated testing.

If anyone else needs an inspiration for a similar thing:

module MarkdownHelper
  def markdown(string)
    html = PandocRuby.convert(string).strip
    
    nokogiri = Nokogiri::HTML::DocumentFragment.parse(html)

    nokogiri = clone_alt_into_img_and_hide_figcaption_from_sr(nokogiri)
    nokogiri = add_empty_alt_to_decorative_img(nokogiri)

    nokogiri.to_html.html_safe
  end

  # Pandoc removes the content of an image's alt attribute, as the text is also available inside figcaption (to avoid screen reader redundancies). This is terrible though, as this renders the image itself invisible to screen readers. So we clone the alternative text back into the alt attribute again, and place an aria-hidden on figcaption.
  #
  # See https://github.com/jgm/pandoc/issues/6491
  def clone_alt_into_img_and_hide_figcaption_from_sr(nokogiri)
    nokogiri.css('figure').map do |figure|
      img        = figure.at_css('img')
      figcaption = figure.at_css('figcaption')

      img['alt'] = figcaption.text
      figcaption['aria-hidden'] = true
    end

    nokogiri
  end

  # Pandoc doesn't add an empty alt-attribute if the alternative text is left empty. Because screen readers announce the file name in this situation, we add an empty alt-attribute here.
  def add_empty_alt_to_decorative_img(nokogiri)
    nokogiri.css('img:not([alt])').map do |img|
      img['alt'] = ''
    end

    nokogiri
  end
end

tarleb mentioned this issue Jul 1, 2020

HTML writer: improve alt-text/caption handling for HTML5 #6495

Merged

jgm closed this as completed in #6495 Jul 19, 2020

MurakamiShinyu mentioned this issue Apr 3, 2021

Screen reader reads the same text twice because figcaption is generated from img alt. vivliostyle/vfm#75

Closed

PetraOleum mentioned this issue Oct 16, 2024

[Bug] aria-hidden elements within post text are affected by css rule intended for UI elements FreshRSS/FreshRSS#6909

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pandoc 2.x renders images' alternative texts in an inaccessible fashion #6491

Pandoc 2.x renders images' alternative texts in an inaccessible fashion #6491

jmuheim commented Jun 30, 2020

tarleb commented Jun 30, 2020

jmuheim commented Jun 30, 2020

mb21 commented Jun 30, 2020

jmuheim commented Jun 30, 2020

tarleb commented Jun 30, 2020

mb21 commented Jun 30, 2020

tarleb commented Jun 30, 2020

jgm commented Jun 30, 2020

tarleb commented Jul 1, 2020

jmuheim commented Jul 19, 2020

jmuheim commented Jul 19, 2020

Pandoc 2.x renders images' alternative texts in an inaccessible fashion #6491

Pandoc 2.x renders images' alternative texts in an inaccessible fashion #6491

Comments

jmuheim commented Jun 30, 2020

tarleb commented Jun 30, 2020

jmuheim commented Jun 30, 2020

mb21 commented Jun 30, 2020

jmuheim commented Jun 30, 2020

tarleb commented Jun 30, 2020

mb21 commented Jun 30, 2020

tarleb commented Jun 30, 2020

jgm commented Jun 30, 2020

tarleb commented Jul 1, 2020

jmuheim commented Jul 19, 2020

jmuheim commented Jul 19, 2020