Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue parsing Facebook img/emojis #548

Closed
2 of 7 tasks
AliLezamaIgrat opened this issue Jan 11, 2023 · 1 comment
Closed
2 of 7 tasks

Issue parsing Facebook img/emojis #548

AliLezamaIgrat opened this issue Jan 11, 2023 · 1 comment

Comments

@AliLezamaIgrat
Copy link

Hey guys, first of all excellent work with this lib have been very useful for my team, for quite some time.

I'm currently facing an issue while trying to parse a simple HTML text:

<div>
<div>This is my test to a Facebook emoji:</div>
<div><img src="https://static.xx.fbcdn.net/images/emoji.php/v9/t71/2/16/1f967.png" alt="text" width="24" height="24"></div>
</div>

Please provide as much information about where the but is located or what you were using:

  • Parser
  • HtmlRenderer
  • Formatter
  • FlexmarkHtmlParser
  • DocxRenderer
  • PdfConverterExtension
  • extension(s)

To Reproduce

The actual code I'm using is from the version 0.40.16, but I also tried using update 0.50.50 and got the same result. Then took a look to your repo and it seems issue is still there even if I update to latest version

  • Code 0.40.16
public static String toMarkdown(final String html) {
        // Strip table tags
        Document doc  = Jsoup.parse(html, "", Parser.xmlParser());
        doc.select("table, tr, td").unwrap();
        doc.outputSettings().outline(true);
        final String stripped = doc.html();
        return FlexmarkHtmlParser.parse(stripped,1, new MutableDataSet()
                        .set(FlexmarkHtmlParser.BR_AS_EXTRA_BLANK_LINES, false)
                        .set(FlexmarkHtmlParser.SKIP_CHAR_ESCAPE, true)
                        .set(FlexmarkHtmlParser.EXT_INLINE_INS, ExtensionConversion.TEXT)
                        .set(FlexmarkHtmlParser.EXTRACT_AUTO_LINKS, false)
        );

    }
  • Code 0.50.50
    public static String toMarkdown(final String html) {
        // Strip table tags
        Document doc  = Jsoup.parse(html, "", Parser.xmlParser());
        doc.select("table, tr, td").unwrap();
        doc.outputSettings().outline(true);
        final String stripped = doc.html();
        MutableDataSet options = new MutableDataSet();
        options.set(FlexmarkHtmlConverter.BR_AS_EXTRA_BLANK_LINES, false)
                .set(FlexmarkHtmlConverter.SKIP_CHAR_ESCAPE, true)
                .set(FlexmarkHtmlConverter.EXT_INLINE_INS, ExtensionConversion.TEXT)
                .set(FlexmarkHtmlConverter.EXTRACT_AUTO_LINKS, false);
        return FlexmarkHtmlConverter.builder(options).build().convert(html);

    }

The error log we're getting is:

stack_trace:java.lang.NullPointerException: null
	at com.vladsch.flexmark.util.html.FormattingAppendableImpl.append(FormattingAppendableImpl.java:668) ~[na:na]
	at com.vladsch.flexmark.convert.html.FlexmarkHtmlParser.processImg(FlexmarkHtmlParser.java:1026) ~[na:na]
	at com.vladsch.flexmark.convert.html.FlexmarkHtmlParser.processElement(FlexmarkHtmlParser.java:519) ~[na:na]
	at com.vladsch.flexmark.convert.html.FlexmarkHtmlParser.processHtmlTree(FlexmarkHtmlParser.java:489) ~[na:na]
	at com.vladsch.flexmark.convert.html.FlexmarkHtmlParser.processSpan(FlexmarkHtmlParser.java:1452) ~[na:na]
	at com.vladsch.flexmark.convert.html.FlexmarkHtmlParser.processElement(FlexmarkHtmlParser.java:533) ~[na:na]
	at com.vladsch.flexmark.convert.html.FlexmarkHtmlParser.processHtmlTree(FlexmarkHtmlParser.java:489) ~[na:na]
	at com.vladsch.flexmark.convert.html.FlexmarkHtmlParser.processHtmlTree(FlexmarkHtmlParser.java:470) ~[na:na]
	at com.vladsch.flexmark.convert.html.FlexmarkHtmlParser.processDiv(FlexmarkHtmlParser.java:1411) ~[na:na]
	at com.vladsch.flexmark.convert.html.FlexmarkHtmlParser.processElement(FlexmarkHtmlParser.java:515) ~[na:na]
	at com.vladsch.flexmark.convert.html.FlexmarkHtmlParser.processHtmlTree(FlexmarkHtmlParser.java:489) ~[na:na]
	at com.vladsch.flexmark.convert.html.FlexmarkHtmlParser.processHtmlTree(FlexmarkHtmlParser.java:470) ~[na:na]
	at com.vladsch.flexmark.convert.html.FlexmarkHtmlParser.processDiv(FlexmarkHtmlParser.java:1411) ~[na:na]
	at com.vladsch.flexmark.convert.html.FlexmarkHtmlParser.processElement(FlexmarkHtmlParser.java:515) ~[na:na]
	at com.vladsch.flexmark.convert.html.FlexmarkHtmlParser.processHtmlTree(FlexmarkHtmlParser.java:489) ~[na:na]
	at com.vladsch.flexmark.convert.html.FlexmarkHtmlParser.processHtmlTree(FlexmarkHtmlParser.java:470) ~[na:na]
	at com.vladsch.flexmark.convert.html.FlexmarkHtmlParser.processDiv(FlexmarkHtmlParser.java:1411) ~[na:na]
	at com.vladsch.flexmark.convert.html.FlexmarkHtmlParser.processElement(FlexmarkHtmlParser.java:515) ~[na:na]
	at com.vladsch.flexmark.convert.html.FlexmarkHtmlParser.processHtmlTree(FlexmarkHtmlParser.java:489) ~[na:na]
	at com.vladsch.flexmark.convert.html.FlexmarkHtmlParser.processHtmlTree(FlexmarkHtmlParser.java:470) ~[na:na]
	at com.vladsch.flexmark.convert.html.FlexmarkHtmlParser.parse(FlexmarkHtmlParser.java:326) ~[na:na]
	at com.vladsch.flexmark.convert.html.FlexmarkHtmlParser.parse(FlexmarkHtmlParser.java:405) ~[na:na]
	at com.en.util.StringUtils.toMarkdown(StringUtils.java:1225) ~[na:na]

And what seems to be causing this error is the validation being made here, since is trying to use the property shortcut, when this property is being null when getting the info for the emoji.

if (emoji != null) {
out.append(':').append(emoji.shortcut).append(':');
} else {

We currently add a validation before we use your lib parser, so we can get rid of those NullPointers.

 if (emoji != null && emoji.shortcut == null) {
      Element emojiDiv = el.parent().appendElement("div");
      final String emojiCode = emoji.unicodeSampleFile.split("[.]")[0];
      emojiDiv.append("&#x" + emojiCode + ";");
      el.remove();
}

If you could point to the right direction if something is not being configure properly to handle these emojis or add an specific case to handle this NullPointers would be appreciated.

DamnedElric added a commit to DamnedElric/flexmark-java that referenced this issue Mar 14, 2023
@DamnedElric
Copy link
Contributor

Noticed that someone else created the issue as I was fixing the bug. PR should fix the issue. The fix simply falls back to regular image rendering in case the emoji does not have a corresponding shortcut.

@vsch vsch closed this as completed in c31cc03 Apr 30, 2023
vsch added a commit that referenced this issue Apr 30, 2023
Fix #548: Converting html images fails if the image refers to an emoji without a shortcut
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants