Skip to content
This repository has been archived by the owner on Sep 21, 2023. It is now read-only.

Does not escape Reddit formatting characters #19

Open
itsthejoker opened this issue May 13, 2018 · 6 comments
Open

Does not escape Reddit formatting characters #19

itsthejoker opened this issue May 13, 2018 · 6 comments

Comments

@itsthejoker
Copy link
Member

itsthejoker commented May 13, 2018

Example: https://old.reddit.com/r/TranscribersOfReddit/comments/8j2hkw/casualuk_image_we_have_confirmation_that_the/dywfq7b/

Instead of properly escaping the first character, it's rendered and defeats the purpose. We have two options for this:

a) simply insert the entire transcription into a code block

b) create a list of all reddit snoodown specialty characters (like _, `, ~) and escape them anytime they appear.

This issue is to discuss available fixes and to then enact them.

@perryprog
Copy link
Contributor

perryprog commented May 13, 2018

Oh that's fun. I'm all for putting it in a codeblock, so users without RES don't have to recreate any characters that weren't displayed for whatever reason. (RES has a button that does that for you)

@perryprog
Copy link
Contributor

Update: after discussion on discord, we've decided that escaping will be better due to the way copy and pasting works on mobile.

@codingJWilliams
Copy link

Above PR demonstrates one way we could solve this by escaping snoodown characters

@perryprog
Copy link
Contributor

Now I'm not sure if escaping is the best, there's so many different ways to do this, but none of them are the same on each platform.

@TheLonelyGhost what do you think?

@TheLonelyGhost
Copy link
Member

TheLonelyGhost commented May 14, 2018

Honestly, you don't want me to weigh in on if I'm pro-markdown (escaping) or anti-markdown (code block). I don't really like markdown.

Simply put, I haven't seen our OCR bot ever give a transcription that should be interpreted and rendered with a markdown interpreter. Rather, it should be interpreted as plain text. The markup to guarantee it's rendered as such? A code block. Since we don't have a choice with reddit and it will always be interpreted as markdown, we're stuck with code blocks as our only option.

Rant incoming, feel free to skip.


I merely tolerate markdown as incremental progress. Why?

I'd rather the public center around it instead of 2006-era warez NFO file, full of ASCII art visually differentiating sections and markup in the document in unique and completely different ways compared to the last NFO file you saw... but that leaves us in a lesser-of-two-evils situation, not actually liking one or (god forbid) both options.

Secondly, if we didn't have code block as an option we would have to escape SO MANY CHARACTERS in SO MANY CONTEXTS.

Unescaped markdown (seemingly arbitrarily) interprets certain characters either in a markdown, html, or even LaTeX context, depending on if it uses redcarpet, kramdown, pandoc, or some other markdown interpreter. We would have to account for any...

  • ampersand (&) because HTML interpretation might screw with it
  • xml-like word (<foo>) because it might disappear as invalid HTML, thanks to the browser
  • backslashes (\) because it's the escape character itself
  • ... any number of other edge-cases due to markdown's inherently designed flexibility with nested syntaxes.

Frankly, it's a nightmare. I have to relearn the rules of how multiple syntaxes (html + markdown) are allowed to intermingle depending on if it's snoodown, github-flavored markdown, straight-up commonmark, pandoc, or some other variant. If I have to re-evaluate that every time, it's not ready for automation.

I hate markdown.


/rant

@TimJentzsch
Copy link
Contributor

I argue that the text has to be escaped twice - once for the bot post to be displayed correctly and once for the pasted text to be displayed correctly in the transcription. So I think it should first be escaped with backslashes and then put in a code block.

Out of experience, I can say that the most important things to escape would be lists and headings. E.g.

- Item 1
- Item 2

#hashtag

should become

    \- Item 1
    \- Item 2
    
    \#hashtag

which should be simple enough to achieve with regular expressions.

Also, is it working correctly on mobile an important trait? I doubt that there are many who transcribe on mobile and also use the OCR.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants