-
Notifications
You must be signed in to change notification settings - Fork 5
Does not escape Reddit formatting characters #19
Comments
Oh that's fun. I'm all for putting it in a codeblock, so users without RES don't have to recreate any characters that weren't displayed for whatever reason. (RES has a button that does that for you) |
Update: after discussion on discord, we've decided that escaping will be better due to the way copy and pasting works on mobile. |
Above PR demonstrates one way we could solve this by escaping snoodown characters |
Now I'm not sure if escaping is the best, there's so many different ways to do this, but none of them are the same on each platform. @TheLonelyGhost what do you think? |
Honestly, you don't want me to weigh in on if I'm pro-markdown (escaping) or anti-markdown (code block). I don't really like markdown. Simply put, I haven't seen our OCR bot ever give a transcription that should be interpreted and rendered with a markdown interpreter. Rather, it should be interpreted as plain text. The markup to guarantee it's rendered as such? A code block. Since we don't have a choice with reddit and it will always be interpreted as markdown, we're stuck with code blocks as our only option. Rant incoming, feel free to skip. I merely tolerate markdown as incremental progress. Why? I'd rather the public center around it instead of 2006-era warez NFO file, full of ASCII art visually differentiating sections and markup in the document in unique and completely different ways compared to the last NFO file you saw... but that leaves us in a lesser-of-two-evils situation, not actually liking one or (god forbid) both options. Secondly, if we didn't have code block as an option we would have to escape SO MANY CHARACTERS in SO MANY CONTEXTS. Unescaped markdown (seemingly arbitrarily) interprets certain characters either in a markdown, html, or even LaTeX context, depending on if it uses redcarpet, kramdown, pandoc, or some other markdown interpreter. We would have to account for any...
Frankly, it's a nightmare. I have to relearn the rules of how multiple syntaxes (html + markdown) are allowed to intermingle depending on if it's snoodown, github-flavored markdown, straight-up commonmark, pandoc, or some other variant. If I have to re-evaluate that every time, it's not ready for automation. I hate markdown. /rant |
I argue that the text has to be escaped twice - once for the bot post to be displayed correctly and once for the pasted text to be displayed correctly in the transcription. So I think it should first be escaped with backslashes and then put in a code block. Out of experience, I can say that the most important things to escape would be lists and headings. E.g.
should become
which should be simple enough to achieve with regular expressions. Also, is it working correctly on mobile an important trait? I doubt that there are many who transcribe on mobile and also use the OCR. |
Example: https://old.reddit.com/r/TranscribersOfReddit/comments/8j2hkw/casualuk_image_we_have_confirmation_that_the/dywfq7b/
Instead of properly escaping the first character, it's rendered and defeats the purpose. We have two options for this:
a) simply insert the entire transcription into a code block
b) create a list of all reddit snoodown specialty characters (like
_
, `,~
) and escape them anytime they appear.This issue is to discuss available fixes and to then enact them.
The text was updated successfully, but these errors were encountered: