Replies: 2 comments
-
Decided to give it a go with simple regexes. function purge(string $markdownText): string
{
$purgedText = $markdownText;
// Remove emphasis (* and _), strong emphasis (** and __), and inline code (``)
$purgedText = preg_replace('/(\*\*|__|\*|_|\`\`)(.*?)\1/', '$2', $purgedText);
// Remove headers (##, ###, etc.)
$purgedText = preg_replace('/^#{1,6}\s*(.*)$/m', '$1', $purgedText);
// Remove links & images ![alt or text](url)
$purgedText = preg_replace('/!?\[([^\[\]]*?)\]\((.*?)\)/', '$1 ($2)', $purgedText);
// Remove code blocks (```, ~~~)
$purgedText = preg_replace('/(```|~~~)\R*(.*?)(\R*\1)/s', '$2', $purgedText);
$purgedText = trim($purgedText);
return $purgedText;
} |
Beta Was this translation helpful? Give feedback.
0 replies
-
You can accomplish this by parsing the Markdown into an AST and then calling |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi. Does this library support purging the markdown?
I have a text with some markdown, and I render the text (into HTML) to show it in the browser. This scenario accounts for, at least, 90% of the cases, I suppose. However, I show the same text with markdown in an Excel file. Including the markdown (along with the actual content) in the file would be unnecessary. The markdown needs to be purged, and only the actual content should be printed to the Excel file.
For example:
'# Hello World!'
->'Hello World!'
'**Hello World!**'
->'Hello World!'
If it doesn't support purging (out-of-the-box), can it still be done? The library has an AST, so I believe it can be done. Can someone (familiar with the library and ASTs) give an example as a starting point?
Beta Was this translation helpful? Give feedback.
All reactions