Skip to content

Commit

Permalink
Re-implement abbreviations in Markdown reader.
Browse files Browse the repository at this point in the history
Note that with this new implementation, you can defeat the
abbreviationization by using two spaces after the period.

This commit removes support for moving abbreviations
after soft breaks (#4635), so the abbreviation support won't
work for abbreviations occuring at the end of a line.
It seems better not to mess with the user's soft breaks,
especially now that we have `--wrap=preserve`.
  • Loading branch information
jgm committed Oct 3, 2021
1 parent 912c7aa commit d59f120
Show file tree
Hide file tree
Showing 3 changed files with 33 additions and 62 deletions.
11 changes: 7 additions & 4 deletions MANUAL.txt
Original file line number Diff line number Diff line change
Expand Up @@ -691,10 +691,13 @@ header when requesting a document from a URL:
directory or fall back on a system default. To see the
system default, use
`pandoc --print-default-data-file=abbreviations`. The only
use pandoc makes of this list is in the Markdown reader.
Strings found in this list will be followed by a nonbreaking
space, and the period will not produce sentence-ending space
in formats like LaTeX. The strings may not contain spaces.
use pandoc makes of this list is in the Markdown reader,
and only if the `smart` extension is enabled.
A single space following a string on this list will be
transformed into a nonbreaking space (so, to defeat this
feature you can use two spaces after a period).
As a result, the period will not produce sentence-ending space
in formats like LaTeX. The abbreviations may not contain spaces.

[`pandocfilters`]: https://github.com/jgm/pandocfilters
[PHP]: https://github.com/vinai/pandocfilters-php
Expand Down
53 changes: 26 additions & 27 deletions src/Text/Pandoc/Readers/Markdown.hs
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
{-# LANGUAGE TupleSections #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE ViewPatterns #-}
{-# LANGUAGE FlexibleContexts #-}
{- |
Module : Text.Pandoc.Readers.Markdown
Copyright : Copyright (C) 2006-2021 John MacFarlane
Expand Down Expand Up @@ -1731,35 +1732,33 @@ nonEndline = satisfy (/='\n')

str :: PandocMonad m => MarkdownParser m (F Inlines)
str = do
result <- mconcat <$> many1
( T.pack <$> (many1 (satisfy (\c -> isAlphaNum c ||
abbrevs <- getOption readerAbbreviations
isSmart <- extensionEnabled Ext_smart <$> getOption readerExtensions
let tryAbbrev t =
if t `Set.member` abbrevs
then try (do char ' ' <* notFollowedBy (char ' ')
return $ t <> "\160")
-- <|>
-- try (do lookAhead newline
-- guardDisabled Ext_hard_line_breaks
-- guardDisabled Ext_ignore_line_breaks
-- endline
-- -- move soft break before abbrev (#4635)
-- return $ "\n" <> t <> "\160")
<|> return t
else return t
let nonSpaceChunk = do
t <- T.pack <$> many1 (satisfy (\c -> isAlphaNum c ||
c == ',' || c == '?' ||
c == '(' || c == ')' ||
c == '/' )) <*
updateLastStrPos)
<|> try (T.pack <$> many1 spaceChar <* notFollowedBy newline)
<|> try (T.singleton <$> char '.' <*
notFollowedBy (char '.') <* updateLastStrPos) )
-- TODO: handle abbreviations as an AST transformation?
-- Then they could work on other formats, too.
-- (do guardEnabled Ext_smart
-- abbrevs <- getOption readerAbbreviations
-- if result `Set.member` abbrevs
-- then try (do ils <- whitespace
-- notFollowedBy (() <$ cite <|> () <$ note)
-- -- ?? lookAhead alphaNum
-- -- replace space after with nonbreaking space
-- -- if softbreak, move before abbrev if possible (#4635)
-- return $ do
-- ils' <- ils
-- case B.toList ils' of
-- [Space] ->
-- return (B.str result <> B.str "\160")
-- _ -> return (B.str result <> ils'))
-- <|> return (return (B.str result))
-- else return (return (B.str result)))
-- <|>
return (return (B.str result))
c == '/' )
<|> try (char '.' <* notFollowedBy (char '.')))
updateLastStrPos
if isSmart
then tryAbbrev t
else return t
let spaceChunk = T.pack <$> (try (many1 spaceChar <* notFollowedBy newline))
return . B.text . mconcat <$> many1 (nonSpaceChunk <|> spaceChunk)

-- an endline character that can be treated as a space, not a structural break
endline :: PandocMonad m => MarkdownParser m (F Inlines)
Expand Down
31 changes: 0 additions & 31 deletions test/command/4635.md

This file was deleted.

0 comments on commit d59f120

Please sign in to comment.