learn_real_world_haskell/Ch13 Data structures.tex

\documentclass[./main.tex]{subfiles}

\begin{document}

\subsection*{关联列表}

通常我们需要处理由键索引的无序数据。例如 Unix 管理员会有一系列的数值 UIDs 以及其关联的用户名。该列表的价值在于能够查找给定 UID 的用户名，而不是按照数据的顺序。
换言之，UID 是数据库的键。

在 Haskell 中，有若干方法可以处理类似这样数据的结构。两个最常见的就是关联列表以及由\acode{Data.Map}模块提供的\acode{Map}类型。关联列表很方便，因为它们很简单。
它们就是 Haskell 的列表，因此所有熟悉的列表函数同样可用于关联列表。然而对于较大的数据集，\acode{Map}则比关联列表具有更大的性能优势。

关联列表就是一个包含了（键，值）元组的普通列表。对于 UID 映射至用户名的类型就是\acode{[(Integer, String)]}。

Haskell 拥有一个称为\acode{Data.List.lookup}的内建函数用于关联列表的查询。其类型为\acode{Eq a => a -> [(a, b)] -> Maybe b}。

\begin{lstlisting}[language=Haskell]
  ghci> let al = [(1,"one"),(2,"two"),(3,"three"),(4,"four")]
  ghci> lookup 1 al
  Just "one"
  ghci> lookup 5 al
  Nothing
\end{lstlisting}

\acode{lookup}函数实际很简单，我们可以自己编写一个：

\begin{lstlisting}[language=Haskell]
  myLookup :: (Eq a) => a -> [(a, b)] -> Maybe b
  myLookup _ [] = Nothing
  myLookup key ((thisKey, thisVal) : rest) =
    if key == thisKey
      then Just thisVal
      else myLookup key rest
\end{lstlisting}

让我们来看一个更复杂的关联列表。在 Unix/Linux 机器中存在一个名为\acode{/etc/passwd}的文件用于存储用户名，UIDs，home 路径，已经其它数据。我们将编写一个程序用于
解析该文件，创建一个关联列表，令用户通过给定的 UID 来查找用户名。

\begin{lstlisting}[language=Haskell]
  import Control.Monad (when)
  import Data.List
  import System.Environment (getArgs)
  import System.Exit
  import System.IO

  main = do
    -- Load the command-line arguments
    args <- getArgs

    -- If we don't have the right amount of args, give an error and abort
    when (length args /= 2) $ do
      putStrLn "Syntax: passwd-al filename uid"
      exitFailure

    -- Read the file lazily
    content <- readFile $ head args

    -- Compute the username in pure code
    let username = findByUID content $ read $ args !! 1

    -- Display the result
    case username of
      Just x -> putStrLn x
      Nothing -> putStrLn "Could not find that UID"

    putStrLn "whatever"

  -- Given the entire input and a UID, see if we can find a username.
  findByUID :: String -> Integer -> Maybe String
  findByUID content uid =
    let al = map parseLine . filter (('#' /=) . head) . lines $ content
     in lookup uid al

  -- Convert a colon-separated line into fields
  parseLine :: String -> (Integer, String)
  parseLine input =
    let fields = split ':' input
     in (read (fields !! 2), head fields)

  -- Takes a delimiter and a list. Break up the list based on the delimiter.
  split :: (Eq a) => a -> [a] -> [[a]]
  -- If the input is empty, the result is a list of empty lists.
  split _ [] = [[]]
  split delim str =
    -- Find the part of the list before delim and put it in "before".
    -- The rest of the list, including the leading delim, goes in "remainder".
    let (before, remainder) = span (/= delim) str
     in before : case remainder of
          [] -> []
          -- If there is more data to precess,
          -- call split recursively to process it
          x ->
            split delim $ tail x
\end{lstlisting}

\textbf{注}：原文的\acode{findByUID}函数中的\acode{let al = map parseLine . lines \$ content}需要修改成
\acode{let al = map parseLine . filter (('#' /=) . head) . lines \$ content}才能对\acode{/etc/passwd}文件生效。否则会抛出异常
\acode{Prelude.!!: index too large}。

测试：

\begin{lstlisting}
  % runhaskell passwd-al.hs /etc/passwd 0
  root
  % runhaskell passwd-al.hs /etc/passwd 3
  Could not find that UID
\end{lstlisting}

\subsection*{映射}

\acode{Data.Map}模块提供了一个与关联列表行为相似的\acode{Map}类型，不过有更优异的性能。

映射提供了其它语言中与哈希表一样功能。其内部的实现为一个平衡二叉树。相较于哈希表，在不可变数据上的表现而言，映射更加的高效。这也是纯函数式编程如何深刻影响我们编写代码的
最明显的例子：我们选择的数据结构和算法可以清晰的表达并且高效的执行，但是我们对特定任务的选择通常与命令式语言中的对应选项不同。

\acode{Data.Map}中有些函数与 Prelude 同名，因此需要\acode{import qualified Data.Map as Map}并使用\acode{Map.name}来引用模块中对应的名称。

\begin{lstlisting}[language=Haskell]
  import Data.Map qualified as Map

  -- Functions to generate a Map that represents an association list as a map

  al = [(1, "one"), (2, "two"), (3, "three"), (4, "four")]

  {-
  Create a map representation of 'al' by converting the association list using Map.fromList
  -}
  mapFromAL = Map.fromList al

  {-
  Create a map representation of 'al' by doing a fold
  -}
  mapFold = foldl (\map (k, v) -> Map.insert k v map) Map.empty al

  {-
  Manually create a map with the elements of 'al' in it
  -}
  mapManual =
    Map.insert 2 "two"
      . Map.insert 4 "four"
      . Map.insert 1 "one"
      . Map.insert 3 "three"
      $ Map.empty
\end{lstlisting}

类似\acode{Map.insert}的函数通常在 Haskell 中这么工作：它们返回输入数据的拷贝，并应用所需的修改。这对于映射而言很轻松。这意味着可以使用\acode{foldl}来构建一个如
\acode{mapFold}案例中那样的映射。或者是调用\acode{Map.insert}串联在一起如\acode{mapManual}案例中那样。使用\textbf{ghci}来查看是否如同预期：

\begin{lstlisting}[language=Haskell]
  ghci> :l buildmap.hs
  [1 of 2] Compiling Main             ( buildmap.hs, interpreted )
  Ok, one module loaded.
  ghci> mapFromAL
  fromList [(1,"one"),(2,"two"),(3,"three"),(4,"four")]
  ghci> mapFold
  fromList [(1,"one"),(2,"two"),(3,"three"),(4,"four")]
  ghci> mapManual
  fromList [(1,"one"),(2,"two"),(3,"three"),(4,"four")]
\end{lstlisting}

注意\acode{mapManual}的输出与之前使用列表来构建映射的输出是不一样的。映射并不确保原始的顺序。

\subsection*{函数也是数据}

Haskell 一部分的能力就是创建于操作函数非常的容易。下面是一个将函数存储为 record 字段的例子：

\begin{lstlisting}[language=Haskell]
  data CustomColor = CustomColor {red :: Int, green :: Int, blue :: Int}
    deriving (Eq, Show, Read)

  data FuncRec = FuncRec {name :: String, colorCalc :: Int -> (CustomColor, Int)}

  plus5func color x = (color, x + 5)

  purple = CustomColor 255 0 255

  plus5 = FuncRec {name = "plus5", colorCalc = plus5func purple}

  always0 = FuncRec {name = "always0", colorCalc = const (purple, 0)}
\end{lstlisting}

注意\acode{colorCalc}的类型：它是一个函数，接受一个\acode{Int}并返回\acode{(CustomColor, Int)}的元组。我们创建了两个\acode{FuncRec}的 records：
\acode{plus5}与\acode{always0}。注意两者的\acode{colorCalc}总是返回紫色。\acode{FuncRec}其本身并没有字段用于存储颜色，而是以某种方式使值成为了函数本身。
这就是\textit{闭包 closure}。

\begin{lstlisting}[language=Haskell]
  ghci> :l funcrecs.hs
  [1 of 2] Compiling Main             ( funcrecs.hs, interpreted )
  Ok, one module loaded.
  ghci> :t plus5
  plus5 :: FuncRec
  ghci> name plus5
  "plus5"
  ghci> :t colorCalc plus5
  colorCalc plus5 :: Int -> (CustomColor, Int)
  ghci> (colorCalc plus5) 7
  (CustomColor {red = 255, green = 0, blue = 255},12)
  ghci> :t colorCalc always0
  colorCalc always0 :: Int -> (CustomColor, Int)
  ghci> (colorCalc always0) 7
  (CustomColor {red = 255, green = 0, blue = 255},0)
\end{lstlisting}

更高级的方式，例如将数据用于若干地方，使用类型构造函数则会大有帮助：

\begin{lstlisting}[language=Haskell]
  data FuncRec = FuncRec
    { name :: String,
      calc :: Int -> Int,
      namedCalc :: Int -> (String, Int)
    }

  mkFuncRec :: String -> (Int -> Int) -> FuncRec
  mkFuncRec name calcfunc =
    FuncRec
      { name = name,
        calc = calcfunc,
        namedCalc = \x -> (name, calcfunc x)
      }

  plus5 = mkFuncRec "plus5" (+ 5)

  always0 = mkFuncRec "always0" (const 0)
\end{lstlisting}

这里有一个名为\acode{mkFuncRec}的函数，其接受一个\acode{String}以及另一个函数作为参数，返回一个新的\acode{FuncRec} record。注意\acode{mkFuncRec}的两个参数
在若干地方都被用到了。

\begin{lstlisting}[language=Haskell]
  ghci> :l funcrecs2.hs
  [1 of 2] Compiling Main             ( funcrecs2.hs, interpreted )
  Ok, one module loaded.
  ghci> :t plus5
  plus5 :: FuncRec
  ghci> name plus5
  "plus5"
  ghci> (calc plus5) 5
  10
  ghci> (namedCalc plus5) 5
  ("plus5",10)
  ghci> let plus5a = plus5 {name = "PLUS5A"}
  ghci> name plus5a
  "PLUS5A"
  ghci> (namedCalc plus5a) 5
  ("plus5",10)
\end{lstlisting}

注意\acode{plus5a}的创建。我们修改了\acode{name}字段而不是\acode{namedCalc}字段。这是为什么\acode{name}拥有一个新的名称，而\acode{namedCalc}仍然返回的是
传入\acode{mkFuncRec}的名称；它不会改变，除非我们显式的修改它。

\subsection*{拓展案列：/etc/passwd}

以下案例从\acode{/etc/passwd}文件中解析并存储数据。

\begin{lstlisting}[language=Haskell]
  import Control.Monad (when)
  import Data.List
  import Data.Map qualified as Map
  import System.Environment (getArgs)
  import System.Exit
  import System.IO
  import Text.Printf (printf)

  {-
  The primary piece of data this program will store.
  It represents the fields in a POSIX /etc/passwd file.
  -}
  data PasswdEntry = PasswdEntry
    { userName :: String,
      password :: String,
      uid :: Integer,
      gid :: Integer,
      gecos :: String,
      homeDir :: String,
      shell :: String
    }
    deriving (Eq, Ord)

  {-
  Define how we get data to a 'PasswdEntry'.
  -}
  instance Show PasswdEntry where
    show pe =
      printf
        "%s:%s:%d:%d:%s:%s:%s"
        (userName pe)
        (password pe)
        (uid pe)
        (gid pe)
        (gecos pe)
        (homeDir pe)
        (shell pe)

  {-
  Converting data back out of a 'PasswdEntry'.
  -}
  instance Read PasswdEntry where
    readsPrec _ value =
      case split ':' value of
        [f1, f2, f3, f4, f5, f6, f7] ->
          -- Generate a 'PasswdEntry' the shorthand way:
          -- using the positional fields. We use 'read' to
          -- convert the numeric fields to Integers.
          [(PasswdEntry f1 f2 (read f3) (read f4) f5 f6 f7, [])]
        x -> error $ "Invalid number of fields in input: " ++ show x
      where
        -- Takes a delimiter and a list.
        -- Break up the list based on the delimiter.
        split :: (Eq a) => a -> [a] -> [[a]]
        -- If the input is empty, the result is a list of empty lists.
        split _ [] = []
        split delim str =
          -- Find the part of the list before delim and put it in "before".
          -- The rest of the list, including the leading delim,
          -- goes in "remainder".
          let (before, remainder) = span (/= delim) str
           in before : case remainder of
                [] -> []
                x ->
                  -- If there is more data to process,
                  -- call split recursively to process it
                  split delim (tail x)

  -- Convenience aliases; we'll have two maps: one from UID to entries
  -- and the other from username to entries
  type UIDMap = Map.Map Integer PasswdEntry

  type UserMap = Map.Map String PasswdEntry

  {-
  Converts input data to maps. Returns UID and User maps.
  -}
  inputToMaps :: String -> (UIDMap, UserMap)
  inputToMaps inp = (uidmap, usermap)
    where
      -- fromList converts a [(key, value)] list into a Map
      uidmap = Map.fromList . map (\pe -> (uid pe, pe)) $ entries
      usermap = Map.fromList . map (\pe -> (userName pe, pe)) $ entries
      -- Convert the input String to [PasswdEntry]
      entries = map read $ lines inp

  mainMenu maps@(uidmap, usermap) = do
    putStr optionText
    hFlush stdout
    sel <- getLine
    -- See what they want to do. For every option except 4,
    -- return them to the main menu afterwards by calling
    -- mainMenu recursively
    case sel of
      "1" -> lookupUserName >> mainMenu maps
      "2" -> lookupUID >> mainMenu maps
      "3" -> displayFile >> mainMenu maps
      "4" -> return ()
      _ -> putStrLn "Invalid selection" >> mainMenu maps
    where
      lookupUserName = do
        putStrLn "Username: "
        username <- getLine
        case Map.lookup username usermap of
          Nothing -> putStrLn "Not found."
          Just x -> print x
      lookupUID = do
        putStrLn "UID: "
        uidstring <- getLine
        case Map.lookup (read uidstring) uidmap of
          Nothing -> putStrLn "Not found."
          Just x -> print x
      displayFile =
        putStr . unlines . map (show . snd) . Map.toList $ uidmap
      optionText =
        "\npasswdmap options:\n\
        \\n\
        \1    Look up a user name\n\
        \2    Look up a UID\n\
        \3    Display entire file\n\
        \4    Quit\n\n\
        \Your selection: "

  main = do
    -- Load the command-line arguments
    args <- getArgs

    -- If we don't have the right number of args,
    -- give an error and abort

    when (length args /= 1) $ do
      putStrLn "Syntax: passwdmap filename"
      exitFailure

    -- Read the file lazily
    content <- readFile $ head args
    let maps = inputToMaps content
    mainMenu maps
\end{lstlisting}

该案例维护了两个 maps：username 与\acode{PasswdEntry}，以及 UID 与\acode{PasswdEntry}。可以将此视为数据库的两个不同的数据索引，用于加速搜索不同的字段。

\subsection*{拓展案列：数值类型}

略

\subsubsection*{第一步}

如果希望为加号操作符实现一些自定义行为，则必须定义一个 newtype，并使其成为 Num 的一个实例。该类型需要以符号方式存储表达式。为此我们需要存储操作本身，其左侧以及右侧。
其中，左右侧也可以是表达式。

\begin{lstlisting}[language=Haskell]
  -- The "operators" that we're going to support
  data Op = Plus | Minus | Mul | Div | Pow
    deriving (Eq, Show)

  {- The core symbolic manipulation type -}
  data SymbolicManip a
    = Number a -- Simple number, such as 5
    | Arith Op (SymbolicManip a) (SymbolicManip a)
    deriving (Eq, Show)

  {-
    SymbolicManip will be an instance of Num.
    Define how the Num operations are handled over a SymbolicManip.
    This will implement things like (+) for SymbolicManip.
  -}
  instance (Num a) => Num (SymbolicManip a) where
    a + b = Arith Plus a b
    a - b = Arith Minus a b
    a * b = Arith Mul a b
    negate = Arith Mul (Number (-1))
    abs a = error "abs is unimplemented"
    signum _ = error "signum is unimplemented"
    fromInteger i = Number (fromInteger i)
\end{lstlisting}

这里定义了名为\acode{Op}的类型。该类型代表了某些的操作符。接着是\acode{SymbolicManip a}的定义，由于有\acode{Num a}约束，任何\acode{Num}都可用作于\acode{a}。
一个完整类型会像\acode{SymbolicManip Int}这样。

一个\acode{SymbolicManip}类型可以是一个简单的数字，或者是某些计算操作。\acode{Arith}构造函数的类型是递归的，这在 Haskell 中很合理。\acode{Arith}用一个
\acode{Op}和另外两个\acode{SymbolicManip}创建了一个新的\acode{SymbolicManip}。

\begin{lstlisting}[language=Haskell]
  ghci> :l numsimple.hs
  [1 of 2] Compiling Main             ( numsimple.hs, interpreted )
  Ok, one module loaded.
  ghci> Number 5
  Number 5
  ghci> :t Number 5
  Number 5 :: Num a => SymbolicManip a
  ghci> :t Number (5::Int)
  Number (5::Int) :: SymbolicManip Int
  ghci> Number 5 * Number 10
  Arith Mul (Number 5) (Number 10)
  ghci> (5 * 10)::SymbolicManip Int
  Arith Mul (Number 5) (Number 10)
  ghci> (5 * 10 + 2)::SymbolicManip Int
  Arith Plus (Arith Mul (Number 5) (Number 10)) (Number 2)
\end{lstlisting}

可以看到非常基础的表达式是如何工作的。注意 Haskell 是如何“转换”\acode{5 * 10 + 2}成一个\acode{SymbolicManip}的，以及其中的处理顺序。这并非一个真实的转换；
\acode{SymbolicManip}如今是一个 first-class 数值。整数数值的字面量会被视为包裹在\acode{fromInteger}内，因此\acode{5}等同于\acode{SymbolicManip Int}
等同于\acode{Int}。

至此我们的任务就简单了：拓展\acode{SymbolicManip}类型来表达所有需求的操作，为其实现其他数值 typeclasses 的实例，并实现自身的\acode{Show}。

\subsubsection*{完整代码}

以下是完整的\acode{num.hs}代码：

\begin{lstlisting}[language=Haskell]
  import Data.List

  -- Symbolic/units manipulation
  data Op
    = Plus
    | Minus
    | Mul
    | Div
    | Pow
    deriving (Eq, Show)

  {-
    The core symbolic manipulation type.
    It can be a simple number, a symbol, a binary arithmetic operation (such as +),
    or a unary arithmetic operation (such as cos)

    Notice the types of BinaryArith and UnaryArith: it's a recursive type.
    So, we could represent a (+) over two SymbolicManips.
  -}
  data SymbolicManips a
    = Number a -- Simple number, such as 5
    | Symbol String -- A symbol, such as x
    | BinaryArith Op (SymbolicManips a) (SymbolicManips a)
    | UnaryArith String (SymbolicManips a)
    deriving (Eq)
\end{lstlisting}

上述代码定义了之前使用过的\acode{Op}，以及\acode{SymbolicManips}，与之前类似。这个版本支持了单元算法符（即仅接受一个参数）例如\acode{abs}以及\acode{cos}。
接下来定义\acode{Num}。

\begin{lstlisting}[language=Haskell]
  {-
    SymbolicManips will be an instance of Num.
    Define how the Num operations are handled over a SymbolicManips.
    This will implement things like (+) for SymbolicManips.
  -}
  instance (Num a) => Num (SymbolicManips a) where
    a + b = BinaryArith Plus a b
    a - b = BinaryArith Minus a b
    a * b = BinaryArith Mul a b
    negate = BinaryArith Mul (Number (-1))
    abs = UnaryArith "abs"
    signum _ = error "signum is unimplemented"
    fromInteger = Number . fromInteger
\end{lstlisting}

以上代码简单直接。注意之前的版本并不支持\acode{abs}，不过现在有了\acode{UnaryArith}构造函数就能支持了。接下来是更多的实例：

\begin{lstlisting}[language=Haskell]
  {-
    Make SymbolicManips an instance of Fractional
  -}
  instance (Fractional a) => Fractional (SymbolicManips a) where
    a / b = BinaryArith Div a b
    recip = BinaryArith Div $ Number 1
    fromRational = Number . fromRational

  {-
    Make SymbolicManips an instance of Floating
  -}
  instance (Floating a) => Floating (SymbolicManips a) where
    pi = Symbol "pi"
    exp = UnaryArith "exp"
    log = UnaryArith "log"
    sqrt = UnaryArith "sqrt"
    a ** b = BinaryArith Pow a b
    sin = UnaryArith "sin"
    cos = UnaryArith "cos"
    tan = UnaryArith "tan"
    asin = UnaryArith "asin"
    acos = UnaryArith "acos"
    atan = UnaryArith "atan"
    sinh = UnaryArith "sinh"
    cosh = UnaryArith "cosh"
    tanh = UnaryArith "tanh"
    asinh = UnaryArith "asinh"
    acosh = UnaryArith "acosh"
    atanh = UnaryArith "atanh"
\end{lstlisting}

以上代码实现的是\acode{Fractional}与\acode{Floating}。接下来是将表达式以字符串形式展示的实现：

\begin{lstlisting}[language=Haskell]
  {-
    Show a SymbolicManips as a String, using conventional algebraic notation
  -}
  prettyShow :: (Show a, Num a) => SymbolicManips a -> String
  -- Show a number or symbol as a bare number or serial
  prettyShow (Number x) = show x
  prettyShow (Symbol x) = x
  prettyShow (BinaryArith op a b) =
    let pa = simpleParen a
        pb = simpleParen b
        pop = op2str op
    in pa ++ pop ++ pb
  prettyShow (UnaryArith opstr a) =
    opstr ++ "(" ++ prettyShow a ++ ")"

  op2str :: Op -> String
  op2str Plus = "+"
  op2str Minus = "-"
  op2str Mul = "*"
  op2str Div = "/"
  op2str Pow = "**"

  {-
    Add parenthesis where needed. This function is fairly conservative and will
    add parenthesis when not needed in some cases.

    Haskell will have already figured out precedence for us while building up
    the SymbolicManips.
  -}
  simpleParen :: (Show a, Num a) => SymbolicManips a -> String
  simpleParen (Number x) = prettyShow (Number x)
  simpleParen (Symbol x) = prettyShow (Symbol x)
  simpleParen x@(BinaryArith _ _ _) = "(" ++ prettyShow x ++ ")"
  simpleParen x@(UnaryArith _ _) = prettyShow x

  {-
    Showing a SymbolicManips calls the prettyShow function on it
  -}
  instance (Show a, Num a) => Show (SymbolicManips a) where
    show = prettyShow
\end{lstlisting}

首先是定义\acode{prettyShow}函数，它用于渲染表达式。接下来实现将一个表达式转换为 RPN 格式字符串的算法：

\begin{lstlisting}[language=Haskell]
  {-
    Show a SymbolicManip using RPN. HP calculator users may find this familiar.
  -}
  rnpShow :: (Show a, Num a) => SymbolicManips a -> String
  rnpShow i =
    let toList (Number x) = [show x]
        toList (Symbol x) = [x]
        toList (BinaryArith op a b) = toList a ++ toList b ++ [op2str op]
        toList (UnaryArith op a) = toList a ++ [op]
        join :: [a] -> [[a]] -> [a]
        join = intercalate
    in join " " (toList i)
\end{lstlisting}

接下来实现一个函数来对表达式进行一些基本的简化：

\begin{lstlisting}[language=Haskell]
  {-
    Perform some basic algebraic simplifications on a SymbolicManip
  -}
  simplify :: (Num a, Eq a) => SymbolicManips a -> SymbolicManips a
  simplify (BinaryArith op ia ib) =
    let sa = simplify ia
        sb = simplify ib
    in case (op, sa, sb) of
          (Mul, Number 1, b) -> b
          (Mul, a, Number 1) -> a
          (Mul, Number 0, _) -> Number 0
          (Mul, _, Number 0) -> Number 0
          (Div, a, Number 1) -> a
          (Plus, a, Number 0) -> a
          (Plus, Number 0, b) -> b
          (Minus, a, Number 0) -> a
          _ -> BinaryArith op sa sb
  simplify (UnaryArith op a) = UnaryArith op (simplify a)
  simplify x = x
\end{lstlisting}

现在需要为已构建好的库来增加单位测量。这将允许我们表达例如“5 米”这样的数量。首先是定义一个类型：

\begin{lstlisting}[language=Haskell]
  {-
    New data type: Units.
    A Units type contains a number and a SymbolicManip, which represents the units of measure.
    A simple label would be something like (Symbol "m")
  -}
  data Units a = Units a (SymbolicManips a) deriving (Eq)
\end{lstlisting}

一个\acode{Units}包含一个数值与一个标签。标签本身是一个\acode{SymbolicManip}。接下来就是\acode{Units}的\acode{Num}实例了：

\begin{lstlisting}[language=Haskell]
  {-
    Implement Units for Num.
    We don't know how to convert between arbitrary units, so we generate an error if we try to add
    numbers with different units. For multiplication, generate the appropriate new units.
  -}
  instance (Num a, Eq a) => Num (Units a) where
    (Units xa ua) + (Units xb ub)
      | ua == ub = Units (xa + xb) ua
      | otherwise = error "Mis-matched units in add or subtract"
    (Units xa ua) - (Units xb ub) = Units xa ua + Units (xb * (-1)) ub
    (Units xa ua) * (Units xb ub) = Units (xa * xb) (ua * ub)
    negate (Units xa ua) = Units (negate xa) ua
    abs (Units xa ua) = Units (abs xa) ua
    signum (Units xa _) = Units (signum xa) (Number 1)
    fromInteger i = Units (fromInteger i) (Number 1)
\end{lstlisting}

这里就能明白为什么使用\acode{SymbolicManip}而不是\acode{String}来存储测量单位了。例如在乘法计算出现时，测量的单位也同样发生了变化。例如 5 米乘以 2 米，那么得到
的是 10 平方米。我们强制加法时的单位匹配，并由加法来实现减法。下面是更多的 typeclass：

\begin{lstlisting}[language=Haskell]
  {-
    Make Units an instance of Fractional
  -}
  instance (Fractional a, Eq a) => Fractional (Units a) where
    (Units xa ua) / (Units xb ub) = Units (xa / xb) (ua / ub)
    recip a = 1 / a
    fromRational r = Units (fromRational r) (Number 1)

  {-
    Floating implementation for Units.
    Use some intelligence for angle calculations: support deg and rad
  -}
  instance (Floating a, Eq a) => Floating (Units a) where
    pi = Units pi $ Number 1
    exp = error "exp not yet implemented in Units"
    log = error "log not yet implemented in Units"
    sqrt (Units xa ua) = Units (sqrt xa) (sqrt ua)
    (Units xa ua) ** (Units xb ub)
      | ub == Number 1 = Units (xa ** xb) (ua ** Number xb)
      | otherwise = error "units for RHS of ** not supported"
    logBase _ = error "logBase not yet implemented in Units"
    sin (Units xa ua)
      | ua == Symbol "rad" = Units (sin xa) (Number 1)
      | ua == Symbol "deg" = Units (sin (deg2rad xa)) (Number 1)
      | otherwise = error "Units for sin must be empty"
    cos (Units xa ua)
      | ua == Symbol "rad" = Units (cos xa) (Number 1)
      | ua == Symbol "deg" = Units (cos (deg2rad xa)) (Number 1)
      | otherwise = error "Units for cos must be empty"
    tan (Units xa ua)
      | ua == Symbol "rad" = Units (tan xa) (Number 1)
      | ua == Symbol "deg" = Units (tan (deg2rad xa)) (Number 1)
      | otherwise = error "Units for tan must be empty"
    asin (Units xa ua)
      | ua == Number 1 = Units (rad2deg $ asin xa) (Symbol "deg")
      | otherwise = error "Units for asin must be empty"
    acos (Units xa ua)
      | ua == Number 1 = Units (rad2deg $ acos xa) (Symbol "deg")
      | otherwise = error "Units for acos must be empty"
    atan (Units xa ua)
      | ua == Number 1 = Units (rad2deg $ atan xa) (Symbol "deg")
      | otherwise = error "Units for atan must be empty"
    sinh = error "sinh not yet implemented in Units"
    cosh = error "cosh not yet implemented in Units"
    tanh = error "tanh not yet implemented in Units"
    asinh = error "asinh not yet implemented in Units"
    acosh = error "acosh not yet implemented in Units"
    atanh = error "atanh not yet implemented in Units"
\end{lstlisting}

接下来是一些处理单位的公用函数：

\begin{lstlisting}[language=Haskell]
  {-
    A simple function that takes a number and a String and returns an appropriate Units type to
    represent the number and its unit of measure
  -}
  units :: (Num z) => z -> String -> Units z
  units a b = Units a $ Symbol b

  {-
    Extract the number only out of a Units type
  -}
  dropUnits :: (Num z) => Units z -> z
  dropUnits (Units x _) = x

  {-
    Utilities for the Unit implementation
  -}
  deg2rad x = 2 * pi * x / 360

  rad2deg x = 360 * x / (2 * pi)
\end{lstlisting}

这里首先是\acode{units}用于简化了表达式的构建。接着是\acode{dropUnits}用于去掉单位返回裸值\acode{Num}。最后是定义了之前所需的角度与弧度之间的转换函数。接下来是
\acode{Units}的\acode{Show}实例：

\begin{lstlisting}[language=Haskell]
  {-
    Showing units: we show the numeric component, an underscore, then the prettyShow version of the simplified units
  -}
  instance (Show a, Num a, Eq a) => Show (Units a) where
    show (Units xa ua) = show xa ++ "_" ++ prettyShow (simplify ua)
\end{lstlisting}

最后是测试：

\begin{lstlisting}[language=Haskell]
  test :: (Num a) => a
  test = 2 * 5 + 3
\end{lstlisting}

\subsection*{函数作为数据的优点}

在命令式语言中，两个列表的追加廉价且容易。这里是一个简单的 C 结构，其中维护了两个列表的头尾指针：

\begin{lstlisting}[language=C]
  struct list {
    struct node *head, *tail;
  };
\end{lstlisting}

当我们想将一个列表附加到另一个列表时，需要修改前一个列表的最后一个节点并指向后一个列表的\acode{head}节点，接着更新尾指针指向尾结点。

显然在 Haskell 中，如果想要保持纯粹性，这种方法是不行的。因为数据是不可变的，所以不能随意修改列表。Haskell 用来附加列表的\acode{(++)}操作符：

\begin{lstlisting}[language=Haskell]
  (++) :: [a] -> [a] -> [a]
  (x : xs) ++ ys = x : xs ++ ys
  _ ++ ys = ys
\end{lstlisting}

从代码中可知，创建一个新的列表取决于初始列表的长度。

如果是不断的附加，那么这个代价是巨大的。\acode{(++)}操作符是右关联的：

\begin{lstlisting}[language=Haskell]
  ghci> :info (++)
  (++) :: [a] -> [a] -> [a] 	-- Defined in GHC.Base
  infixr 5 ++
\end{lstlisting}

这就意味着 Haskell 将表达式\acode{"a" ++ "b" ++ "c"}转换为\acode{"a" ++ ("b" ++ "c")}。这样可以使性能达到最优，因为它令左侧的操作尽可能的简短。

当我们重复的对一个列表进行附加时，我们便破坏了这种结合性。

与此同时，命令式程序员却在呵呵笑，因为他们重复追加的成本只取决于他们执行追加的次数。它们的性能是线性的，而我们则是平方函数。

像这样重复对列表进行附加的常见操作会造成性能损失时，就需要从另一个角度来看待这个问题了。

表达式\acode{"a"++}是一个偏应用函数。它的类型是什么？

\begin{lstlisting}[language=Haskell]
  ghci> :type ("a" ++)
  ("a" ++) :: [Char] -> [Char]
\end{lstlisting}

由于这是一个函数，我们可以使用\acode{(.)}操作符与其它操作进行组合，例如\acode{("b"++)}。

\begin{lstlisting}[language=Haskell]
  ghci> :type ("a" ++) . ("b" ++)
  ("a" ++) . ("b" ++) :: [Char] -> [Char]
\end{lstlisting}

新的函数拥有同样的类型。那么当停止组合函数时，并将一个\acode{String}提供给函数会得到什么呢？

\begin{lstlisting}[language=Haskell]
  ghci> let f = ("a" ++) . ("b" ++)
  ghci> f []
  "ab"
\end{lstlisting}

我们附加了字符串！我们用这些偏应用函数来存储数据，通过一个空列表来获取它。每个偏应用的\acode{(++)}与\acode{(.)}\textit{代表}着一个附加，但是又不实际
\textit{执行}附加。

关于这个方法有两点非常有趣的地方。首先偏应用的成本是恒定的，因此很多偏应用的成本是线性的。其次当我们最终提供\acode{[]}值来获取偏应用中的最终列表时，应用的顺序是
从右至左的。这使得\acode{(++)}的左侧操作最小，从而让附加操作的整体成本是线性而不是二次方程的。

\subsubsection*{将差异列表变为一个合理的库}

第一步是使用\acode{newtype}声明对用户隐藏基础类型。我们将创建一个名为\acode{DList}的新类型。

\begin{lstlisting}[language=Haskell]
  newtype DList a = DL
  { unDL :: [a] -> [a]
  }
\end{lstlisting}

\acode{unDL}函数即解构函数，即移除\acode{DL}构造函数。

\begin{lstlisting}[language=Haskell]
  append :: DList a -> DList a -> DList a
  append xs ys = DL (unDL xs . unDL ys)
\end{lstlisting}

\acode{append}函数看起来有点复杂，但它实际上只是之前展示的\acode{(.)}操作符的相同用法。对函数进行组合，首先需要从\acode{DL}构造函数中解包，因此使用了
\acode{unDL}。接着通过\acode{DL}构造函数重新将结果包装起来。

下面是另一种写法，这里使用了模式匹配来解包\acode{xs}与\acode{ys}：

\begin{lstlisting}[language=Haskell]
  append' :: DList a -> DList a -> DList a
  append' (DL xs) (DL ys) = DL (xs . ys)
\end{lstlisting}

\acode{DList}类型如果不能与正常列表互相转换的话，那么用处也不会很大：

\begin{lstlisting}[language=Haskell]
  fromList :: [a] -> DList a
  fromList xs = DL (xs ++)

  toList :: DList a -> [a]
  toList (DL xs) = xs []
\end{lstlisting}

如果我们还想要\acode{DList}变得跟普通列表一样好用，那么还需要提供一些列表的常规操作。

\begin{lstlisting}[language=Haskell]
  empty :: DList a
  empty = DL id

  -- equivalent of the list type's (:) operator
  cons :: a -> DList a -> DList a
  cons x (DL xs) = DL $ (x :) . xs

  dfoldr :: (a -> b -> b) -> b -> DList a -> b
  dfoldr f z xs = foldr f z $ toList xs
\end{lstlisting}

尽管\acode{DList}令列表附加的成本下降，但并不是所有的类列表操作都是可用的。\acode{head}函数的成本对于普通列表是常数，而\acode{DList}需要将整个\acode{DList}
转换为正常列表才能进行，因此这使成本变得额外高昂。

\begin{lstlisting}[language=Haskell]
  safeHead :: DList a -> Maybe a
  safeHead xs = case toList xs of
    (y : _) -> Just y
    _ -> Nothing
\end{lstlisting}

对于\acode{map}而言，我们需要将\acode{DList}类型成为一个函子。

\begin{lstlisting}[language=Haskell]
  dmap :: (a -> b) -> DList a -> DList b
  dmap f = dfoldr go empty
    where
      go x = cons (f x)

  instance Functor DList where
    fmap = dmap
\end{lstlisting}

最终可在源文件头部添加模块头进行导出：

\begin{lstlisting}[language=Haskell]
  module DList
    ( DList,
      fromList,
      toList,
      empty,
      append,
      cons,
      dfoldr,
    )
  where
\end{lstlisting}

\subsubsection*{列表，差异列表，以及幺半群}

在抽象代数中，存在一个名为\textit{幺半群 monoid}的简单抽象结构。很多数学对象就是幺半群，它们必须有两个属性：

\begin{itemize}
  \item 一个二维操作符。名为\acode{(*)}：表达式\acode{a * (b * c)}的结果必须等同于\acode{(a * b) * c}的结果。
  \item 一个 id 值。如果是\acode{e}，它必须遵循两个规则：\acode{a * e == a}以及\acode{e * a == a}。
\end{itemize}

具体实现：

\begin{lstlisting}[language=Haskell]
  instance Semigroup (DList a) where
    (<>) = append

  instance Monoid (DList a) where
    mempty = empty
\end{lstlisting}

\textbf{注}：与原文不同，\acode{Monoid}现在需要\acode{Semigroup}作为约束。

\subsection*{通用序列}

Haskell 内建的列表类型和我们所构建的\acode{DList}类型在某些特定场景下的性能非常的差。而\acode{Data.Sequence}模块中定义的\acode{Seq}容器类型则是在大量的操作下
具有优异的性能。

\begin{lstlisting}[language=Haskell]
  import Data.Sequence qualified as Seq

  import Data.Sequence ((<|), (><), (|>))

  import Data.Foldable qualified as Foldable
\end{lstlisting}

示例：

\begin{lstlisting}[language=Haskell]
  ghci> :l DataSequence.hs
  [1 of 2] Compiling Main             ( DataSequence.hs, interpreted )
  Ok, one module loaded.
  ghci> Seq.empty
  fromList []
  ghci> Seq.singleton 1
  fromList [1]
  ghci> let a = Seq.fromList [1,2,3]
  ghci> Seq.singleton 1 |> 2
  fromList [1,2]
  ghci> let left = Seq.fromList [1,3,3]
  ghci> let right = Seq.fromList [7,1]
  ghci> left >< right
  fromList [1,3,3,7,1]
  ghci> Foldable.toList (Seq.fromList [1,2,3])
  [1,2,3]
  ghci> Foldable.foldl' (+) 0 (Seq.fromList [1,2,3])
  6
\end{lstlisting}

\end{document}