-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathCh13 Data structures.tex
972 lines (796 loc) · 34.6 KB
/
Ch13 Data structures.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
\documentclass[./main.tex]{subfiles}
\begin{document}
\subsection*{关联列表}
通常我们需要处理由键索引的无序数据。例如 Unix 管理员会有一系列的数值 UIDs 以及其关联的用户名。该列表的价值在于能够查找给定 UID 的用户名,而不是按照数据的顺序。
换言之,UID 是数据库的键。
在 Haskell 中,有若干方法可以处理类似这样数据的结构。两个最常见的就是关联列表以及由\acode{Data.Map}模块提供的\acode{Map}类型。关联列表很方便,因为它们很简单。
它们就是 Haskell 的列表,因此所有熟悉的列表函数同样可用于关联列表。然而对于较大的数据集,\acode{Map}则比关联列表具有更大的性能优势。
关联列表就是一个包含了(键,值)元组的普通列表。对于 UID 映射至用户名的类型就是\acode{[(Integer, String)]}。
Haskell 拥有一个称为\acode{Data.List.lookup}的内建函数用于关联列表的查询。其类型为\acode{Eq a => a -> [(a, b)] -> Maybe b}。
\begin{lstlisting}[language=Haskell]
ghci> let al = [(1,"one"),(2,"two"),(3,"three"),(4,"four")]
ghci> lookup 1 al
Just "one"
ghci> lookup 5 al
Nothing
\end{lstlisting}
\acode{lookup}函数实际很简单,我们可以自己编写一个:
\begin{lstlisting}[language=Haskell]
myLookup :: (Eq a) => a -> [(a, b)] -> Maybe b
myLookup _ [] = Nothing
myLookup key ((thisKey, thisVal) : rest) =
if key == thisKey
then Just thisVal
else myLookup key rest
\end{lstlisting}
让我们来看一个更复杂的关联列表。在 Unix/Linux 机器中存在一个名为\acode{/etc/passwd}的文件用于存储用户名,UIDs,home 路径,已经其它数据。我们将编写一个程序用于
解析该文件,创建一个关联列表,令用户通过给定的 UID 来查找用户名。
\begin{lstlisting}[language=Haskell]
import Control.Monad (when)
import Data.List
import System.Environment (getArgs)
import System.Exit
import System.IO
main = do
-- Load the command-line arguments
args <- getArgs
-- If we don't have the right amount of args, give an error and abort
when (length args /= 2) $ do
putStrLn "Syntax: passwd-al filename uid"
exitFailure
-- Read the file lazily
content <- readFile $ head args
-- Compute the username in pure code
let username = findByUID content $ read $ args !! 1
-- Display the result
case username of
Just x -> putStrLn x
Nothing -> putStrLn "Could not find that UID"
putStrLn "whatever"
-- Given the entire input and a UID, see if we can find a username.
findByUID :: String -> Integer -> Maybe String
findByUID content uid =
let al = map parseLine . filter (('#' /=) . head) . lines $ content
in lookup uid al
-- Convert a colon-separated line into fields
parseLine :: String -> (Integer, String)
parseLine input =
let fields = split ':' input
in (read (fields !! 2), head fields)
-- Takes a delimiter and a list. Break up the list based on the delimiter.
split :: (Eq a) => a -> [a] -> [[a]]
-- If the input is empty, the result is a list of empty lists.
split _ [] = [[]]
split delim str =
-- Find the part of the list before delim and put it in "before".
-- The rest of the list, including the leading delim, goes in "remainder".
let (before, remainder) = span (/= delim) str
in before : case remainder of
[] -> []
-- If there is more data to precess,
-- call split recursively to process it
x ->
split delim $ tail x
\end{lstlisting}
\textbf{注}:原文的\acode{findByUID}函数中的\acode{let al = map parseLine . lines \$ content}需要修改成
\acode{let al = map parseLine . filter (('#' /=) . head) . lines \$ content}才能对\acode{/etc/passwd}文件生效。否则会抛出异常
\acode{Prelude.!!: index too large}。
测试:
\begin{lstlisting}
% runhaskell passwd-al.hs /etc/passwd 0
root
% runhaskell passwd-al.hs /etc/passwd 3
Could not find that UID
\end{lstlisting}
\subsection*{映射}
\acode{Data.Map}模块提供了一个与关联列表行为相似的\acode{Map}类型,不过有更优异的性能。
映射提供了其它语言中与哈希表一样功能。其内部的实现为一个平衡二叉树。相较于哈希表,在不可变数据上的表现而言,映射更加的高效。这也是纯函数式编程如何深刻影响我们编写代码的
最明显的例子:我们选择的数据结构和算法可以清晰的表达并且高效的执行,但是我们对特定任务的选择通常与命令式语言中的对应选项不同。
\acode{Data.Map}中有些函数与 Prelude 同名,因此需要\acode{import qualified Data.Map as Map}并使用\acode{Map.name}来引用模块中对应的名称。
\begin{lstlisting}[language=Haskell]
import Data.Map qualified as Map
-- Functions to generate a Map that represents an association list as a map
al = [(1, "one"), (2, "two"), (3, "three"), (4, "four")]
{-
Create a map representation of 'al' by converting the association list using Map.fromList
-}
mapFromAL = Map.fromList al
{-
Create a map representation of 'al' by doing a fold
-}
mapFold = foldl (\map (k, v) -> Map.insert k v map) Map.empty al
{-
Manually create a map with the elements of 'al' in it
-}
mapManual =
Map.insert 2 "two"
. Map.insert 4 "four"
. Map.insert 1 "one"
. Map.insert 3 "three"
$ Map.empty
\end{lstlisting}
类似\acode{Map.insert}的函数通常在 Haskell 中这么工作:它们返回输入数据的拷贝,并应用所需的修改。这对于映射而言很轻松。这意味着可以使用\acode{foldl}来构建一个如
\acode{mapFold}案例中那样的映射。或者是调用\acode{Map.insert}串联在一起如\acode{mapManual}案例中那样。使用\textbf{ghci}来查看是否如同预期:
\begin{lstlisting}[language=Haskell]
ghci> :l buildmap.hs
[1 of 2] Compiling Main ( buildmap.hs, interpreted )
Ok, one module loaded.
ghci> mapFromAL
fromList [(1,"one"),(2,"two"),(3,"three"),(4,"four")]
ghci> mapFold
fromList [(1,"one"),(2,"two"),(3,"three"),(4,"four")]
ghci> mapManual
fromList [(1,"one"),(2,"two"),(3,"three"),(4,"four")]
\end{lstlisting}
注意\acode{mapManual}的输出与之前使用列表来构建映射的输出是不一样的。映射并不确保原始的顺序。
\subsection*{函数也是数据}
Haskell 一部分的能力就是创建于操作函数非常的容易。下面是一个将函数存储为 record 字段的例子:
\begin{lstlisting}[language=Haskell]
data CustomColor = CustomColor {red :: Int, green :: Int, blue :: Int}
deriving (Eq, Show, Read)
data FuncRec = FuncRec {name :: String, colorCalc :: Int -> (CustomColor, Int)}
plus5func color x = (color, x + 5)
purple = CustomColor 255 0 255
plus5 = FuncRec {name = "plus5", colorCalc = plus5func purple}
always0 = FuncRec {name = "always0", colorCalc = const (purple, 0)}
\end{lstlisting}
注意\acode{colorCalc}的类型:它是一个函数,接受一个\acode{Int}并返回\acode{(CustomColor, Int)}的元组。我们创建了两个\acode{FuncRec}的 records:
\acode{plus5}与\acode{always0}。注意两者的\acode{colorCalc}总是返回紫色。\acode{FuncRec}其本身并没有字段用于存储颜色,而是以某种方式使值成为了函数本身。
这就是\textit{闭包 closure}。
\begin{lstlisting}[language=Haskell]
ghci> :l funcrecs.hs
[1 of 2] Compiling Main ( funcrecs.hs, interpreted )
Ok, one module loaded.
ghci> :t plus5
plus5 :: FuncRec
ghci> name plus5
"plus5"
ghci> :t colorCalc plus5
colorCalc plus5 :: Int -> (CustomColor, Int)
ghci> (colorCalc plus5) 7
(CustomColor {red = 255, green = 0, blue = 255},12)
ghci> :t colorCalc always0
colorCalc always0 :: Int -> (CustomColor, Int)
ghci> (colorCalc always0) 7
(CustomColor {red = 255, green = 0, blue = 255},0)
\end{lstlisting}
更高级的方式,例如将数据用于若干地方,使用类型构造函数则会大有帮助:
\begin{lstlisting}[language=Haskell]
data FuncRec = FuncRec
{ name :: String,
calc :: Int -> Int,
namedCalc :: Int -> (String, Int)
}
mkFuncRec :: String -> (Int -> Int) -> FuncRec
mkFuncRec name calcfunc =
FuncRec
{ name = name,
calc = calcfunc,
namedCalc = \x -> (name, calcfunc x)
}
plus5 = mkFuncRec "plus5" (+ 5)
always0 = mkFuncRec "always0" (const 0)
\end{lstlisting}
这里有一个名为\acode{mkFuncRec}的函数,其接受一个\acode{String}以及另一个函数作为参数,返回一个新的\acode{FuncRec} record。注意\acode{mkFuncRec}的两个参数
在若干地方都被用到了。
\begin{lstlisting}[language=Haskell]
ghci> :l funcrecs2.hs
[1 of 2] Compiling Main ( funcrecs2.hs, interpreted )
Ok, one module loaded.
ghci> :t plus5
plus5 :: FuncRec
ghci> name plus5
"plus5"
ghci> (calc plus5) 5
10
ghci> (namedCalc plus5) 5
("plus5",10)
ghci> let plus5a = plus5 {name = "PLUS5A"}
ghci> name plus5a
"PLUS5A"
ghci> (namedCalc plus5a) 5
("plus5",10)
\end{lstlisting}
注意\acode{plus5a}的创建。我们修改了\acode{name}字段而不是\acode{namedCalc}字段。这是为什么\acode{name}拥有一个新的名称,而\acode{namedCalc}仍然返回的是
传入\acode{mkFuncRec}的名称;它不会改变,除非我们显式的修改它。
\subsection*{拓展案列:/etc/passwd}
以下案例从\acode{/etc/passwd}文件中解析并存储数据。
\begin{lstlisting}[language=Haskell]
import Control.Monad (when)
import Data.List
import Data.Map qualified as Map
import System.Environment (getArgs)
import System.Exit
import System.IO
import Text.Printf (printf)
{-
The primary piece of data this program will store.
It represents the fields in a POSIX /etc/passwd file.
-}
data PasswdEntry = PasswdEntry
{ userName :: String,
password :: String,
uid :: Integer,
gid :: Integer,
gecos :: String,
homeDir :: String,
shell :: String
}
deriving (Eq, Ord)
{-
Define how we get data to a 'PasswdEntry'.
-}
instance Show PasswdEntry where
show pe =
printf
"%s:%s:%d:%d:%s:%s:%s"
(userName pe)
(password pe)
(uid pe)
(gid pe)
(gecos pe)
(homeDir pe)
(shell pe)
{-
Converting data back out of a 'PasswdEntry'.
-}
instance Read PasswdEntry where
readsPrec _ value =
case split ':' value of
[f1, f2, f3, f4, f5, f6, f7] ->
-- Generate a 'PasswdEntry' the shorthand way:
-- using the positional fields. We use 'read' to
-- convert the numeric fields to Integers.
[(PasswdEntry f1 f2 (read f3) (read f4) f5 f6 f7, [])]
x -> error $ "Invalid number of fields in input: " ++ show x
where
-- Takes a delimiter and a list.
-- Break up the list based on the delimiter.
split :: (Eq a) => a -> [a] -> [[a]]
-- If the input is empty, the result is a list of empty lists.
split _ [] = []
split delim str =
-- Find the part of the list before delim and put it in "before".
-- The rest of the list, including the leading delim,
-- goes in "remainder".
let (before, remainder) = span (/= delim) str
in before : case remainder of
[] -> []
x ->
-- If there is more data to process,
-- call split recursively to process it
split delim (tail x)
-- Convenience aliases; we'll have two maps: one from UID to entries
-- and the other from username to entries
type UIDMap = Map.Map Integer PasswdEntry
type UserMap = Map.Map String PasswdEntry
{-
Converts input data to maps. Returns UID and User maps.
-}
inputToMaps :: String -> (UIDMap, UserMap)
inputToMaps inp = (uidmap, usermap)
where
-- fromList converts a [(key, value)] list into a Map
uidmap = Map.fromList . map (\pe -> (uid pe, pe)) $ entries
usermap = Map.fromList . map (\pe -> (userName pe, pe)) $ entries
-- Convert the input String to [PasswdEntry]
entries = map read $ lines inp
mainMenu maps@(uidmap, usermap) = do
putStr optionText
hFlush stdout
sel <- getLine
-- See what they want to do. For every option except 4,
-- return them to the main menu afterwards by calling
-- mainMenu recursively
case sel of
"1" -> lookupUserName >> mainMenu maps
"2" -> lookupUID >> mainMenu maps
"3" -> displayFile >> mainMenu maps
"4" -> return ()
_ -> putStrLn "Invalid selection" >> mainMenu maps
where
lookupUserName = do
putStrLn "Username: "
username <- getLine
case Map.lookup username usermap of
Nothing -> putStrLn "Not found."
Just x -> print x
lookupUID = do
putStrLn "UID: "
uidstring <- getLine
case Map.lookup (read uidstring) uidmap of
Nothing -> putStrLn "Not found."
Just x -> print x
displayFile =
putStr . unlines . map (show . snd) . Map.toList $ uidmap
optionText =
"\npasswdmap options:\n\
\\n\
\1 Look up a user name\n\
\2 Look up a UID\n\
\3 Display entire file\n\
\4 Quit\n\n\
\Your selection: "
main = do
-- Load the command-line arguments
args <- getArgs
-- If we don't have the right number of args,
-- give an error and abort
when (length args /= 1) $ do
putStrLn "Syntax: passwdmap filename"
exitFailure
-- Read the file lazily
content <- readFile $ head args
let maps = inputToMaps content
mainMenu maps
\end{lstlisting}
该案例维护了两个 maps:username 与\acode{PasswdEntry},以及 UID 与\acode{PasswdEntry}。可以将此视为数据库的两个不同的数据索引,用于加速搜索不同的字段。
\subsection*{拓展案列:数值类型}
略
\subsubsection*{第一步}
如果希望为加号操作符实现一些自定义行为,则必须定义一个 newtype,并使其成为 Num 的一个实例。该类型需要以符号方式存储表达式。为此我们需要存储操作本身,其左侧以及右侧。
其中,左右侧也可以是表达式。
\begin{lstlisting}[language=Haskell]
-- The "operators" that we're going to support
data Op = Plus | Minus | Mul | Div | Pow
deriving (Eq, Show)
{- The core symbolic manipulation type -}
data SymbolicManip a
= Number a -- Simple number, such as 5
| Arith Op (SymbolicManip a) (SymbolicManip a)
deriving (Eq, Show)
{-
SymbolicManip will be an instance of Num.
Define how the Num operations are handled over a SymbolicManip.
This will implement things like (+) for SymbolicManip.
-}
instance (Num a) => Num (SymbolicManip a) where
a + b = Arith Plus a b
a - b = Arith Minus a b
a * b = Arith Mul a b
negate = Arith Mul (Number (-1))
abs a = error "abs is unimplemented"
signum _ = error "signum is unimplemented"
fromInteger i = Number (fromInteger i)
\end{lstlisting}
这里定义了名为\acode{Op}的类型。该类型代表了某些的操作符。接着是\acode{SymbolicManip a}的定义,由于有\acode{Num a}约束,任何\acode{Num}都可用作于\acode{a}。
一个完整类型会像\acode{SymbolicManip Int}这样。
一个\acode{SymbolicManip}类型可以是一个简单的数字,或者是某些计算操作。\acode{Arith}构造函数的类型是递归的,这在 Haskell 中很合理。\acode{Arith}用一个
\acode{Op}和另外两个\acode{SymbolicManip}创建了一个新的\acode{SymbolicManip}。
\begin{lstlisting}[language=Haskell]
ghci> :l numsimple.hs
[1 of 2] Compiling Main ( numsimple.hs, interpreted )
Ok, one module loaded.
ghci> Number 5
Number 5
ghci> :t Number 5
Number 5 :: Num a => SymbolicManip a
ghci> :t Number (5::Int)
Number (5::Int) :: SymbolicManip Int
ghci> Number 5 * Number 10
Arith Mul (Number 5) (Number 10)
ghci> (5 * 10)::SymbolicManip Int
Arith Mul (Number 5) (Number 10)
ghci> (5 * 10 + 2)::SymbolicManip Int
Arith Plus (Arith Mul (Number 5) (Number 10)) (Number 2)
\end{lstlisting}
可以看到非常基础的表达式是如何工作的。注意 Haskell 是如何“转换”\acode{5 * 10 + 2}成一个\acode{SymbolicManip}的,以及其中的处理顺序。这并非一个真实的转换;
\acode{SymbolicManip}如今是一个 first-class 数值。整数数值的字面量会被视为包裹在\acode{fromInteger}内,因此\acode{5}等同于\acode{SymbolicManip Int}
等同于\acode{Int}。
至此我们的任务就简单了:拓展\acode{SymbolicManip}类型来表达所有需求的操作,为其实现其他数值 typeclasses 的实例,并实现自身的\acode{Show}。
\subsubsection*{完整代码}
以下是完整的\acode{num.hs}代码:
\begin{lstlisting}[language=Haskell]
import Data.List
-- Symbolic/units manipulation
data Op
= Plus
| Minus
| Mul
| Div
| Pow
deriving (Eq, Show)
{-
The core symbolic manipulation type.
It can be a simple number, a symbol, a binary arithmetic operation (such as +),
or a unary arithmetic operation (such as cos)
Notice the types of BinaryArith and UnaryArith: it's a recursive type.
So, we could represent a (+) over two SymbolicManips.
-}
data SymbolicManips a
= Number a -- Simple number, such as 5
| Symbol String -- A symbol, such as x
| BinaryArith Op (SymbolicManips a) (SymbolicManips a)
| UnaryArith String (SymbolicManips a)
deriving (Eq)
\end{lstlisting}
上述代码定义了之前使用过的\acode{Op},以及\acode{SymbolicManips},与之前类似。这个版本支持了单元算法符(即仅接受一个参数)例如\acode{abs}以及\acode{cos}。
接下来定义\acode{Num}。
\begin{lstlisting}[language=Haskell]
{-
SymbolicManips will be an instance of Num.
Define how the Num operations are handled over a SymbolicManips.
This will implement things like (+) for SymbolicManips.
-}
instance (Num a) => Num (SymbolicManips a) where
a + b = BinaryArith Plus a b
a - b = BinaryArith Minus a b
a * b = BinaryArith Mul a b
negate = BinaryArith Mul (Number (-1))
abs = UnaryArith "abs"
signum _ = error "signum is unimplemented"
fromInteger = Number . fromInteger
\end{lstlisting}
以上代码简单直接。注意之前的版本并不支持\acode{abs},不过现在有了\acode{UnaryArith}构造函数就能支持了。接下来是更多的实例:
\begin{lstlisting}[language=Haskell]
{-
Make SymbolicManips an instance of Fractional
-}
instance (Fractional a) => Fractional (SymbolicManips a) where
a / b = BinaryArith Div a b
recip = BinaryArith Div $ Number 1
fromRational = Number . fromRational
{-
Make SymbolicManips an instance of Floating
-}
instance (Floating a) => Floating (SymbolicManips a) where
pi = Symbol "pi"
exp = UnaryArith "exp"
log = UnaryArith "log"
sqrt = UnaryArith "sqrt"
a ** b = BinaryArith Pow a b
sin = UnaryArith "sin"
cos = UnaryArith "cos"
tan = UnaryArith "tan"
asin = UnaryArith "asin"
acos = UnaryArith "acos"
atan = UnaryArith "atan"
sinh = UnaryArith "sinh"
cosh = UnaryArith "cosh"
tanh = UnaryArith "tanh"
asinh = UnaryArith "asinh"
acosh = UnaryArith "acosh"
atanh = UnaryArith "atanh"
\end{lstlisting}
以上代码实现的是\acode{Fractional}与\acode{Floating}。接下来是将表达式以字符串形式展示的实现:
\begin{lstlisting}[language=Haskell]
{-
Show a SymbolicManips as a String, using conventional algebraic notation
-}
prettyShow :: (Show a, Num a) => SymbolicManips a -> String
-- Show a number or symbol as a bare number or serial
prettyShow (Number x) = show x
prettyShow (Symbol x) = x
prettyShow (BinaryArith op a b) =
let pa = simpleParen a
pb = simpleParen b
pop = op2str op
in pa ++ pop ++ pb
prettyShow (UnaryArith opstr a) =
opstr ++ "(" ++ prettyShow a ++ ")"
op2str :: Op -> String
op2str Plus = "+"
op2str Minus = "-"
op2str Mul = "*"
op2str Div = "/"
op2str Pow = "**"
{-
Add parenthesis where needed. This function is fairly conservative and will
add parenthesis when not needed in some cases.
Haskell will have already figured out precedence for us while building up
the SymbolicManips.
-}
simpleParen :: (Show a, Num a) => SymbolicManips a -> String
simpleParen (Number x) = prettyShow (Number x)
simpleParen (Symbol x) = prettyShow (Symbol x)
simpleParen x@(BinaryArith _ _ _) = "(" ++ prettyShow x ++ ")"
simpleParen x@(UnaryArith _ _) = prettyShow x
{-
Showing a SymbolicManips calls the prettyShow function on it
-}
instance (Show a, Num a) => Show (SymbolicManips a) where
show = prettyShow
\end{lstlisting}
首先是定义\acode{prettyShow}函数,它用于渲染表达式。接下来实现将一个表达式转换为 RPN 格式字符串的算法:
\begin{lstlisting}[language=Haskell]
{-
Show a SymbolicManip using RPN. HP calculator users may find this familiar.
-}
rnpShow :: (Show a, Num a) => SymbolicManips a -> String
rnpShow i =
let toList (Number x) = [show x]
toList (Symbol x) = [x]
toList (BinaryArith op a b) = toList a ++ toList b ++ [op2str op]
toList (UnaryArith op a) = toList a ++ [op]
join :: [a] -> [[a]] -> [a]
join = intercalate
in join " " (toList i)
\end{lstlisting}
接下来实现一个函数来对表达式进行一些基本的简化:
\begin{lstlisting}[language=Haskell]
{-
Perform some basic algebraic simplifications on a SymbolicManip
-}
simplify :: (Num a, Eq a) => SymbolicManips a -> SymbolicManips a
simplify (BinaryArith op ia ib) =
let sa = simplify ia
sb = simplify ib
in case (op, sa, sb) of
(Mul, Number 1, b) -> b
(Mul, a, Number 1) -> a
(Mul, Number 0, _) -> Number 0
(Mul, _, Number 0) -> Number 0
(Div, a, Number 1) -> a
(Plus, a, Number 0) -> a
(Plus, Number 0, b) -> b
(Minus, a, Number 0) -> a
_ -> BinaryArith op sa sb
simplify (UnaryArith op a) = UnaryArith op (simplify a)
simplify x = x
\end{lstlisting}
现在需要为已构建好的库来增加单位测量。这将允许我们表达例如“5 米”这样的数量。首先是定义一个类型:
\begin{lstlisting}[language=Haskell]
{-
New data type: Units.
A Units type contains a number and a SymbolicManip, which represents the units of measure.
A simple label would be something like (Symbol "m")
-}
data Units a = Units a (SymbolicManips a) deriving (Eq)
\end{lstlisting}
一个\acode{Units}包含一个数值与一个标签。标签本身是一个\acode{SymbolicManip}。接下来就是\acode{Units}的\acode{Num}实例了:
\begin{lstlisting}[language=Haskell]
{-
Implement Units for Num.
We don't know how to convert between arbitrary units, so we generate an error if we try to add
numbers with different units. For multiplication, generate the appropriate new units.
-}
instance (Num a, Eq a) => Num (Units a) where
(Units xa ua) + (Units xb ub)
| ua == ub = Units (xa + xb) ua
| otherwise = error "Mis-matched units in add or subtract"
(Units xa ua) - (Units xb ub) = Units xa ua + Units (xb * (-1)) ub
(Units xa ua) * (Units xb ub) = Units (xa * xb) (ua * ub)
negate (Units xa ua) = Units (negate xa) ua
abs (Units xa ua) = Units (abs xa) ua
signum (Units xa _) = Units (signum xa) (Number 1)
fromInteger i = Units (fromInteger i) (Number 1)
\end{lstlisting}
这里就能明白为什么使用\acode{SymbolicManip}而不是\acode{String}来存储测量单位了。例如在乘法计算出现时,测量的单位也同样发生了变化。例如 5 米乘以 2 米,那么得到
的是 10 平方米。我们强制加法时的单位匹配,并由加法来实现减法。下面是更多的 typeclass:
\begin{lstlisting}[language=Haskell]
{-
Make Units an instance of Fractional
-}
instance (Fractional a, Eq a) => Fractional (Units a) where
(Units xa ua) / (Units xb ub) = Units (xa / xb) (ua / ub)
recip a = 1 / a
fromRational r = Units (fromRational r) (Number 1)
{-
Floating implementation for Units.
Use some intelligence for angle calculations: support deg and rad
-}
instance (Floating a, Eq a) => Floating (Units a) where
pi = Units pi $ Number 1
exp = error "exp not yet implemented in Units"
log = error "log not yet implemented in Units"
sqrt (Units xa ua) = Units (sqrt xa) (sqrt ua)
(Units xa ua) ** (Units xb ub)
| ub == Number 1 = Units (xa ** xb) (ua ** Number xb)
| otherwise = error "units for RHS of ** not supported"
logBase _ = error "logBase not yet implemented in Units"
sin (Units xa ua)
| ua == Symbol "rad" = Units (sin xa) (Number 1)
| ua == Symbol "deg" = Units (sin (deg2rad xa)) (Number 1)
| otherwise = error "Units for sin must be empty"
cos (Units xa ua)
| ua == Symbol "rad" = Units (cos xa) (Number 1)
| ua == Symbol "deg" = Units (cos (deg2rad xa)) (Number 1)
| otherwise = error "Units for cos must be empty"
tan (Units xa ua)
| ua == Symbol "rad" = Units (tan xa) (Number 1)
| ua == Symbol "deg" = Units (tan (deg2rad xa)) (Number 1)
| otherwise = error "Units for tan must be empty"
asin (Units xa ua)
| ua == Number 1 = Units (rad2deg $ asin xa) (Symbol "deg")
| otherwise = error "Units for asin must be empty"
acos (Units xa ua)
| ua == Number 1 = Units (rad2deg $ acos xa) (Symbol "deg")
| otherwise = error "Units for acos must be empty"
atan (Units xa ua)
| ua == Number 1 = Units (rad2deg $ atan xa) (Symbol "deg")
| otherwise = error "Units for atan must be empty"
sinh = error "sinh not yet implemented in Units"
cosh = error "cosh not yet implemented in Units"
tanh = error "tanh not yet implemented in Units"
asinh = error "asinh not yet implemented in Units"
acosh = error "acosh not yet implemented in Units"
atanh = error "atanh not yet implemented in Units"
\end{lstlisting}
接下来是一些处理单位的公用函数:
\begin{lstlisting}[language=Haskell]
{-
A simple function that takes a number and a String and returns an appropriate Units type to
represent the number and its unit of measure
-}
units :: (Num z) => z -> String -> Units z
units a b = Units a $ Symbol b
{-
Extract the number only out of a Units type
-}
dropUnits :: (Num z) => Units z -> z
dropUnits (Units x _) = x
{-
Utilities for the Unit implementation
-}
deg2rad x = 2 * pi * x / 360
rad2deg x = 360 * x / (2 * pi)
\end{lstlisting}
这里首先是\acode{units}用于简化了表达式的构建。接着是\acode{dropUnits}用于去掉单位返回裸值\acode{Num}。最后是定义了之前所需的角度与弧度之间的转换函数。接下来是
\acode{Units}的\acode{Show}实例:
\begin{lstlisting}[language=Haskell]
{-
Showing units: we show the numeric component, an underscore, then the prettyShow version of the simplified units
-}
instance (Show a, Num a, Eq a) => Show (Units a) where
show (Units xa ua) = show xa ++ "_" ++ prettyShow (simplify ua)
\end{lstlisting}
最后是测试:
\begin{lstlisting}[language=Haskell]
test :: (Num a) => a
test = 2 * 5 + 3
\end{lstlisting}
\subsection*{函数作为数据的优点}
在命令式语言中,两个列表的追加廉价且容易。这里是一个简单的 C 结构,其中维护了两个列表的头尾指针:
\begin{lstlisting}[language=C]
struct list {
struct node *head, *tail;
};
\end{lstlisting}
当我们想将一个列表附加到另一个列表时,需要修改前一个列表的最后一个节点并指向后一个列表的\acode{head}节点,接着更新尾指针指向尾结点。
显然在 Haskell 中,如果想要保持纯粹性,这种方法是不行的。因为数据是不可变的,所以不能随意修改列表。Haskell 用来附加列表的\acode{(++)}操作符:
\begin{lstlisting}[language=Haskell]
(++) :: [a] -> [a] -> [a]
(x : xs) ++ ys = x : xs ++ ys
_ ++ ys = ys
\end{lstlisting}
从代码中可知,创建一个新的列表取决于初始列表的长度。
如果是不断的附加,那么这个代价是巨大的。\acode{(++)}操作符是右关联的:
\begin{lstlisting}[language=Haskell]
ghci> :info (++)
(++) :: [a] -> [a] -> [a] -- Defined in GHC.Base
infixr 5 ++
\end{lstlisting}
这就意味着 Haskell 将表达式\acode{"a" ++ "b" ++ "c"}转换为\acode{"a" ++ ("b" ++ "c")}。这样可以使性能达到最优,因为它令左侧的操作尽可能的简短。
当我们重复的对一个列表进行附加时,我们便破坏了这种结合性。
与此同时,命令式程序员却在呵呵笑,因为他们重复追加的成本只取决于他们执行追加的次数。它们的性能是线性的,而我们则是平方函数。
像这样重复对列表进行附加的常见操作会造成性能损失时,就需要从另一个角度来看待这个问题了。
表达式\acode{"a"++}是一个偏应用函数。它的类型是什么?
\begin{lstlisting}[language=Haskell]
ghci> :type ("a" ++)
("a" ++) :: [Char] -> [Char]
\end{lstlisting}
由于这是一个函数,我们可以使用\acode{(.)}操作符与其它操作进行组合,例如\acode{("b"++)}。
\begin{lstlisting}[language=Haskell]
ghci> :type ("a" ++) . ("b" ++)
("a" ++) . ("b" ++) :: [Char] -> [Char]
\end{lstlisting}
新的函数拥有同样的类型。那么当停止组合函数时,并将一个\acode{String}提供给函数会得到什么呢?
\begin{lstlisting}[language=Haskell]
ghci> let f = ("a" ++) . ("b" ++)
ghci> f []
"ab"
\end{lstlisting}
我们附加了字符串!我们用这些偏应用函数来存储数据,通过一个空列表来获取它。每个偏应用的\acode{(++)}与\acode{(.)}\textit{代表}着一个附加,但是又不实际
\textit{执行}附加。
关于这个方法有两点非常有趣的地方。首先偏应用的成本是恒定的,因此很多偏应用的成本是线性的。其次当我们最终提供\acode{[]}值来获取偏应用中的最终列表时,应用的顺序是
从右至左的。这使得\acode{(++)}的左侧操作最小,从而让附加操作的整体成本是线性而不是二次方程的。
\subsubsection*{将差异列表变为一个合理的库}
第一步是使用\acode{newtype}声明对用户隐藏基础类型。我们将创建一个名为\acode{DList}的新类型。
\begin{lstlisting}[language=Haskell]
newtype DList a = DL
{ unDL :: [a] -> [a]
}
\end{lstlisting}
\acode{unDL}函数即解构函数,即移除\acode{DL}构造函数。
\begin{lstlisting}[language=Haskell]
append :: DList a -> DList a -> DList a
append xs ys = DL (unDL xs . unDL ys)
\end{lstlisting}
\acode{append}函数看起来有点复杂,但它实际上只是之前展示的\acode{(.)}操作符的相同用法。对函数进行组合,首先需要从\acode{DL}构造函数中解包,因此使用了
\acode{unDL}。接着通过\acode{DL}构造函数重新将结果包装起来。
下面是另一种写法,这里使用了模式匹配来解包\acode{xs}与\acode{ys}:
\begin{lstlisting}[language=Haskell]
append' :: DList a -> DList a -> DList a
append' (DL xs) (DL ys) = DL (xs . ys)
\end{lstlisting}
\acode{DList}类型如果不能与正常列表互相转换的话,那么用处也不会很大:
\begin{lstlisting}[language=Haskell]
fromList :: [a] -> DList a
fromList xs = DL (xs ++)
toList :: DList a -> [a]
toList (DL xs) = xs []
\end{lstlisting}
如果我们还想要\acode{DList}变得跟普通列表一样好用,那么还需要提供一些列表的常规操作。
\begin{lstlisting}[language=Haskell]
empty :: DList a
empty = DL id
-- equivalent of the list type's (:) operator
cons :: a -> DList a -> DList a
cons x (DL xs) = DL $ (x :) . xs
dfoldr :: (a -> b -> b) -> b -> DList a -> b
dfoldr f z xs = foldr f z $ toList xs
\end{lstlisting}
尽管\acode{DList}令列表附加的成本下降,但并不是所有的类列表操作都是可用的。\acode{head}函数的成本对于普通列表是常数,而\acode{DList}需要将整个\acode{DList}
转换为正常列表才能进行,因此这使成本变得额外高昂。
\begin{lstlisting}[language=Haskell]
safeHead :: DList a -> Maybe a
safeHead xs = case toList xs of
(y : _) -> Just y
_ -> Nothing
\end{lstlisting}
对于\acode{map}而言,我们需要将\acode{DList}类型成为一个函子。
\begin{lstlisting}[language=Haskell]
dmap :: (a -> b) -> DList a -> DList b
dmap f = dfoldr go empty
where
go x = cons (f x)
instance Functor DList where
fmap = dmap
\end{lstlisting}
最终可在源文件头部添加模块头进行导出:
\begin{lstlisting}[language=Haskell]
module DList
( DList,
fromList,
toList,
empty,
append,
cons,
dfoldr,
)
where
\end{lstlisting}
\subsubsection*{列表,差异列表,以及幺半群}
在抽象代数中,存在一个名为\textit{幺半群 monoid}的简单抽象结构。很多数学对象就是幺半群,它们必须有两个属性:
\begin{itemize}
\item 一个二维操作符。名为\acode{(*)}:表达式\acode{a * (b * c)}的结果必须等同于\acode{(a * b) * c}的结果。
\item 一个 id 值。如果是\acode{e},它必须遵循两个规则:\acode{a * e == a}以及\acode{e * a == a}。
\end{itemize}
具体实现:
\begin{lstlisting}[language=Haskell]
instance Semigroup (DList a) where
(<>) = append
instance Monoid (DList a) where
mempty = empty
\end{lstlisting}
\textbf{注}:与原文不同,\acode{Monoid}现在需要\acode{Semigroup}作为约束。
\subsection*{通用序列}
Haskell 内建的列表类型和我们所构建的\acode{DList}类型在某些特定场景下的性能非常的差。而\acode{Data.Sequence}模块中定义的\acode{Seq}容器类型则是在大量的操作下
具有优异的性能。
\begin{lstlisting}[language=Haskell]
import Data.Sequence qualified as Seq
import Data.Sequence ((<|), (><), (|>))
import Data.Foldable qualified as Foldable
\end{lstlisting}
示例:
\begin{lstlisting}[language=Haskell]
ghci> :l DataSequence.hs
[1 of 2] Compiling Main ( DataSequence.hs, interpreted )
Ok, one module loaded.
ghci> Seq.empty
fromList []
ghci> Seq.singleton 1
fromList [1]
ghci> let a = Seq.fromList [1,2,3]
ghci> Seq.singleton 1 |> 2
fromList [1,2]
ghci> let left = Seq.fromList [1,3,3]
ghci> let right = Seq.fromList [7,1]
ghci> left >< right
fromList [1,3,3,7,1]
ghci> Foldable.toList (Seq.fromList [1,2,3])
[1,2,3]
ghci> Foldable.foldl' (+) 0 (Seq.fromList [1,2,3])
6
\end{lstlisting}
\end{document}