When I started learning about functional programming after years of writing object oriented code, one difference that stood out for me is how, unlike many functional languages, object oriented languages have this built-in notion of identity for objects separate from object equality.
Let's compare these two (contrived) examples for defining persons. First in Python and then in Haskell.
class Person:
def __init__(self, name, children):
self.name = name
self.children = children
bob = Person("Bob", [])
data Person = Person
{ name :: String
, children :: Person
}
bob = Person "Bob" []
Now in Python, if we were to define another person that is also called "Bob", we could determine that we are dealing with a different Bob using the is
operator.
bob = Person("Bob", [])
bob2 = Person("Bob", [])
assert bob is not bob2
In Haskell, though, if we define
bob = Person "Bob" []
bob2 = Person "Bob" []
we don't have any built in way to determine whether bob
is the same person as bob2
and actually, since the data is immutable, some compiler might, in theory, very well use common subexpression elimination to optimize the above to:
bob = Person "Bob" []
bob2 = bob
Of course, when dealing with immutable data it shouldn't matter at all whether two variables actually point to the exact same data in memory or not, but there are some cases where we want to know whether two data structures actually share data or if they merely have an equal value. For example
mary = Person "Mary" []
bob = Person "Bob" [mary]
alice = Person "Alice" [mary]
The way we current have defined Person
we have no way of knowing wether Mary is the daughter of both Bob and Alice or whether both just happen to have a child with the same name. In Haskell, if we care about identity, we have to provide it manually.
import Data.Unique
import Data.Function (on)
data UniquePerson = UniquePerson
{ name :: String
, children :: [UniquePerson]
, identity :: Unique
}
is :: UniquePerson -> UniquePerson -> Bool
is = (==) `on` identity
However, this comes with the limitation that we can only "instantiate" new persons in the IO monad, which makes sense as a unique identity is actually an observable side-effect.
import Control.Applicative ((<$>))
main = do
mary <- UniquePerson "Mary" [] <$> newUnique
bob <- UniquePerson "Bob" [mary] <$> newUnique
alice <- UniquePerson "Alice" [mary] <$> newUnique
print $ (children bob !! 0) `is` (children alice !! 0)
There's one problem with having to initialize person data in the IO monad, namely that we are now dependent on the order of initialization. This will cause us trouble if change the data type from a parent-child hierarchy to something that can have circular links.
data UniquePerson = UniquePerson
{ name :: String
, friends :: [UniquePerson]
, identity :: Unique
}
main = do
bob <- UniquePerson "Bob" [] <$> newUnique
fred <- UniquePerson "Fred" [bob] <$> newUnique
Now Fred can be a friend of Bob or Bob can be the friend of Fred, but there's no immediately obvious way to model the relationship both ways. We could change the data type to only refer to the friend using his ID, as in:
data UniquePerson = UniquePerson
{ name :: String
, friends :: [Unique]
, identity :: Unique
}
main = do
bobId <- newUnique
fredId <- newUnique
bob = UniquePerson "Bob" [fredId] bobId
fred = UniquePerson "Fred" [bobId] fredId
But then we'd have to keep a lookup map of ID->Person mappings and dereference the id's through that whenever we want to operate on the data and where's the fun in that?
It's worth noting, that for non-unique persons we don't have the same problem.
data Person = Person
{ name :: String
, friends :: [Person]
}
bob = Person "Bob" [fred]
fred = Person "Fred" [bob]
As pure expressions aren't necessary evaluated in order, the order of declarations doesn't matter and we can refer to a variable above its actual declaration.
Wouldn't it be nice if we could somehow use the pure data type to declare relationships between the persons, but still attach each person with a unique identity? This is where the data-reify
package comes in handy.
data-reify
is based on the research paper Type-Safe Observable Sharing in Haskell and it uses GHC-specific tricks to identify shared nodes in a data structure, essentially giving us the uniqueness property for free.
To use data-reify
you need to have a recursive data type (such as our Person
type) and a mirror type where the points of recursion are replaced by abstract references.
data Person = Person String [Person]
data Person' ref = Person' String [ref]
Now we can derive a MuRef
instance that maps from Person
to Person'
.
{-# LANGUAGE TypeFamilies #-}
import Data.Reify
import Data.Traversable (traverse)
instance MuRef Person where
type DeRef Person = Person'
mapDeRef f (Person name friends) =
Person' name <$> traverse f friends
The type of mapDeRef
is a bit cryptic, but the gist of it is that it gets as a parameter a function f
and a value of the original data type (Person
) and the implementation has to return a value of the mirror type (Person'
) where each recursive field is mapped through f
.
Now that we have a MuRef
instance, we can use the reifyGraph
function to build a list of (ID, Person')
pairs and then initialize a UniquePerson
based on that.
makeUnique :: Person -> IO UniquePerson
makeUnique p = do
Graph nodes rootId <- reifyGraph p
let deref identity = UniquePerson name uniqueFriends identity where
uniqueFriends = map deref friends
Person' name friends = fromJust $ lookup identity nodes
return $ deref rootId
Note that this uses the Unique
type from Data.Reify.Graph
which is different from Data.Unique
, but with small changes to makeUnique
we can, again, use Data.Unique.Unique
to ensure the uniqueness of the identities throughout our program.
makeUnique :: Person -> IO UniquePerson
makeUnique p = do
Graph nodes rootId <- reifyGraph p
let makeIdentity (uid,_) = do
uniq <- newUnique
return (uid, uniq)
identities <- mapM makeIdentity nodes
let deref uid = UniquePerson name uniqueFriends identity where
uniqueFriends = map deref friends
identity = fromJust $ lookup uid identities
Person' name friends = fromJust $ lookup uid nodes
return $ deref rootId
Now we can test that the identities are mapped correctly:
main = do
let bob = Person "Bob" [fred, alice]
fred = Person "Fred" [bob]
alice = Person "Alice" [fred]
bob' <- makeUnique bob
let [fred', alice'] = friends bob'
-- Test that Bob and Alice know the exact same Fred
print $ fred' `is` (friends alice' !! 0)
Finally, it's worth noting that reifyGraph
doesn't in any way cache values from previous calls, so calling makeUnique bob
and makeUnique fred
will produce two separate UniquePerson
graphs with no shared identities.
Sami Hangaslammi <[sami.hangaslammi@leonidasoy.fi](mailto://sami.hangaslammi@leonidasoy.fi)>
Leonidas Oy <http://leonidasoy.fi>