This is a post about week 2 of the CIS 194: Introduction to Haskell course by Brent Yorgey, from the Penn University of Pennsilvania.
This week is all about Algebraic Data Types (ADT), not to be confused with Abstract Data Types (also ADT) which are another topic.
Haskell has enumeration types (like Java, but still less verbose and more intuitive). An example:
data Food = Pizza
We have just declared a new (algebraic data) type called
Food, with three data constructors which are the only values of the type
We can now define new functions on the new data type using pattern matching:
isTempting:: Food -> Boolean
isTempting Salad = False
isTempting _ = True
But in Haskell enumeration types are only a special case of Algebraic Data Types. One common class of ADT is called sum type (a.k.a. tagged union). A simple example of ADT which is not an enumeration is this:
data OperationResult = OK
| Error Integer
Here we see that
Error is a data constructor that takes an argument of type
Integer. We can construct new
OperationResult values using the
Error data constructor:
success = OK
failure = Error 404
OK is a value of type
OperationResult (since it's a data constructor with zero arguments), but
Error by itself it's not. We have to pass an
Integer value to it to build an
OperationResult with it.
We've just introduced polymorphic data types. Specifically, we can have type signatures with variables just as we can have function implementations with variables. The difference here is that while in actual code variables are symbols bound to values, in type variables are bound to types of values. In other words, types are actually values in type signatures. We're reasoning on a higher and more abstract level. Take a moment to contemplate this fact.
Formally, in Haskell an ADT is a type with one or more data constructors, each one of them can have zero or more arguments.
A general example that shows how to build values is:
isSafeDiv:: Double -> Double -> OperationResult
isSafeDiv _ 0 = Error 1000
isSafeDiv _ _ = OK
We can also use pattern matching to make decisions based on the structure of the
OperationResult value and bind variables to the arguments:
isSuccessful:: OperationResult -> Boolean
isSuccessful OK = True
isSuccessful (Error n) = False
It's idiomatic in Haskell when you have an algebraic data type with a single data constructor, to name it like the data type itself. Example:
data Person = Person String String Int
This can be done since types and data constructors live in separate namespaces.
In general, pattern matching is a way to know what data constructor has been used to create a value of a certain ADT, and to take apart its arguments. Effectively, in Haskell this is the only way to make decisions.
pat ::= _
| var @ (pat)
| (Constructor pat1 pat2 ... patn)
_ is a wildcard.
- We can pattern match against literal values (for example:
- We can pattern match against a pattern, and still bind the entire value to a variable.
- We can pattern match against a data constructor (even recursively).
It's worth noting that types like
Int can be viewed like an ADT:
data Int = 0 | 1 | 2 | ...
Indeed, we can pattern match on its values. But, perhaps obviously, they are not implemented like that in the compiler.
A way (the only one actually) to do pattern matching is by using case expressions:
case exp of
pat1 -> exp1
pat2 -> exp2
For example, we could reimplement the
isSuccessful function from earlier using a case expression:
isSuccessful:: OperationResult -> Boolean
isSuccessful op = case op of
OK -> True
(Error n) -> False
However it's more elegant to use the first version. Indeed, the syntax for doing pattern matching in a function definition is just syntactic sugar on case expressions.
Recursive algebraic data types
It's interesting to note that ADTs can be recursive. For example, let's define a list of integers:
data IntList = Empty
| Cons Int IntList
This definition can be read as: "an
IntList is either an
Empty one or an
Int value followed by an
IntList". This kind of definition is quite clear and elegant (see Church encoding). For example:
-- [1,2,3] can be represented as an IntList:
l = Cons 1 (Cons 2 (Cons 3 Empty))
A recursive data ADT naturally leads to recursive functions. For example, to calculate the sum of all the values in an
calcSum:: IntList -> Int
calcSum Empty = 0
calcSum (Cons n ns) = n + calcSum ns
So, we've seen so far that type signatures can have variables, and can be recursive. Sounds like we could have a Turing-complete type system... indeed, we have one. Someone even implemented a LISP interpreter that completely runs on the Haskell type system!
That's all for this week. Remember: do the exercises!