CIS 194, week 2: Algebraic Data Types

Updated on

Tags:


This is a post about week 2 of the CIS 194: Introduction to Haskell course by Brent Yorgey, from the Penn University of Pennsilvania.

This week is all about Algebraic Data Types (ADT), not to be confused with Abstract Data Types (also ADT) which are another topic.

Haskell has enumeration types (like Java, but still less verbose and more intuitive). An example:

data Food = Pizza
	      | Bacon
		  | Salad
	deriving Show

We have just declared a new (algebraic data) type called Food, with three data constructors which are the only values of the type Food.

We can now define new functions on the new data type using pattern matching:

isTempting:: Food -> Boolean
isTempting Salad = False
isTempting _ = True

But in Haskell enumeration types are only a special case of Algebraic Data Types. One common class of ADT is called sum type (a.k.a. tagged union). A simple example of ADT which is not an enumeration is this:

data OperationResult = OK
                     | Error Integer
	deriving Show

Here we see that Error is a data constructor that takes an argument of type Integer. We can construct new OperationResult values using the Error data constructor:

success:: OperationResult
success = OK

failure:: OperationResult
failure = Error 404

OK is a value of type OperationResult (since it's a data constructor with zero arguments), but Error by itself it's not. We have to pass an Integer value to it to build an OperationResult with it.

We've just introduced polymorphic data types. Specifically, we can have type signatures with variables just as we can have function implementations with variables. The difference here is that while in actual code variables are symbols bound to values, in type variables are bound to types of values. In other words, types are actually values in type signatures. We're reasoning on a higher and more abstract level. Take a moment to contemplate this fact.

Formally, in Haskell an ADT is a type with one or more data constructors, each one of them can have zero or more arguments.

A general example that shows how to build values is:

isSafeDiv:: Double -> Double -> OperationResult
isSafeDiv _ 0 = Error 1000
isSafeDiv _ _ = OK

We can also use pattern matching to make decisions based on the structure of the OperationResult value and bind variables to the arguments:

isSuccessful:: OperationResult -> Boolean
isSuccessful OK = True
isSuccessful (Error n) = False

It's idiomatic in Haskell when you have an algebraic data type with a single data constructor, to name it like the data type itself. Example:

data Person = Person String String Int
	deriving Show

This can be done since types and data constructors live in separate namespaces.

Pattern Matching

In general, pattern matching is a way to know what data constructor has been used to create a value of a certain ADT, and to take apart its arguments. Effectively, in Haskell this is the only way to make decisions.

pat ::= _
	| var
	| var @ (pat)
	| (Constructor pat1 pat2 ... patn)	
In order:
  • _ is a wildcard.
  • We can pattern match against literal values (for example: OK).
  • We can pattern match against a pattern, and still bind the entire value to a variable.
  • We can pattern match against a data constructor (even recursively).

It's worth noting that types like Int can be viewed like an ADT:

data Int = 0 | 1 | 2 | ...

Indeed, we can pattern match on its values. But, perhaps obviously, they are not implemented like that in the compiler.

Case expressions

A way (the only one actually) to do pattern matching is by using case expressions:

case exp of
	pat1 -> exp1
	pat2 -> exp2
	...

For example, we could reimplement the isSuccessful function from earlier using a case expression:

isSuccessful:: OperationResult -> Boolean
isSuccessful op = case op of
	                OK -> True
					(Error n) -> False

However it's more elegant to use the first version. Indeed, the syntax for doing pattern matching in a function definition is just syntactic sugar on case expressions.

Recursive algebraic data types

It's interesting to note that ADTs can be recursive. For example, let's define a list of integers:

data IntList = Empty
	         | Cons Int IntList

This definition can be read as: "an IntList is either an Empty one or an Int value followed by an IntList". This kind of definition is quite clear and elegant (see Church encoding). For example:

-- [1,2,3] can be represented as an IntList:
l:: IntList
l = Cons 1 (Cons 2 (Cons 3 Empty))

A recursive data ADT naturally leads to recursive functions. For example, to calculate the sum of all the values in an IntList:

calcSum:: IntList -> Int
calcSum Empty = 0
calcSum (Cons n l) = n + calcSum l

So, we've seen so far that type signatures can have variables, and can be recursive. Sounds like we could have a Turing-complete type system... indeed, we have one. Someone even implemented a LISP interpreter that completely runs on the Haskell type system!

That's all for this week. Remember: do the exercises!

Horizontal divider

Introduction to Haskell, week 1

Updated on

Tags:


A while ago I've finally started to study Haskell, in particular following the CIS 194: Introduction to Haskell course by Brent Yorgey from the Penn University of Pennsilvania. This is the first resource of the curriculum I plan to follow to learn Haskell (thanks a lot to Chris Allen for laying out a path to FP enlightenment :)

I've worked through the first 4 weeks now, but I decided at this point to switch gears and go back to recap what I've learned so far.

Without further ado, let's recap CIS 192: week 1.

What is Haskell?

Haskell (named after Haskell Curry, for his work on combinatory logic and for the Curry-Howard Correspondence), is a lazy, statically typed, pure functional programming language created in the 1980's by a committee of academics. It's very well alive today, as it's one of the most advanced (statically typed) languages out there.

Haskell is:

Functional

  • Functions are first-class values. That is, they can be passed to other functions or returned by them like any other values.
  • The computation model is based around evaluating expressions, not executing instructions. In other words, it's not based on the Von Neumann architecture (instructions that operate on a shared memory), but on the lambda calculus (you can think about it in terms of composing functions to transform streams of immutable data).

Pure

Every expression is referentially transparent. This means that:

  • Everything is immutable. Every "mutation" is modeled as a transformation, a function that doesn't change the original value but creates a new one.
  • There are no side effects (well, there are: modeled with monads to retain purity, but you don't need to know what a monad is to use them!).

As a consequence of the previous points, calling the same function with the same arguments results in the same output, always.

This approach has a number of very nice benefits, that once you wrap your head around this paradigm you won't give away too easily:

  • Equational reasoning: thinking about the code becomes much more easier. Refactoring becomes a breeze.
  • Parallelism: using multiple cores is much easier when you know that functions are guaranteed not to interfere with each another. There is no shared state!

In general, with static types and pure functions programs become much more easy to maintain, refactor, debug and reasoning about.

Lazy

In Haskell values are computed only when needed (call-by-need evaluation strategy).

Advantages:

  • It's easy to define new control structures just by defining a new function. Contrast this with languages like Clojure where you need a macros to achieve that, or languages like Java where it's basically impossible.
  • It's easy to work with infinite data structures, since values are only computed when needed. You can achieve the same in idiomatic Clojure by using the seq abstraction and lazy-seq, in Java 8 using Streams, etc.
  • It enables a compositional style (we will see it down the road with wholemeal programming, currying and point-free style).

Downside: it becomes harder to reason about the time and space characteristics of programs.

Themes

The course revolves around thee key areas:

  • Types
  • Abstractions
  • Wholemeal programing

Types

Static type systems can be annoying, and some of them really are. But this isn't because type systems are inherently annoying, that's because some of them are insufficiently expressive (for example, Java and C++ ones).

A type system (especially Haskell one):

  • Gives you clarity of thinking and helps you to design and reason about programs. Types become an organizing principle, a precise and powerful tool to think about and express abstractions. Using them, you are able to reason at a higher level, in a systematic way.
  • Is a form of documentation, always in sync with the actual code.
  • Turns a lot of runtime errors to compile time ones. Computers can do complex, repetitive and clearly specified things efficiently: why not delegate to them some of the burden that a human has to carry on while writing software?

Once you start to get the hang of it, a (good) type system becomes an invaluable ally. It feels liberating, since it can help you long before you write the first line of code: it helps you in the design of the system.

Abstraction

In some way, designing and maintaining software is a battle against repetition: you frequently need to take similar things and factor out their commonality (a process known as abstraction).

Haskell gives you a lot of abstraction power: parametric polymorphism, higher-order functions, type classes, etc. Its type systems is also a powerful, methodic and sound tool to think mathematically about them.

Wholemeal programming

Quoting Ralf Hinze:

“Functional languages excel at wholemeal programming, a term coined by Geraint Jones. Wholemeal programming means to think big: work with an entire list, rather than a sequence of elements; develop a solution space, rather than an individual solution; imagine a graph, rather than a single path. The wholemeal approach often offers new insights or provides new perspectives on a given problem. It is nicely complemented by the idea of projective programming: first solve a more general problem, then extract the interesting bits and pieces by transforming the general program into more specialised ones.”

In short, it's about working with abstractions rather than concrete instances of the problem/solution space, with group/types of things instead of with single instances.

Next, Brent Yorgey goes on showing the basic types (scalars and lists), how to define and combine functions, etc. Lots of good stuff.

Horizontal divider

Data types as graphical shapes

Updated on

Tags:


A very interesting idea that I've just found in a post by Aaron Spiegel: to explain basic FP constructs like map, filter and fold/reduce graphically (nothing new, I've already done that several times), but with an interesting twist: by representing types as shapes. That's a clever trick to better explain that HOFs (Higher-Order Functions).

However, visual languages rapidly become inadequate to express more high level concepts. For example you can sort-of encode algebraic data types using different colors for example. Then maybe you can express options using full or empty shapes. But, as you probably can see, there are limits in this medium and we can reach them pretty fast.

In some sense it's like understanding certain mathematical concepts using geometry (as a visual learner, that's how I really understood integrals), but one of the reasons mathematicians use textual notation is that it's much more concise and expressive.

Horizontal divider

Summary: *-Driven* do not change anything

Updated on

Tags:


This is a short summary of my takeaways of the article *-Driven* do not change anything by Michał Bartyzel.

  • Mental frameworks (like *-Driven*) need to be intepreted in the light of the appropriate context, and require experience to be applied to unknown ones.
  • These frameworks are usually formed over many years of experience by induction, and adapted to other contexts by deduction.
  • Instead of focusing on context/experience-dependent mental frameworks, invest in developing the foundamentals:
    • Responsibility
    • Encapsulation
    • Composition
Horizontal divider

Interesting links

Updated on

Tags:


Deliver Working Software Frequently

Before you can run, you need to learn how to walk. This is a good primer on agility. Focus first on delivery.

Until you can deliver, work on delivery. Work on nothing else until then. The rest will come in due time.

I reckon your message broker might be a bad idea.

Message brokers can be a bad idea if treated like "infallible gods", because they aren't. Think about three good design principles for realiable systems:

  • Fail fast
  • Process supervision
  • End-to-End principle

In the end, it isn’t so much about message brokers, but treating them as infallible gods. By using acknowledgements, back-pressure and other techniques you can move responsibility out of the brokers and into the edges. What was once a central point of failure becomes an effectively stateless server, no-longer hiding failures from your components. Your system works because it knows that it can fail.

PostgreSQL 9.4 - Looking Up (With JSONB and Logical Decoding)

PostgreSQL 9.4 is going to be exciting with JSONB support. As David Lesches says:

JSONB has been committed to Postgres 9.4, making Postgres the first RDBMS with rock-solid support for schemaless data

Horizontal divider

On "Enlightenment"

Updated on

Tags:


The more I think about it, the more I see it. They say that you should expose yourself to different kind of programming languages, and more specifically to different kind of paradigms. The key of that advice is to get exposure (and hopefully proficiency) with different ways of thinking. The more high-level thinking tools you have at your disposal, the more effective and efficient you are at identifying and solving problems.

You become able to really see the problem and build a general solution. You start to deconstruct what you know, the constraints and the problem at hand, and you become able to build general solutions by composing simple things. Systematically. When you start to think in terms of data transformations via map, filter, reduce, juxt, etc. you realize that you are operating on a higher level. You start to really see what the single resposibility principle is all about. And then, you start to see all the accidental complexity that used to slow you down and hide the essence of the solution behind reams of low level details (syntactical or otherwise). When your mind is not distracted by accidental complexity, you can step back and think clearly about what you have and what is really needed.

Unfortunately, this kind of "enlightenment" has some downsides:

  • You become painfully aware that you could do much better and much faster than when you develop with less advanced languages (for example: Java). Here starts the frustration.
  • Your "yet-to-reach-enlightenment" colleagues don't see the way you do. They don't understand (yet). This is the blub paradox at work.

Combine both factors, and you have a recipe for misery. It's depressing seeing friends (and yourself) wasting time with unnecessary complexity.

Square wheels

However, even if it requires work, practice and dedication to reach a master level, I think that it's worth it. I'm certainly not a master, and yet I'm seeing benefits of this learning journey even when I'm not writing LISP code. Eric Raymond was right:

LISP is worth learning for a different reason — the profound enlightenment experience you will have when you finally get it. That experience will make you a better programmer for the rest of your days, even if you never actually use LISP itself a lot.

So go ahead and learn some Clojure, Haskell and Factor (for example). You don't need to spend a lot of time to get a feel of what's out there. You will be a better programmer anyway, and I think this is a worthwhile goal.

Horizontal divider

On frameworks

Updated on

Tags:


Frameworks remove your ability to solve your specific problems from first principles. They opt you out of innovation, simplicity & elegance — Steve Purcell

Horizontal divider

How to programmatically manage a Light Table-aware nREPL server

Updated on

Tags:


Starting an nREPL server for a Clojure application is trivial if you use Leiningen:

lein repl

# or:

lein repl :headless

But what if you have an application deployed as a JAR (or WAR in an application server) and you want to have remote access to an nREPL server? Fortunately, it's easy to start one programmatically:

(require '[clojure.tools.nrepl.server :refer (start-server stop-server)])

; Starting an nREPL server is trivial:
(defonce nrepl-server (start-server :port 12345))

; So is stopping it:
(stop-server nrepl-server)

It works out of the box if you use Emacs with Cider to connect to it, but if you use Light Table, you need to add an additional nREPL middleware to support some functionality that LT needs to work properly. Again, this is easy if you use Leiningen: you just need to declare the dependency to the middleware and add some :repl-options:

(defproject some-awesome-thing "0.1.0-SNAPSHOT"
  :description "FIXME: write description"
  :dependencies [[org.clojure/clojure "1.5.1"]
                 [lein-light-nrepl "0.0.13"]] 
  :repl-options {:nrepl-middleware [lighttable.nrepl.handler/lighttable-ops]})

Using the lighttable-ops nREPL middleware programmatically is also quite easy, but you have to know how to compose nREPL middlewares. First, you need to add the dependency to the LT middleware in your project.clj as before:

(defproject some-awesome-thing "0.1.0-SNAPSHOT"
  :description "FIXME: write description"
  :dependencies [[org.clojure/clojure "1.5.1"]
                 [lein-light-nrepl "0.0.13"]])

Then you can add it to the middlewares chain in your code:

(require '[clojure.tools.nrepl.server :refer (start-server stop-server)])
(require '[lighttable.nrepl.handler :refer [lighttable-ops]])

; Starting a server with custom middlewares is trivial:
(defonce nrepl-server 
         (start-server :port 12345
                       :handler (default-handler #'lighttable-ops)))

; So is stopping it:
(stop-server nrepl-server)

Notice that we passed lighttable-ops to the default-handler as a var (in fact, #'lighttable-ops is a reader macro that expands to (var lighttable-ops)).

Now you can connect to your project remotely using Light Table and inspect, modify and control it live. Enjoy :)

Horizontal divider

Terrifying

Updated on

Tags:


This is one of the most terrifying technical books I've ever read (not an affiliated link):

Release It!

I haven't finished it yet, and I already think it's a foundamental read for every professional software developer. Along with, for example:

Horizontal divider

JDBC Lint

Tags:


Recently I'm working quite heavily with straight JDBC and I'm learning some things from best practices and some others the hard way. One tool that I've found useful is JDBC Lint: it's a tool that acts as a dynamic proxy for Connection objects (so that it can be used transparently) and reports errors at runtime on the standard error channel.

For a primer on some JDBC best practices, see this post.

Horizontal divider