On June 15, 2007 in Uncategorized
B&D not S&M
by Philippa Cowderoy for The Monad.Reader Issue Four
Welcome back to Impure Thoughts. Today I’m going to talk about bringing a little discipline to our coding courtesy of type systems.
Languages with strong type systems are often described as “bondage & discipline” languages - supposedly this is something to avoid, and which no true hacker would willingly use. B&D is bad, mmmkay? Except there’re a lot of less-vanilla folks out there who’ll look on in amusement and ask “what’s so bad?” (or “wmmf mmf bmf?” as the case may be). A good language and a good library writer can care for you, help you avoid making silly mistakes and give you the freedom to do things you couldn’t accomplish safely otherwise. Much like a good dom, in fact.
Of course, the type system doesn’t know everything about the things you don’t want to happen and the things you do - communication is key, you tell it your desires and it sets about making them come true. The fun part’s in finding new and exciting things to achieve together - so let’s look at some examples!
Complex Typing for Monads
Monads’re a true gift to the programming restraint artist - we get the ability to play with weird and wonderful semantics (or even just plain state), and at the same time whenever we do so it’s visible to the type system. But we can take it beyond merely typing a single monad with the help of monad transformers and type classes.
Let’s pick a reasonably concrete example. The Network module in the hierarchical libraries has a function called withSocketsDo - it takes an IO action as a parameter and executes it ‘between’ initialisation and shutdown of the native networking library (it’s not strictly necessary on *nix platforms, but portability is generally a good idea if you’ve no pressing reasons to avoid it). Actions that use the networking operations outside withSocketsDo could be considered to be in error - so, is there a way to avoid this at the type level?
One simple option would be to build a Network monad for the networking operations, with a function that lifts IO actions into it. Then withSocketsDo takes a Network action rather than an IO action. This is flawed though:
*Lifting is tedious, and makes good factoring of code difficult
*It doesn’t scale - if we have a second library that needs a similar kind of initialisation (an OpenGL binding, say), we end up with two separate incompatible monads
To attack the scalability issue, we can place the network operations into a Network typeclass and instead of a new monad we can provide a monad transformer we can apply to the IO monad to yield a new monad which would be an instance of the Network class (the transformer’d effectively be a renamed identity transformer). Then we can provide an instance of Network for other monad transformers which applies if the underlying monad belongs to Network - effectively lifting all the network operations for us automatically. The only hiccup here is ordinary IO - though fortunately there’s a typeclass in the hierarchical libraries called MonadIO which provides a suitable mechanism for lifting IO operations back.
Neat though this solution is, we’re still left lifting ordinary IO operations. If rewriting the Prelude and some of the standard libraries is acceptable, we could try putting these into typeclasses as well - we can even cover operations imported via the FFI with some clever instancing and appropriate lifting mechanisms, setting things up so that anything that supports IO supports the new operations defined by lifting from the plain IO monad all the way up to the monad in actual use.
Policing the State
Much as Haskell is a pure language, sooner or later most of us are guilty of using state for something. Involved uses tend to require a proper polymorphic heap and references in the vein of those provided by the ST monad and STRefs - and useful as they are, they leave us open to all the same mistakes we can make in an imperative language. In fact, sometimes the situation’s worse - we can’t take advantage of a stack to ensure mutable values are limited to a specific scope (though we can’t shoot ourselves in the foot by having attempted to ensure this and left ourselves a dangling pointer either, so it could be worse). So, what can we do?
Well, as it happens I’ve told a slight lie in stating the problem - we already can and do limit references by scope, as the ST monad itself demonstrates (see code below). Computations and references are all parameterised on a type variable, and the only way to run ST computations that can create references is via functions that cause that variable to be instantiated to a fresh variable - making it impossible for the computation to return one of the references as the reference’s type would contain a variable that’d gone out of scope.
data STRef s a newSTRef :: a -> ST s (STRef s a) readSTRef :: STRef s a -> ST s a writeSTRef :: STRef s a -> a -> ST s () modifySTRef :: STRef s a -> (a -> a) -> ST s () data ST s a runST :: (forall s . ST s a) -> a
There’s a notable deficiency with this compared to a proper stack discipline however: we only get the one scope. Sure, we can use runST inside an ST computation, but then we can’t access the outermost scope’s references. Now, I had a bash at a solution myself and spent a good couple of days trying to construct a useful relationship between scopes (or at least the variables associated with them) using multiparameter typeclasses before eventually realising the type inference system didn’t appear to be propagating type information in the directions I needed (ouch!) - the idea being that if I can show one scope’s inside another I can grant the usual access operations. Much to my relief, it turns out that Oleg Kiselyov had already written code that solves the problem! In Oleg’s code, each computation carries a type-level list of regions it can access.
- Oleg’s code can be found here: Oleg’s monadic regions code.
- He also recommended that I link to this message posted to the haskell mailing list: http://firstname.lastname@example.org/msg16167.html
- Finally, you might want a look at HList: http://homepages.cwi.nl/~ralf/HList/
Using Type Classes to Aid Refactoring
An oft-heard comment when discussing type systems with users of dynamically-typed languages is “I don’t care what type it is, I just want to know that it accepts the method”. This has lead to the notion of Duck Typing (”it walks like a duck, quacks like a duck - who cares if it’s an orangutang doing a good impression?” - this description may be somewhat unkind…).
The dynamic typing folks do have a point here - duck typing makes refactoring much, much easier because it doesn’t matter so much when changes to the types in a program happen. However, we have a couple of tricks up our own sleeve: the first is plain ol’ type inference, which will minimise the number of changes required to propagate a new type throughout a program. The second is type classes.
Ordinary single-parameter type classes tell us, short and sweet, that any type that is an instance of them supports a particular set of operations. Effectively they represent the things we reckon a duck does. And we can parameterise function parameters and results where previously we had just the one type, constraining the type variable to have an instance of the appropriate type class.
As a first effort, this ain’t bad - but there’s a problem in the form of parameterised datatypes such as lists. Sad as it is, currently we can’t have a list of things that belong to a typeclass. What we can do, using a common extension, is build a type with a single constructor that accepts anything with an instance of a typeclass, like so:
If we can track down everywhere we introduce a value belonging to the typeclass, we can replace it with a `BelongsToFoo` and either supply functions that unbox the value and call the typeclass operations or just supply an instance for `BelongsToFoo` (warning: if you’re not careful this may lead to you repeatedly boxing `BelongsToFoos` in layers of `BelongsToFoos` - you probably don’t want to do that).
Tracking down introductions is surprisingly easy to do, because whenever we introduce a value we end up using it - which means that the type system can tell us everywhere things don’t fit! Because the type system is covering our backs, we can push changes through a bit at a time and be sure the system still works, until at some point we’ll have the option of removing the typeclass entirely.
One potential problem does arise however: sometimes we use a type for multiple purposes, and this makes it a little harder to track down when we really mean to replace it. The all-time classic offender here has to be the String type - is it someone’s name? An address? A URI, a filepath, a document? This suggests that perhaps we should declare type synonyms for new usages when they first arise, so that we have less trouble if the type behind the name changes. As ever, the more information we give the type system the more it can help us.
What if it Goes Wrong?
Once in a while, we want to do something the type system refuses to let us. Most of the time, it’s for our own good - if we understood the consequences (often plain ol’ “undefined behaviour”), we’d do otherwise. Occasionally, we know better but just can’t find a way to convince it.
When people engage in power play, it’s common to agree on a “safeword” - something that can be said to indicate to the dom that something is genuinely up and can we please stop now before somebody’s harmed. Here the analogy turns out to be backwards, because by default a computer’ll cheerfully do anything we tell it to - we only get to take advantage of a type system because we request it by coding in Haskell in the first place. However, there’s very definitely a place for what I’ve begun to think of as “unsafewords”, operations that allow us to break the rules entirely if (and hopefully if) we know what we’re doing and we really really mean it.
Such operations are sometimes a bit controversial - certainly it’s a bit worrying if a language forces their use in order to get anything done. On the other hand, it’s rather difficult to implement standard library features without them - making them a common compiler implementation ‘hack’. On occasion they’re useful for implementing new monads and the like for similar reasons - you don’t break the language rules necessarily, just those the monad usually enforces.
Haskell 98 itself doesn’t have any unsafewords, whereas many Haskell implementations do. GHC’s take the form “unsafeSomething” - if you’re interested, read the documentation. But please make sure you know what you’re doing with them before using them in code anybody relies on, or receiving too hard a spanking will be the least of your worries!
There’s much more to try than I can cover in this article alone - while I’ve made use of them, I’ve barely discussed the capabilities offered by many of the extensions seen in GHC and Hugs. Hopefully I’ve touched on enough to raise some interest however. Personally I’d love to be able to apply some of these techniques in a slightly lower-level language, or at least a strict language with an IO monad offering low-level capabilities - I can’t help but wonder if we’re reaching a point where our type systems can help address the challenges inherent in real time programming.
For now I must sign off however. My thanks to Oleg - to everybody else, have fun and play safe!