Impure Thoughts

Functional programming, down and dirty

Thtatic Compilathionth (without a lisp)

On June 13, 2007 in Uncategorized

Impure Thoughts

Thtatic Compilathionth (without a lisp)

by Philippa Cowderoy for The Monad.Reader IssueTwo

Hi, and welcome to Impure Thoughts, an irregular column slavishly dedicated to writing dirty, perverted and sometimes outright impure code in Haskell. Often I’ll be discussing a slightly larger piece of code than makes sense to include entirely within the article - when this happens, as it has this time, the code (along with usage notes and the like) can be found at the bottom of the article. This article’s code sadly only works with GHC6.2.2 - hopefully an updated archive that works with newer versions will be posted at some point.

Today, we’re going to use Template Haskell to write ourselves a little compiler:

module Main where
import Compiler
$(compile FILENAME)

Looks like a joke, doesn’t it? But at the same time, there’s a lot of truth in the idea that a compiler’s just a function from source code to object code - all we’re doing here is calling that function and splicing that object code into the program using Template Haskell, at which point the Haskell compiler can turn it into machine code. So, what does compile look like?

module Compiler where
import Language.Haskell.THSyntax
import AST
import Parser
import Text.ParserCombinators.Parsec

compile :: FilePath -> Q [Dec]
compile f = do text <- qIO $ readFile f
               case (parse prog f text) of
                Left err -> do qIO $ print err
                               return []
                Right ast -> astToDecs ast

astToDecs :: AST -> Q [Dec]

Pretty simple, but there’s a slight shocker in here - we’re doing IO at compile time! That aside, we’re just reading the file in, parsing it and converting the resulting AST to a list of Haskell declarations. It’s not going to make much sense showing you the rest of the Compiler module without describing the language being compiled, mind. So:

module AST where

type AST = [Statement]

data Statement = Assign Identifier Expression |
                 Print Expression
               deriving Read

data Expression = IntLit Integer |
                  Var Identifier |
                  Add Expression Expression |
                  Sub Expression Expression |
                  Mul Expression Expression |
                  Div Expression Expression
                deriving Read

type Identifier = String

This is the abstract syntax tree type for our language - we see that programs are lists of statements, where a statement either assigns the result of an expression to an identifier or prints the result of an expression. Expressions are four-operator arithmetic on integers, with variables (the identifiers Assign statements assign to). The concrete syntax is also fairly simple, as this example ”(test.a in the code accompanying this document)” shows:

foo = 42; >1+2; >foo

In this program, foo is set to 42, then the result of 1+2 is printed followed by the value of foo. A full definition of the concrete syntax is given by the Parsec parser in Parser.hs.

So, having got this far we can finally translate an AST into a list of Haskell declarations:

astToDecs ast = [d|main = $(doE (map statementToStmt ast)

statementToStmt :: Statement -> Q Stmt
statementToStmt (Assign i e) = do ce <- [|return $(expressionToExpr e) |]
                                  return (BindS (VarP $ ideToStr i)
statementToStmt (Print e) = do pexp <- [|print $(expressionToExpr e) |]
                               return (NoBindS pexp)

expressionToExpr :: Expression -> Q Exp
expressionToExpr (IntLit i) = litE $ IntegerL i
expressionToExpr (Var i) = varE (ideToStr i)
expressionToExpr (Add l r) =
  [| $(expressionToExpr l) + $(expressionToExpr r) |]
expressionToExpr (Sub l r) =
  [| $(expressionToExpr l) - $(expressionToExpr r) |]
expressionToExpr (Mul l r) =
  [| $(expressionToExpr l) * $(expressionToExpr r) |]
expressionToExpr (Div l r) =
  [| $(expressionToExpr l) `div` $(expressionToExpr r) |]

– prefix all source identifiers so they don’t collide
– with any Haskell identifiers
ideToStr i = “foo” ++ i

The translated program consists of a single function (main, of course), which consists of a do statement. The source Statements are translated to commands in the IO monad, and Expressions are translated into simple Haskell arithmetic expressions. For example, the program in test.a becomes this:

main = do {foo <- return 42; print (1+2); print (foo)}

In fact, aside from the quasiquoting and splicing, expressionToExpr looks suspiciously like an ordinary evaluation function - the only difference is in making sure the evaluation occurs at run-time rather than compile-time. This suggests that a compiler looks much like a staged interpreter. Better yet, it means that we can make use of monad transformers to structure the semantics of the language being compiled - and thus languages far more complex than the toy being discussed here can be implemented with relative ease.


I feel a quick usage note is in order. First of all, GHC’s recompilation checking mechanism doesn’t appear to interact well with Template Haskell code that does IO - it can’t explore the consequences of IO done by TH code and thus assumes it has no effect, which means that changes to the program being compiled won’t be picked up. In the short run, turning off the recompilation checker will do the trick - but then the TH-based compiler gets recompiled in its entirety on each run. Perhaps the packaging mechanism can be used to fix this? Secondly, we need a way to pass command-line parameters into the TH program. I’ve cheated slightly - I use the FILENAME in the Main module as a macro for the C pre-processor. A typical GHC invocation looks like this:

ghc -no-recomp -fth -cpp -DFILENAME="""test.a""" --make compiler.hs

The number of quotes required is a little ugly - unfortunately, we need FILENAME to be replaced with `”test.a”` rather than `test.a`.