CodeWars - Three-pass compiler for arithmetic expressions

Question

I'm looking for feedback on my solution to the Tiny Three-Pass Compiler kata on CodeWars. The kata is to implement a compiler for an arithmetic language in three passes - parsing to an AST, constant folding, then generating code in a small assembly language.

I'm looking for general stylistic review, as well as reducing some of the duplication in the pass2 and pass3 functions; I have a feeling that the Add/Sub/Mul/Div cases could probably be condensed somehow.

Without further ado, the code:

module TinyThreePassCompiler where

import Data.Map.Strict (Map)
import qualified Data.Map.Strict as Map
import Data.Maybe (fromJust)
import Text.Parsec
import qualified Text.Parsec.Token as Tok

-- count number of arguments declared, how identifiers map to argument numbers
type CompilerState = (Int, Map String Int) 
type Parser a = Parsec String CompilerState a

langDef :: Tok.LanguageDef CompilerState
langDef = Tok.LanguageDef
  { Tok.commentStart    = ""
  , Tok.commentEnd      = ""
  , Tok.commentLine     = ""
  , Tok.nestedComments  = False
  , Tok.identStart      = letter
  , Tok.identLetter     = letter
  , Tok.opStart         = oneOf "+-*/"
  , Tok.opLetter        = oneOf "+-*/"
  , Tok.reservedNames   = []
  , Tok.reservedOpNames = []
  , Tok.caseSensitive   = True
  }

lexer :: Tok.TokenParser CompilerState
lexer = Tok.makeTokenParser langDef

parens :: Parser a -> Parser a
parens = Tok.parens lexer

brackets :: Parser a -> Parser a
brackets = Tok.brackets lexer

identifier :: Parser String
identifier = Tok.identifier lexer

reservedOp :: String -> Parser ()
reservedOp = Tok.reservedOp lexer

integer :: Parser Integer
integer = Tok.integer lexer

data AST = Imm Int
         | Arg Int
         | Add AST AST
         | Sub AST AST
         | Mul AST AST
         | Div AST AST
         deriving (Eq, Show)

function :: Parser AST
function = do
  brackets argList
  expression

argList :: Parser ()
argList = do
  many variableDec
  return ()

-- for each variable declaration, we add a mapping from the argument name to the argument number
-- as well as incrementing the Int in CompilerState, which tracks what argument number we're on
variableDec :: Parser ()
variableDec = do
  varName <- identifier
  (varNum, _) <- getState
  modifyState (\(num, args) -> (num + 1, Map.insert varName varNum args))
  return ()

expression :: Parser AST
expression = term `chainl1` addSubOp

addSubOp :: Parser (AST -> AST -> AST)
addSubOp =  (reservedOp "+" >> return Add)
        <|> (reservedOp "-" >> return Sub)

term :: Parser AST
term = factor `chainl1` multDivOp

multDivOp :: Parser (AST -> AST -> AST)
multDivOp =  (reservedOp "*" >> return Mul)
         <|> (reservedOp "/" >> return Div)

factor :: Parser AST
factor =  number
      <|> variableUse
      <|> parens expression

number :: Parser AST
number = do
  num <- integer
  return $ Imm $ fromIntegral num

-- using fromJust because we don't care about error handling
-- per problem, all programs will be valid
variableUse :: Parser AST
variableUse = do
  varName <- identifier
  (_, varMap) <- getState
  return $ Arg $ fromJust $ Map.lookup varName varMap

compile :: String -> [String]
compile = pass3 . pass2 . pass1

-- parsing pass
pass1 :: String -> AST
pass1 str = case (runParser function (0, Map.empty) "" str) of
  (Left _) -> error "Parse error"
  (Right ast) -> ast

-- constant folding pass
-- do a postorder traversal of the AST, checking for optimization opportunities at each node
pass2 :: AST -> AST
pass2 ast = case ast of
  Add left right -> case (pass2 left, pass2 right) of
    ((Imm m), (Imm n)) -> Imm (m + n)
    (l, r) -> Add l r
  Sub left right -> case (pass2 left, pass2 right) of
    ((Imm m), (Imm n)) -> Imm (m - n)
    (l, r) -> Sub l r
  Mul left right -> case (pass2 left, pass2 right) of
    ((Imm m), (Imm n)) -> Imm (m * n)
    (l, r) -> Mul l r
  Div left right -> case (pass2 left, pass2 right) of
    ((Imm m), (Imm n)) -> Imm (m `div` n)
    (l, r) -> Div l r
  _ -> ast

-- code generation pass
-- postorder traversal of the AST, generating code at each node
pass3 :: AST -> [String]
pass3 ast = case ast of
  Imm n -> ["IM " ++ show n, "PU"]
  Arg a -> ["AR " ++ show a, "PU"]
  Add l r -> pass3 l ++ pass3 r ++ ["PO", "SW", "PO", "AD", "PU"]
  Sub l r -> pass3 l ++ pass3 r ++ ["PO", "SW", "PO", "SU", "PU"]
  Mul l r -> pass3 l ++ pass3 r ++ ["PO", "SW", "PO", "MU", "PU"]
  Div l r -> pass3 l ++ pass3 r ++ ["PO", "SW", "PO", "DI", "PU"]

A few notes:

the AST data type and types for compile, pass1, pass2, and pass3 are set by the kata. I'm stuck with using Int as the underlying type for the Imm and Arg constructors.
Everything is one module for running on CodeWars. I'd probably split this into multiple modules if it weren't for that restriction.
Parsec is available on CodeWars; Megaparsec isn't, else I'd consider using that.
Per the kata's description, all test programs will be valid; thus, I don't put any effort into error handling, and I'm ok with variableUse throwing an error if an undeclared argument is used.

Zeta · Accepted Answer · 2018-03-01 07:51:25Z

Huh. I never got to solve that kata, so congratulations first.

Working with a given AST

As you said yourself, the AST type isn't optimal. We could use

data AST = Imm Int
         | Arg Int
         | BinOp BinOp AST AST
         deriving (Eq, Show)

data BinOp = Add | Mul | Sub | Div deriving (Eq, Show)

instead and simplify both pass2 and pass3 a lot. The operator parsers would return a BinOp instead of an AST -> AST -> AST. But we're stuck with the current AST.

Reducing code duplication (DRY)

If we have a look at pass2 we see that all cases are very similar. They all follow the

 Constructor left right -> case (pass2 left, pass2 right) of
                             (Imm m, Imm n) -> Imm (m <op> n)

pattern. We can just use that one, too:

pass2 :: AST -> AST
pass2 ast = case ast of
  Add left right -> simplify Add (+) left right
  Sub left right -> simplify Sub (-) left right
  Mul left right -> simplify Mul (*) left right
  Div left right -> simplify Div div left right
  _ -> ast
 where
  simplify c op l r = case (pass2 l, pass2 r) of 
     (Imm m, Imm n) -> Imm (m `op` n)
     _              -> c l r

This variant is easier to maintain, as we only have to change simplify or add the new constructor to the outer case, e.g.

pass2 :: AST -> AST
pass2 ast = case ast of
  Add left right -> simplify Add (+) left right
  Sub left right -> simplify Sub (-) left right
  Mul left right -> simplify Mul (*) left right
  Div left right -> simplify Div div left right
  Pow left right -> simplify Pow (^) left right
  _ -> ast
 where
  simplify c op l r = case (pass2 l, pass2 r) of 
     (Imm m, Imm n) -> Imm (m `op` n)
     _              -> c l r

It's similar for pass3:

pass3 :: AST -> [String]
pass3 ast = case ast of
  Imm n -> ["IM " ++ show n, "PU"]
  Arg a -> ["AR " ++ show a, "PU"]
  Add l r -> generate "AD" l r
  Sub l r -> generate "SU" l r
  Mul l r -> generate "MU" l r
  Div l r -> generate "DI" l r
 where
  generate oc l r = pass3 l ++ pass3 r ++ ["PO", "SW", "PO", oc, "PU"]

This keeps the repetition in our code a little bit down.

An intermediate representation

But we can do better if we introduce additional functions and types:

data BinOp = BAdd | BSub | BMul | BDiv deriving (Eq, Show)

data BinOpExpr = BinOpExpr BinOp AST AST

toBinOpExpr :: AST -> Maybe BinOpExpr
toBinOpExpr ast = case ast of
  Add l r -> gen BAdd l r
  Sub l r -> gen BSub l r
  Mul l r -> gen BMul l r
  Div l r -> gen BDiv l r
  _       -> Nothing
 where
  gen c l r = Just $ BinOpExpr c l r

If we now have toOpCode :: BinOp -> String, it's really easy to write pass3:

pass3 :: AST -> [String]
pass3 ast = case ast of
  Imm n -> ["IM " ++ show n, "PU"]
  Arg a -> ["AR " ++ show a, "PU"]
  o     -> case toBinOpExpr o of 
    Just (BinOpExpr op l r) -> pass3 l ++ pass3 r ++ ["PO", "SW", "PO", toOpCode op, "PU"]
    Nothing -> error "Unreachable"

Similar for pass2 if we have toFunc :: BinOp -> (Int -> Int -> Int):

pass2 :: AST -> AST
pass2 ast = case toBinOpExpr ast of
  Just (BinOpExpr op l r) -> simplify op l r ast
  _                       -> ast 
 where
  simplify op l r ast = case (pass2 l, pass2 r) of 
     (Imm m, Imm n) -> Imm (toFunc op m n)
     _              -> ast

Other remarks

Your Parser are well done, but we can get rid of some superfluous code. For example

argList :: Parser ()
argList = do
  many variableDec
  return ()

can be written as

argList = skipMany variableDec

skipMany still applies the parser (and therefore changes the state), whereas many applies the parser and collects a list of [()].

Similarly, we can get rid of return () in

variableDec :: Parser ()
variableDec = do
  varName <- identifier
  (varNum, _) <- getState
  modifyState (\(num, args) -> (num + 1, Map.insert varName varNum args))
  return ()

as modifyState f >> return () is the same as modifyState f due to the monad laws:

For number, we can use Parser's Functor instance to make it shorter:

number :: Parser AST
number = Imm . fromIntegral <$> integer

The other functions could get written in a point-free style, but that would be harder to read in my opinion.

Thanks! These are some interesting suggestions; I'll definitely keep them in mind for future work in this vein. — DylanSp
– DylanSp, Commented Feb 28, 2018 at 19:46
@DylanSp Finally got to comment the Parser section. Have fun on CW! — Zeta
– Zeta, Commented Mar 1, 2018 at 7:51

Gurkenglas · Accepted Answer · 2018-02-27 13:57:05Z

Control.Lens.Plated specializes in passes over recursive data structures.

pass2 = rewriteOf uniplate $ fmap Imm . \case
  Add (Imm left) (Imm right) -> Just $ left + right
  Sub (Imm left) (Imm right) -> Just $ left - right
  Mul (Imm left) (Imm right) -> Just $ left * right
  Div (Imm left) (Imm right) -> Just $ left / right
  _ -> Nothing

Control.Lens.Prism specializes in abstracting over constructors.

pass2 = rewriteOf uniplate $ \ast -> listToMaybe
  [ Imm $ f left right
  | (p, f) <- [(_Add, (+)), (_Sub, (-)), (_Mul, (*)), (_Div, (/))]
  , Just (Imm left, Imm right) <- pure $ preview p ast
  ]

Stack Exchange Network

CodeWars - Three-pass compiler for arithmetic expressions

2 Answers 2

Working with a given AST

Reducing code duplication (DRY)

An intermediate representation

Other remarks

You must log in to answer this question.

Hot Network Questions

CodeWars - Three-pass compiler for arithmetic expressions

2 Answers 2

Working with a given AST

Reducing code duplication (DRY)

An intermediate representation

Other remarks

You must log in to answer this question.

Related

Hot Network Questions