DobadoBots: Implementing the Parser

Lately, I have been working on a video-game called DobadoBots. This game is about programming a robot’s artificial intelligence to solve mazes.

The robot is materialized by a white triangle. The goal is to reach the objective (orange square) while avoiding several obstacles.

Instead of using a standard embeddable script language such as LUA, I went the custom way and wrote my own language.

In a previous post, we detailed the AST specification and structure. In this post, we go through the implementation of the parser.

As a quick reminder, the language we are attempting to parse looks like this

IF laserDistance > 20
  moveForward
ELSE
  IF laserScan = objective
    moveForward
  ELSE
   IF laserScan = obstacle
       turnLeft
   ELSE faceObjectives

TL;DR, show me the code instead: here you go.

Parsec

As a new Haskell developer, I went through the various available parsing libraries. I finally decided to use the Parsec library. I have been amazed by it so far. This is probably the best parsing library I ever used.

The base concept of Parsec is to combine multiple Parser monads in order to create a monad that will either parse your AST, either return a detailed error message.

Parsec is providing basic Parser monads for the most common builtin datatypes (e.g. space, newline, character, integer, double, etc.). You can then build more useful parsers by assembling those basic ones using some combinators.

I find this design very elegant, however, after using this library a bit, you can very quickly mess things up. Therefore, I would recommend writing a lot of unit tests before implementing your parser.

Let’s Parse Dobadolang

There are several ways to use parsec. You can use the monadic approach and take advantage of the do notation or you can use the applicative approach and use the applicative style to combine the various parsers.

I decided to use the applicative approach for this project as it was a more compact notation.

We will build our compiler using the bottom up approach, let’s start by specifying the AST’s leaves: the action token.

Parsing Action Tokens

Each token of our language is defined by a specific string. Parsec provides a string parser which can parse a simple string case-sensitively. We can, for example, parse the move forward token like this:

string "moveForward"

Alright, but if we look at the type of string, we see we done something wrong.

Stream s m Char => String -> ParsecT s u m String

Yup, the actual returned value will be the string itself. We do not want that, we want to return the AST element instead.

We will use the <$ version of fmap. This inflix function basically discards the return value contained in the parsec parser to instead return the value specified at its left.

MoveForward <$ string "moveForward"

We use a similar technique to parse the other action tokens.

actionParser :: CharParser () ActionToken
actionParser = MoveForward     <$ string "moveForward"
          <|>  try (TurnLeft   <$ string "turnLeft")
          <|>  try (TurnRight  <$ string "turnRight")
          <|>  FaceObjective   <$ string "faceObjective"
          <|>  ChangeObjective <$ string "changeObjective"

There are two new tricks here:

  1. the try function prevents parsers from consuming too many inputs. You can check a detailed explanation of this function on hackage.
  2. the <|> combinator here materializes a choice. We use it to try parsing the various tokens.

Parsing AST’s Nodes

The whole DobadoBots language is built around the conditional structure. Naturally, we want to parse that structure.

conditionParser :: CharParser () Cond
conditionParser = Cond <$>
                    ((spaces *> string "IF") *> logicExprParser <* newline)
                  <*>
                    (spaces *> parseNested <* newline)
                  <*>
                    ((spaces *> string "ELSE") *> (newline *> (spaces *> parseNested)))
                  where
                    parseNested = Token <$> actionParser
                              <|> conditionParser

Writing this function was my “aha!” moment, this is when I started to fancy parsec. Here, we just had to combine the previously implemented logic expression and action token parsers.

This is basically why I love to use parsec with the applicative notation. The resulting code seems to be written using a DSL while keeping Haskell’s strong type checking. It is both highly readable and easy to reuse.

Now we have defined the various parsec Parser instances to generate DobadoBots AST, time to wrap-up the parser.

Wrapping up Everything

Once the Parser monad is defined, we just need to call the parse function.

parseScript :: Text -> Either ParseError Cond
parseScript script = parse scriptFile "Error while parsing script: " $ unpack script .

As shown in this function definition, the parse function returns either a parsing error, which contains both the error location and a short message describing what should be there in order to have a valid input.

Testing the Implementation

As I mentioned in the introduction, it is quite easy to screw things up when parsing. In order to mitigate that, it seems like a good idea to write some test cases up front.

I used the HSpec unit test task runner together with stack for running the tests. It is a good idea to inline the fixtures in the test file to avoid any filesystem side effect while writing purely functional tests. I used the embed file library to achieve that.

You can find the complete test case on the DobadoBots GitHub repository.

Each test case looks basically like this:

it "should parse simple cond =" $
  parseScript (decodeUtf8 $(embedFile "test/fixtures/script/simplecond.script")) `shouldBe` Right (Cond (CmpLogicInt $ Eq LaserDistance 1)(Token TurnLeft) (Token MoveForward))

We are inlining a fixture to check the parse result with an expected AST.

Another testing approach we could implement would be the round trip property check.

The goal of this test is to ensure that parse(print(parse(script))) == parse(script) is always true. Basically, it checks if your parser/pretty printer couple is correct. I did not figure out how to setup stack to both test the library using HSpec (unit tests) and Quickcheck (property-based tests). I will probably figure out how to do that and write a follow-up article in the near future.

I do not see anything more to add, as you can see, writing a parser using parsec is not that big of a deal. In the next article, we will cover up Dobadobot’s text editor.