Brainfuck Interpreter - Ben Wendt's blog

Every programmer has some level of interest in knowing how to write a language. I’ve always wanted to tinker with this level of software development, but even after years of development experience you never really get exposed to this stuff because what’s already there generally just works. On top of that, it’s a somewhat complex process involving several steps with intimidating terminology.

But the best defense against not knowing how to do something that you want to do is always jumping in and doing it. So, with help from an excellent article by Ben Johnson about writing your own parser in go, I dived in. This article has easily digestable definitions and example of the different steps of the parsing and lexing process.

lexing

Lexing is the process of converting a steam of structured text, like the programming language code you want to interpret, into a list of pre-defined symbols. So you could take a string like "while" and convert it into something your compiler / interpreter undestands as a while symbol.

parsing

Parsing is taking a bunch of symbols generated by the lexer and turning them into a structure that your compiler / interpreter will be able to work with. So it takes something like [IF, CONDITION(true), START_BLOCK, PRINT, STRING(hello), END_BLOCK] and changes it into something more like:

if condition(true)
  print 'hello'

choosing brainfuck

brainfuck is an esoteric programming language, meaning no one uses it for any serious work. I chose to write an interpreter for it because it has only eight commands, and it is very conceptually simple. Most compilers and interpreters will parse lexed tokens into something called an “abstract syntax tree”. Since brainfuck is so simple, I was able to skip this step and make a struct with methods that act on the structs state in place. Brainfuck also has lots of example code online that was very useful for writing my tests (more on that later). Brainfuck is also a fairly minimal language that is still Turing-complete, which is a desirable quality to have in a language for which you are writing an interpreter.

Brainfuck is also really nice because it doesn’t have a context aware grammar, meaning that you don’t have to keep track of when you’re in string. There are no variables or functions, so you don’t need a symbol table. In all, it’s a simple featureless language perfect for this kind of task.

What I came up with probably wouldn’t get a great grade if it was what I handed in for what I assume might be the first assignment in a compilers course, but I learned a lot that was worthwhile, so I’ll discuss those findings here.

implementation

Implementation was actually a bit easier than I expected. With a bit of help writing the lexer from the aforementioned blog post, I was able to create a Machine struct with a series of operations. Each of these operates on the state of the struct, which contains operations, current operation, a sequence of bytes that can be manipulated, a reference to the current byte being processed, input, and output:

type Machine struct {
    Position int
    Operation int
    State []byte
    Output []byte
    Input *bytes.Reader
    Operations []Token
}

The operation array is filled with a collection of tokens, and as the machine moves through the operations, each token is mapped to a method that operates on the object. Most of these alter the State slice or position in some way, except for the control flow methods [ and ], which read the current position and jump to the corresponding bracket if the value at the current position is zero or non-zero respectively.

Because there is only one control flow operation in brainfuck, and with it being so simple, I saw this as an opportunity to “get my feet wet” writing an interpreter that doesn’t use an abstract syntax tree. It’s a barrier to learning about this technology that I didn’t have to hurdle to get this working. My implementation just does the control flow manually in place on the Machine struct by moving the Position value.

tests

The last time I was playing with brainfuck, I wrote a string to brainfuck “compiler” that gave me some simple test cases for outputting text. It’s also pretty easy to test the input case: just pre-populate your input, then output it, and assert equality.

I was able to find many “Hello world” examples for brainfuck. The first time I put one in it dropped my interpreter into an infinite loop. Resolving this was definitely that hardest part of this process. I ended up just mentally thinking through what would be wrong until I figured it out: it was an off by one error. Once that worked, some of my “hello world” tests almost worked! I was very excited. But they had an unexpected character with byte code 10 at the end. I was pulling my hair out. A quick check of my ascii table answered the question. Whoever coded these put a new line at the end of the output.

(Thinking back I remember seeing CHAR(10) or similar in an old SQLServer database eons ago, but I must have erased that experience from memory.)

After that I put in a program on esolangs.org listed as “often triggers interpreter bugs.” It worked on the first go and I was happy.

conclusion

It’s an interpreter no one will ever use for a language that no one ever uses, but I’m happy with it. I learned enough and had a pleasant experience, with only one ruined sleep. I think esolangs offer a great opportunity for writing interpreters because the languages generally don’t have many commands. Maybe I will try writing one for Piet next.

You can check out my brainfuck interpreter on github.