Every programmer has some level of interest in knowing how to write a language. I’ve always wanted to tinker with this level of software development, but even after years of development experience you never really get exposed to this stuff because what’s already there generally just works. On top of that, it’s a somewhat complex process involving several steps with intimidating terminology.
But the best defense against not knowing how to do something that you want to do is always jumping in and doing it. So, with help from an excellent article by Ben Johnson about writing your own parser in go, I dived in. This article has easily digestable definitions and example of the different steps of the parsing and lexing process.
lexing
Lexing is the process of converting a steam of structured text, like the programming language
code you want to interpret, into a list of pre-defined symbols. So you could take a string like
"while"
and convert it into something your compiler / interpreter undestands as a while
symbol.
parsing
Parsing is taking a bunch of symbols generated by the lexer and turning them into a structure
that your compiler / interpreter will be able to work with. So it takes something like
[IF, CONDITION(true), START_BLOCK, PRINT, STRING(hello), END_BLOCK]
and changes it into
something more like:
if condition(true)
print 'hello'
choosing brainfuck
brainfuck is an esoteric programming language,
meaning no one uses it for any serious work. I chose to write an interpreter for it because
it has only eight commands, and it is very conceptually simple. Most compilers and
interpreters will parse lexed tokens into something called an “abstract syntax tree”. Since
brainfuck is so simple, I was able to skip this step and make a struct
with methods that
act on the struct
s state in place. Brainfuck also has lots of example code online that was
very useful for writing my tests (more on that later). Brainfuck is also a fairly minimal
language that is still Turing-complete, which is a desirable quality to have in a language
for which you are writing an interpreter.
Brainfuck is also really nice because it doesn’t have a context aware grammar, meaning that you don’t have to keep track of when you’re in string. There are no variables or functions, so you don’t need a symbol table. In all, it’s a simple featureless language perfect for this kind of task.
What I came up with probably wouldn’t get a great grade if it was what I handed in for what I assume might be the first assignment in a compilers course, but I learned a lot that was worthwhile, so I’ll discuss those findings here.
implementation
Implementation was actually a bit easier than I expected. With a bit of help writing the lexer
from the aforementioned blog post, I was able to create a Machine
struct with a series of
operations. Each of these operates on the state of the struct, which contains operations,
current operation, a sequence of bytes that can be manipulated, a reference to the current
byte being processed, input, and output:
type Machine struct {
Position int
Operation int
State []byte
Output []byte
Input *bytes.Reader
Operations []Token
}
The operation array is filled with a collection of tokens, and as the machine moves through the
operations, each token is mapped to a method that operates on the object. Most of these alter the
State
slice or position in some way, except for the control flow methods [
and ]
, which
read the current position and jump to the corresponding bracket if the value at the current position
is zero or non-zero respectively.
Because there is only one control flow operation in brainfuck, and with it being so simple, I saw this
as an opportunity to “get my feet wet” writing an interpreter that doesn’t use an abstract syntax tree.
It’s a barrier to learning about this technology that I didn’t have to hurdle to get this working. My
implementation just does the control flow manually in place on the Machine
struct by moving the
Position
value.
tests
The last time I was playing with brainfuck, I wrote a string to brainfuck “compiler” that gave me some simple test cases for outputting text. It’s also pretty easy to test the input case: just pre-populate your input, then output it, and assert equality.
I was able to find many “Hello world” examples for brainfuck. The first time I put one in it dropped my interpreter into an infinite loop. Resolving this was definitely that hardest part of this process. I ended up just mentally thinking through what would be wrong until I figured it out: it was an off by one error. Once that worked, some of my “hello world” tests almost worked! I was very excited. But they had an unexpected character with byte code 10 at the end. I was pulling my hair out. A quick check of my ascii table answered the question. Whoever coded these put a new line at the end of the output.
(Thinking back I remember seeing CHAR(10)
or similar in an old SQLServer database eons ago, but I must
have erased that experience from memory.)
After that I put in a program on esolangs.org listed as “often triggers interpreter bugs.” It worked on the first go and I was happy.
conclusion
It’s an interpreter no one will ever use for a language that no one ever uses, but I’m happy with it. I learned enough and had a pleasant experience, with only one ruined sleep. I think esolangs offer a great opportunity for writing interpreters because the languages generally don’t have many commands. Maybe I will try writing one for Piet next.
You can check out my brainfuck interpreter on github.