Lambda Calculus Interpreter and Benefits of Ugliness

For the past week, I’ve been working on a lambda calculus interpreter (Github) in order to try to better understand the concepts in lambda calculus. It’s been a great experience, especially since I’ve been coding it in Python. Normally, if I were writing the interpreter in a language like SML or Haskell, I would rely heavily on pattern-matching.

Using pattern-matching is a pretty natural fit for creating an interpreter, given the operations involved in lexing, parsing, and reduction/evaluation — specifically, handling a particular encountered block depending on the token or keyword/structure encountered. An example of this idea implemented in a Haskell lexer is below:

lexer :: String -> [String]
lexer [] = []
lexer (x:xs) =
    case x of
      '(' -> "(" : lexer xs
      ')' -> ")" : lexer xs
      '.' -> "." : lexer xs
      ' ' -> lexer xs
      _   -> let chunkWord [] = []
                 chunkWord (y:ys) =
                     case y of
                       ' ' -> []
                       ')' -> []
                       '.' -> []
                       _   -> y : chunkWord ys
                 word = chunkWord (x:xs)
             in word : (lexer $ drop (length word) (x:xs))

The lexer takes a string and iterates through it character by character. Each character encountered has specified a certain action (even if the actions are simple in this case). A '(' character produces the action of creating a cons list of the string "(" and the result of lexer on the rest of the string. An analogous action happens for ')', '.', and ' '. For anything else (which has to then be a keyword or variable in lambda calculus), chunk it together as a string and cons it onto the rest of the the string (sans chunked string). It’s not bad, though I imagine I’d probably be able to write this in an even more concise way if I knew Haskell better.

If I were to translate this code literally to Python, I would enter elif hell, since Python has neither pattern matching or even case expressions. Although beauty is in the eye of the beholder, a massive block of elifs is not what I consider beautiful code. For such a small codeblock, it isn’t too bad, but even so, it looks verbose when in Python.

class Lexer(object):
    @classmethod
    def tokenize(cls, string):
        result = []
        first_letter = True

        for char in string:
            if char == '(':
                result.append(char)
                first_letter = True
            elif char == ')':
                result.append(char)
                first_letter = True
            elif char == '.':
                result.append(char)
            elif char == ' ':
                first_letter = True
            else:
                if first_letter == True:
                    result.append(char)
                    first_letter = False
                else:
                    result[-1] = result[-1] + char

        return result

Again, it’s not that bad, given how short the set of cases there are here (I’ll have to actually find a longer interpreter to give a properly horrifying example). In fact, taking into account the chunkWord gymnastics I had to do in Haskell to satisfy the type checker (though that might just be due to my ignorance in Haskell), it might even look nicer to some (… aside from the first_letter gymnastics required here because you can’t easily skip ahead characters in a Python for loop… a topic for another post). However, even here the specific case pattern looks annoyingly verbose and repetitive, at least to my eyes. elifs takes up two lines at minimum and starts to blur together with many, many elif statements. Alternatively, elif blocks start turning into spaghetti code if you opt for using a dictionary to handle each case, with the instructions for each case separated from conditional statements themselves, especially if you have to define the operations as functions separate from the dictionary.

However, this type of ugliness does comes with an advantage when trying a learn a concept. The Haskell version actually has a lot of code repetition which is made tolerable and even pleasant through pattern-matching. Being forced to think more cleverly about how to express one’s ideas in a succinct way to counter that ugliness can give one deeper insight into what’s actually going on. For example, the below code more explicitly acknowledges the code repetition and handles the for skipping-ahead challenge in a cleaner way (to my eyes), by accumulating characters when the current character is not one of the special delimiters and clearing it when it reaches one of those delimiters.

class Lexer(object):
    @classmethod
    def tokenize(cls, string):
        result = []
        temp = ''

        for char in string:
            if char in ['(', ')', '.', ' ']:
                if temp != '':
                    result.append(temp)
                    temp = ''
                if char != ' ':
                    result.append(char)
            else:
                temp = temp + char

        if temp != '':
            result.append(temp)

        return result

Even though this can probably also be improved upon, being forced to think harder and attack the problem with multiple paradigms was helpful. The fact that Python is not great at handling multiple cases in a neat, clean way was actually beneficial for my understanding in this case. Given this experience, I think it might be helpful in the future to try to solve hard programming problems in languages that really don’t have good support for paradigms that these problems are most naturally expressed in.

James's Blog

Cookery, Hackery, and Hopefully Not Quackery

Lambda Calculus Interpreter and Benefits of Ugliness

Comments