Write a Compiler

Upcoming Course Dates:

Course Hours:

Cost:

  • $1500

Registration (no payment required now):

Instructor: David Beazley

Other Courses | FAQ


Overview

Shatter your brain by writing a compiler for a new programming language! Why? Because it's cool and you'll learn a lot.

Compilers is often considered a capstone course for computer science majors. There is a good reason for that--compilers are tools that programmers use daily regardless of the application domain. Moreover, a compiler touches nearly every topic of computer science ranging from the theoretical to the practical. Learning more about how they work makes you a better programmer. The process of writing a compiler is an exercise in managing software complexity. Compilers have a lot of moving parts, involve unusual tools, are difficult to test, and are challenging to debug. You'll learn a lot by writing a compiler. Plus, you'll be able to brag about it later.

Target Audience

This class is for more experienced programmers who'd like to take on a good challenge. Not many programmers actually get the opportunity to write a compiler unless they happen to take such a course as a CS undergraduate or they enroll in graduate school. Thus, if you're self-taught and curious about what writing a compiler is all about, this is the course for you.


"If you've seen any of David Beazley's fun and mind blowing talks, you can begin to imagine his week-long, intensive courses, held in his own Chicago office -- or "lair", as he calls it. It's even better than you think: starting with breakfast every day, David is a gracious host, a daredevil coder, and an inspiring teacher. I took his Compiler Course and I hope to attend more of his unique, hands-on workshops."

Luciano Ramalho, author of Fluent Python

Instruction Format

This course is entirely project focused. The goal of the course is not just to learn how to write a compiler, but also how to approach the problem of writing a compiler. Part of the course involves group discussion about problem decomposition, coding techniques, design tradeoffs, testing, and other related topics. The rest of the time is spent working on individual coding.

Coding examples and the instructor solution are written in Python. However, the project involves no third-party libraries, esoteric Python-specific features, or Python-dependent tooling. As such, you're free to implement the project in any programming language that you wish. Writing a compiler can also be a challenging way to learn a new programming language should you be inclined to tackle the course on "hard mode."

Prerequisites

You might not think that you're ready to write a compiler, but if you've been coding for awhile and know the basics of data structures, it's something that you could probably tackle. No prior background in compilers is required although awareness of common programming language concepts (e.g., types, functions, classes, scoping rules, etc.) is strongly advised. Some knowledge of text manipulation, computer architecture (machine instructions, memory, etc.), and prior use of a compiled language is also recommended. Robert Nystrom's Crafting Interpreters book is also an excellent source of background information.

Syllabus

The course is structured around the goal of creating a small programming language called Wabbit. Wabbit is a small, statically typed, imperative language. You'll write a compiler that can take Wabbit code and compile it to a native executable program via LLVM. Recent versions of the course have also optionally targeted WebAssembly. The code produced by your compiler will have performance comparable to programs written in C.

The project involves the following core problems:

  1. Data model. You need to figure out some way to represent a computer program, not as text, but as a proper data structure. Sometimes this is called an Abstract Syntax Tree (AST), but it's not necessarily directly tied to parsing.

  2. Parsing. You need to parse programs by converting them from text to the data model. This involves tokenizing text and understanding grammars. We'll look a bit at how parsing algorithms work and you will write a recursive descent parser from scratch.

  3. Program Transformation. Compilers often perform steps that transform a program into an equivalent program. For example, many advanced programming language features can be implemented using simpler features that already exist. Similarly, many compiler optimization techniques are based on program transforms.

  4. Type checking. You'll write a simple static program analyzer that checks the source code for type errors and other semantic problems. Time permitting, we may also discuss a few advanced topics such as Algebraic Type Systems.
  5. Code generation. You'll have your compiler generate code for LLVM and/or WebAssembly. Once you have this, you'll have programs that execute at native speed comparable to C programs. Time permitting, we'll discuss other possible code generation targets such as byte-code interpreters and virtual machines.

  6. Native code generation. Tools such as LLVM still hide a lot of low-level details. We'll conclude by talking about some lower-level issues such as register allocation, activation frames, function calls, linkers, and other matters.

It's important to note that a major goal of the course is to build a stronger intuition for how all of the parts of a compiler actually work. Although there are a lot of existing frameworks and tools that can be used to assist in the creation of a "production grade" compiler, you will be creating a compiler from scratch from first principles.

Practical Takeaways

Although you are unlikely to write a compiler in your day-to-day work, this course touches on a wide variety of practical topics that are applicable elsewhere. These include:

Are You Nuts?

Writing a compiler in only 5 days? Is it even possible? To be sure, compilers is often regarded as one of the most difficult CS courses that one can take. If you take it at a University, you might get a professor who will drag you through the infamous Dragon Book, spend a lot of time doing mathematical proofs (e.g., deriving the LALR(1) parsing algorithm), and make the focus of the course on preparing graduate students for future research in programming languages. That can be a great class. I have taught that class. This is NOT that class.

Instead, this is a compilers course aimed at practioners. As such, the main focus is on coding and software development. Yes, you will learn about some important core concepts that underpin compilers. However, instead of doing mathematical proofs involving parsing theory, we'll focus on how you would actually go about implementing a parser and doing things such as writing unit tests for it.

To be sure, you will write a lot of code in this course. The course runs for more than 40 hours over five days. The final completed project consists of approximately 2000-3000 lines of code and is every bit as involved as the project you typically find in a college-level compilers course for computer science majors. Your final solution will probably have just as many bugs.

About the Instructor

This class is led by David Beazley. Although most known for his work in the Python community, Dave was formerly a tenure-track assistant professor in the Department of Computer Science at the University of Chicago where he taught a Compilers course along with a variety of other topics in systems and programming languages. In 2023 and 2024, he taught the Programming Language Design and Implementation course with Shriram Krishnamurthi at Brown University. Dave is also the creator of the PLY and SLY parsing tools for Python. He recently gave a PyCon talk about these tools. In a previous decade, he also created Swig, a C/C++ compiler for building scripting language extension modules. Mostly though, Dave just thinks that writing a compiler is cool.


Copyright (C) 2005-2024, David Beazley