👨‍🔬Alpha #7: HIR, High-level Intermediate Representation

Well, I have decided to go without continuations. 😅 And I’m finally done with the transition! after three weeks of struggling

Why? The following slide from the “Compiling with Continuations or without? Whatever.” presentation clicked for me.

It says that direct style IR is better suited for the early stages of compilation and CPS—for later ones. I then remembered that even in the “Compiling with Continuations” book, the CPS is constructed from an already-simplified version of ML, with types resolved and operations lowered.

And that’s precisely the reason why I started adding an IR!—I needed a place to expand syntax sugar, lower types and methods.

Swift Intermediate Language (SIL)

I also watched Swift Intermediate Language (SIL) presentation. I liked it. It’s similar to LLVM IR but is simpler and preserves Swift-specific information. This allows performing Swift-specific analyses on SIL itself. (Clang generates a side CFG that mirrors LLVM IR in many ways and implements its own analysis. As you might imagine, that’s a lot of duplication.)

I took a couple of lessons from SIL:

  • Rich type system. SIL preserves most of Swift type system, which allows performing more optimizations, resolving some dynamic methods at compile time, etc. I’ll need this to implement method specialization and static method dispatch.

  • Requiring every expression to be assigned to a variable. This provides a place to attach information: source location, types. SIL goes even further than LLVM IR here, requiring constants to be assigned to variables as well. (This removes the Value/Constant divide from LLVM IR.)

  • It’s more heterogeneous than paper IRs. IRs in papers are only concerned with expressions—the whole compilation unit is an expression. SIL (like LLVM IR) has a notion of a module that has a name, type declarations, global variables and functions, interfaces, and whatnot. I have previously tried to cram Alpha into the neat uniform world of expressions—it’s possible, but having a Module type and top-level declarations is easier.

  • Basic blocks with parameters. SIL also has cool basic blocks with parameters. Instead of phi nodes, a basic block can accept parameters, and then branch instruction becomes “jump with arguments.” This is easier to understand and reason about. (And basic blocks with parameters are continuations!) Though SIL’s basic blocks is a feature I won’t use in Alpha because I don’t use CFG (control flow graph) representation.


Another interesting project I found is Cranelift. It’s another compiler framework (like LLVM) but designed with JIT compilation in mind—it is biased towards faster compilation times but does less optimization. This is usually a good trade-off for a JIT’ted language.

Cranelift is written in Rust, which is a nice bonus given that Alpha is written in Rust as well. Cranelift IR is similar to LLVM IR, although simpler. (And it also uses basic blocks with parameters!)

It might be a good target to consider in the future. Either as a second target side-by-side with LLVM or as a replacement. But not now.

Current status

The new IR is called HIR (High-level Intermediate Representation). It is defined in src/hir/hir.rs.

The old compiler has been replaced with AST→HIR and HIR→LLVM IR translations. The code is much simpler now, and there is less duplication—I like it. You can check it at rasendubi/alpha#1 refactor: use HIR.

Everything works except built-in functions (print, multiplication, etc.)—that’s why CI is failing.

Previously, ExecutionSession manipulated Alpha objects to define datatypes and attach methods. This required quite a few actions for every built-in function added—and that’s why Alpha had very few of them (print, *, and type_of were the only built-in functions).

Now I’m thinking of adding Alpha syntax to reference Rust functions. This way, I can write more of the standard library in Alpha itself. I’ll focus on that this week.