👨‍🔬Alpha #8: Let bindings and recovery from HIR refactoring

This week Alpha got let bindings. I also fixed things broken after HIR refactoring: built-in functions, custom constructors, and one extra bug.

Let bindings

Alpha finally got let bindings—a long procrastinated feature. I showed the syntax in the first post but didn’t implement it because it’s too trivial. Now I did.

The syntax is as follows:

# global bindings
let x = 42

  # local bindings
  let world = "world"
  print("hello, ")

Note that let-bindings are constant and cannot be mutated, although they can be shadowed:

let x = 42
let x = x + 8
print(x) # 50

I’ll add mutation later with atoms.

Built-in functions

Alpha got syntax for annotations, although currently they can only be used for one purpose: to refer to built-in functions.

It looks like follows (this is the entire Alpha’s standard library 😅):

fn type_of(x: Any): DataType

fn *(x: f64, y: f64): f64
fn *(x: i64, y: i64): i64

fn print(x: Any): Void
fn print(x: DataType): Void
fn print(x: Void): Void
fn print(x: i64): Void
fn print(x: f64): Void
fn print(x: String): Void

fn println(x) = { print(x); print("\n") }

Custom constructors

I fixed regression of user-defined constructors.

How methods work (a quick recap):

  • A Method is a signature + implementation:

    pub struct Method {
        /// Values can be either types (type=DataType) or a Type{T} (type=Type).
        /// If parameter specifier is a type, any subtype is accepted.
        /// If parameter specifier is a Type{T}, only type value T is accepted.
        pub signature: *const SVec,
        // compiled instance of the method
        pub instance: GenericFn,
  • Methods are attached to DataType

    pub struct DataType {
        /// SVec<Method>
        pub methods: *const SVec,
  • Every function is compiled into a new DataType + an instance of that type. The method is attached to that DataType

    fn hello() = "hello"
    # compiles to:
    type hello_t = {}
    let hello = hello_t()
    add_method(hello_t, Method(
      # signature
      # instance
      (this: Any) => "hello"
  • Constructors are simply methods attached to DataType (because type_of(*any-type*) == DataType).

  • For method dispatch to find the appropriate method, constructors use a special Type{T} parameter specifier that only matches the value T.

  • Default constructors are defined automatically by type declarations.

    type Rect = { width, height }
    # Rect(Any, Any) is default constructor
    fn Rect(width: Any, height: Any) = Rect{ width, height }
    # compiles to:
        # signature
        [Type{Rect}, Any, Any],
        # instance
        (this: DataType, width: Any, height: Any) => Rect{ width, height }
  • Custom constructors are defined as follows:

    fn Rect(width) = Rect(width, width)

Previously, the compiler did not detect that the user’s Rect must be defined as a constructor and used Any as a parameter specifier. This led to the following bug:

type X = {x}
type Y = {y}

fn X() = X(0)
fn Y() = Y(0)

X() # this called Y()

This called Y(0) instead of X(0) because both X() and Y() compiled to the same [Any] signature and both attached to the DataType type (type_of(X) == type_of(Y) == DataType).

Now the compiler detects this and uses [Type{X}] and [Type{Y}] signatures.

An extra bug

I also found the following bug:

type Point = { x, y }

Point(2, 3).x # this worked

Point(2, 3).y # and this crashed

Accessing the non-first field of any type crashed the program. It was trying to read from protected memory, which is a sign of a GC-related bug.

I quickly pinpointed that the bug was caused by irgen incorrectly constructing the DataType for new types. DataType contains a list of pointers inside the structure so that GC can find and update them. For Point type above, the pointers should be [0, 8] (pointing to x and y fields), but irgen set them to [0, 16].

I have pinpointed this further to the following line in LLVM IR:

store i64 ptrtoint ({ %Any*, %Any* }* getelementptr ({ %Any*, %Any* }, { %Any*, %Any* }* null, i64 1) to i64), i64* %g80.in.pointers.1, align 4

This line was supposed to calculate the offset of the second field in the struct, but instead, it calculates the size of the struct. This happens because indexes for getelementptr start indexing into an array.

The following Rust statement was to blame:

let ptr_offset = llvm_ty
            self.context.int_type(64).const_int(i as u64, false),

I quickly added an extra index:

        self.context.int_type(64).const_int(0, false), // this line added
        self.context.int_type(64).const_int(i as u64, false),

…which should have produced the following LLVM IR:

store i64 ptrtoint (%Any** getelementptr ({ %Any*, %Any* }, { %Any*, %Any* }* null, i64 0, i64 1) to i64), i64* %g80.in.pointers.1, align 4

…aaand, it crashed with a segfault deep inside LLVM.

This was when I noticed that I was using a deprecated function LLVMConstGEP. I’d tried to update to LLVMConstGEP2 and I got a link error: LLVMConstGEP2 couldn’t be found.

After 20 minutes of playing with link flags and inspecting LLVM libraries, I found that LLVMConstGEP2 just isn’t implemented. Yep, the new function was declared, the old one deprecated, but the new one was never defined. Even after two years. (And this is one of the reasons I prefer any modern language to C++.)

The deprecated function was a dead end. I’ve reverted it back to LLVMConstGEP, and while browsing GEP’s FAQ page (getelementptr is the only instruction that has its own FAQ page), I noticed “Why do struct member indices always use i32?”

That was it! you can only index into structures with i32 integers, and I was using i64, which LLVM gladly pointed out by segfaulting into my face 😋

I changed 64 to 32 and it worked:

self.context.int_type(32).const_int(i as u64, false),

Next steps

I want to take a couple of weeks off from the project to rest, play with my other projects, and re-evaluate my focus. I’ll post the update in three to four weeks. See ya 😊