👨🔬Alpha #8: Let bindings and recovery from HIR refactoring
This week Alpha got let bindings. I also fixed things broken after HIR refactoring: built-in functions, custom constructors, and one extra bug.
Let bindings
Alpha finally got let
bindings—a long procrastinated feature. I showed the syntax in the first post but didn’t implement it because it’s too trivial. Now I did.
The syntax is as follows:
# global bindings
let x = 42
print(x)
{
# local bindings
let world = "world"
print("hello, ")
print(world)
print("!\n")
}
Note that let-bindings are constant and cannot be mutated, although they can be shadowed:
let x = 42
let x = x + 8
print(x) # 50
I’ll add mutation later with atoms.
Built-in functions
Alpha got syntax for annotations, although currently they can only be used for one purpose: to refer to built-in functions.
It looks like follows (this is the entire Alpha’s standard library 😅):
@intrinsic("type_of")
fn type_of(x: Any): DataType
@intrinsic("f64_mul")
fn *(x: f64, y: f64): f64
@intrinsic("i64_mul")
fn *(x: i64, y: i64): i64
@intrinsic("print_any")
fn print(x: Any): Void
@intrinsic("print_datatype")
fn print(x: DataType): Void
@intrinsic("print_void")
fn print(x: Void): Void
@intrinsic("print_i64")
fn print(x: i64): Void
@intrinsic("print_f64")
fn print(x: f64): Void
@intrinsic("print_string")
fn print(x: String): Void
fn println(x) = { print(x); print("\n") }
Custom constructors
I fixed regression of user-defined constructors.
How methods work (a quick recap):
A Method is a signature + implementation:
#[repr(C)] pub struct Method { /// Values can be either types (type=DataType) or a Type{T} (type=Type). /// /// If parameter specifier is a type, any subtype is accepted. /// /// If parameter specifier is a Type{T}, only type value T is accepted. pub signature: *const SVec, // compiled instance of the method pub instance: GenericFn, }
Methods are attached to
DataType
#[repr(C)] pub struct DataType { ... /// SVec<Method> pub methods: *const SVec, ... }
Every function is compiled into a new
DataType
+ an instance of that type. The method is attached to that DataTypefn hello() = "hello" # compiles to: type hello_t = {} let hello = hello_t() add_method(hello_t, Method( # signature [Any], # instance (this: Any) => "hello" ));
Constructors are simply methods attached to
DataType
(becausetype_of(*any-type*) == DataType
).For method dispatch to find the appropriate method, constructors use a special
Type{T}
parameter specifier that only matches the value T.Default constructors are defined automatically by
type
declarations.type Rect = { width, height } # Rect(Any, Any) is default constructor fn Rect(width: Any, height: Any) = Rect{ width, height } # compiles to: add_method( DataType, Method( # signature [Type{Rect}, Any, Any], # instance (this: DataType, width: Any, height: Any) => Rect{ width, height } ) )
Custom constructors are defined as follows:
fn Rect(width) = Rect(width, width)
Previously, the compiler did not detect that the user’s Rect
must be defined as a constructor and used Any
as a parameter specifier. This led to the following bug:
type X = {x}
type Y = {y}
fn X() = X(0)
fn Y() = Y(0)
X() # this called Y()
This called Y(0)
instead of X(0)
because both X()
and Y()
compiled to the same [Any]
signature and both attached to the DataType type (type_of(X) == type_of(Y) == DataType
).
Now the compiler detects this and uses [Type{X}]
and [Type{Y}]
signatures.
An extra bug
I also found the following bug:
type Point = { x, y }
Point(2, 3).x # this worked
Point(2, 3).y # and this crashed
Accessing the non-first field of any type crashed the program. It was trying to read from protected memory, which is a sign of a GC-related bug.
I quickly pinpointed that the bug was caused by irgen incorrectly constructing the DataType
for new types. DataType
contains a list of pointers inside the structure so that GC can find and update them. For Point
type above, the pointers should be [0, 8]
(pointing to x
and y
fields), but irgen set them to [0, 16]
.
I have pinpointed this further to the following line in LLVM IR:
store i64 ptrtoint ({ %Any*, %Any* }* getelementptr ({ %Any*, %Any* }, { %Any*, %Any* }* null, i64 1) to i64), i64* %g80.in.pointers.1, align 4
This line was supposed to calculate the offset of the second field in the struct, but instead, it calculates the size of the struct. This happens because indexes for getelementptr
start indexing into an array.
The following Rust statement was to blame:
let ptr_offset = llvm_ty
.pointer_type(AddressSpace::Generic)
.const_null()
.const_gep(
llvm_ty,
&[
self.context.int_type(64).const_int(i as u64, false),
],
)
.const_ptr_to_int(self.context.int_type(64));
I quickly added an extra index:
.const_gep(
llvm_ty,
&[
self.context.int_type(64).const_int(0, false), // this line added
self.context.int_type(64).const_int(i as u64, false),
],
)
…which should have produced the following LLVM IR:
store i64 ptrtoint (%Any** getelementptr ({ %Any*, %Any* }, { %Any*, %Any* }* null, i64 0, i64 1) to i64), i64* %g80.in.pointers.1, align 4
…aaand, it crashed with a segfault deep inside LLVM.
This was when I noticed that I was using a deprecated function LLVMConstGEP
. I’d tried to update to LLVMConstGEP2
and I got a link error: LLVMConstGEP2
couldn’t be found.
After 20 minutes of playing with link flags and inspecting LLVM libraries, I found that LLVMConstGEP2 just isn’t implemented. Yep, the new function was declared, the old one deprecated, but the new one was never defined. Even after two years. (And this is one of the reasons I prefer any modern language to C++.)
The deprecated function was a dead end. I’ve reverted it back to LLVMConstGEP
, and while browsing GEP’s FAQ page (getelementptr is the only instruction that has its own FAQ page), I noticed “Why do struct member indices always use i32
?”
That was it! you can only index into structures with i32
integers, and I was using i64
, which LLVM gladly pointed out by segfaulting into my face 😋
I changed 64 to 32 and it worked:
self.context.int_type(32).const_int(i as u64, false),
Next steps
I want to take a couple of weeks off from the project to rest, play with my other projects, and re-evaluate my focus. I’ll post the update in three to four weeks. See ya 😊