👨‍🔬Alpha #2: multi-methods, type hierarchy, and dot desugaring

November 14, 2021

This week was a productive one—Alpha got type hierarchy, multi-methods, and dot desugaring. I suppose it is Turing-complete now, though this doesn’t matter much.

Type hierarchy

Type hierarchy establishes the subtyping relationship between types. Mechanics of type hierarchy in Alpha are similar to Julia (as in “I copied the design exactly”).

There are abstract and data types. You cannot create values of abstract types, but they serve as a backbone for the type hierarchy. On top of the hierarchy is Any—every type is a subtype of Any.

type Any = abstract

type Number: Any = abstract

Concrete types are either primitives (integer or float) or structure types:

type i64: Number = integer(64)
type f64: Number = float(64)

type Point = { x, y }

Because types are first-class citizens, they are also represented in the hierarchy:

type Type: Any = abstract
type AbstractType: Type = {
  name: String,
  supertype: AbstractType,
  …
}
type DataType: Type = {
  name: String,
  supertype: AbstractType,
  size: i64,
  fields: …
}

Internally, every type declaration allocates a type value (either AbstractType or DataType) and arranges a global binding.

type Point = { x, y }
#=> compiles to
let Point = DataType {
  name: "Point",
  supertype: Any,
  size: 16,
  fields: ["x", "y"],
  …
}

Most traditional object-oriented languages implement single dispatch: if concrete types of arguments are not known at compile-time, the function implementation is selected dynamically based on the type of this. For example, in C++:

class Shape {
public:
  virtual void compute_area() const = 0;
};

class Rectangle: public Shape {
public:
  virtual void compute_area() const {
    std::count << "Rectangle::compute_area()\n";
  }
}

class Ellipse: public Shape {
public:
  virtual void compute_area() const {
    std::count << "Ellipse::compute_area()\n";
  }
}

int main(int argc, const char** argv) {
  // Cast to Shape* erases the specific type.
  Shape *r = new Rectangle();
  Shape *e = new Ellipse();

  r->compute_area();    // invokes Rectangle::compute_area
  e->compute_area();    // invokes Ellipse::compute_area
}

However, if you need to select implementation based on two or more arguments, this is not possible in C++:

void intersect(const Rectange *r, const Ellipse *e) {
  std::cout << "intersect(Rectangle, Ellipse)\n";
}

void intersect(const Rectange *r1, const Rectangle *r2) {
  std::cout << "intersect(Rectangle, Rectangle)\n";
}

void intersect(const Shape *s1, const Shape *s2) {
  std::cout << "intersect(Shape, Shape)\n";
}

int main(int argc, const char** argv) {
  Shape *r = new Rectangle();
  Shape *e = new Ellipse();

  // This always invokes intersect(const Shape *, const Shape *)
  intersect(r, e);
}

Multi-methods (aka multiple dispatch) is a strictly more powerful polymorphism technique that selects implementation based on multiple arguments.

All functions in Alpha are generic and use multiple dispatch:

type Shape = abstract
type Rectangle: Shape = {}
type Ellipse: Shape = {}

fn intersect(s1: Shape, s2: Shape) = 1
fn intersect(r: Rectangle, e: Ellipse) = 2
fn intersect(r: Rectangle, r: Rectangle) = 3

# Erase concrete types to show that dispatch is dynamic
fn eraser(s1: Shape, s2: Shape) = intersect(s1, s2)

print(eraser(Rectangle(), Ellipse()))
# this correctly invokes intersect(Rectangle, Ellipse) and prints 2

Multi-methods are harder to implement efficiently than single dispatch, though there are known techniques to make dispatch constant-time, and Julia goes to great lengths specializing function and resolving most dynamic dispatches at compile-time (possible because of JIT).

The current implementation in Alpha is primitive and slow, but it works. I can focus on other features now and will rewrite it later.

Dot desugaring

Because multi-methods treat all parameters equally, it doesn’t make sense to assign special meaning to the first one: intersect(Rectangle, Ellipse) does not belong to Rectangle any more than to Ellipse. Therefore, languages with multiple dispatch usually force you to define all functions outside of class definitions. This also allows extending functions from other modules—if the user of the library decides to add Triangle type, they can define intersect(Rectangle, Triangle) without having to modify Rectangle’s code.

The downside is that you can’t use the dot-call notation: r.intersect(e). While intersect(r, e) is equally powerful as r.intersect(e), the dot-call notation has better DX.

Given r. as input, an IDE can analyze the type of r and offer better suggestions, filtering out functions that are not applicable.

It also allows piping style of programming, whereas values flow left-to-right:

xs.filter((x) -> x != 0).map((x) -> x * x).print()

When you build up result interactively, this style is nice as you don’t have to jump back and forth to wrap arguments in parentheses and add function calls before your current value. Compare the previous code to:

print(map((x) -> x * x, filter((x) -> x != 0, xs))

This code is cumbersome to write and read.

Haskell uses dot for function composition which also allows for piping, although values flow right to left:

show . map (\x -> x * x)
  . filter (\x -> x /= 0) $ xs

Alpha has two desugaring rules for dot:

x.f(arg1, …) → f(x, arg1, …)
x.field → field(x)

Note that these rules work for any function:

type Rectangle = { width, height }

fn area(r: Rectangle) = r.width * r.height
# desugars to:
# fn area(r: Rectangle) = *(width(r), height(r))

Rectangle(10, 20).area #=> 200
# desugars to:
# area(Rectangle(10, 20))

Besides enabling piping, this adheres to the Uniform access principle: it doesn’t matter whether area is implemented through storage (field) or computation (function)—the access looks the same. You can refactor area to be a field without changing the client code.

Name clashes

One downside of that approach is that it promotes all functions and fields to the same scope, leading to name clashes.

For example, the following code in Haskell does not compile:

data Human = Human { name :: String }
data Dog = Dog { name :: String }

This happens because Haskell tries to define two functions:

name :: Human -> String
name :: Dog -> String

And you cannot do that in Haskell because the name function is not polymorphic.

But that works fine in Alpha because of multi-methods—name becomes a generic function with two methods:

fn name(_: Human): String
fn name(_: Dog): String

Next steps

Currently, Alpha has i64/f64 and structure types only. The next week I want to focus on adding variable-length types: Strings and Arrays/Vectors. This will allow me to expose more internal types (DataType, Method) to the Alpha environment (they use Rust’s String and Vec now, so they cannot be accessed within Alpha).

Another area for research is GC: Alpha has a primitive bump memory allocator, but it never collects memory. LLVM has support for Garbage Collection but I need to research how to integrate it properly.

Next in series: Alpha #3: dynamically-sized types and almost-finished garbage collection

Backlinks

👨‍🔬 Alpha
👨‍🔬 Alpha #3: dynamically-sized types and almost-finished garbage collection
👨‍🔬 Alpha #1: a newborn programming language for extensible systems