Skip to main content
  1. Posts/

Vig

·2973 words·14 mins

A Fresh Mix of Functional and Object Ideas #

If you have read my previous posts or looked at my Github projects, you know I love language design and compiler work. I have built interpreters, written compilers and even started a book on creating your first compiler.

Yet none of these projects felt like a truly amazing language. I was looking into a more mathematical approach of representing types and talking with friends gave me an idea: what if I blend some object concepts into a purely functional design? That thought sent me down a new direction. Now I am building a language called Vig.

In this post I will share its key ideas and the feeling I want it to have. In future posts I will explore specific implementations and design challenges.

A simple example: sum even numbers #

Rust offers powerful types, but also has it’s limits. Suppose we want to sum a list of even numbers. In Rust you might write:

use num::Zero;
use std::ops::Add;

fn sum_all<T>(xs: &[T]) -> T
where
    T: Add<Output = T> + Zero + Copy,
{
    xs.iter().copied().fold(T::zero(), |acc, x| acc + x)
}

// A newtype restricting to even u32s
struct Evens(Vec<u32>);

impl Evens {
    fn new(xs: Vec<u32>) -> Self {
        // This check has to happen at runtime
        assert!(xs.iter().all(|x| x % 2 == 0));
        Evens(xs)
    }
}

fn main() {
    let evens = Evens::new(vec![2, 4, 6]);
    let total = sum_all(&evens);    // expected reference `&[_]` 
                                    // found reference `&Evens`
    println!("Sum: {}", total);
}

This won’t compile as sum_all expects a slice and not our Evens newtype. Rust insists on either unwrapping the Evens type, losing the type-level Evens guarantee. Or add more boilerplate, like implementing the Deref trait. In addition, Rust doesn’t allow us to do a compile time check for the contents of Evens, this check has to happen at runtime.

In Vig, susbet polymorphism handles all this at compile time:

// Define a refinement type Evens
type Evens :: { u32 x | x % 2 == 0 }

// A generic sum over any list of a numeric set S
// Numeric is a built in typeclass
func SumAll<S: Numeric> :: List<S> => S
SumAll = List<S> xs -> xs.fold(S(0), (acc, x) -> acc + x)

List<Evens> l = [2, 4, 6]
result = SumAll(l) |> print

The syntax may seem a little strange but more on that later. We define Evens as the set of all even u32 values. Then we write SumAll for any numeric list. In Vig, types are their own identity functions, S(0) constructs the zero of any numeric type S. The compiler sees that List<Evens> fits List<S> when S is Evens. No wrappers or extra traits needed. And the even check can be proven at compile time.

The Type System #

I started my programming journey with JavaScript. A language that doesn’t really have a well defined type system, dynamic types are just chaotic and prone to bugs. Then I moved to Python and PHP. Both improved little on that. Java introduced me to strong typing and I quickly fell in love. TypeScript brought me back to JavaScript but with safety. Rust then showed me how expressive types can catch errors without runtime cost. My aim is a rich type system that fins mistakes up front and never slows down your code.

What you can define #

Almost every type in Vig is a mathematical set. You can use:

  • Structural and nominal types
  • (Tagged) Unions / Sum types
  • Intersection types
  • Dependent types

Built-in sets include i32, u64, f32, bool, string and more. But more on these later. You can combine, refine and extend these to match your domain.

Crafting your own types #

Your own types can be defined as follows:

type SomeNumbers :: { 1, 2, 3 }
type OtherNumbers :: { 4, 5, 6 }
type MoreNumbers :: SomeNumbers + OtherNumbers

One way to define a type is to list all possible values the type contains. Any expression is valid. Taking the sum of two sets + is set union. In this case MoreNumbers contains {1,2,3,4,5,6}.

A more powerful way of defining a type is by using a refinement set. It’s very close to mathematics and looks as follows:

type Evens :: { u32 x | x % 2 == 0 }
type EvensNotZero :: Evens \ { 0 }

The Evens type contains all even unsigned 32 bit integers. The EvensNotZero type is equivalent to type EvensNotZero :: { u32 x | x % 2 == 0 && x != 0 }. In this case you can even write type EvensNotZero :: Evens \ 0, which is syntactic sugar for the case above. The | symbol is an operator and has the lowest precedence. It makes the LHS defintive, by ensuring any variable on the LHS is defined in the RHS expression. The RHS must be an expression that returns a bool for every possible value of the variable(s) defined on the LHS. The expression u32 x | x % 2 == 0 is non-deterministic, it doesn’t have a single value; rather it assumes a number of different values. A set construction unwinds suchs non-determinism and constructs a set (type) containing all of those values.

This is an intentional set definition, each element may be non-deterministic. The set may be of infinite size.

Another way of defining a finite set is by using the range operator .., it accepts two operands and returns a non-deterministic value constrained to the range with both operands exclusive. An including operator (..=) also exists:

type Digits :: { 0..=9 }
type SmallLetters :: { 'a'..='z' }
type X :: { 0..1920 }
type Y :: { 0..1080 }

Built in Types #

Vig uses a few core base sets to build everything else. These include:

  • string: set of all Unicode strings
  • char: set of all Unicode characters (char \(\subseteq\) char)
  • i32, i64 and i128: set of all signed 32, 64 and 128-bit integers
  • u32, u64 and u128: set of all unsigned 32, 64 and 128-bit integers
  • f32: set of all single precision floating point numbers
  • f64: set of all double precision floating point numbers
  • byte: set of all unsigned 8 bit integers
  • bool: { true, false }

Using these basesets in combination with other types and certain constructs you have a very expressive and powerful typesystem.

These base sets let the compiler track subset relations. That is how u32 is a subset of u64 and so on.

Advanced Type Constructs #

Vig goes beyond basic sets to give you precise control.

Tuples #

A tuple is an ordered list of values.

my_box = (10, 20, 35)

Is a tuple instance. Tuple types can be built in many ways.

type Dim :: { f32 x, f32 y | (x, y) }
type Dim :: { (f32 _, f32 _) }
type Dim :: f32 * f32
type Dim :: f32 ** 2

The first two cases are intentional set definitions, the second case is syntactic sugar for the first case. If no | operator is present the compiler will use the identity function to construct the type. The latter two cases use operators on the basesets.

Records #

A record is as follows:

type PersonRecord :: record { Name: string, Age: i32 }
Person = record { Name="John", Age=42 }

Nominal Types #

By default, sets are inclusive: they contain all values that satisfy the set condition. In that sense such sets represent structural types. A member of a given set does not have to be constructed explicitely as a member or inhabitant; if it fits the criteria, it is a member.

In order to discriminate between certain types we introduce Nominal types.

Classes #

A class constructs a unique class from the candidate set

type BoundsX :: { f32 x | x in -90..90 }
type BoundsY :: { f32 x | x in -189..180 }

type Latitude :: class { degrees: BoundsX }
type Longitude :: class (BoundsY)

To be a member of a class, a value must be constructed as a member of that class explicitely.

lat = Latitude { degrees: 40.712 }
lon = Longitude(-74.005)

All members of a class are still members of the candidate set. This means it is possible to perform arithmatic on Latitudes and Longitudes. However unless the functions/operators performing the arithmetic have been overloaded to support returning members of Latitude and Longitde, they will return members of the candidate set and will need to be converted back to class members.

Classes are themselves sets. New inclusive sets or classes can be constructed based on classes.

type Coords :: Latitude * Longitude

Later on typeclasses will also be added, which can be implemented for any type. The syntax of this is still to be determined.

Tagged Unions #

A simple tagged union, or enum is defined as follows:

type Colors :: enum { Red, Green, Blue }

Colors is a set of 3 symbolic values, denoted by Colors.Red, Colors.Green and Colors.Blue.

Values can be associated with symbolic values:

type Colors :: enum {
    Red = (255, 0, 0),
    Green = (0, 255, 0),
    Blue = (0, 0, 255),
}

type Shapes :: enum {
    Circle: f32, // radius
    Rectangle: f32 * f32, // 2 sides
    Triangle: f32 ** 3, // 3 sides
}

Two enums are stolen from Rust:

type Option<T> {
    Some<T>,
    None,
}
type Result<T,E> {
    Ok<T>,
    Err<E>,
}

These mean that there is no null in Vig. And errors are treated as values.

Functions #

Functions first need to be declared, before they are allowed to be defined.

func Fib :: u32 => u32

This declares the function Fib to take in a single u32 value and returns a single u32 output. The => operator takes a lefthand domain and a righthand codomain. Underlying each function is a set of ordered pairs. After this declaration we are allowed to define the function by specifying it’s ordered pairs explicitely.

Fib = u32 x -> {|
    0 -> 1,
    1 -> 1,
    u32 n ?> 1 -> self(n-1) + self(n-2)
|}

This defines the function Fib to be set of ordered pairs where an input of 0 and 1 are mapped to 1 and an input of any u32 which is larger than 1 returns Fib(n-1) + Fib(n-2). A function always defines a name self to refer to… itself.

The _ pattern means “match all” and can be used to match any other case that isn’t covered by the arms before. It’s important that a function maps every single element in the domain to at least one element in the codomain.

A function can also be defined with a single lambda expression

func Double :: u32 => Evens
Double = u32 x -> 2 * x

The tight type system allows the compiler to be really explicit about the values for which a function is defined. The domain of / is f32 \ 0. The compiler will if possible infer the domains of the functions itself. Such that you can write:

func f :: f32 => f32
f = f32 x -> 1 - 1/(1-x)

If the compiler is not able to infer backwards, the compiler will throw a compile time error and you will have to narrow the type yourself:

func f :: f32 => f32
f = f32 x ?!= 1 -> 1 - 1/(1-x)

A function which returns anything but a Result or Option enum is a total function. This means the function is defined for the entirety of it’s domain. All function examples above are total functions.

If a function can fail for a given input, or return an error, it uses Result or Option. Those are partial functions.

Consider a function which given a filename returns the contents of the file. If the file does not exist at runtime, the function is undefined for the given filename. The compiler has no way of knowing this at compile time. Thus, such a function is marked as partial because while it is defined for all filenames, only a subset will return file contents.

Any function that uses a partial function, is a partial function itself.

Functions can be composed using the |> operator. This allows us to chain functions as follows:

func f :: i32 => u32
f = u32 x -> if x < 0 then { -x } else { x }
func g :: u32 => Evens
g = u32 x -> 2 * x
func h :: i32 => Evens
h = f |> g
// gets desugared to
h = i32 x -> g(f(x))

We can also chain function calls as follows:

func f :: i32 => u32
func g :: u32 => Evens
h = f(-3) |> g // h is of type Evens
// gets desugared to
h = g(f(-3))

It is important that the codomain of the LHS is a subset of the domain of the RHS. The compiler will result in a compile time error if this is not the case. If needed a transformation function can be defined and chained inbetween to transform the codomain of the LHS to the RHS, you can chain as many functions as you want f |> t |> g.

The logical or operator || can be used on functions. They are useless on total functions, but very useful in combination with partial functions. f || g returns a function which given an argument x returns f(x) if f returns Ok or Some for x, otherwise it returns g(x).

For example:

func ReadText :: string => string 
ReadText = StdIO.Read || (string _ -> "")

Both the LHS and RHS must have equal domain and codomain. Later this constraint might be lowered to either a subset of the other and then the new signature will take the largest domain and codomain of each.

Axioms #

All of these type constraints have to be verified by the compiler. The following code is valid code:

type Evens :: { u32 x | x % 2 == 0 }
func Fib :: u64 => u64
Fib = u64 x -> {|
    0 -> 1,
    1 -> 1,
    u64 n ?> 1 -> self(n-1) + self(n-2)
|}
Evens x = 4   // If we don't specify Evens here, the compiler will likely infer u64 as we use it in Fib
y = Fib(x)

The compiler must figure out of all elements of Evens are also found in u64. It does this by determining if Evens is a subset of u64. It does this through a number of steps, but in this case it’ll reach the last step, a SMT solver. The compiler will in this case try to determine if \( x \mid x \equiv 0 \mod 2 \subseteq x \text{ for } x \in \mathbb{Z} \)`.

To do this it will try to prove \( \forall x . x \equiv 0 \mod 2 \implies x \) using an SMT solver. Now for very complex types the compiler will have a hard time evaluating this. That’s where axioms come into play. If you have very specific domain knowledge you can turn this into axioms which the compiler may use, if it deems it useful. For example:

type Large :: { u64 x | x > 1000 }
Large a = Fib(5)
Large b = Fib(12)

The compiler only knows that Fib returns a u64, and \( \text{u64} \not \subseteq \text{Large} \). The compiler will thus throw an error that Fib(5) can not be of type Large. It will suggest you to convert it Large by chaining it into a transformer, for example:

func MakeLarge :: u64 => Large
MakeLarge = u64 x -> if x <= 1000 then { 1000 } else { x }
Large a = Fib(5) |> MakeLarge
Large b = Fib(12) |> MakeLarge

This will compile perfectly fine, but this might not be what you want. Therefore you can also introduce an axiom. We know a lower bound on the Fibonnaci numbers, \(\forall n . Fib(n) >= 2^{n-1}\). If we define this axiom at the top of our module, the compiler will be able to use it to prove that Fib(5) is not an element of Large but Fib(12) is an element of Large.

axiom fib_lower_bound :: u32 n . Fib(n) >= 2**(n-1)

The . operator is also definitive, making sure any variable on the LHS is defined on the RHS. The RHS must always be a boolean expression. The compiler will now typecheck and find \(Fib(n)>=2^{n-1}>1000\) to determine if Fib(n) is an element of Large. The compiler will then attempt to prove \(n = 5 \land Fib(n)>=2^{n-1}>1000\), which is not satisfiable as \(2^5=32 \not > 1000\), so the compiler will throw an error. After removing the Large a = Fib(5) line, the compiler will then proof \(n = 12 \land Fib(n)>=2^{n-1}>1000\), which is satisfiable as \(2^{12}=4096 > 1000\), the compiler is now happy.

Now SMT solving can be really slow for complex clauses. That’s why there will most likely be a compiler setting to let the compiler fall back to a conservative “unknown” rather than hang or require an axiom. In that case you will likely have to introduce some sort of transformer function. I will also be adding derive traits, like in Rust, which will allow you to add a #[opaque] to a type, to skip deep proof obligations when performance matters. The syntax of this is still to be determined.

Conclusion #

I know this was a lot. There is more to Vig than I covered here. A prototype compiler exists, but it still needs work. Soon I will open source the project so you can try it yourself. I hope you find the ideas clear and engaging. Let me know what you think and what examples you would like to see or what topics about the design process you’d like to hear more about.