Vig
Table of Contents
A Fresh Mix of Functional and Object Ideas #
If you have read my previous posts or looked at my Github projects, you know I love language design and compiler work. I have built interpreters, written compilers and even started a book on creating your first compiler.
Yet none of these projects felt like a truly amazing language. I was looking into a more mathematical approach of representing types and talking with friends gave me an idea: what if I blend some object concepts into a purely functional design? That thought sent me down a new direction. Now I am building a language called Vig.
In this post I will share its key ideas and the feeling I want it to have. In future posts I will explore specific implementations and design challenges.
A simple example: sum even numbers #
Rust offers powerful types, but also has it’s limits. Suppose we want to sum a list of even numbers. In Rust you might write:
use num::Zero;
use std::ops::Add;
fn sum_all<T>(xs: &[T]) -> T
where
T: Add<Output = T> + Zero + Copy,
{
xs.iter().copied().fold(T::zero(), |acc, x| acc + x)
}
// A newtype restricting to even u32s
struct Evens(Vec<u32>);
impl Evens {
fn new(xs: Vec<u32>) -> Self {
// This check has to happen at runtime
assert!(xs.iter().all(|x| x % 2 == 0));
Evens(xs)
}
}
fn main() {
let evens = Evens::new(vec![2, 4, 6]);
let total = sum_all(&evens); // expected reference `&[_]`
// found reference `&Evens`
println!("Sum: {}", total);
}
This won’t compile as sum_all
expects a slice and not our Evens
newtype. Rust insists on either unwrapping the Evens
type, losing the type-level Evens
guarantee. Or add more boilerplate, like implementing the Deref
trait. In addition, Rust doesn’t allow us to do a compile time check for the contents of Evens
, this check has to happen at runtime.
In Vig, susbet polymorphism handles all this at compile time:
// Define a refinement type Evens
type Evens :: { u32 x | x % 2 == 0 }
// A generic sum over any list of a numeric set S
// Numeric is a built in typeclass
func SumAll<S: Numeric> :: List<S> => S
SumAll = List<S> xs -> xs.fold(S(0), (acc, x) -> acc + x)
List<Evens> l = [2, 4, 6]
result = SumAll(l) |> print
The syntax may seem a little strange but more on that later. We define Evens
as the set of all even u32
values. Then we write SumAll
for any numeric list. In Vig, types are their own identity functions, S(0)
constructs the zero of any numeric type S
. The compiler sees that List<Evens>
fits List<S>
when S
is Evens
. No wrappers or extra traits needed. And the even check can be proven at compile time.
The Type System #
I started my programming journey with JavaScript. A language that doesn’t really have a well defined type system, dynamic types are just chaotic and prone to bugs. Then I moved to Python and PHP. Both improved little on that. Java introduced me to strong typing and I quickly fell in love. TypeScript brought me back to JavaScript but with safety. Rust then showed me how expressive types can catch errors without runtime cost. My aim is a rich type system that fins mistakes up front and never slows down your code.
What you can define #
Almost every type in Vig is a mathematical set. You can use:
- Structural and nominal types
- (Tagged) Unions / Sum types
- Intersection types
- Dependent types
Built-in sets include i32
, u64
, f32
, bool
, string
and more. But more on these later. You can combine, refine and extend these to match your domain.
Crafting your own types #
Your own types can be defined as follows:
type SomeNumbers :: { 1, 2, 3 }
type OtherNumbers :: { 4, 5, 6 }
type MoreNumbers :: SomeNumbers + OtherNumbers
One way to define a type is to list all possible values the type contains. Any expression is valid. Taking the sum of two sets +
is set union. In this case MoreNumbers
contains {1,2,3,4,5,6}
.
A more powerful way of defining a type is by using a refinement set. It’s very close to mathematics and looks as follows:
type Evens :: { u32 x | x % 2 == 0 }
type EvensNotZero :: Evens \ { 0 }
The Evens
type contains all even unsigned 32 bit integers. The EvensNotZero
type is equivalent to type EvensNotZero :: { u32 x | x % 2 == 0 && x != 0 }
. In this case you can even write type EvensNotZero :: Evens \ 0
, which is syntactic sugar for the case above. The |
symbol is an operator and has the lowest precedence. It makes the LHS defintive, by ensuring any variable on the LHS is defined in the RHS expression. The RHS must be an expression that returns a bool
for every possible value of the variable(s) defined on the LHS. The expression u32 x | x % 2 == 0
is non-deterministic, it doesn’t have a single value; rather it assumes a number of different values. A set construction unwinds suchs non-determinism and constructs a set (type) containing all of those values.
This is an intentional set definition, each element may be non-deterministic. The set may be of infinite size.
Another way of defining a finite set is by using the range operator ..
, it accepts two operands and returns a non-deterministic value constrained to the range with both operands exclusive. An including operator (..=
) also exists:
type Digits :: { 0..=9 }
type SmallLetters :: { 'a'..='z' }
type X :: { 0..1920 }
type Y :: { 0..1080 }
Built in Types #
Vig uses a few core base sets to build everything else. These include:
string
: set of all Unicode stringschar
: set of all Unicode characters (char \(\subseteq\) char)i32
,i64
andi128
: set of all signed 32, 64 and 128-bit integersu32
,u64
andu128
: set of all unsigned 32, 64 and 128-bit integersf32
: set of all single precision floating point numbersf64
: set of all double precision floating point numbersbyte
: set of all unsigned 8 bit integersbool
:{ true, false }
Using these basesets in combination with other types and certain constructs you have a very expressive and powerful typesystem.
These base sets let the compiler track subset relations. That is how u32
is a subset of u64
and so on.
Advanced Type Constructs #
Vig goes beyond basic sets to give you precise control.
Tuples #
A tuple is an ordered list of values.
my_box = (10, 20, 35)
Is a tuple instance. Tuple types can be built in many ways.
type Dim :: { f32 x, f32 y | (x, y) }
type Dim :: { (f32 _, f32 _) }
type Dim :: f32 * f32
type Dim :: f32 ** 2
The first two cases are intentional set definitions, the second case is syntactic sugar for the first case. If no |
operator is present the compiler will use the identity function to construct the type. The latter two cases use operators on the basesets.
Records #
A record is as follows:
type PersonRecord :: record { Name: string, Age: i32 }
Person = record { Name="John", Age=42 }
Nominal Types #
By default, sets are inclusive: they contain all values that satisfy the set condition. In that sense such sets represent structural types. A member of a given set does not have to be constructed explicitely as a member or inhabitant; if it fits the criteria, it is a member.
In order to discriminate between certain types we introduce Nominal types.
Classes #
A class constructs a unique class from the candidate set
type BoundsX :: { f32 x | x in -90..90 }
type BoundsY :: { f32 x | x in -189..180 }
type Latitude :: class { degrees: BoundsX }
type Longitude :: class (BoundsY)
To be a member of a class, a value must be constructed as a member of that class explicitely.
lat = Latitude { degrees: 40.712 }
lon = Longitude(-74.005)
All members of a class are still members of the candidate set. This means it is possible to perform arithmatic on Latitudes and Longitudes. However unless the functions/operators performing the arithmetic have been overloaded to support returning members of Latitude and Longitde, they will return members of the candidate set and will need to be converted back to class members.
Classes are themselves sets. New inclusive sets or classes can be constructed based on classes.
type Coords :: Latitude * Longitude
Later on typeclasses will also be added, which can be implemented for any type. The syntax of this is still to be determined.
Tagged Unions #
A simple tagged union, or enum is defined as follows:
type Colors :: enum { Red, Green, Blue }
Colors is a set of 3 symbolic values, denoted by Colors.Red
, Colors.Green
and Colors.Blue
.
Values can be associated with symbolic values:
type Colors :: enum {
Red = (255, 0, 0),
Green = (0, 255, 0),
Blue = (0, 0, 255),
}
type Shapes :: enum {
Circle: f32, // radius
Rectangle: f32 * f32, // 2 sides
Triangle: f32 ** 3, // 3 sides
}
Two enums are stolen from Rust:
type Option<T> {
Some<T>,
None,
}
type Result<T,E> {
Ok<T>,
Err<E>,
}
These mean that there is no null
in Vig. And errors are treated as values.
Functions #
Functions first need to be declared, before they are allowed to be defined.
func Fib :: u32 => u32
This declares the function Fib
to take in a single u32
value and returns a single u32
output. The =>
operator takes a lefthand domain and a righthand codomain. Underlying each function is a set of ordered pairs. After this declaration we are allowed to define the function by specifying it’s ordered pairs explicitely.
Fib = u32 x -> {|
0 -> 1,
1 -> 1,
u32 n ?> 1 -> self(n-1) + self(n-2)
|}
This defines the function Fib
to be set of ordered pairs where an input of 0
and 1
are mapped to 1
and an input of any u32
which is larger than 1
returns Fib(n-1) + Fib(n-2)
. A function always defines a name self
to refer to… itself.
The _
pattern means “match all” and can be used to match any other case that isn’t covered by the arms before. It’s important that a function maps every single element in the domain to at least one element in the codomain.
A function can also be defined with a single lambda expression
func Double :: u32 => Evens
Double = u32 x -> 2 * x
The tight type system allows the compiler to be really explicit about the values for which a function is defined. The domain of /
is f32 \ 0
. The compiler will if possible infer the domains of the functions itself. Such that you can write:
func f :: f32 => f32
f = f32 x -> 1 - 1/(1-x)
If the compiler is not able to infer backwards, the compiler will throw a compile time error and you will have to narrow the type yourself:
func f :: f32 => f32
f = f32 x ?!= 1 -> 1 - 1/(1-x)
A function which returns anything but a Result or Option enum is a total function. This means the function is defined for the entirety of it’s domain. All function examples above are total functions.
If a function can fail for a given input, or return an error, it uses Result
or Option
. Those are partial functions.
Consider a function which given a filename returns the contents of the file. If the file does not exist at runtime, the function is undefined for the given filename. The compiler has no way of knowing this at compile time. Thus, such a function is marked as partial because while it is defined for all filenames, only a subset will return file contents.
Any function that uses a partial function, is a partial function itself.
Functions can be composed using the |>
operator. This allows us to chain functions as follows:
func f :: i32 => u32
f = u32 x -> if x < 0 then { -x } else { x }
func g :: u32 => Evens
g = u32 x -> 2 * x
func h :: i32 => Evens
h = f |> g
// gets desugared to
h = i32 x -> g(f(x))
We can also chain function calls as follows:
func f :: i32 => u32
func g :: u32 => Evens
h = f(-3) |> g // h is of type Evens
// gets desugared to
h = g(f(-3))
It is important that the codomain of the LHS is a subset of the domain of the RHS. The compiler will result in a compile time error if this is not the case. If needed a transformation function can be defined and chained inbetween to transform the codomain of the LHS to the RHS, you can chain as many functions as you want f |> t |> g
.
The logical or operator ||
can be used on functions. They are useless on total functions, but very useful in combination with partial functions. f || g
returns a function which given an argument x
returns f(x)
if f
returns Ok
or Some
for x
, otherwise it returns g(x)
.
For example:
func ReadText :: string => string
ReadText = StdIO.Read || (string _ -> "")
Both the LHS and RHS must have equal domain and codomain. Later this constraint might be lowered to either a subset of the other and then the new signature will take the largest domain and codomain of each.
Axioms #
All of these type constraints have to be verified by the compiler. The following code is valid code:
type Evens :: { u32 x | x % 2 == 0 }
func Fib :: u64 => u64
Fib = u64 x -> {|
0 -> 1,
1 -> 1,
u64 n ?> 1 -> self(n-1) + self(n-2)
|}
Evens x = 4 // If we don't specify Evens here, the compiler will likely infer u64 as we use it in Fib
y = Fib(x)
The compiler must figure out of all elements of Evens
are also found in u64
. It does this by determining if Evens
is a subset of u64
. It does this through a number of steps, but in this case it’ll reach the last step, a SMT solver. The compiler will in this case try to determine if \( x \mid x \equiv 0 \mod 2 \subseteq x \text{ for } x \in \mathbb{Z} \)`.
To do this it will try to prove \( \forall x . x \equiv 0 \mod 2 \implies x \) using an SMT solver. Now for very complex types the compiler will have a hard time evaluating this. That’s where axioms
come into play. If you have very specific domain knowledge you can turn this into axioms
which the compiler may use, if it deems it useful. For example:
type Large :: { u64 x | x > 1000 }
Large a = Fib(5)
Large b = Fib(12)
The compiler only knows that Fib
returns a u64
, and \( \text{u64} \not \subseteq \text{Large} \). The compiler will thus throw an error that Fib(5)
can not be of type Large
. It will suggest you to convert it Large
by chaining it into a transformer, for example:
func MakeLarge :: u64 => Large
MakeLarge = u64 x -> if x <= 1000 then { 1000 } else { x }
Large a = Fib(5) |> MakeLarge
Large b = Fib(12) |> MakeLarge
This will compile perfectly fine, but this might not be what you want. Therefore you can also introduce an axiom
. We know a lower bound on the Fibonnaci numbers, \(\forall n . Fib(n) >= 2^{n-1}\). If we define this axiom at the top of our module, the compiler will be able to use it to prove that Fib(5)
is not an element of Large
but Fib(12)
is an element of Large
.
axiom fib_lower_bound :: u32 n . Fib(n) >= 2**(n-1)
The .
operator is also definitive, making sure any variable on the LHS is defined on the RHS. The RHS must always be a boolean expression. The compiler will now typecheck and find \(Fib(n)>=2^{n-1}>1000\) to determine if Fib(n)
is an element of Large
. The compiler will then attempt to prove \(n = 5 \land Fib(n)>=2^{n-1}>1000\), which is not satisfiable as \(2^5=32 \not > 1000\), so the compiler will throw an error. After removing the Large a = Fib(5)
line, the compiler will then proof \(n = 12 \land Fib(n)>=2^{n-1}>1000\), which is satisfiable as \(2^{12}=4096 > 1000\), the compiler is now happy.
Now SMT solving can be really slow for complex clauses. That’s why there will most likely be a compiler setting to let the compiler fall back to a conservative “unknown” rather than hang or require an axiom. In that case you will likely have to introduce some sort of transformer function. I will also be adding derive traits, like in Rust, which will allow you to add a #[opaque]
to a type, to skip deep proof obligations when performance matters. The syntax of this is still to be determined.
Conclusion #
I know this was a lot. There is more to Vig than I covered here. A prototype compiler exists, but it still needs work. Soon I will open source the project so you can try it yourself. I hope you find the ideas clear and engaging. Let me know what you think and what examples you would like to see or what topics about the design process you’d like to hear more about.