Types

Types are a fundamental feature of computer programming that are found in some form in every programming language. They are a way for a programming language to describe the meaning of data stored in memory, describe the rules of how that data can be manipulated, enforce those rules and give the programmer tools to create their own systems of rules for the manipulation of data.

The terminology varies quite a lot between the worlds of mathematics and computer science, and even between different programming languages. The description below takes a largely Julia oriented explanation of types, as the terminology used in discussing Julia's type system is very close to the generic CS definition of the concepts, so makes a good foundation for knowledge of types. Despite the variation in terminology and in the features offered by different languages' type systems, a lot of the concepts transfer to any language.

Concrete Types

A concrete type can be thought of as a way of interpreting a sequence of bits. For example, the bitstring 10111000 is 184 when viewed as an unsigned integer (UInt8 in Julia), or -74 when viewed as a signed integer (Int8). The pattern of ones and zeros in memory does not change, only the rules for deciding the meaning of the data changes.

10.0 μs
"10111000"
4.4 ms
"10111000"
10.1 ms
184
13.3 ms

Data can change from being one type to another in several different ways. The most common way is by explicit type conversion, i.e. creating a new instantiation of a type by passing its constructor.

The Char 'A' can be converted to an Int by passing it to the Int constructor.

5.2 μs
65
8.2 ms

Likewise, Ints can be converted back to Chars with the Char constructor.

2.7 μs
'A': ASCII/Unicode U+0041 (category Lu: Letter, uppercase)
14.2 μs

In strongly typed languages, this is the only way to convert data from one type to another. In weakly typed languages, such as C, types can be changed by casting them from one type to another. This is where the bits in memory are treated as a different type by disregarding its previous type and treating that bitstring as if it is its new type.

int i = (int)true; // i is now 1
i = (int)false; // i is now 0

This can backfire in situations where there are unexpected differences in the underlying hardware. For example, different systems can use different endianness. In general, the order of bits in a byte goes from most significant bit (MSB) to least significant bit (LSB). However, in types larger than a single byte, the order of bytes can differ from system to system and even between types.

6.6 μs
"01000001000000000000000000000000"
5.5 ms
"00000000000000000000000001000001"
4.5 ms

The bitstring of the Int32 is represented most significant byte first, or big-endian, and the Char is represented least significant byte first, or little-endian.

31.3 μs
true
4.6 ms
141 ms
21.7 μs

Even though they both represent the number 65, the underlying bit pattern in memory is different. A C style casting of the Char 'A' to an Int32 results in a number that is very much not 65.

9.4 μs
1090519040
109 ms

Other languages, such as JavaScript, allow a loose comparison of values across types. There are two different types of comparison, == and === (which don't have names, but I call twoquals and threequals, so feel free to use those), which differ on whether the type information is used when determining equality.

> "0" == 0
true

> "0" === 0
false

> false == 0
true

> false === 0
false

This is due to another form of type conversion, called implicit type conversion or duck typing. This is when the type of an object is helpfully changed to whatever the interpreter thinks your code probably means based on the types passed to it.

In JavaScript, this is most famously known from the confusion caused by implicit conversion between Numbers and Strings. In particular, the + operator means addition for Numbers and concatenation for Strings. However, due to the loose conversions between types, if a string can also represent a number, and the operator can only be used on numbers, such as -, * or /, both strings will be implicitly converted to numbers.

> 1 + 1
2

> "1" + 1
"11"

> "1" - "1"
0

> "1" + "1" - "1"
10

This is either perfectly sensible or jaw-clenchingly infuriating, depending on how you feel about this sort of thing. I don't mind JavaScript's semantics, but I can see why other people hate it. Let's compare with Julia.

7.8 μs
'2': ASCII/Unicode U+0032 (category Nd: Number, decimal digit)
100 ns

MethodError: no method matching +(::String, ::Int64)

Closest candidates are:

+(::Any, ::Any, !Matched::Any, !Matched::Any...) at operators.jl:538

+(!Matched::Base.CoreLogging.LogLevel, ::Integer) at logging.jl:116

+(::String, !Matched::String) at D:\code\julia\notebooks\types.jl#==#82b00a80-663f-11eb-3f2d-891a54c4e511:3

...

  1. top-level scope@Local: 1[inlined]
---

Julia lets you increment the value of a Char, but adding an Int to a String is not allowed. Or rather, not implemented by default. Due to the magic of operator overloading and sadomasochism, it's possible to add JavaScript's + semantics to Julia.

4.5 μs
"11"
3.4 ms
10
5.0 ms

Mmmmmm, web scale and full stack. 👌😎🔥🔥🔥🔥🔥🔥

While ridiculous and nauseating to many, this example helps demonstrate why anyone cares about types in the first place. Annotating the types of function parameters on the function definition tells the compiler which types that function definition (or method) should handle. In the example above, we defined a new method on the + function, so that + can now handle being passed String arguments.

Calling a different method on a function based on the types of the arguments is called multiple dispatch, and is very handy. It lets us add our super neato JS style string addition and subtraction without overwriting the boring normal way of adding and subtracting of numbers. The + functions can live happily side by side.

10.3 μs
2
100 ns
"11"
1.3 μs

Abstract Types

As mentioned before, the types we've been referring to have all been concrete types, that is, types which have a particular representation in memory. Due the practical limitation of only having a finite amount of memory available on any given computer, it's necessary to only use a certain number of bits to represent any given value. This means that any concrete type has a limited range of values it can represent.

Int8 can only represent values in the range -128:127, because it only has eight bits available, so the maximum number of values it can represent is 2^8. Int64 can represent a considerably larger range of values.

14.4 μs
-128:127
4.5 ms
-9223372036854775808:9223372036854775807
5.6 μs

Even though these are technically different types, they represent the same concept, so it's useful to be able to refer to operations on the common value that these types represent, like with the + operator in the example above.

Now what if some people don't like our awesome new +? Well, we could make our own addclassic function that doesn't have the string methods, just like the + used to. This could be done using our old pal multiple dispatch again, something like:

addclassic(a::Int8, b::Int8) = a + b
addclassic(a::Int16, b::Int16) = a + b
addclassic(a::Int32, b::Int32) = a + b
addclassic(a::Int64, b::Int64) = a + b

This would work fine, but it would get tedious having to do this for every function that we wanted to work across multiple types, and this doesn't even cover what happens if we wanted to add an Int64 to an Int32, or an Int8 to an Int16, or any of the other combinations. Luckily there's an easier way.

“If it walks like a duck, quacks like a duck, and subtypes AbstractDuck, it's a duck.” — Dijkstra, probably.

Abstract types are a way of arranging types in a hierarchy, so that methods can be written for a type further up the hierarchy (called a supertype), and that function can be called by any of the types that are below it in the hierarchy (subtypes).

12.1 μs
26.3 ms

When each of those types was defined, it was assigned the supertype Signed, which is the supertype which represents all signed integer types.

type Int8 <: Signed end

Since each of the types we wrote add methods for share a supertype, so we could instead write the following code and have the same functionality.

add(a::Signed, b::Signed) = a + b

Signed itself also has further supertypes, so let's have a look at those.

7.4 μs
Integer
1.3 μs
Real
1.2 μs
Number
1.7 μs
Any
1.2 μs
Any
1.5 μs

As you can see, the type hierarchy goes quite high, all the way up to Any, the supertype of all supertypes that's even its own supertype. All the types form a big tree with Any sitting at the top (in the mathematical sense of a tree, it's not my fault mathematicians don't know which way up trees go). Any is the default type, and variables and parameters which don't have a type annotation default to Any.

Our add function could target Any, with either of the following equivalent definitions.

addclassic(a::Any, b::Any) = a + b
addclassic(a, b) = a + b

This leaves us back with our JS style string concatenation that we were defining our special strict add function to get away from. We want to add numbers and nothing else, so let's target Number.

6.1 μs
addclassic (generic function with 1 method)
17.5 μs
2
100 ns
2
2.7 ms

MethodError: no method matching addclassic(::String, ::String)

  1. top-level scope@Local: 1[inlined]
---

Success! addclassic works just like + used to before we upgraded it. Now nobody can complain.

4.0 μs