Types
Types are a fundamental feature of computer programming that are found in some form in every programming language. They are a way for a programming language to describe the meaning of data stored in memory, describe the rules of how that data can be manipulated, enforce those rules and give the programmer tools to create their own systems of rules for the manipulation of data.
The terminology varies quite a lot between the worlds of mathematics and computer science, and even between different programming languages. The description below takes a largely Julia oriented explanation of types, as the terminology used in discussing Julia's type system is very close to the generic CS definition of the concepts, so makes a good foundation for knowledge of types. Despite the variation in terminology and in the features offered by different languages' type systems, a lot of the concepts transfer to any language.
Concrete Types
A concrete type can be thought of as a way of interpreting a sequence of bits. For example, the bitstring 10111000
is 184
when viewed as an unsigned integer (UInt8
in Julia), or -74
when viewed as a signed integer (Int8
). The pattern of ones and zeros in memory does not change, only the rules for deciding the meaning of the data changes.
"10111000"
xxxxxxxxxx
bitstring(UInt8(184))
"10111000"
xxxxxxxxxx
bitstring(Int8(-72))
184
xxxxxxxxxx
parse(Int, bitstring(Int8(-72)), base=2)
Data can change from being one type to another in several different ways. The most common way is by explicit type conversion, i.e. creating a new instantiation of a type by passing its constructor.
The Char 'A'
can be converted to an Int by passing it to the Int constructor.
65
xxxxxxxxxx
Int8('A')
Likewise, Ints can be converted back to Chars with the Char constructor.
'A': ASCII/Unicode U+0041 (category Lu: Letter, uppercase)
xxxxxxxxxx
Char(65)
In strongly typed languages, this is the only way to convert data from one type to another. In weakly typed languages, such as C, types can be changed by casting them from one type to another. This is where the bits in memory are treated as a different type by disregarding its previous type and treating that bitstring as if it is its new type.
int i = (int)true; // i is now 1
i = (int)false; // i is now 0
This can backfire in situations where there are unexpected differences in the underlying hardware. For example, different systems can use different endianness. In general, the order of bits in a byte goes from most significant bit (MSB) to least significant bit (LSB). However, in types larger than a single byte, the order of bytes can differ from system to system and even between types.
"01000001000000000000000000000000"
xxxxxxxxxx
bitstring(Char(65))
"00000000000000000000000001000001"
xxxxxxxxxx
bitstring(Int32(65))
The bitstring of the Int32
is represented most significant byte first, or big-endian, and the Char
is represented least significant byte first, or little-endian.
true
xxxxxxxxxx
Char(Int32(65)) == 'A'
"00000000"
"00000000"
"00000000"
"01000001"
xxxxxxxxxx
eachbyte(Int32(65))
"01000001"
"00000000"
"00000000"
"00000000"
xxxxxxxxxx
eachbyte('A')
Even though they both represent the number 65, the underlying bit pattern in memory is different. A C style casting of the Char
'A'
to an Int32
results in a number that is very much not 65
.
1090519040
xxxxxxxxxx
parse(Int32, bitstring('A'), base=2)
Other languages, such as JavaScript, allow a loose comparison of values across types. There are two different types of comparison, ==
and ===
(which don't have names, but I call twoquals and threequals, so feel free to use those), which differ on whether the type information is used when determining equality.
> "0" == 0
true
> "0" === 0
false
> false == 0
true
> false === 0
false
This is due to another form of type conversion, called implicit type conversion or duck typing. This is when the type of an object is helpfully changed to whatever the interpreter thinks your code probably means based on the types passed to it.
In JavaScript, this is most famously known from the confusion caused by implicit conversion between Numbers and Strings. In particular, the +
operator means addition for Numbers and concatenation for Strings. However, due to the loose conversions between types, if a string can also represent a number, and the operator can only be used on numbers, such as -
, *
or /
, both strings will be implicitly converted to numbers.
> 1 + 1
2
> "1" + 1
"11"
> "1" - "1"
0
> "1" + "1" - "1"
10
This is either perfectly sensible or jaw-clenchingly infuriating, depending on how you feel about this sort of thing. I don't mind JavaScript's semantics, but I can see why other people hate it. Let's compare with Julia.
'2': ASCII/Unicode U+0032 (category Nd: Number, decimal digit)
xxxxxxxxxx
'1' + 1
MethodError: no method matching +(::String, ::Int64)
Closest candidates are:
+(::Any, ::Any, !Matched::Any, !Matched::Any...) at operators.jl:538
+(!Matched::Base.CoreLogging.LogLevel, ::Integer) at logging.jl:116
+(::String, !Matched::String) at D:\code\julia\notebooks\types.jl#==#82b00a80-663f-11eb-3f2d-891a54c4e511:3
...
- top-level scope@Local: 1[inlined]
xxxxxxxxxx
"1" + 1
Julia lets you increment the value of a Char
, but adding an Int
to a String
is not allowed. Or rather, not implemented by default. Due to the magic of operator overloading and sadomasochism, it's possible to add JavaScript's +
semantics to Julia.
"11"
xxxxxxxxxx
let +(a::String, b::Int64) = a * string(b)
"1" + 1
end
10
xxxxxxxxxx
begin
import Base: +, -
+(a::String, b::String) = a * b
-(a::String, b::String) = parse(Int, a) - parse(Int, b)
"1" + "1" - "1"
end
Mmmmmm, web scale and full stack. 👌😎🔥🔥🔥🔥🔥🔥
While ridiculous and nauseating to many, this example helps demonstrate why anyone cares about types in the first place. Annotating the types of function parameters on the function definition tells the compiler which types that function definition (or method) should handle. In the example above, we defined a new method on the +
function, so that +
can now handle being passed String
arguments.
Calling a different method on a function based on the types of the arguments is called multiple dispatch, and is very handy. It lets us add our super neato JS style string addition and subtraction without overwriting the boring normal way of adding and subtracting of numbers. The +
functions can live happily side by side.
2
xxxxxxxxxx
1 + 1 # your grampa's addition
"11"
xxxxxxxxxx
"1" + "1" # cool refreshing NEW Addition™
Abstract Types
As mentioned before, the types we've been referring to have all been concrete types, that is, types which have a particular representation in memory. Due the practical limitation of only having a finite amount of memory available on any given computer, it's necessary to only use a certain number of bits to represent any given value. This means that any concrete type has a limited range of values it can represent.
Int8
can only represent values in the range -128:127
, because it only has eight bits available, so the maximum number of values it can represent is 2^8
. Int64
can represent a considerably larger range of values.
-128:127
xxxxxxxxxx
typemin(Int8):typemax(Int8)
-9223372036854775808:9223372036854775807
xxxxxxxxxx
typemin(Int64):typemax(Int64)
Even though these are technically different types, they represent the same concept, so it's useful to be able to refer to operations on the common value that these types represent, like with the +
operator in the example above.
Now what if some people don't like our awesome new +
? Well, we could make our own addclassic
function that doesn't have the string methods, just like the +
used to. This could be done using our old pal multiple dispatch again, something like:
addclassic(a::Int8, b::Int8) = a + b
addclassic(a::Int16, b::Int16) = a + b
addclassic(a::Int32, b::Int32) = a + b
addclassic(a::Int64, b::Int64) = a + b
This would work fine, but it would get tedious having to do this for every function that we wanted to work across multiple types, and this doesn't even cover what happens if we wanted to add an Int64
to an Int32
, or an Int8
to an Int16
, or any of the other combinations. Luckily there's an easier way.
“If it walks like a duck, quacks like a duck, and subtypes
AbstractDuck
, it's a duck.” — Dijkstra, probably.
Abstract types are a way of arranging types in a hierarchy, so that methods can be written for a type further up the hierarchy (called a supertype), and that function can be called by any of the types that are below it in the hierarchy (subtypes).
Signed
Signed
Signed
Signed
xxxxxxxxxx
supertype.((Int8, Int16, Int32, Int64))
When each of those types was defined, it was assigned the supertype Signed
, which is the supertype which represents all signed integer types.
type Int8 <: Signed end
Since each of the types we wrote add
methods for share a supertype, so we could instead write the following code and have the same functionality.
add(a::Signed, b::Signed) = a + b
Signed
itself also has further supertypes, so let's have a look at those.
Integer
xxxxxxxxxx
supertype(Signed)
Real
xxxxxxxxxx
supertype(Integer)
Number
xxxxxxxxxx
supertype(Real)
Any
xxxxxxxxxx
supertype(Number)
Any
xxxxxxxxxx
supertype(Any)
As you can see, the type hierarchy goes quite high, all the way up to Any
, the supertype of all supertypes that's even its own supertype. All the types form a big tree with Any
sitting at the top (in the mathematical sense of a tree, it's not my fault mathematicians don't know which way up trees go). Any
is the default type, and variables and parameters which don't have a type annotation default to Any
.
Our add
function could target Any
, with either of the following equivalent definitions.
addclassic(a::Any, b::Any) = a + b
addclassic(a, b) = a + b
This leaves us back with our JS style string concatenation that we were defining our special strict add
function to get away from. We want to add numbers and nothing else, so let's target Number
.
addclassic (generic function with 1 method)
xxxxxxxxxx
addclassic(a::Number, b::Number) = a + b
2
xxxxxxxxxx
addclassic(1, 1)
2
xxxxxxxxxx
addclassic(Int8(1), Int8(1))
MethodError: no method matching addclassic(::String, ::String)
- top-level scope@Local: 1[inlined]
xxxxxxxxxx
addclassic("1", "1")
Success! addclassic
works just like +
used to before we upgraded it. Now nobody can complain.