2019-06-15 - Abstraction is Magic

There's a saying in scientific circles about standing on the shoulders of giants. The idea being that what people are working on today is based on the knowledge gained by those that came before them. If everyone was starting from scratch then we'd never make any progress. Programming is the same way but our currency of progress is abstraction.

Abstraction is the idea of hiding details so that it's easier to focus on the bigger picture. When programming first started everything was done in machine code with the programmer telling the processor the exact actions to perform. This allowed absolute control and the possibility of extremely efficient programs but it also required the programmer to be very aware of the intricacies of the processor and it took a lot of work to develop a complete application. The invention of compilers allowed the details of the processor to be abstracted so that the programmer could focus more on the specifics of the application they wanted to create. The developers of the compiler still needed to know about the processor but their work allowed others to focus on bigger issues. Successive generations of programming languages and advances in operating systems and framework allow even more abstraction.

But abstraction is a double edged sword. It helps you to focus on the bigger picture and ignore the small details until there's a problem with those small details. It's really nice that the operating system has a mechanism for creating a dialog box until there's an issue creating that dialog box and the OS won't tell you want it is or how to fix it. When you are writing machine code there's never a situation where the processor does something you didn't explicitly tell it to do. The more abstractions you have the more things that are going on behind the scenes that you are not aware of. You also have less control over how exactly things work. When you are doing everything yourself it's easy to optimize operations to be very efficient for your specific case. Abstractions need to be general enough to meet a variety of needs so they may end up doing things that aren't required for your specific scenario.

I think the important thing here is that abstractions are required for programming to advance but we can't lose sight of what those abstractions are doing. You need to understand your abstractions to some degree if you are going to be successful at using them. This is the main thing that drives me to learn about assembly language, intermediate code, and compilers. I will likely never do anything with those concepts professionally but knowing them helps me work with them as abstractions.

2019-04-27 - In IL: Summing Arrays

Today we are going to see some of the instructions we looked at last time in action. Let's start by looking at a simple program that sums the values in an array.

Program.cs

using System;

using System.Collections.Generic;

using System.Linq;

using System.Text;

using System.Threading.Tasks;

namespace CsArray1

{

class Program

{

static void Main(string[] args)

{

int[] array = new int[5];

for(int i = 0; i < array.Length; i++)

{

array[i] = i;

}

int sum = 0;

for(int i = 0; i < array.Length; i++)

{

sum += array[i];

}

Console.WriteLine(sum);

}

This program creates a 1-dimensional array, fills that array with values, and then sums up those values. Now let's look at the compiled version.

Main

.method private hidebysig static void Main(string[] args) cil managed

{

.entrypoint

// Code size 54 (0x36)

.maxstack 3

.locals init ([0] int32[] 'array',

[1] int32 sum,

[2] int32 i,

[3] int32 V_3)

IL_0000: ldc.i4.5

IL_0001: newarr [mscorlib]System.Int32

IL_0006: stloc.0

IL_0007: ldc.i4.0

IL_0008: stloc.2

IL_0009: br.s IL_0013

IL_000b: ldloc.0

IL_000c: ldloc.2

IL_000d: ldloc.2

IL_000e: stelem.i4

IL_000f: ldloc.2

IL_0010: ldc.i4.1

IL_0011: add

IL_0012: stloc.2

IL_0013: ldloc.2

IL_0014: ldloc.0

IL_0015: ldlen

IL_0016: conv.i4

IL_0017: blt.s IL_000b

IL_0019: ldc.i4.0

IL_001a: stloc.1

IL_001b: ldc.i4.0

IL_001c: stloc.3

IL_001d: br.s IL_0029

IL_001f: ldloc.1

IL_0020: ldloc.0

IL_0021: ldloc.3

IL_0022: ldelem.i4

IL_0023: add

IL_0024: stloc.1

IL_0025: ldloc.3

IL_0026: ldc.i4.1

IL_0027: add

IL_0028: stloc.3

IL_0029: ldloc.3

IL_002a: ldloc.0

IL_002b: ldlen

IL_002c: conv.i4

IL_002d: blt.s IL_001f

IL_002f: ldloc.1

IL_0030: call void [mscorlib]System.Console::WriteLine(int32)

IL_0035: ret

} // end of method Program::Main

100

101

102

103

104

105

106

107

108

109

110

111

112

113

The looping sequence should be very familiar to you by now. You can see it initialize the looping variable, test the variable, perform the loop operations, and increment the variable. You also see some of the instructions we talked about last time such as newarr, stelem, ldlen, and ldelem.

Now let's look at another example.

Program.cs

using System;

using System.Collections.Generic;

using System.Linq;

using System.Text;

using System.Threading.Tasks;

namespace CsArray2

{

class Program

{

static void Main(string[] args)

{

int[,] array = new int[5,10];

for (int i = 0; i < array.GetLength(0); i++)

{

for (int j = 0; j < array.GetLength(1); j++)

{

array[i,j] = i * j;

}

int sum = 0;

for (int i = 0; i < array.GetLength(0); i++)

{

for (int j = 0; j < array.GetLength(1); j++)

{

sum += array[i,j];

}

Console.WriteLine(sum);

}

This time we are doing basically the same thing except with a 2-dimensional array. This means that we have nested loops for each part, elements are accessed using two indexes, and we have to use the GetLength() method so that we can indicate which dimension we want the length of. Now let's look at the compiled version of this.

Main

.method private hidebysig static void Main(string[] args) cil managed

{

.entrypoint

// Code size 122 (0x7a)

.maxstack 5

.locals init ([0] int32[0...,0...] 'array',

[1] int32 sum,

[2] int32 i,

[3] int32 j,

[4] int32 V_4,

[5] int32 V_5)

IL_0000: ldc.i4.5

IL_0001: ldc.i4.s 10

IL_0003: newobj instance void int32[0...,0...]::.ctor(int32,

int32)

IL_0008: stloc.0

IL_0009: ldc.i4.0

IL_000a: stloc.2

IL_000b: br.s IL_002e

IL_000d: ldc.i4.0

IL_000e: stloc.3

IL_000f: br.s IL_0020

IL_0011: ldloc.0

IL_0012: ldloc.2

IL_0013: ldloc.3

IL_0014: ldloc.2

IL_0015: ldloc.3

IL_0016: mul

IL_0017: call instance void int32[0...,0...]::Set(int32,

int32,

int32)

IL_001c: ldloc.3

IL_001d: ldc.i4.1

IL_001e: add

IL_001f: stloc.3

IL_0020: ldloc.3

IL_0021: ldloc.0

IL_0022: ldc.i4.1

IL_0023: callvirt instance int32 [mscorlib]System.Array::GetLength(int32)

IL_0028: blt.s IL_0011

IL_002a: ldloc.2

IL_002b: ldc.i4.1

IL_002c: add

IL_002d: stloc.2

IL_002e: ldloc.2

IL_002f: ldloc.0

IL_0030: ldc.i4.0

IL_0031: callvirt instance int32 [mscorlib]System.Array::GetLength(int32)

IL_0036: blt.s IL_000d

IL_0038: ldc.i4.0

IL_0039: stloc.1

IL_003a: ldc.i4.0

IL_003b: stloc.s V_4

IL_003d: br.s IL_0068

IL_003f: ldc.i4.0

IL_0040: stloc.s V_5

IL_0042: br.s IL_0057

IL_0044: ldloc.1

IL_0045: ldloc.0

IL_0046: ldloc.s V_4

IL_0048: ldloc.s V_5

IL_004a: call instance int32 int32[0...,0...]::Get(int32,

int32)

IL_004f: add

IL_0050: stloc.1

IL_0051: ldloc.s V_5

IL_0053: ldc.i4.1

IL_0054: add

IL_0055: stloc.s V_5

IL_0057: ldloc.s V_5

IL_0059: ldloc.0

IL_005a: ldc.i4.1

IL_005b: callvirt instance int32 [mscorlib]System.Array::GetLength(int32)

IL_0060: blt.s IL_0044

IL_0062: ldloc.s V_4

IL_0064: ldc.i4.1

IL_0065: add

IL_0066: stloc.s V_4

IL_0068: ldloc.s V_4

IL_006a: ldloc.0

IL_006b: ldc.i4.0

IL_006c: callvirt instance int32 [mscorlib]System.Array::GetLength(int32)

IL_0071: blt.s IL_003f

IL_0073: ldloc.1

IL_0074: call void [mscorlib]System.Console::WriteLine(int32)

IL_0079: ret

} // end of method Program::Main

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

We have the same looping sequence as before except this time there's multiple sequences nested inside of each other. The big difference here is that we don't see any of the array instructions we talked about last time. Instead we see method calls and calls to constructors. This is because we have a 2-dimensional array. As mentioned last time IL special cases 1-dimensional arrays that start at 0 and the array instructions we looked at last time are only used for those special arrays. When we move to 2 dimensions we lose the instructions and have to revert to method calls.

Speaking of instructions, next time we're going to look at some more basic instructions which either haven't come up yet or were missed.

2019-04-06 - Chicken or the Egg

People have long debated which came first, the chicken or the egg? Since genetic variation and mutations arise from the fertilization process the egg must have come first. Two proto-chickens got together and they produced an egg from which a chicken hatched. The hatching process doesn't change the animal inside of the egg and so a chicken must come from a chicken egg which was laid by proto-chicken parents.

That being said it does raise a nomenclature question. Is a chicken egg a chicken egg because it contains a chicken or because it was laid by a chicken? An unfertilized chicken egg is still considered to be a chicken egg even though it doesn't contain a chicken. This means that the egg from which the first chicken hatched was not a chicken egg because it was laid by a proto-chicken and so the chicken came first. Later that chicken laid chicken eggs.

At the same time changes in animal populations occur over long periods of time and are usually the result of environmental changes or some other external effect. There was probably never a specific first chicken. Something happened which caused the factors that influenced the survival of proto-chickens to change leading to different characteristics being more ideal and eventually leading to a population that was different enough from past generations to be considered a different species and so neither the chicken or the egg came first. They appeared at the same time.

It's all just a matter of perspective.

2019-02-03 - In IL: Array Instructions

An array is a series of elements laid out continuously in memory with each element being accessible using its index value. The two key properties of an array are its bounds and its dimensions. Bounds indicate the lowest and highest possible index values while dimensions indicate how many index values are required. For example a two-dimensional array could be used as a table with one index representing the rows and the other index representing the columns. The bounds would indicate how many rows and columns the table has. IL allows arrays with multiple dimensions and various bounds but treats one-dimensional arrays with a 0 for the lower bound as special. These arrays are referred to as vectors and there are special IL instructions for working with these types of arrays.

newarr (NEW ARRay)

Pops a integer off of the stack and creates a new array able to contain that many elements.

Instruction	Description	Binary Format
newarr <type>	Create new array	0x8D <T>

ldlen (LoaD LENgth)

Pops an array off of the stack and pushes the length of the array onto the stack.

Instruction	Description	Binary Format
ldlen	Length of array	0x8E

ldelema (LoaD ELEMent Address)

Pops an index value and an array off of the stack and pushes the address of the element of the array at the specified index onto the stack.

Instruction	Description	Binary Format
ldelema <type>	load element address of specified type	0x8F <T>

ldelem (LoaD ELEMent)

Pops an index value and an array off of the stack and pushes the element of the array at the specified index onto the stack.

Instruction	Description	Binary Format
ldelem.i1	load 8 bit integer element	0x90
ldelem.u1	load 8 bit unsigned integer element	0x91
ldelem.i2	load 16 bit integer element	0x92
ldelem.u2	load 16 bit unsigned integer element	0x93
ldelem.i4	load 32 bit integer element	0x94
ldelem.u4	load 32 bit unsigned integer element	0x95
ldelem.i8	load 64 bit integer element	0x96
ldelem.u8	load 64 bit unsigned integer element	0x96
ldelem.i	load native integer element	0x97
ldelem.r4	load 32 bit floating point element	0x98
ldelem.r8	load 64 bit floating point element	0x99
ldelem.ref	load object element	0x9A
ldelem <type>	load element of specified type	0xA3 <T>

stelem (SeT ELEMent)

Pops a value, an index value, and an array off of the stack and sets the element of the array at the specified index to the value.

Instruction	Description	Binary Format
stelem.i	set native integer element	0x9B
stelem.i1	set 8 bit integer element	0x9C
stelem.i2	set 16 bit integer element	0x9D
stelem.i4	set 32 bit integer element	0x9E
stelem.i8	set 64 bit integer element	0x9F
stelem.r4	set 32 bit floating point element	0xA0
stelem.r8	set 64 bit floating point element	0xA1
stelem.ref	set object element	0xA2
stelem<type>	set element of specified type	0xA4<T>

Next time we'll look at some programs which use arrays.

2019-06-15 - Abstraction is Magic

2019-04-27 - In IL: Summing Arrays

2019-04-06 - Chicken or the Egg

2019-02-03 - In IL: Array Instructions

newarr (NEW ARRay)

ldlen (LoaD LENgth)

ldelema (LoaD ELEMent Address)

ldelem (LoaD ELEMent)

stelem (SeT ELEMent)

6 7 8 9 10