home comics writing pictures archive about

2016-04-09 - Why Comment

Commenting code is one of those things that seems simple at first but turns out to be rather complicated. Write too few comments and it's hard for people to understand the code. Write too many comments and it's equally hard because it's difficult to see the code through the comments.

I think the answer is in determining what information isn't already being provided by the code. Variable names tell you what information is being worked with. Operations and function names tell you how those variables are going to change. Because the code is meant to tell the computer what to do it's also good at telling the reader what it's doing. The only thing missing is why. Why are you calling this function with these arguments? Why did you implement that algorithm a certain way? Why aren't you using this feature of the language? As much as possible you want your code to tell the reader everything they need to know but comments can be added to fill in gaps. They help to explain to future people reading the code why you did certain things a certain way if it's not obvious.

The same goes for documenting your own classes and functions. The class name, methods, and fields should tell you a lot about what the class if for. Similarly a function name, parameters, and return type can tell you what it will do. Again the only thing missing is why. Why would I want to use this class instead of another? Why would I want to call this function? That information can be put in comments on the class or function. Then people thinking about using that class or function can understand the scenario it's meant for and decide if it meets their needs or not.

2016-03-29 - In IL: Instructions and the Stack

So now that we’ve learned where local variables are stored it’s time to learn what we can do with them. In most high level languages you write a series of symbols which indicate what you want to happen with each symbol implying the action they represent. In IL you have a series of instruction and their arguments with the name of each instruction implying the action it will perform. Before we get to instructions though we need to understand stacks.

A Stack is a data structure that can be imagined as a pile of things. You can “push” things onto the top of the pile or “pop” things off of the top of the pile. Depending on the implementation you may also be able to look at things on the top of the pile without removing them. In IL every method has an Evaluation Stack. Most instructions pop things from this stack, push things onto this stack, or both. The stack is used to store temporary values similar to how registers work in some computer processors.

There are a bunch of instructions in IL so I will start by summarizing some of the common ones.

nop (No OPeration)

Used to fill up space for various uses. Such as allowing instructions to be patched in later.

Instruction Description Binary Format
nop Does nothing 0x00

ldc (LoaD Constant)

Pushes a value, determined by the instruction or its argument, onto the stack.

Instruction Description Binary Format
ldc.i4.m1 Loads –1 onto the stack as a 4-byte integer 0x15
ldc.i4.X Loads X onto the stack as a 4-byte integer where X is 0-8 0x16 - 0x1E
ldc.i4.s <num> Loads “short” 1-byte integer <num> onto the stack as a 4-byte integer 0x1F <int8>
ldc.i4 <num> Loads 4-byte integer <num> onto the stack 0x20 <int32>
ldc.i8 <num> Loads 8-byte integer <num> onto the stack 0x21 <int64>
ldc.r4 <num> Loads 4-byte floating-point value <num> onto the stack 0x22 <float32>
ldc.r8 <num> Loads 8-byte floating-point value <num> onto the stack 0x23 <float64>

ldstr (LoaD STRing)

Pushes a reference to a string onto the stack

Instruction Description Binary Format
ldstr <string> Loads a reference to <string> onto the stack 0x72 <T>

<T> represents a metadata token. These are 4-byte values that indicate a location where the actual data is stored in the file containing the code.

ldloc (LoaD LOCal variable)

Push the value of a local variable onto the stack.

Instruction Description Binary Format
ldcloc.X Loads the value of local variable X onto the stack where X is 0-3 0x06 - 0x09
ldcloc.s <index> Loads the value of local variable with “short” index <index> onto the stack 0x11 <uint8>
ldcloc <index> Loads the value of local variable with index <index> onto the stack 0xFE 0x0C <uint16>

conv (CONVersion)

Pops a value off of the stack, converts it to the type based on the specific instruction used and pushes the result onto the stack.

Instruction Description Binary Format
conv.i1 Converts the value on the stack to a 1-byte integer 0x67
conv.i2 Converts the value on the stack to a 2-byte integer 0x68
conv.i4 Converts the value on the stack to a 4-byte integer 0x69
conv.i8 Converts the value on the stack to a 8-byte integer 0x6A
conv.r4 Converts the value on the stack to a 4-byte floating-point value 0x6B
conv.r8 Converts the value on the stack to a 8-byte floating-point value 0x6C
conv.u4 Converts the value on the stack to a 4-byte unsigned integer 0x6D
conv.u8 Converts the value on the stack to a 8-byte unsigned integer 0x6E
conv.u2 Converts the value on the stack to a 2-byte unsigned integer 0xD1
conv.u1 Converts the value on the stack to a 1-byte unsigned integer 0xD2

box

Pops a value-type value off of the stack, boxes it as the specified reference type and pushes the result onto the stack.

Instruction Description Binary Format
box <type> Boxes the value on the stack as the specified <type> 0x8C <T>

stloc (SeT LOCal variable)

Pops a value off of the stack and sets it as the value of a local variable.

Instruction Description Binary Format
stloc.X Sets the value of local variable X with a value from the stack where X is 0-3 0x0A - 0x0D
stloc.s <index> Sets the value of local variable with “short” index <index> with a value from the stack 0x13 <uint8>
stloc <index> Sets the value of local variable with index <index> with a value from the stack 0xFE 0x0E <uint16>

add (ADDition), sub (SUBtraction), mul (MULtiplication), div (DIVision)

Pops two values off of the stack and pushes the result of the specified action onto the stack. How the action is performed and the type returned depends on the types of the values.

Instruction Description Binary Format
add Adds the first value popped off of the stack to the second value popped off of the stack and pushes the result onto the stack 0x58
sub Subtracts the first value popped off of the stack from the second value popped off of the stack and pushes the result onto the stack 0x59
mul Multiplies the second value popped off of the stack by the first value popped off of the stack and pushes the result onto the stack 0x5A
div Divides the second value popped off of the stack by the first value popped off of the stack and pushes the result onto the stack 0x5B

call

Used to call a method based on the argument to the instruction. The method pops its parameters off of the stack, if any, and pushes its return value onto the stack, if it has one.

Instruction Description Binary Format
call <method> Calls the <method> method 0x28 <T>

ret (RETurn)

Returns from a method. Uses the value on the stack as the return value. The stack should be empty except for this value. If the method doesn't return a value then the stack should be completely empty.

Instruction Description Binary Format
ret Returns from a method 0x2A

Next time we will see some of these instructions in action.

2016-03-05 - Wire Entropy

One of the most annoying things about dealing with computers is all the wires. They seem to just wrap themselves around each other until you end up with a tangled mess. Well that phenomenon can be explained by entropy. Entropy is a complicated thermodynamic quantity but it can be generalized as a measure of how chaotic a system is.

When you first arrange wires they are very neat and tidy. This is a low entropy or ordered arrangement because the specific arrangement of the parts is very important to the overall state. There are a limited number of arrangements that can be considered “neat” and “tidy”. Wires being tangled up is a high entropy or chaotic arrangement because the arrangement of the parts isn’t very important to the overall state. There are a large number of arrangements that can be considered “tangled”.

The Second Law of Thermodynamics states that the entropy of a system can never decrease without work being done to it. This means that wires won’t straighten themselves out unless someone or something forces them to. The problem is there’s no law saying entropy can’t increase. As you bump your desk, as you pull on the wires, as the world turns they are being jostled about. These small forces tend to be rather random so they can’t really be thought of as directed work. So according to the laws of physics these small forces can’t straighten out the wires and can only tangle them.

And that’s the universe for you.

2016-02-13 - In IL: Variables in Visual Basic .NET

Last time we looked at how variables were defined in the IL generated by compiling a C# program. Now we will do the same with a Visual Basic .NET (VB) program and see what changes. Let’s start with a VB program that does the same thing as the C# program.

Module1.vb
Module Module1
Sub Main()
Dim b As Boolean = True
Dim c As Char = "c"c
Dim f As Single = Single.MaxValue
Dim d As Double = Double.MaxValue
Dim sb As SByte = SByte.MaxValue
Dim sh As Short = Short.MaxValue
Dim i As Integer = Integer.MaxValue
Dim l As Long = Long.MaxValue
Dim ub As Byte = Byte.MaxValue
Dim ush As UShort = UShort.MaxValue
Dim ui As UInteger = UInteger.MaxValue
Dim ul As ULong = ULong.MaxValue
Dim dl As Decimal = Decimal.MaxValue
Dim o As Object = New Object()
Dim s As String = "s"
Dim al As ArrayList = New ArrayList()
Console.WriteLine(b)
Console.WriteLine(c)
Console.WriteLine(o)
Console.WriteLine(s)
Console.WriteLine(f)
Console.WriteLine(d)
Console.WriteLine(sb)
Console.WriteLine(sh)
Console.WriteLine(i)
Console.WriteLine(l)
Console.WriteLine(ub)
Console.WriteLine(ush)
Console.WriteLine(ui)
Console.WriteLine(ul)
Console.WriteLine(dl)
Console.WriteLine(al)
End Sub
End Module
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39

This program is functionally the same as the C# version. It declares a bunch of variables and then prints their string representation to the screen. There's a few syntax differences though such as how variables are declared, the lack less semicolons and curly braces, having to cast the string to a character using c. So let’s see what the compiled IL looks like.

Main
.method public static void Main() cil managed
{
.entrypoint
.custom instance void [mscorlib]System.STAThreadAttribute::.ctor() = ( 01 00 00 00 )
// Code size 240 (0xf0)
.maxstack 6
.locals init ([0] bool b,
[1] char c,
[2] float32 f,
[3] float64 d,
[4] int8 sb,
[5] int16 sh,
[6] int32 i,
[7] int64 l,
[8] uint8 ub,
[9] uint16 ush,
[10] uint32 ui,
[11] uint64 ul,
[12] valuetype [mscorlib]System.Decimal dl,
[13] object o,
[14] string s,
[15] class [mscorlib]System.Collections.ArrayList al)
IL_0000: nop
IL_0001: ldc.i4.1
IL_0002: stloc.0
IL_0003: ldc.i4.s 99
IL_0005: stloc.1
IL_0006: ldc.r4 3.4028235e+038
IL_000b: stloc.2
IL_000c: ldc.r8 1.7976931348623157e+308
IL_0015: stloc.3
IL_0016: ldc.i4.s 127
IL_0018: stloc.s sb
IL_001a: ldc.i4 0x7fff
IL_001f: stloc.s sh
IL_0021: ldc.i4 0x7fffffff
IL_0026: stloc.s i
IL_0028: ldc.i8 0x7fffffffffffffff
IL_0031: stloc.s l
IL_0033: ldc.i4 0xff
IL_0038: stloc.s ub
IL_003a: ldc.i4 0xffff
IL_003f: stloc.s ush
IL_0041: ldc.i4.m1
IL_0042: stloc.s ui
IL_0044: ldc.i4.m1
IL_0045: conv.i8
IL_0046: stloc.s ul
IL_0048: ldloca.s dl
IL_004a: ldc.i4.m1
IL_004b: ldc.i4.m1
IL_004c: ldc.i4.m1
IL_004d: ldc.i4.0
IL_004e: ldc.i4.0
IL_004f: call instance void [mscorlib]System.Decimal::.ctor(int32,
int32,
int32,
bool,
uint8)
IL_0054: newobj instance void [mscorlib]System.Object::.ctor()
IL_0059: call object [mscorlib]System.Runtime.CompilerServices.RuntimeHelpers::GetObjectValue(object)
IL_005e: stloc.s o
IL_0060: ldstr "s"
IL_0065: stloc.s s
IL_0067: newobj instance void [mscorlib]System.Collections.ArrayList::.ctor()
IL_006c: stloc.s al
IL_006e: ldloc.0
IL_006f: call void [mscorlib]System.Console::WriteLine(bool)
IL_0074: nop
IL_0075: ldloc.1
IL_0076: call void [mscorlib]System.Console::WriteLine(char)
IL_007b: nop
IL_007c: ldloc.s o
IL_007e: call object [mscorlib]System.Runtime.CompilerServices.RuntimeHelpers::GetObjectValue(object)
IL_0083: call void [mscorlib]System.Console::WriteLine(object)
IL_0088: nop
IL_0089: ldloc.s s
IL_008b: call void [mscorlib]System.Console::WriteLine(string)
IL_0090: nop
IL_0091: ldloc.2
IL_0092: call void [mscorlib]System.Console::WriteLine(float32)
IL_0097: nop
IL_0098: ldloc.3
IL_0099: call void [mscorlib]System.Console::WriteLine(float64)
IL_009e: nop
IL_009f: ldloc.s sb
IL_00a1: call void [mscorlib]System.Console::WriteLine(int32)
IL_00a6: nop
IL_00a7: ldloc.s sh
IL_00a9: call void [mscorlib]System.Console::WriteLine(int32)
IL_00ae: nop
IL_00af: ldloc.s i
IL_00b1: call void [mscorlib]System.Console::WriteLine(int32)
IL_00b6: nop
IL_00b7: ldloc.s l
IL_00b9: call void [mscorlib]System.Console::WriteLine(int64)
IL_00be: nop
IL_00bf: ldloc.s ub
IL_00c1: call void [mscorlib]System.Console::WriteLine(int32)
IL_00c6: nop
IL_00c7: ldloc.s ush
IL_00c9: call void [mscorlib]System.Console::WriteLine(int32)
IL_00ce: nop
IL_00cf: ldloc.s ui
IL_00d1: call void [mscorlib]System.Console::WriteLine(uint32)
IL_00d6: nop
IL_00d7: ldloc.s ul
IL_00d9: call void [mscorlib]System.Console::WriteLine(uint64)
IL_00de: nop
IL_00df: ldloc.s dl
IL_00e1: call void [mscorlib]System.Console::WriteLine(valuetype [mscorlib]System.Decimal)
IL_00e6: nop
IL_00e7: ldloc.s al
IL_00e9: call void [mscorlib]System.Console::WriteLine(object)
IL_00ee: nop
IL_00ef: ret
} // end of method Module1::Main
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189

That looks almost identical to the compiled C# program from last except for the added STAThreadAttribute line. But how can that be? These are two completely different programs in two completely different languages? Well it’s because they aren’t completely different programs. The actual operations being performed are identical so it makes sense that the IL generated would be the same.

IL captures the semantics of the code used to generate it not the syntax. The code to perform a specific operation could be completely different in two different languages but if those operations are meant to do the same thing then the IL generated will be similar. In the same way different languages could have features that look the same but work very differently which would generate different IL. The set of IL features is typically larger than the requirements of any single language to allow for a wide variety of language targeting the CLI.

Next time we will learn about stacks and operations.