home comics writing pictures archive about

2015-08-09 - Unity Coroutines

I’ve been looking into the Unity game engine lately. While doing so I cam across the concept of a “Coroutine”. In Unity most things are done in an update function in a class attached to an object which gets called every frame. This is good for a lot of things but not for actions that should occur over a period of time. Coroutines give the ability to insert delays between operations so that you can better control when updates occur. The Unity manual says that a coroutine is “a function declared with a return type of IEnumerator and with the yield return statement included somewhere in the body. […] To set a coroutine running, you need to use the StartCoroutine function:” The example they give is something similar to this.

Unity Coroutine Function
IEnumerator Fade() {
for (float f = 1f; f >= 0; f -= 0.1f) {
Color c = GetComponent<Renderer>().material.color;
c.a = f;
GetComponent<Renderer>().material.color = c;
yield return new WaitForSeconds(.1f);
}
}

And the coroutine would be started like this.

Start Coroutine Function
StartCoroutine(Fade());

Now I find the requirement that a coroutine has to be a function with yield returns to be very interesting. For one thing the caller of a function doesn’t generally know what the function is doing and for another the coroutine is being called and the return value passed to StartCoroutine, not the function itself. If the documentation says it then it must be correct though. Let’s see if the compiled version of the function gives us any clues as to how StartCoroutine is detecting that the called function contain yield returns.

Compiled Coroutine Function
.method private hidebysig
instance class [mscorlib]System.Collections.IEnumerator Fade () cil managed
{
// Method begins at RVA 0x1b89c
// Code size 20 (0x14)
.maxstack 2
.locals init (
[0] class Fader/'<Fade>d__0',
[1] class [mscorlib]System.Collections.IEnumerator
)
IL_0000: ldc.i4.0
IL_0001: newobj instance void Fader/'<Fade>d__0'::.ctor(int32)
IL_0006: stloc.0
IL_0007: ldloc.0
IL_0008: ldarg.0
IL_0009: stfld class Fader Fader/'<Fade>d__0'::'<>4__this'
IL_000e: ldloc.0
IL_000f: stloc.1
IL_0010: br.s IL_0012
IL_0012: ldloc.1
IL_0013: ret
} // end of method Fader::Fade
.method private hidebysig

This is Common Intermediate Language (CIL) code. It is what C# usually gets compiled into. It’s basically an assembly language for a virtual machine. When the program is running it starts up an instance of the virtual machine which interprets the IL and generates actual native code. IL is useful for us because it is somewhat readable and gives us clues as to what is actually going to happen when the code is ran.

Looking at the function we notice some strange things. The compiled function doesn’t have any yield returns. It doesn’t even look like the code to change colour is there. All it does is create and return a Fader/'<Fade>d__0' object. Well let’s go look at that class.

Compiled Coroutine Class
.class nested private auto ansi sealed beforefieldinit '<Fade>d__0'
extends [mscorlib]System.Object
implements class [mscorlib]System.Collections.Generic.IEnumerator`1<object>,
[mscorlib]System.Collections.IEnumerator,
[mscorlib]System.IDisposable
{
.custom instance void [mscorlib]System.Runtime.CompilerServices.CompilerGeneratedAttribute::.ctor() = (
01 00 00 00
)
// Fields
.field private object '<>2__current'
.field private int32 '<>1__state'
.field public class Fader '<>4__this'
.field public float32 '<f>5__1'
.field public valuetype [UnityEngine]UnityEngine.Color '<c>5__2'
// Methods
.method private final hidebysig newslot virtual
instance bool MoveNext () cil managed
{
.override method instance bool [mscorlib]System.Collections.IEnumerator::MoveNext()
// Method begins at RVA 0x1b770
// Code size 212 (0xd4)
.maxstack 3
.locals init (
[0] bool CS$1$0000,
[1] int32 CS$4$0001,
[2] bool CS$4$0002
)
IL_0000: ldarg.0
IL_0001: ldfld int32 Fader/'<Fade>d__0'::'<>1__state'
IL_0006: stloc.1
//int loc1 = <>1__state;
IL_0007: ldloc.1
IL_0008: switch (IL_001c, IL_0017)
//switch(loc1)
IL_0015: br.s IL_001e
IL_0017: br IL_009c
IL_001c: br.s IL_0023
IL_001e: br IL_00ce
IL_0023: ldarg.0
IL_0024: ldc.i4.m1
IL_0025: stfld int32 Fader/'<Fade>d__0'::'<>1__state'
IL_002a: nop
IL_002b: ldarg.0
IL_002c: ldc.r4 1
IL_0031: stfld float32 Fader/'<Fade>d__0'::'<f>5__1'
IL_0036: br.s IL_00b6
IL_0038: nop
IL_0039: ldarg.0
IL_003a: ldarg.0
IL_003b: ldfld class Fader Fader/'<Fade>d__0'::'<>4__this'
IL_0040: call instance !!0 [UnityEngine]UnityEngine.Component::GetComponent<class [UnityEngine]UnityEngine.Renderer>()
IL_0045: callvirt instance class [UnityEngine]UnityEngine.Material [UnityEngine]UnityEngine.Renderer::get_material()
IL_004a: callvirt instance valuetype [UnityEngine]UnityEngine.Color [UnityEngine]UnityEngine.Material::get_color()
IL_004f: stfld valuetype [UnityEngine]UnityEngine.Color Fader/'<Fade>d__0'::'<c>5__2'
IL_0054: ldarg.0
IL_0055: ldflda valuetype [UnityEngine]UnityEngine.Color Fader/'<Fade>d__0'::'<c>5__2'
IL_005a: ldarg.0
IL_005b: ldfld float32 Fader/'<Fade>d__0'::'<f>5__1'
IL_0060: stfld float32 [UnityEngine]UnityEngine.Color::a
IL_0065: ldarg.0
IL_0066: ldfld class Fader Fader/'<Fade>d__0'::'<>4__this'
IL_006b: call instance !!0 [UnityEngine]UnityEngine.Component::GetComponent<class [UnityEngine]UnityEngine.Renderer>()
IL_0070: callvirt instance class [UnityEngine]UnityEngine.Material [UnityEngine]UnityEngine.Renderer::get_material()
IL_0075: ldarg.0
IL_0076: ldfld valuetype [UnityEngine]UnityEngine.Color Fader/'<Fade>d__0'::'<c>5__2'
IL_007b: callvirt instance void [UnityEngine]UnityEngine.Material::set_color(valuetype [UnityEngine]UnityEngine.Color)
IL_0080: nop
IL_0081: ldarg.0
IL_0082: ldc.r4 0.1
IL_0087: newobj instance void [UnityEngine]UnityEngine.WaitForSeconds::.ctor(float32)
IL_008c: stfld object Fader/'<Fade>d__0'::'<>2__current'
IL_0091: ldarg.0
IL_0092: ldc.i4.1
IL_0093: stfld int32 Fader/'<Fade>d__0'::'<>1__state'
IL_0098: ldc.i4.1
IL_0099: stloc.0
IL_009a: br.s IL_00d2
IL_009c: ldarg.0
IL_009d: ldc.i4.m1
IL_009e: stfld int32 Fader/'<Fade>d__0'::'<>1__state'
IL_00a3: nop
IL_00a4: ldarg.0
IL_00a5: dup
IL_00a6: ldfld float32 Fader/'<Fade>d__0'::'<f>5__1'
IL_00ab: ldc.r4 0.1
IL_00b0: sub
IL_00b1: stfld float32 Fader/'<Fade>d__0'::'<f>5__1'
IL_00b6: ldarg.0
IL_00b7: ldfld float32 Fader/'<Fade>d__0'::'<f>5__1'
IL_00bc: ldc.r4 0.0
IL_00c1: clt.un
IL_00c3: ldc.i4.0
IL_00c4: ceq
IL_00c6: stloc.2
IL_00c7: ldloc.2
IL_00c8: brtrue IL_0038
IL_00cd: nop
IL_00ce: ldc.i4.0
IL_00cf: stloc.0
IL_00d0: br.s IL_00d2
IL_00d2: ldloc.0
IL_00d3: ret
} // end of method '<Fade>d__0'::MoveNext
.method private final hidebysig specialname newslot virtual
instance object 'System.Collections.Generic.IEnumerator<System.Object>.get_Current' () cil managed
{
.custom instance void [mscorlib]System.Diagnostics.DebuggerHiddenAttribute::.ctor() = (
01 00 00 00
)
.override method instance !0 class [mscorlib]System.Collections.Generic.IEnumerator`1<object>::get_Current()
// Method begins at RVA 0x1b850
// Code size 11 (0xb)
.maxstack 1
.locals init (
[0] object
)
IL_0000: ldarg.0
IL_0001: ldfld object Fader/'<Fade>d__0'::'<>2__current'
IL_0006: stloc.0
IL_0007: br.s IL_0009
IL_0009: ldloc.0
IL_000a: ret
//return <>2__current;
} // end of method '<Fade>d__0'::'System.Collections.Generic.IEnumerator<System.Object>.get_Current'
.method private final hidebysig newslot virtual
instance void System.Collections.IEnumerator.Reset () cil managed
{
.custom instance void [mscorlib]System.Diagnostics.DebuggerHiddenAttribute::.ctor() = (
01 00 00 00
)
.override method instance void [mscorlib]System.Collections.IEnumerator::Reset()
// Method begins at RVA 0x1b867
// Code size 6 (0x6)
.maxstack 8
IL_0000: newobj instance void [mscorlib]System.NotSupportedException::.ctor()
IL_0005: throw
//throw new NotSupportedException();
} // end of method '<Fade>d__0'::System.Collections.IEnumerator.Reset
.method private final hidebysig newslot virtual
instance void System.IDisposable.Dispose () cil managed
{
.override method instance void [mscorlib]System.IDisposable::Dispose()
// Method begins at RVA 0x1b86e
// Code size 2 (0x2)
.maxstack 8
IL_0000: nop
IL_0001: ret
} // end of method '<Fade>d__0'::System.IDisposable.Dispose
.method private final hidebysig specialname newslot virtual
instance object System.Collections.IEnumerator.get_Current () cil managed
{
.custom instance void [mscorlib]System.Diagnostics.DebuggerHiddenAttribute::.ctor() = (
01 00 00 00
)
.override method instance object [mscorlib]System.Collections.IEnumerator::get_Current()
// Method begins at RVA 0x1b874
// Code size 11 (0xb)
.maxstack 1
.locals init (
[0] object
)
IL_0000: ldarg.0
IL_0001: ldfld object Fader/'<Fade>d__0'::'<>2__current'
IL_0006: stloc.0
IL_0007: br.s IL_0009
IL_0009: ldloc.0
IL_000a: ret
} // end of method '<Fade>d__0'::System.Collections.IEnumerator.get_Current
.method public hidebysig specialname rtspecialname
instance void .ctor (
int32 '<>1__state'
) cil managed
{
.custom instance void [mscorlib]System.Diagnostics.DebuggerHiddenAttribute::.ctor() = (
01 00 00 00
)
// Method begins at RVA 0x1b88b
// Code size 14 (0xe)
.maxstack 8
IL_0000: ldarg.0
IL_0001: call instance void [mscorlib]System.Object::.ctor()
IL_0006: ldarg.0
IL_0007: ldarg.1
IL_0008: stfld int32 Fader/'<Fade>d__0'::'<>1__state'
IL_000d: ret
} // end of method '<Fade>d__0'::.ctor
// Properties
.property instance object 'System.Collections.Generic.IEnumerator<System.Object>.Current'()
{
.get instance object Fader/'<Fade>d__0'::'System.Collections.Generic.IEnumerator<System.Object>.get_Current'()
}
.property instance object System.Collections.IEnumerator.Current()
{
.get instance object Fader/'<Fade>d__0'::System.Collections.IEnumerator.get_Current()
}
} // end of class <Fade>d__0

Wow, that’s a lot of stuff. It appears to be a class implementing the IEnumerator interface with the colour change logic in its MoveNext() function. So where did all this code come from? The compiler put it in there. It took the original function and generated a class based on the return type. The body of the function got transformed into a state driven enumerator with the current property being set instead of having yield returns. So if the yield return code gets compiled into an IEnumerator class then how can StartCoroutine require a function that contains yield returns? Well clearly it can’t. If you took that compiled class and rewrote it in C# it would look something like this.

Unity Coroutine Class
public class FaderEnumerator : IEnumerator{
private object _current;
private int _state;
public Fader _this;
public float _f;
public Color _c;
public object Current {
get {
return _current;
}
}
public FaderEnumerator(int state)
{
_state = state;
}
public bool MoveNext ()
{
switch (_state) {
case 0:
_state = -1;
_f = 1;
break;
case 1:
_state = -1;
_f -= 0.1f;
break;
}
if(_f >= 0)
{
_c = _this.GetComponent<Renderer>().material.color;
_c.a = _f;
_this.GetComponent<Renderer>().material.color = _c;
_current = new WaitForSeconds(.1f);
_state = 1;
return true;
}
return false;
}
public void Reset ()
{
throw new NotSupportedException();
}
}

To start this coroutine you would do this.

Start Coroutine Class
var faderEnumerator = new FaderEnumerator(0);
faderEnumerator._this = this;
StartCoroutine(faderEnumerator);

These two snippets are roughly equivalent to the original two.

So is the documentation wrong? Well no, the reason the documentation says a coroutine is a function with yield returns is because they made up the concept of couroutines and can define them however they wish. At the same time it isn't being entirely honest though. Clearly you can have a coroutine that isn't a function containing yield returns so it can't really be said that's a requirement.

There are a few reasons I bring this up. Firstly I believe it’s important to understand what abstractions are doing when you use them. The more you understand about what your code is doing the easier it is to debug. The second reason is because the IEnumerator class case is probably more reusable. Since you are working with a whole class there are more options for controlling how it operates. You could create a generic class that could apply similar logic to many different types of things

Finally because the compiler is generating code you need to be careful what you do with yield returns. The compiler is smart and it’s going to do it’s best to do what you are telling it to do but it’s not perfect. There are probably some scenarios where the compiler will refuse to compile the function or worse it will compile but not do exactly what you want it to do. Because you can’t see the code it generates you won’t be able to see exactly what it’s doing.

Also I find it fascinating to look at IL.

2015-07-25 - The Correct Date Format

Occasionally you will hear the British and the Americans fighting over what is the correct date format. The Americans think it’s MM/DD/YYYY and the British think it’s DD/MM/YYYY, but they are both wrong because the correct format is YYYY-MM-DD.

One reason is ambiguity. Without some additional context it’s hard to tell if 12/01/2012 is December 1st, 2012 or the 12th of January, 2012. This ambiguity doesn’t exist for YYYY-MM-DD because there’s no YYYY-DD-MM in common usage. This means that 2012-12-01 is always December 1st, 2012 and there’s no chance of reading the date wrong.

The second reason is sorting. When dates are sorted as text they are ordered by the first part of the date first. This means that MM/DD/YYYY sorts by month, DD/MM/YYYY sorts by day, and YYYY-MM-DD sorts by year. Ideally you want dates sorted chronologically which means it’s better to have the largest part first. YYYY-MM-DD does this while the other formats will mix dates up.

Consider the dates April 2nd 2012, April 3rd 2012, June 15th 2012, April 17th 2013, and June 2nd 2013. The following table shows these dates sorted according to the various date formats.

YYYY-MM-DD MM/DD/YYYY DD/MM/YYYY
2012-04-02 04/02/2012 02/04/2012
2012-04-03 04/03/2012 02/06/2013
2012-06-15 04/17/2013 03/04/2012
2013-04-17 06/02/2013 15/06/2012
2013-06-02 06/15/2012 17/04/2013

With YYYY-MM-DD format everything is in order. All the 2012 dates come before the 2013 dates, All the April dates come before the June dates within the same year. With MM/DD/YYYY we have 2012 dates before and after the 2013 dates. With DD/MM/YYYY we have 2013 dates before 2012 dates and June dates before April dates.

Also I personally think the dashes look better.

2015-07-04 - Hello

I’ve recently finished rewriting the code viewer to be a command line application. The main reason for doing this was to make it easier to debug and improve. While making the command line version I changed some things so now I have to port those changes back to the web. Then I can test issues in the command line version and improve both in unison.

A side benefit of the command line version is that I can easily generate formatted code snippets. That means if I wanted to show how to write “Hello” to the console in C# I could just ask the program to format that snippet and then include it in the page like this.

Console.WriteLine("Hello");

Or if I wanted to do it in C.

printf("Hello\n");

Or Java.

System.out.println("Hello");

Or even MS-DOS x86 Assembly

mov ax, 0200h ; DOS function: Write to file or device
mov dl, 48h ; H
int 021h
mov dl, 65h ; e
int 021h
mov dl, 6Ch ; l
int 021h
mov dl, 6Ch ; l
int 021h
mov dl, 6Fh ; o
int 021h
mov dl, 0Dh ; \r
int 021h
mov dl, 0Ah ; \n
int 021h

I’ve started a page to collect some small “Hello” programs. The plan is to add to it as I learn new languages.

2015-06-20 - Character Encoding: UTF-8/UTF-16/UTF-32

As Unicode expanded there was a counter movement to limit the amount of data required per character. This resulted in several Unicode Transformation Formats (UTF) that aimed to transform the fixed width Unicode characters into a more complex format where only the least commonly used characters required the full 4 bytes.

UTF-8 encodes characters as a series of 8 bit blocks. It was developed for compatibility with ASCII. The first 127 characters are directly encoded as a single byte. Because the first 127 Unicode characters match the original 7-bit ASCII encoding all ASCII text is automatically valid UTF-8 text. Characters above 127 are encoded as a series of blocks with the most significant bits of each byte used to encode the sequencing. The first block will have two or more 1s followed by a 0 with the number of 1s indicating the number of bytes in the sequence. Subsequent blocks will have 10 as their most significant bits. The bits of the character are encoded in the remaining bits.

Character First Block Second Block Third Block Fourth Block
A U+0041 0x41 NA NA NA
Σ U+03A3 0xCE 0xA3 NA NA
😊 U+1F60A 0xFD 0x9F 0x98 0x8A

The bytes for U+1F60A are calculated by first determining the number of bits required to represent the character. 0x1F60A is 0b11111011000001010 which has 17 bits. 3 bytes provides 16 character bits so 4 bytes are required. The value is padded with 0s to 21 bits and then slotted into the pattern 0b11110xx 0b10xxxxxx 0b10xxxxxx 0b10xxxxxx.

UTF-16 represents Unicode characters as 1 or 2 16 bit blocks. It was developed for compatibility with existing UCS-2 implementations. All UCS-2 characters are valid UTF-16 characters and require only 2 bytes. Additional characters are encoded using surrogate pairs. If a 16 bit block has a value in the range 0xD800 to 0xDBFF it is a leading or high surrogate pair and should be followed by the trailing or low surrogate pair in the range 0xDC00 to 0xDFFF. The character value is determined by subtracting the base surrogate value from each pair, 0xD800 and 0xDC00 respectably, then combining the resulting values as two 10 bit chunks and adding 0x010000.

Character First Block Second Block
A U+0041 0x0041 NA
Σ U+03A3 0x03A3 NA
😊 U+1F60A 0xD83D 0xDE0A

The surrogates for U+1F60A are determined by first subtracting 0x010000 from the value to get 0xF60A which, extended to 20 bits, is 0b00001111011000001010. Adding 0xDC00 to the least significant 10 bits gives 0xDE0A which is the low surrogate pair. Adding 0xD800 to the next 10 bits gives 0xD83D which is the high surrogate pair.

UTF-32 represents all code points as a series of 32 bit blocks which is enough to directly represent all current Unicode characters. UTF-32 is identical to UCS-4 but named using the transform pattern to match the other UTF encoding schemes

UTF-8 and UTF-16 are more space efficient than UTF-32 since most characters will only require 1 or 2 bytes. They are also never less efficient as characters can at most use 4 bytes. This space savings comes at the cost of complexity. With variable width characters it’s no longer possible to find the number of characters in a string or the Nth character without reading through the string. Since computers have become more powerful and the transmission of data more common this trade off is acceptable without limiting the number of characters that can be represented by a single encoding.