2015-08-09 - Unity Coroutines
I’ve been looking into the Unity game engine lately. While doing so I cam across the concept of a “Coroutine”. In Unity most things are done in an update function in a class attached to an object which gets called every frame. This is good for a lot of things but not for actions that should occur over a period of time. Coroutines give the ability to insert delays between operations so that you can better control when updates occur. The Unity manual says that a coroutine is “a function declared with a return type of IEnumerator and with the yield return statement included somewhere in the body. […] To set a coroutine running, you need to use the StartCoroutine function:” The example they give is something similar to this.
And the coroutine would be started like this.
Now I find the requirement that a coroutine has to be a function with yield returns to be very interesting. For one thing the caller of a function doesn’t generally know what the function is doing and for another the coroutine is being called and the return value passed to StartCoroutine, not the function itself. If the documentation says it then it must be correct though. Let’s see if the compiled version of the function gives us any clues as to how StartCoroutine is detecting that the called function contain yield returns.
This is Common Intermediate Language (CIL) code. It is what C# usually gets compiled into. It’s basically an assembly language for a virtual machine. When the program is running it starts up an instance of the virtual machine which interprets the IL and generates actual native code. IL is useful for us because it is somewhat readable and gives us clues as to what is actually going to happen when the code is ran.
Looking at the function we notice some strange things. The compiled function doesn’t have any yield returns. It doesn’t even look like the code to change colour is there. All it does is create and return a Fader/'<Fade>d__0' object. Well let’s go look at that class.
Wow, that’s a lot of stuff. It appears to be a class implementing the IEnumerator interface with the colour change logic in its MoveNext() function. So where did all this code come from? The compiler put it in there. It took the original function and generated a class based on the return type. The body of the function got transformed into a state driven enumerator with the current property being set instead of having yield returns. So if the yield return code gets compiled into an IEnumerator class then how can StartCoroutine require a function that contains yield returns? Well clearly it can’t. If you took that compiled class and rewrote it in C# it would look something like this.
To start this coroutine you would do this.
These two snippets are roughly equivalent to the original two.
So is the documentation wrong? Well no, the reason the documentation says a coroutine is a function with yield returns is because they made up the concept of couroutines and can define them however they wish. At the same time it isn't being entirely honest though. Clearly you can have a coroutine that isn't a function containing yield returns so it can't really be said that's a requirement.
There are a few reasons I bring this up. Firstly I believe it’s important to understand what abstractions are doing when you use them. The more you understand about what your code is doing the easier it is to debug. The second reason is because the IEnumerator class case is probably more reusable. Since you are working with a whole class there are more options for controlling how it operates. You could create a generic class that could apply similar logic to many different types of things
Finally because the compiler is generating code you need to be careful what you do with yield returns. The compiler is smart and it’s going to do it’s best to do what you are telling it to do but it’s not perfect. There are probably some scenarios where the compiler will refuse to compile the function or worse it will compile but not do exactly what you want it to do. Because you can’t see the code it generates you won’t be able to see exactly what it’s doing.
Also I find it fascinating to look at IL.
2015-07-25 - The Correct Date Format
Occasionally you will hear the British and the Americans fighting over what is the correct date format. The Americans think it’s MM/DD/YYYY and the British think it’s DD/MM/YYYY, but they are both wrong because the correct format is YYYY-MM-DD.
One reason is ambiguity. Without some additional context it’s hard to tell if 12/01/2012 is December 1st, 2012 or the 12th of January, 2012. This ambiguity doesn’t exist for YYYY-MM-DD because there’s no YYYY-DD-MM in common usage. This means that 2012-12-01 is always December 1st, 2012 and there’s no chance of reading the date wrong.
The second reason is sorting. When dates are sorted as text they are ordered by the first part of the date first. This means that MM/DD/YYYY sorts by month, DD/MM/YYYY sorts by day, and YYYY-MM-DD sorts by year. Ideally you want dates sorted chronologically which means it’s better to have the largest part first. YYYY-MM-DD does this while the other formats will mix dates up.
Consider the dates April 2nd 2012, April 3rd 2012, June 15th 2012, April 17th 2013, and June 2nd 2013. The following table shows these dates sorted according to the various date formats.
| YYYY-MM-DD | MM/DD/YYYY | DD/MM/YYYY |
|---|---|---|
| 2012-04-02 | 04/02/2012 | 02/04/2012 |
| 2012-04-03 | 04/03/2012 | 02/06/2013 |
| 2012-06-15 | 04/17/2013 | 03/04/2012 |
| 2013-04-17 | 06/02/2013 | 15/06/2012 |
| 2013-06-02 | 06/15/2012 | 17/04/2013 |
With YYYY-MM-DD format everything is in order. All the 2012 dates come before the 2013 dates, All the April dates come before the June dates within the same year. With MM/DD/YYYY we have 2012 dates before and after the 2013 dates. With DD/MM/YYYY we have 2013 dates before 2012 dates and June dates before April dates.
Also I personally think the dashes look better.
2015-07-04 - Hello
I’ve recently finished rewriting the code viewer to be a command line application. The main reason for doing this was to make it easier to debug and improve. While making the command line version I changed some things so now I have to port those changes back to the web. Then I can test issues in the command line version and improve both in unison.
A side benefit of the command line version is that I can easily generate formatted code snippets. That means if I wanted to show how to write “Hello” to the console in C# I could just ask the program to format that snippet and then include it in the page like this.
Or if I wanted to do it in C.
Or Java.
Or even MS-DOS x86 Assembly
I’ve started a page to collect some small “Hello” programs. The plan is to add to it as I learn new languages.
2015-06-20 - Character Encoding: UTF-8/UTF-16/UTF-32
As Unicode expanded there was a counter movement to limit the amount of data required per character. This resulted in several Unicode Transformation Formats (UTF) that aimed to transform the fixed width Unicode characters into a more complex format where only the least commonly used characters required the full 4 bytes.
UTF-8 encodes characters as a series of 8 bit blocks. It was developed for compatibility with ASCII. The first 127 characters are directly encoded as a single byte. Because the first 127 Unicode characters match the original 7-bit ASCII encoding all ASCII text is automatically valid UTF-8 text. Characters above 127 are encoded as a series of blocks with the most significant bits of each byte used to encode the sequencing. The first block will have two or more 1s followed by a 0 with the number of 1s indicating the number of bytes in the sequence. Subsequent blocks will have 10 as their most significant bits. The bits of the character are encoded in the remaining bits.
| Character | First Block | Second Block | Third Block | Fourth Block |
|---|---|---|---|---|
| A U+0041 | 0x41 | NA | NA | NA |
| Σ U+03A3 | 0xCE | 0xA3 | NA | NA |
| 😊 U+1F60A | 0xFD | 0x9F | 0x98 | 0x8A |
The bytes for U+1F60A are calculated by first determining the number of bits required to represent the character. 0x1F60A is 0b11111011000001010 which has 17 bits. 3 bytes provides 16 character bits so 4 bytes are required. The value is padded with 0s to 21 bits and then slotted into the pattern 0b11110xx 0b10xxxxxx 0b10xxxxxx 0b10xxxxxx.
UTF-16 represents Unicode characters as 1 or 2 16 bit blocks. It was developed for compatibility with existing UCS-2 implementations. All UCS-2 characters are valid UTF-16 characters and require only 2 bytes. Additional characters are encoded using surrogate pairs. If a 16 bit block has a value in the range 0xD800 to 0xDBFF it is a leading or high surrogate pair and should be followed by the trailing or low surrogate pair in the range 0xDC00 to 0xDFFF. The character value is determined by subtracting the base surrogate value from each pair, 0xD800 and 0xDC00 respectably, then combining the resulting values as two 10 bit chunks and adding 0x010000.
| Character | First Block | Second Block |
|---|---|---|
| A U+0041 | 0x0041 | NA |
| Σ U+03A3 | 0x03A3 | NA |
| 😊 U+1F60A | 0xD83D | 0xDE0A |
The surrogates for U+1F60A are determined by first subtracting 0x010000 from the value to get 0xF60A which, extended to 20 bits, is 0b00001111011000001010. Adding 0xDC00 to the least significant 10 bits gives 0xDE0A which is the low surrogate pair. Adding 0xD800 to the next 10 bits gives 0xD83D which is the high surrogate pair.
UTF-32 represents all code points as a series of 32 bit blocks which is enough to directly represent all current Unicode characters. UTF-32 is identical to UCS-4 but named using the transform pattern to match the other UTF encoding schemes
UTF-8 and UTF-16 are more space efficient than UTF-32 since most characters will only require 1 or 2 bytes. They are also never less efficient as characters can at most use 4 bytes. This space savings comes at the cost of complexity. With variable width characters it’s no longer possible to find the number of characters in a string or the Nth character without reading through the string. Since computers have become more powerful and the transmission of data more common this trade off is acceptable without limiting the number of characters that can be represented by a single encoding.



![[Valid RSS]](/images/valid-rss-rogers.png)
