2015-11-08 - In IL: Introduction
The Common Intermediate Language (CIL), Microsoft Intermediate Language (MSIL) or usually just Intermediate Language (IL) is the language of the virtual machine component of the Common Language Infrastructure (CLI) standard. The .NET Framework and Mono are implantations of this standard.
The virtual machine is a lot like a computer but implemented in software. The virtual machine exectures IL programs by compiling it into native code. That native code is the language of the computer that the program is running on and represents the operations that it understands. The virtual machine also handles a lot of tasks that would normally need to be done by the program itself such as allocating and freeing memory. This simplifies programming and also provides isolation between the program and the actual computer.
IL provides an intermediate step between the higher level languages and the native code of the computer. It provides classes and functions which are common in high level languages while at the same time having a syntax and structure more like native code. This makes it familiar to people approaching it from the higher level language point of view and from the native code point of view allowing both sides to easily work with it. It also separates responsibilities because the language designers only need to focus on the IL compiler while the designers of the virtual machine implementation focus on translating the IL into native code. Any changes to the IL compiler benefit all implementations it supports. Any improvements to the virtual machine benefit all the languages that target it.
In this series of posts I want to investigate how languages that target the .NET Framework (C#, Visual Basic .NET) compile into IL code. The idea is to learn a little bit about IL, how the compiler works, and how .NET programs actually run. This information should be useful when designing or debugging programs and may even lead to some ideas about how to build a compiler.
Should be fun.
2015-10-18 - Character Encoding: Conclusion
I actually don’t spend a lot of time dealing with character encodings. It comes up a little bit when dealing with files but even then it’s just a matter of selecting the right encoding. So then why did I spend the time to investigate and write about character encodings?
Partially it’s because it’s a good thing to be aware of. It’s useful to understand what selecting the encoding is doing. Also the problem of how to store non-numeric data comes up a lot in programming and character encodings serve as a good example of that problem.
That being said the main thing is how it shows that the best solution to a problem can change over time. The designers of ASCII made the decision that data savings was more important that number of characters so they limited themselves to 7 bits. As time went on this situation changed; space became cheaper while the desire for more characters grew. This lead to Extended-ASCII and Unicode with their 8 bit, 16 bit, and 32 bit characters. Then things partially flipped around again. With the internet and data being sent all over the world space savings became important again but people didn’t want to lose their extra characters. This lead to the UTF formats that went for space savings under common circumstances at the cost of added complexity.
ASCII and Extended ASCII don’t fit current needs because the limited character set and need for code pages complicates sharing information around the world. Similarly UTF-8 wouldn’t have worked for early computers and Teletype machines because the variable width characters would have made it excessively complicated to implement on the hardware of the time.
I find problems that don’t have singular answers to be the most interesting. Problems with the specifics of the situation impact the requirements and multiple solutions can work simultaneously in different situations. All of these encodings are in use today. Some more than others but the introduction of newer encodings hasn’t destroyed those that came before it. One of the goals of UTF-8 was to be compatible with ASCII for this reason.
I’d also like to point out that this is not an exhaustive list. There are a bunch of other character encodings. Even within Unicode their are other transformations and variations of transformations. These are just the most common ones that I know of and have encountered.
2015-10-04 - Seriously Fun TV
I really like TV. I watch a fair bit of it, and from all that watching I’ve found that there’s a recipe to the shows that I enjoy. The best TV shows are those that start with a large scoop of drama and then add in a few spoonfuls of comedy on top of it.
I like stories and characters. Those are the things I find interesting about TV shows. So I tend to prefer dramas because they usually have better stories and characters. The problem is that shows that are pure drama tend to be very dark and depressing. They are just about bad things happening to the main characters and their trying to deal with things before the next bad things happen. I personally don’t like shows that make me depressed.
That’s where the comedy comes in. A little bit of silliness helps to break up the drama in the story and make the characters more likable. The comedy keeps things from being too depressing. The show can’t just be a pure comedy though because then you lose the stories and characters. They just become a series of jokes and as funny as they may be they aren’t really interesting.
So my favourite shows are those that have good stories and characters but aren’t too serious.
2015-08-09 - Unity Coroutines
I’ve been looking into the Unity game engine lately. While doing so I cam across the concept of a “Coroutine”. In Unity most things are done in an update function in a class attached to an object which gets called every frame. This is good for a lot of things but not for actions that should occur over a period of time. Coroutines give the ability to insert delays between operations so that you can better control when updates occur. The Unity manual says that a coroutine is “a function declared with a return type of IEnumerator and with the yield return statement included somewhere in the body. […] To set a coroutine running, you need to use the StartCoroutine function:” The example they give is something similar to this.
And the coroutine would be started like this.
Now I find the requirement that a coroutine has to be a function with yield returns to be very interesting. For one thing the caller of a function doesn’t generally know what the function is doing and for another the coroutine is being called and the return value passed to StartCoroutine, not the function itself. If the documentation says it then it must be correct though. Let’s see if the compiled version of the function gives us any clues as to how StartCoroutine is detecting that the called function contain yield returns.
This is Common Intermediate Language (CIL) code. It is what C# usually gets compiled into. It’s basically an assembly language for a virtual machine. When the program is running it starts up an instance of the virtual machine which interprets the IL and generates actual native code. IL is useful for us because it is somewhat readable and gives us clues as to what is actually going to happen when the code is ran.
Looking at the function we notice some strange things. The compiled function doesn’t have any yield returns. It doesn’t even look like the code to change colour is there. All it does is create and return a Fader/'<Fade>d__0' object. Well let’s go look at that class.
Wow, that’s a lot of stuff. It appears to be a class implementing the IEnumerator interface with the colour change logic in its MoveNext() function. So where did all this code come from? The compiler put it in there. It took the original function and generated a class based on the return type. The body of the function got transformed into a state driven enumerator with the current property being set instead of having yield returns. So if the yield return code gets compiled into an IEnumerator class then how can StartCoroutine require a function that contains yield returns? Well clearly it can’t. If you took that compiled class and rewrote it in C# it would look something like this.
To start this coroutine you would do this.
These two snippets are roughly equivalent to the original two.
So is the documentation wrong? Well no, the reason the documentation says a coroutine is a function with yield returns is because they made up the concept of couroutines and can define them however they wish. At the same time it isn't being entirely honest though. Clearly you can have a coroutine that isn't a function containing yield returns so it can't really be said that's a requirement.
There are a few reasons I bring this up. Firstly I believe it’s important to understand what abstractions are doing when you use them. The more you understand about what your code is doing the easier it is to debug. The second reason is because the IEnumerator class case is probably more reusable. Since you are working with a whole class there are more options for controlling how it operates. You could create a generic class that could apply similar logic to many different types of things
Finally because the compiler is generating code you need to be careful what you do with yield returns. The compiler is smart and it’s going to do it’s best to do what you are telling it to do but it’s not perfect. There are probably some scenarios where the compiler will refuse to compile the function or worse it will compile but not do exactly what you want it to do. Because you can’t see the code it generates you won’t be able to see exactly what it’s doing.
Also I find it fascinating to look at IL.