2015-10-18 - Character Encoding: Conclusion
I actually don’t spend a lot of time dealing with character encodings. It comes up a little bit when dealing with files but even then it’s just a matter of selecting the right encoding. So then why did I spend the time to investigate and write about character encodings?
Partially it’s because it’s a good thing to be aware of. It’s useful to understand what selecting the encoding is doing. Also the problem of how to store non-numeric data comes up a lot in programming and character encodings serve as a good example of that problem.
That being said the main thing is how it shows that the best solution to a problem can change over time. The designers of ASCII made the decision that data savings was more important that number of characters so they limited themselves to 7 bits. As time went on this situation changed; space became cheaper while the desire for more characters grew. This lead to Extended-ASCII and Unicode with their 8 bit, 16 bit, and 32 bit characters. Then things partially flipped around again. With the internet and data being sent all over the world space savings became important again but people didn’t want to lose their extra characters. This lead to the UTF formats that went for space savings under common circumstances at the cost of added complexity.
ASCII and Extended ASCII don’t fit current needs because the limited character set and need for code pages complicates sharing information around the world. Similarly UTF-8 wouldn’t have worked for early computers and Teletype machines because the variable width characters would have made it excessively complicated to implement on the hardware of the time.
I find problems that don’t have singular answers to be the most interesting. Problems with the specifics of the situation impact the requirements and multiple solutions can work simultaneously in different situations. All of these encodings are in use today. Some more than others but the introduction of newer encodings hasn’t destroyed those that came before it. One of the goals of UTF-8 was to be compatible with ASCII for this reason.
I’d also like to point out that this is not an exhaustive list. There are a bunch of other character encodings. Even within Unicode their are other transformations and variations of transformations. These are just the most common ones that I know of and have encountered.
2015-10-04 - Seriously Fun TV
I really like TV. I watch a fair bit of it, and from all that watching I’ve found that there’s a recipe to the shows that I enjoy. The best TV shows are those that start with a large scoop of drama and then add in a few spoonfuls of comedy on top of it.
I like stories and characters. Those are the things I find interesting about TV shows. So I tend to prefer dramas because they usually have better stories and characters. The problem is that shows that are pure drama tend to be very dark and depressing. They are just about bad things happening to the main characters and their trying to deal with things before the next bad things happen. I personally don’t like shows that make me depressed.
That’s where the comedy comes in. A little bit of silliness helps to break up the drama in the story and make the characters more likable. The comedy keeps things from being too depressing. The show can’t just be a pure comedy though because then you lose the stories and characters. They just become a series of jokes and as funny as they may be they aren’t really interesting.
So my favourite shows are those that have good stories and characters but aren’t too serious.
2015-08-09 - Unity Coroutines
I’ve been looking into the Unity game engine lately. While doing so I cam across the concept of a “Coroutine”. In Unity most things are done in an update function in a class attached to an object which gets called every frame. This is good for a lot of things but not for actions that should occur over a period of time. Coroutines give the ability to insert delays between operations so that you can better control when updates occur. The Unity manual says that a coroutine is “a function declared with a return type of IEnumerator and with the yield return statement included somewhere in the body. […] To set a coroutine running, you need to use the StartCoroutine function:” The example they give is something similar to this.
And the coroutine would be started like this.
Now I find the requirement that a coroutine has to be a function with yield returns to be very interesting. For one thing the caller of a function doesn’t generally know what the function is doing and for another the coroutine is being called and the return value passed to StartCoroutine, not the function itself. If the documentation says it then it must be correct though. Let’s see if the compiled version of the function gives us any clues as to how StartCoroutine is detecting that the called function contain yield returns.
This is Common Intermediate Language (CIL) code. It is what C# usually gets compiled into. It’s basically an assembly language for a virtual machine. When the program is running it starts up an instance of the virtual machine which interprets the IL and generates actual native code. IL is useful for us because it is somewhat readable and gives us clues as to what is actually going to happen when the code is ran.
Looking at the function we notice some strange things. The compiled function doesn’t have any yield returns. It doesn’t even look like the code to change colour is there. All it does is create and return a Fader/'<Fade>d__0' object. Well let’s go look at that class.
Wow, that’s a lot of stuff. It appears to be a class implementing the IEnumerator interface with the colour change logic in its MoveNext() function. So where did all this code come from? The compiler put it in there. It took the original function and generated a class based on the return type. The body of the function got transformed into a state driven enumerator with the current property being set instead of having yield returns. So if the yield return code gets compiled into an IEnumerator class then how can StartCoroutine require a function that contains yield returns? Well clearly it can’t. If you took that compiled class and rewrote it in C# it would look something like this.
To start this coroutine you would do this.
These two snippets are roughly equivalent to the original two.
So is the documentation wrong? Well no, the reason the documentation says a coroutine is a function with yield returns is because they made up the concept of couroutines and can define them however they wish. At the same time it isn't being entirely honest though. Clearly you can have a coroutine that isn't a function containing yield returns so it can't really be said that's a requirement.
There are a few reasons I bring this up. Firstly I believe it’s important to understand what abstractions are doing when you use them. The more you understand about what your code is doing the easier it is to debug. The second reason is because the IEnumerator class case is probably more reusable. Since you are working with a whole class there are more options for controlling how it operates. You could create a generic class that could apply similar logic to many different types of things
Finally because the compiler is generating code you need to be careful what you do with yield returns. The compiler is smart and it’s going to do it’s best to do what you are telling it to do but it’s not perfect. There are probably some scenarios where the compiler will refuse to compile the function or worse it will compile but not do exactly what you want it to do. Because you can’t see the code it generates you won’t be able to see exactly what it’s doing.
Also I find it fascinating to look at IL.
2015-07-25 - The Correct Date Format
Occasionally you will hear the British and the Americans fighting over what is the correct date format. The Americans think it’s MM/DD/YYYY and the British think it’s DD/MM/YYYY, but they are both wrong because the correct format is YYYY-MM-DD.
One reason is ambiguity. Without some additional context it’s hard to tell if 12/01/2012 is December 1st, 2012 or the 12th of January, 2012. This ambiguity doesn’t exist for YYYY-MM-DD because there’s no YYYY-DD-MM in common usage. This means that 2012-12-01 is always December 1st, 2012 and there’s no chance of reading the date wrong.
The second reason is sorting. When dates are sorted as text they are ordered by the first part of the date first. This means that MM/DD/YYYY sorts by month, DD/MM/YYYY sorts by day, and YYYY-MM-DD sorts by year. Ideally you want dates sorted chronologically which means it’s better to have the largest part first. YYYY-MM-DD does this while the other formats will mix dates up.
Consider the dates April 2nd 2012, April 3rd 2012, June 15th 2012, April 17th 2013, and June 2nd 2013. The following table shows these dates sorted according to the various date formats.
YYYY-MM-DD | MM/DD/YYYY | DD/MM/YYYY |
---|---|---|
2012-04-02 | 04/02/2012 | 02/04/2012 |
2012-04-03 | 04/03/2012 | 02/06/2013 |
2012-06-15 | 04/17/2013 | 03/04/2012 |
2013-04-17 | 06/02/2013 | 15/06/2012 |
2013-06-02 | 06/15/2012 | 17/04/2013 |
With YYYY-MM-DD format everything is in order. All the 2012 dates come before the 2013 dates, All the April dates come before the June dates within the same year. With MM/DD/YYYY we have 2012 dates before and after the 2013 dates. With DD/MM/YYYY we have 2013 dates before 2012 dates and June dates before April dates.
Also I personally think the dashes look better.