Author Archives: Zeckul

How to avoid boxing value types in F# equality comparisons

This week I realized, and promptly reported to the Visual Fsharp compiler, that F#’s operator (=) boxes user-defined value types. That’s right, any value type you might want to use from a third-party library, or any value type you define for performance reasons, well, don’t compare them using (=) in F#, currently. Unless performance is of no concern to you, but then why would you use value types in the first place?

[<Struct>]
type AnyType =
    // whatever

AnyType() = AnyType() // allocates on the GC heap!?!

I don’t want to get into the how the problem happens as that would just be guesswork, I don’t work on the F# compiler. I just know it is there, and based on the feedback so far, it’s not going to be a quick fix.

So what can we do if we want to use value types in F#? Can’t use Equals, we’d need to cast to IEquatable(T) which is in itself a boxing operation, and Object.Equals boxes and casts both operands. One working alternative (thanks latkin) is to define op_Equality and use the new F# 4.0 NonStructuralModule, but that disables structural comparisons altogether. A workaround around that problem is to alias the equality operators in NonStructuralModule themselves, but then one has to remember to use the aliased operators, and I don’t trust anyone to remember anything, starting with myself.

Based on a later suggestions I received (thanks manofstick), this is what I would personally do:

#nowarn "86"

[<AutoOpen>]
module FastEquals =
    let inline eq<'a when 'a :> System.IEquatable<'a>> (x:'a) (y:'a) = x.Equals y    
    let inline (=) x y = eq x y
    let inline (<>) x y = not (eq x y)
    
    let inline (=@) x y = Microsoft.FSharp.Core.Operators.(=) x y
    let inline (<>@) x y = Microsoft.FSharp.Core.Operators.(<>) x y

First, we are redefining the (=) operator in terms of IEquatable(T).Equals. This does the correct thing, not just for value types, but for classes, records and discriminated unions. There’s no overhead, as the method compiles to a single x86 instruction for all basic numerics types, and yet we still get structural equality for most F# types. F# complains with warning #86, hence, the #nowarn. Shut up F#, we’re fixing you up.

But of course not all types implement IEquatable(T), and the beauty of this is that it won’t compile in this case! So you can confidently use (=) everywhere, and trust the compiler that it won’t let you misuse it. Idiomatic, correct and fast. That’s what we want. I feel good now.

I’m not entirely sure if it’s a good idea to expose the default F# operator, but just in case, I did it by renaming it “=@”, that should make it obvious. The obvious case for using this operator would be structural collection comparison, but be aware that, again, this boxes all elements if they’re user-defined value types! Again, not behavior we’d like to accidentally run into. I’m considering getting rid of the default (=) altogether and just redefining structural equality manually for these collections as needed, perhaps using overloaded static members.

Unfortunately, you won’t able to prevent third-party F# code from using the default operator, starting with everything in FSharp.Core: List, Array, etc. It then becomes a matter of avoiding the functions that do box value types. For example:

List.contains value // boxes value types
List.exists (fun v -> v = value) // fast if (=) is redefined as suggested above

Of course, as the compiler warning suggests and I’ve been advised on the fsharp compiler thread, this might not be a good thing to do: I’m unsure as to why but since I don’t use F# in anything large-scale yet, I may be missing some insight into unintended consequences of this. You might want to use another operator for non-boxing comparisons, but I really value being able to use (=) idiomatically and not worry about performance pitfalls.

So far, I do think redefining the operator as I’ve suggested is actually more consistent and less surprising than what the default does (which suffers from other bugs and inconsistencies). What do I mean by that?

More consistent: By default, (=) gives you structural equality on everything except classes, where the behavior is reference equality. This is inconsistent and surprising in F#. By redefining (=) as suggested, where the default would silently use reference equality, the override will cause a compilation error: and that’s probably a good thing. If you really intend to use reference equality (which is too often a bogus and lazy way to compare), you can write Object.ReferenceEquals and it becomes obvious.

I’m only now starting to use this approach and I’ll update this post with my findings as I go along. I’m open to suggestions, refutations, let me know what you think!

Using higher-order functions to bypass lack of structural typing

Coming from a C++ background, a pet peeve of mine in C# is how the language makes it impossible to write generic functions that use non-interface instance methods, for example the following is not legal C#:

T Foo<T>(T a, T b)
{
    return a.IsALie() ? b.Grrr(a) : a;
}

The compiler checks that this is valid for any T, and of course, that’s false because lots of Ts don’t have an IsALie() nor a Grrr(T other) method. As cool and awesome as these methods sound, we just can’t assume every type implements them. Oh well. C# does supports interface constraints though, so, assuming we have interfaces like these:

interface IIsALie { bool IsALie(); }
interface IGrrr<T> { T Grrr(T other); }

… and assuming all the concrete types we’ll ever want to use with this function do implement these interfaces, then the following works:

T Foo<T>(T a, T b) where T:IIsALie, IGrrr<T>
{
    return a.IsALie() ? b.Grrr(a) : a;
}

Is this enough? Well, the problem is that the second assumption is probably not true. What about type XYZ developed by ClosedSoftwareCorporation, which does have IsALie() and Grrr(T) methods, but doesn’t implement IIsALie and IGrrr? There’s no way for you to tell the compiler “Yeah it doesn’t explicitely implement these interfaces, but look, it has all the compatible methods with the right signatures, so just pretend it does, all right?”

The root of the problem is this: C# has a nominal type system, which doesn’t care what operations a type supports, only what specific interfaces it implements. In other words, while a type may be structurally compatible (i.e. it has all the right function signatures), C# doesn’t care, the type has to be nominally compatible (i.e. it has to say that it has the right function signatures – by implementing an interface).

And again, it’s not enough to implement any compatible interface (i.e. any interface that has an IsALie or Grrr method), you have to implement the one and only, specific interface that was used in the generic function. This, combined with lack of traits, induces lameness such as the above example.

C++ is an interesting case because although it’s also a nominal type system, templates are a blind macro system that completely ignores types, so templates actually enable static duck typing. This is what makes the STL awesome, for instance.

Also interestingly, F# provides the same feature as C++ through the inline keyword. This is more restrictive than C++ templates (abusing inlining causes code bloat), but the effect is the same.

But what can we do in the general case? What if you’re stuck with C#, or F#’s inline isn’t practical in your scenario?

It recently occured to me, and this is the reason for this blog post, that higher-order functions enable us to work around this problem. So we can turn the above function into a fully legal and generic one by:

  • Replacing methods with functions (“static methods” in OOP-lingo)
  • Having the function take the functions it uses as parameters, i.e. making it higher-order.

We end up with:

T Foo<T>(Func<T, bool> IsALie, Func<T, T, T> Grrr, T a, T b)
{
    return IsALie(a) ? Grrr(b, a) : a;
}

Now we’ve actually got a much more generic function that can work with any definition of IsALie and Grrr. However, it does have two drawbacks: it’s verbose, and it replaces two direct function calls with two delegate calls, which could be problematic in performance critical code. These may be an acceptable price to pay in many cases though.

The good news are, while C#’s limited support for functional programming won’t help us much there, F# does have you covered.

First, F# can get us rid of most of the extra verbosity through partial application. Let’s say I have “standard” implementations of “isALie” and “grrr” (in module “std” (yes this is getting silly))

let foo isALie grrr a b =
    if (isALie a) then (grrr b a) else a

// Specialize foo with the std implementations by partial application
let fooStd = foo std.isALie std.grrr

// I can then call
fooStd a b

This isn’t exactly as cool as C++ templates because I still have to declare each version of foo I want, but I can do so in just one, very simple line of code. At this point the syntactic noise has been all but eliminated. (Partial application is also possible in C#, if you have a serious case of angle bracket addiction).

Second, F# can get us rid of the performance cost, because (in Release mode at least), it’ll generate specialized code for these partially applied functions that calls the relevant functions directly rather than through an indirection. This what LINQPad gives me as optimized IL for fooStd:

fooStd@11.Invoke:
IL_0000:  nop         
IL_0001:  ldarg.1     
IL_0002:  call        Query_hiqinu+Std.isALie // look ma, direct calls!
IL_0007:  brfalse.s   IL_0011
IL_0009:  ldarg.2     
IL_000A:  ldarg.1     
IL_000B:  call        Query_hiqinu+Std.grrr
IL_0010:  ret         
IL_0011:  ldarg.1     
IL_0012:  ret 

Mind you, I’d prefer to have structural typing. Or at least traits. But, I think that with both “inline” and first-class support for higher-order functions and partial applications, we can work around the deficiencies of a strictly nominal type system in not too ugly ways most of the time. In F#. And in C#, well, gotta love those angle brackets.

Ways to get NullReferenceException in F#

Although F# largely eliminates null, there are still several ways to get a NullReferenceException using F#, some of which may be surprising. But let’s start with the obvious ones:

  • Unchecked.defaultof. This has an intentionally ugly name with “Unchecked” in it and its whole purpose is to create default values, i.e. null for reference types.
    Advice: Use “Unchecked.defaultof” as you would “unsafe” in C#: only in specific interop or performance scenarios and when you really know what you’re doing.
  • Array.zeroCreate. This creates an array where every value is Unchecked.defaultof of the specified type, akin to doing “new T[x]” in C#.
    Advice: Write the code in such a way that you can create and initialize the array at the same time, using Array.create, Array.init or expressions like array comprehensions. Only use zeroCreate when you absolutely must defer member initialization and cannot afford to fill the array with some temporary values, e.g. in performance-critical code.
  • Declaring explicit fields with val. Fields declared in this can be temporarily uninitialized and therefore null.
    Advice: As MSDN says: “Explicit fields are not intended for routine use. In general, when possible you should use a let binding in a class instead of an explicit field. Explicit fields are useful in certain interoperability scenarios, such as when you need to define a structure that will be used in a platform invoke call to a native API, or in COM interop scenarios.”

So far all these are language features designed for interop or high-performance scenarios where you explicitely opt-in to allowing nulls. But you should also be aware of nulls that can creep up in more ordinary F# code:

  • Option.Value. “Get the value of a Some option. A NullReferenceException is raised if the option is None.”
    Advice: Don’t use it! New F# programmers, particularly coming C#, may want to use Option as they would Nullable in C#: testing for and then accessing the value using the associated property. I’m not entirely sure why the member is visible from F#, as it should only be used by the compiler and perhaps other languages. F# gives you pattern matching, use it.
// Poor style in F#, 
// throws NullReferenceException if the conditional logic is wrong
if myOption <> None then
    doSomething(myOption.Value)

// Fool-proof
match myOption with
| Some(value) -> doSomething(value)
| None -> ()
  •  Using instance methods ToString() or GetHashCode() in generic code. A subtle point of F# is that both the Unit type and Option.None are represented as null in compiled code. It is therefore unsafe to use instance methods on generic method arguments that could be either Unit or Option.None.
    Advice: Use string and hash instead of ToString() and GetHashCode(). To illustrate:
// Throws NullReferenceException if value is () or None
let printHashCode value =
    printf "%d" (value.GetHashCode())

// Safe in all cases
let printHashCode value =
    printf "%d" (hash value)

printHashCode ();;
printHashCode None;;

3 reasons why the C# type system is broken, and how F# improves the situation

1) null

The most basic thing a type system does is ensure that any operation I do with an identifier is valid for that particular identifier. For example, if I write:

a.Length

the type system ensures that “a” has a type and that “Length” is an accessible member of that type. “a” could be a String, or an Array, for instance. If “a” was an integer, “a.Length” would simply not compile. The type system protects me from writing nonsense and that blowing up in my face at runtime (*cough* dynamic languages *cough*).

That seems simple enough, except C# (and Java, and most other OOP languages) doesn’t even get it right. C# cannot actually guarantee that any member access is valid because any variable can be null at any time, making any member access potentially invalid. “null” doesn’t have a “Length” member or, indeed, any member; it doesn’t support any operation. Attempting to do something with null at least has well-defined behavior (NullReferenceException) but still is a run-time error, i.e. something that made it past the compiler.

How does F# improve the situation? F# largely gets rid of this problem. F# types cannot be null (unless you ask for it really hard, i.e. Unchecked.defaultof<‘T>). If you want to represent nullability, F# makes you encode that notion in the type system through Option types; using an Option as if it was the wrapped type is a compile-time error, and exhaustive pattern matching ensures you have to deal with the possibility that the value is absent. The fun counterpart is that everything else is guaranteed non-null, so you get nullability where you actually need it and don’t need to worry about it in general – unless of course you’re interacting with C# APIs.

2) void

In C#, all methods return something, except for those that are void, which must be treated completely differently. As Eric Lippert explains:

The effect of a call to a void method is fundamentally different than the effect of a call to a non-void method; a non-void method always puts something on the stack, which might need to be popped off. A void method never puts something on the stack. (…) A void returning method can never, ever be treated polymorphically with a non-void-returning method because doing so corrupts the stack!

This is why, for example, it was impossible in .NET 3.5 to have a single generic delegate for all return types; Func of void would blow up the runtime! Hence the silly Action delegate, and resulting duplication of code everywhere.
In addition, void methods are generally unusable with LINQ extension methods (Select, Aggregate, etc.) because you can’t have IEnumerable of void (what the hell is a sequence of absolutely nothing anyway?). They also screw up async/await in major ways; an async void method is indisguishable from the caller’s perspective from a non-async method, cannot be awaited and cannot be composed. This has become one of C#’s major gotchas.

How does F# improve the situation? In F#, void doesn’t exist whatsoever! All void methods are turned by the compiler into methods returning the singleton unit, a type that carries no useful information but is a real, ordinary type requiring no special treatment, unlike void. So in F# you have unified delegates and no async void nonsense. This makes life simpler and eliminates much code duplication and potential mistakes.

3) Exceptions

In C#, the type system says “this method evaluates to type X”; what it fails to say (aside from the fact that X might be null) is that this method may also not evaluate to anything and throw an exception instead. Exceptions got popular in OOP languages in the 1990s when the only alternative was checking for return codes, C-style. Of course the latter is maintenance and code clutter hell, and exceptions do allow us to write just the happy path in general and centralize error handling. It’s just something the type system totally ignores; the fact that your code compiles says nothing of how it accounts for exceptions; unless you are extremely disciplined, it probably blows up in various situations! Java is one of the few languages to have tried integrating exceptions in the type system: it’s a complete failure and it’s unclear how it could be improved. The designers of C# decided not to bother and just let you write nonsense.

Exceptions also assume a single-threaded, single-stack model; they just don’t mesh well with asynchronous code. Just think for yourself, what is supposed to happen if an exception is thrown by asynchronous code executing in some arbitrary thread? How is the caller notified? (Bonus points: what happens if the asynchronous method is void?) C# 5 does some intricate magic under the hood to allow you to still write something like:

try {
    await AsyncMethodThatMightThrow();
}
catch (InvalidOperationException e) {
}

and have that do what you’d expect; in fact the exception has to be handled by the execution context, stored temporarily, and re-thrown when execution resumes in the caller’s context. If the exception was contained in the return value of the method instead, you could simply inspect it and decide what to do, and you wouldn’t need cryptic (hopefully compiler-generated) boilerplate code to re-route the exception across stacks.

How does F# improve the situation? F# supports and embraces exceptions; one might argue that they’re still the default error mechanism; it’s hard to do otherwise when every .NET API throws exceptions. However, F# also has a functional background, which prefers encoding everything in return values. Indeed, it’s possible to write elegant exception-free code in F#, using the technique detailed by Scott Wlaschin in his article railway-oriented programming. I suggest reading the whole thing, but the basic idea is this: instead of returning T and perhaps throwing an exception, functions should return something like an Option type representing Success or Failure; Success contains the useful data (the T), and Failure contains some information regarding the error (like an exception does).

Doesn’t this throw us back to C-style return value hell? Not at all, because F# makes it easy to write very general adapter functions (i.e. monads) that simply pipeline errors through regular non-error-checking functions, until they hit whichever function actually wants to deal with the errors. This gives us both the happy-path convenience of exceptions and type safety, at the expense of minimal syntactic noise. This is technically also possible in C#, but it would look unimaginably horrible due to lack of discriminated unions, pattern matching, custom operators, aforementioned void and lousy higher-order function syntax.

In conclusion, when F# code compiles, it is 100% void-free, generally null-free, and as exception-free as you can make it; this means code that is consistent, does what it says and reserves fewer surprises at run-time. While C# 6 and particularly C# 7 may include more functional constructs like pattern matching or something more or less like discriminated unions, the 3 issues discussed here are fundamental to the design of the language and no amount of tacked-on features will fix them.

Sorting a linked list using only recursion

Sorting a linked list using only recursion is a common interview question and something any CS graduate should be able to do.

One good approach is to start from a known sort and clearly recursive algorithm such as QuickSort. Here’s a description of QuickSort in English: “Given a list of elements, choose one element to be the pivot; partition the list in two, i.e. the lists of elements lesser and greater than the pivot respectively; perform QuickSort on these two lists and concatenate the results.”

Here’s a description in pseudo-code:

function QuickSort elements (list -> list) :
    if elements is empty then
        return elements
    let pivot = elements.[0]
    let lesser, greater = Partition(elements, fun e -> e <= pivot)
    return Concat(QuickSort(lesser), [pivot], QuickSort(greater))

Note that the pivot can be chosen randomly, but since a linked list only gives us access to the first element directly, we’ll select the first element as the pivot.
Also note that we’re using a hypothetical Partition function that returns two list, one containing the elements for which the predicate applied to each element returns true and false respectively, and a hypothetical Concat function that concatenates 3 lists together.

From this the F# translation is actually shorter than the pseudocode:

let rec sort ls =
    match ls with
    | [] -> ls
    | hd::tl ->
        let lesser, greater = List.partition ((>=) hd) tl
        List.append (sort lesser) (hd::(sort greater))

Test run in FSI:

val sort : ls:'a list -> 'a list when 'a : comparison

> sort [-1;4;2;7;3];;
val it : int list = [-1; 2; 3; 4; 7]

It’s a bit of cheating to be using pre-defined List module functions that do most of the grunt work for us, so let’s implement partition and append (without looking at the F# source code).

partition is a function that takes a list, iterates through it while building two lists that it finally returns. Since it doesn’t take these two lists in parameter, we’ll have to use an inner loop, which in recursive style will make our function look like this:

let partition ls =
    let rec loop ls ls1 ls2 =
        // build ls1 and ls2
    loop ls [] []

Here in the inner loop, “ls” hides the outer “ls” and corresponds to the list that “loop” is iterating through; “ls1” and “ls2” are the two lists it is building. So now all we have to do is implement the body of the loop, which should take the first element of the list, add it to either ls1 and ls2 and call itself recursively on the rest of the list until it is empty. Very straightforward:

let partition fn ls =
    let rec loop ls ls1 ls2 =
        match ls with
        | hd::tl when fn hd -> loop tl (hd::ls1) ls2 
        | hd::tl -> loop tl ls1 (hd::ls2)
        | [] -> ls1, ls2
    loop ls [] []

Test run in FSI, partitioning a list of integers into even and odd lists:

val partition : fn:('a -> bool) -> ls:'a list -> 'a list * 'a list

> partition (fun e -> e % 2 = 0) [1;5;3;7;2;6;4;9];;
val it : int list * int list = ([4; 6; 2], [9; 7; 3; 5; 1])

It works, however it reverses the order of elements. This happens because of the recursive nature of the algorithm: as we advance through the list we are adding each element at the beginning of the list. Therefore the last element to be added becomes the first element, and the list is reversed. Does this matter to our sorting algorithm? No, because all that matters about the lists returned by partition is that they contain they elements for which the predicate returns true and false respectively, not that these elements are in any particular order. We can now swap List.partition for our version and verify that it still works:

> let rec sort ls =
    match ls with
    | [] -> ls
    | hd::tl ->
        let lesser, greater = partition ((>=) hd) tl
        List.append (sort lesser) (hd::(sort greater));;

val sort : ls:'a list -> 'a list when 'a : comparison

> sort [3;2;55;4;7;6];;
val it : int list = [2; 3; 4; 6; 7; 55]

Ok, now to write append. This is a function that takes two list and returns one list containing the elements of the first list followed by the elements of the second list, preserving their order. It does essentially the opposite of partition; whereas the former iterated one list to build two lists, append iterates two lists to build one list; but apart from that, the principle is similar. So the function will have a similar form:

let append ls1 ls2 =
    let rec loop ls ls1 ls2 =
        // build ls by iterating through ls1 and ls2
    loop [] ls1 ls2

At this point, implementing it is quite straightforward:

let append ls1 ls2 =
    let rec loop ls ls1 ls2 =
        match ls1, ls2 with
        | hd::tl, _ -> loop (hd::ls) tl ls2
        | _, hd::tl -> loop (hd::ls) ls1 tl
        | _ -> ls
    loop [] ls1 ls2

The idea is this: if there are elements in ls1 (the pattern hd::tl, _ matches), add the first element of ls1 to the list and keep iterating with the rest of ls1 (tl) and ls2. If there are no elements left in ls1 but there still are in ls2 (the first pattern doesn’t match but the second, _, hd::tl, matches), add the first element of ls2 to the list and keep iterating with the rest of ls1 (which should be empty) and ls2 (tl). If none of these pattern matches, then we’re done and we can return the constructed list.

As we test this, however, we notice a problem:

append [1;2;3] [5;6;7];;
val it : int list = [7; 6; 5; 3; 2; 1]

It does what it should except the order of elements is reversed; and again, this is due to the recursive nature of the algorithm; as we advance through both lists we are adding the elements we encounter to the beginning of the list; so the last element we encounter is the last to be added, and hence ends up being the first in the constructed list.

At this point it becomes apparent we need a reverse function. F# provides one, List.reverse, but of course we won’t use it and implement it ourselves. It turns out that implementing reverse was the most tricky thing for me to understand and the reason I’m writing this post. It would be possible to write reverse by fetching the nth element of the list for n = [length – 1 to 0], but that would run in O(n^2) which is completely unacceptable. So I started writing some code without really thinking and came up with this:

let rec badReverse ls =
    match ls with
    | hd::tl -> hd::(badReverse tl)
    | [] -> ls

But that just returns the original list! At this point I was very confused. In our implementations of partition and append, simply building a list by recursively appending each element to the beginning effectively reversed the list, because as we went through the list, each element was added to the beginning of the list, so the last element encountered became the new beginning of the list and so on. Aren’t we doing the same thing here?

And the answer is yes and no; yes it’s the same iteration mechanism, but instead of appending the element on the argument we pass to the next iteration, we’re appending it on the return value, so effectively we’ve reversed the order of operations. In the previous examples, we were starting with an empty list, and iterating through the original list, adding each element to that new list recursively.

So it appears that’s what we need to do here; start with an empty list and build it by appending the first element of the original list in the argument of the next loop iteration.

let reverse ls =
    let rec loop ls acc =
        match ls with
        | hd::tl -> loop tl (hd::acc)
        | [] -> acc
    loop ls []

And this works. If we have access to the very general List.fold, we also could just do:

> List.fold (fun list x -> x::list) [] [1;2;3];;
val it : int list = [3; 2; 1]

Putting it all together, our append function can now return elements in the correct order by reversing its result:

let append ls1 ls2 =
    let rec loop ls ls1 ls2 =
        match ls1, ls2 with
        | hd::tl, _ -> loop (hd::ls) tl ls2
        | _, hd::tl -> loop (hd::ls) ls1 tl
        | _ -> ls
    loop [] ls1 ls2 |> reverse

And now we could go and replace List.append with our own, and have our fully hand-crafted recursion-only algorithm working! But something still feels very wrong about that append function. We’re appending by adding elements in reverse order and then reversing the result. Couldn’t we just, you know, add the elements in the right order in the first place?

Hey, maybe that dumb badReverse code might be useful! It preserves order, does it not? All we need to do is have it iterate two lists instead of one and call it “append” instead:

let rec append ls1 ls2 =
    match ls1, ls2 with
    | hd::tl, _ -> hd::(append tl ls2)
    | _, hd::tl -> hd::(append ls1 tl)
    | _ -> []

… aaaand this works:

val append : ls1:'a list -> ls2:'a list -> 'a list

> append [1;2;3] [4;5;6];;
val it : int list = [1; 2; 3; 4; 5; 6]

Aaaah much better. More efficient, and we can get rid of the reverse function. So our full listing becomes:

let partition fn ls =
    let rec loop ls ls1 ls2 =
        match ls with
        | hd::tl when fn hd -> loop tl (hd::ls1) ls2 
        | hd::tl -> loop tl ls1 (hd::ls2)
        | [] -> ls1, ls2
    loop ls [] []

let rec append ls1 ls2 =
    match ls1, ls2 with
    | hd::tl, _ -> hd::(append tl ls2)
    | _, hd::tl -> hd::(append ls1 tl)
    | _ -> []

let rec sort ls =
    match ls with
    | [] -> ls
    | hd::tl ->
        let lesser, greater = partition ((>) hd) tl
        append (sort lesser) (hd::(sort greater))

Not a single library call in there, doesn’t this feel great? At this point we could even fix partition so that it preserves ordering, but I’ll leave that as an exercise to the reader.

In “fixing” our implementation to avoid calling reverse on a reversed list, however, we’ve lost one very useful property: tail-recursion. Indeed whereas our original append uses an inner tail-recursive loop, our new append performs an operation on the return value (appending an element) and so cannot be tail-recursive; therefore it will blow up the stack with large enough lists. Is there a way we could get both the simplicity and efficiency of our order-preserving append while keeping it tail-recursive?

C#: Why use public readonly fields instead of properties?

The common advice in OOP is to never expose fields publicly, for the simple reason that it allows external code to manipulate the implementation of your class without your class ever knowing about it. You cannot make assumptions about data you do not control.

However, what about readonly fields? They can’t be manipulated and their value never changes, so while they reveal an implementation detail, if that implementation is unlikely to change, or if the type is not visible to much code, then you’re not actually giving up anything important. I find myself writing many small “record”-type classes these days, and I favor the following form:

class MyRecord {
    public readonly T1 Field1;
    public readonly T2 Field2;
}

Then I ask ReSharper to generate the constructor for me and that’s that. Automatic properties cannot be truly readonly in C# 5, and writing full-blown properties with readonly backing fields is a lot of noise for little benefit. Although C# 6 is set to fix that particular issue, fields have another advantage over properties: they’re always visible in the debugger. For the debugger to show properties, it has to evaluate them, i.e. run code; and for whatever reason it might not be able to. This is when you get the dreaded “Function evaluation timed out” instead of useful information. I get this all the time in the project I’m currently working on.

Finally, while the JIT is reportedly very good at inlining properties when it optimizes, the vast majority of the time, as a developer, you’re running a debug build, and more often than not have a debugger attached to it. This means that using properties everywhere increases the performance delta between your debug and release builds.

Simple maze generation in F#

Here’s an F# implementation of the depth-first search maze generation algorithm:

open System
open System.Windows
open System.Windows.Controls
open System.Windows.Media
open System.Windows.Shapes
open System.Windows.Threading

let width, height = 40, 20
let random = Random()

let tiles = Array.init height (fun _ -> Array.zeroCreate<bool> width)

type Tree =
    | Leaf of int * int
    | Node of int * int * Tree list

let rec buildMaze (x, y) =
    let shuffleArray array =
        let n = Array.length array
        for x = 1 to n do
            let i = n - x
            let j = random.Next(i)
            let tmp = array.[i]
            array.[i] <- array.[j]
            array.[j] <- tmp
        array

    let neighbors = 
        [| x - 1, y; 
           x + 1, y; 
           x, y - 1; 
           x, y + 1 |]
        |> shuffleArray
        |> Array.filter(fun (x, y) -> x >= 0 && x < width && 
                                      y >= 0 && y < height)
    let visited = [ for x, y in neighbors do
                        if not tiles.[y].[x] then
                            tiles.[y].[x] <- true
                            yield buildMaze (x, y) ]
    if List.length visited > 0 then 
        Node(x, y, visited) 
    else 
        Leaf(x, y)

let maze = buildMaze (0, 0)

let window = Window()
window.Title <- "Maze generation"
let grid = Grid()
grid.Background <- SolidColorBrush(Colors.Black)
window.Content <- grid

for i = 0 to width * 2  do
    let column = ColumnDefinition()
    column.Width <- GridLength(9.6, GridUnitType.Pixel)
    grid.ColumnDefinitions.Add(column)
for i = 0 to height * 2 do
    let row = RowDefinition()
    row.Height <- GridLength(9.6, GridUnitType.Pixel)
    grid.RowDefinitions.Add(row)

let rec drawMaze prevx prevy node =
    let addTile (x, y) =
        let tile = Rectangle()
        tile.Width <- 9.6
        tile.Height <- 9.6
        tile.Fill <- SolidColorBrush(Colors.White)
        grid.Children.Add(tile) |> ignore
        Grid.SetColumn(tile, x)
        Grid.SetRow(tile, y)

    let colorGrid x y =
        let finalX, finalY = x * 2, y * 2
        let interX = finalX + prevx - x
        let interY = finalY + prevy - y
        addTile (finalX, finalY)
        if (interX <> finalX || interY <> finalY) then
            addTile (interX, interY)

    match node with
    | Node(x, y, children) ->
        colorGrid x y
        List.iter (drawMaze x y) children
    | Leaf(x, y) -> 
        colorGrid x y

drawMaze 0 0 maze


[<STAThread()>]
do
    let app = Application()
    app.Run(window) |> ignore

Less than 100 lines of code for a working WPF application with no XAML and some non-trivial logic is pretty sweet! F# has become my default language for coding on .NET, it’s a way more simple, concise and cohesive language than C#, that leads to better code with less potential for error.

This implementation is subject to StackOverflowExceptions with larger mazes. While implementing continuation-passing style manually to make the buildMaze function tail-recursive should be feasible, it’s not the kind of thing I actually like to do; I find it very hard to think in those terms. At this point I’m not sure how to avoid potential stack overflows without falling back to an iterative rather than recursive implementation; perhaps one of the functions in the List or Seq modules might do the trick?

Drawing the maze is a bit tricky: I use a grid with twice the size of the actual graph so that there are black gaps between paths, otherwise the drawing would be entirely white. I just need to keep track of which square is connected to which to add the proper intermediary squares (see colorGrid function).

Like any WPF application, this code requires WindowsBase.dll, PresentationCore.dll, PresentationFramework.dll and System.Xaml.dll to build.

The fastest sum in F#

In this post I will explore the question of how to compute the sum of a collection in F# as fast as possible; that is, as close as optimised C++ as possible. This should give us some insights as to how the F# compiler and JIT optimize code.

So first let’s look at C++ in Visual Studio 2012. I’ve enabled optimizations, but not vector instructions (SSE) for the sake of comparison with the CLR JIT, which never vectorizes anything. That is to say, C++ can generate much more sophisticated assembly than this, but this is a good starting point from our perspective.

#include <array>
#include <iostream>

using namespace std;

template<size_t size>
int sum(array<int, size>& arr) {
	int total = 0;
	for (size_t i = 0; i < arr.size(); ++i) {
		total += arr[i];
	}
	return total;
}

int main() {
	array<int, 1000> arr = {};
	auto total = sum(arr);
	cout << total;
}

Of course the compiler inlines the function, but that doesn’t change the loop. It is compiled to the following assembly (comments by me)

001A129E  xor ecx,ecx                     // sum = 0
001A12A0  xor eax,eax                     // i = 0
001A12A2  xor edx,edx                     // sum2 = 0

001A12A4  add ecx,dword ptr [esp+eax*4]   // sum += arr[i]
001A12A7  add edx,dword ptr [esp+eax*4+4] // sum2 += arr[i + 1]
001A12AB  add eax,2                       // i += 2

001A12AE  cmp eax,3E8h                    // if i < 1000
001A12B3  jb main+34h (01A12A4h)          // goto 001A12A4  
 
001A12B5  add ecx,edx                      // sum = sum + sum2

Here the C++ compiler has decided to unroll the loop once, maintaining two running totals, and adding the two at the end. This is an interesting optimisation but it’s not very important; a strictly minimal loop could be reduced to 4 instructions:

add ecx dword ptr[esp+eax*4] // sum += arr[i]
inc eax                      // i++
cmp eax,3E8h                 // if i < 1000
jb 1                         // goto 1

Since the JIT compiler doesn’t do much in terms of advanced optimisations, this minimal loop is what we’d expect if it simply did its job correctly. Let’s see what we can get out of F#. We’ll compare four approaches: “functional” using Array.fold, “succint” using Array.sum, “imperative” using a for loop like we did in C++, and finally we’ll test the “for .. in” loop (foreach in C#).

Note that in order to see the optimized JIT assembly, you need to compile in Release mode and uncheck “Suppress JIT optimization on module load” and “Enable Just My Code” in the debugger options. This is in 32-bit mode as well.

[<EntryPoint>]
let main argv = 

    let arr = Array.zeroCreate 100

    let sum1 = Array.fold(fun acc elem -> acc + elem) 0 arr

    let sum2 = Array.sum arr

    let mutable sum3 = 0
    for i in 0 .. arr.Length - 1 do
        sum3 <- sum3 + arr.[i]
    
    let mutable sum4 = 0
    for elem in arr do
        sum4 <- sum4 + elem
    
    printfn "%d" sum1
    printfn "%d" sum2
    printfn "%d" sum3
    printfn "%d" sum4
    0

One might point out that since the array contains only zeroes, the compiler could optimize all sums away and simply print out zeroes, but fortunately for our study, it’s not smart enough to do that.

Let’s start with Array.fold. I won’t post all the assembly code because there’s quite a long prologue and epilogue which don’t really matter. The inner function is compiled to this:

00000000  mov         eax,edx 
00000002  mov         edx,dword ptr [esp+4] 
00000006  add         eax,edx 
00000008  ret         4 

This is quite good; it simply loads the next element of the array, adds it to the running total, and returns it.

The inner loop of the “fold” function is compiled to this:

0000002c  cmp         esi,ebx 
0000002e  jae         00000081 
00000030  push        dword ptr [edi+esi*4+8] 
00000034  mov         ecx,dword ptr [ebp-14h] 
00000037  mov         eax,dword ptr [ecx] 
00000039  mov         eax,dword ptr [eax+28h] 
0000003c  call        dword ptr [eax+14h] 
0000003f  mov         edx,eax 
00000041  inc         esi 
00000042  mov         eax,dword ptr [ebp-10h] 
00000045  inc         eax 
00000046  cmp         esi,eax 
00000048  jne         0000002C

I’m not going over this in detail; it basically just loads the arguments to the function in registers, calls the function, increments its counter, advances the position in the array, and loops until it hits the end of the array. This is not bad assembly code at all; it’s about as terse and efficient as you could expect a straightforward implementation of fold to be. That said, it’s definitely a lot of code compared to what we actually need, and it’s somewhat disappointing that such a short lambda function isn’t inlined and further optimisations done after that.

Let’s look at Array.sum now:

00000040  add         ebx,dword ptr [edi+esi*4+8] 
00000044  jo          000002DB 
0000004a  inc         esi 
0000004b  cmp         edx,esi 
0000004d  jg          00000040 

Whoa, now we’re talking. This is essentially our “minimal” 4 instructions loop, with the additional “jo” instruction after the “add”, which checks for overflow. If we force a jump to this 000002DB address, an OverflowException is thrown by the runtime. This makes a lot of sense; a sum of an arbitrary number of integers could easily overflow and it’s probably not a good thing in general.

The imperative loop is compiled to this:

0000005a  add         ebx,dword ptr [edi+esi*4+8] 
0000005e  inc         esi 
0000005f  cmp         edx,esi 
00000061  jg          0000005A 

Nice! This is as efficient as we’d hope the JIT to do. No overflow check is performed, so this will silently give incorrect results. If you’re really concerned about performance, your array is large, and if your array is large then overflow is quite likely. So in general I think Array.sum makes more sense than trying to bypass the overflow check to save 1 instruction, but it’s nice to see that you can avoid it if you really want to.

Finally let’s take a look at the for .. in loop:

0000006b  mov         eax,dword ptr [edi+ecx*4+8] 
0000006f  add         esi,eax 
00000071  inc         ecx 
00000072  cmp         edx,ecx 
00000074  jg          0000006B 

This is essentially the same thing, except that the “fetch array element and add to running total” operation is done in two steps instead of one, adding an unnecessary instruction. This is a bit disappointing but it certainly doesn’t change much in terms of performance.

In conclusion, we have seen that the fastest sum in F# is simply the for [index] in 0 .. [length – 1] loop, but that Array.sum gives us overflow checking and thus makes a lot more sense in general. The for .. in loop is ever so slightly less efficient, and Array.fold incurs substantial overhead and should be reserved for more complex computations. We’ve also seen that the C++ compiler can do interesting optimisations beyond what the JIT is capable of, but that the JIT can nonetheless generate quite good, if unsophisticated, assembly.

Re: 8 Most common mistakes C# developers make

A recent blog article by Pawel Bejger lists some common mistakes made by C# developers. As noted by several in the comments, some of this information can be misleading or incorrect. Let’s review the problematic points:

String concatenation instead of StringBuilder

The author argues that using string concatenation is inefficient compared to using StringBuilder. While this is true in the general case, including the example presented, this must be taken with a grain of salt. For instance, if you just want to concatenate a list of strings as in the example presented, the easiest and fastest way is String.Concat(), or String.Join() if you need to insert something in between each pair of strings.

In addition, compile-time concatenations are automatically translated by the compiler into the appropriate calls to String.Concat(). Therefore, the use of StringBuilder should be reserved to building complex strings at runtime; don’t go blindly replacing every string concatenation code with StringBuilders.

Casting with “(T)” instead of “as (T)”

The author argues that if there’s the slightest probability a cast could fail, “as (T)” should be used instead of the regular cast. This is a very common myth and is misleading advice. First, let’s review what each cast does:

  • “(T)” works with both value and reference types. It casts the object to T, and throws an InvalidCastException if the cast isn’t valid.
  • “as (T)” works only with reference types. It returns the object casted to T if it succeeds, and null if the cast isn’t valid.

Each cast expresses a different intent. “(T)” means that you fully expect this cast to succeed, and that if it doesn’t, this is an error in the code. This is the simple and general case. “as (T)” means that you fully expect this cast NOT to succeed at least some of the time, that this is a normal occurrence, and that you will take care of manually handling it via a null check after.

The real mistake I often see is “as (T)” not followed by a null check. The developer fully expects the cast to succeed so doesn’t bother to write the null check, but the day something goes awry, no exception is thrown on the invalid cast, no null check is performed, and a hard to track bug has found its way into the code base. So my advice is to always use the regular cast “(T)” unless you intend to check yourself for the invalid cast via “as (T)” and a null check.

Using “foreach” instead of “for” for anything else than collections

The author claims that using “for” to iterate “anything that is not a collection (so through e.g. an array)””, the “for” loop is much more efficient than the “foreach” loop.

As with any claims regarding performance, this is easily verified. In this case, the verification proves the author wrong.

As the page he links to claims (and which is very outdated), the generated IL for a “foreach” loop is slightly larger than that for a “for” loop. This is no indication of its performance however, as the speed of code doesn’t correlate with its size, and IL isn’t actually executed but compiled just-in-time to assembly code, where further optimizations may happen. A very simple test performing the sum of an array of ints shows that the machine code generated for a foreach loop is in fact shorter, and the execution time also shorter, than for the for loop. Results on my machine (x86 release mode with optimisations, no debugger attached):

SumForEach: 482 ms

SumFor: 503ms

As you can see, this is in fact a very small difference, so I would advise that in general you use the loop that is the most idiomatic or readable, and not worry too much about the performance difference. But if you’re writing performance-sensitive code, don’t rely on someone else’s benchmark and test things out yourself. You may find that it’s not because  code is visually more compact or more C-like that it performs any better.