Tag Archives: F#

Ways to get NullReferenceException in F#

Although F# largely eliminates null, there are still several ways to get a NullReferenceException using F#, some of which may be surprising. But let’s start with the obvious ones:

  • Unchecked.defaultof. This has an intentionally ugly name with “Unchecked” in it and its whole purpose is to create default values, i.e. null for reference types.
    Advice: Use “Unchecked.defaultof” as you would “unsafe” in C#: only in specific interop or performance scenarios and when you really know what you’re doing.
  • Array.zeroCreate. This creates an array where every value is Unchecked.defaultof of the specified type, akin to doing “new T[x]” in C#.
    Advice: Write the code in such a way that you can create and initialize the array at the same time, using Array.create, Array.init or expressions like array comprehensions. Only use zeroCreate when you absolutely must defer member initialization and cannot afford to fill the array with some temporary values, e.g. in performance-critical code.
  • Declaring explicit fields with val. Fields declared in this can be temporarily uninitialized and therefore null.
    Advice: As MSDN says: “Explicit fields are not intended for routine use. In general, when possible you should use a let binding in a class instead of an explicit field. Explicit fields are useful in certain interoperability scenarios, such as when you need to define a structure that will be used in a platform invoke call to a native API, or in COM interop scenarios.”

So far all these are language features designed for interop or high-performance scenarios where you explicitely opt-in to allowing nulls. But you should also be aware of nulls that can creep up in more ordinary F# code:

  • Option.Value. “Get the value of a Some option. A NullReferenceException is raised if the option is None.”
    Advice: Don’t use it! New F# programmers, particularly coming C#, may want to use Option as they would Nullable in C#: testing for and then accessing the value using the associated property. I’m not entirely sure why the member is visible from F#, as it should only be used by the compiler and perhaps other languages. F# gives you pattern matching, use it.
// Poor style in F#, 
// throws NullReferenceException if the conditional logic is wrong
if myOption <> None then
    doSomething(myOption.Value)

// Fool-proof
match myOption with
| Some(value) -> doSomething(value)
| None -> ()
  •  Using instance methods ToString() or GetHashCode() in generic code. A subtle point of F# is that both the Unit type and Option.None are represented as null in compiled code. It is therefore unsafe to use instance methods on generic method arguments that could be either Unit or Option.None.
    Advice: Use string and hash instead of ToString() and GetHashCode(). To illustrate:
// Throws NullReferenceException if value is () or None
let printHashCode value =
    printf "%d" (value.GetHashCode())

// Safe in all cases
let printHashCode value =
    printf "%d" (hash value)

printHashCode ();;
printHashCode None;;
Advertisements

3 reasons why the C# type system is broken, and how F# improves the situation

1) null

The most basic thing a type system does is ensure that any operation I do with an identifier is valid for that particular identifier. For example, if I write:

a.Length

the type system ensures that “a” has a type and that “Length” is an accessible member of that type. “a” could be a String, or an Array, for instance. If “a” was an integer, “a.Length” would simply not compile. The type system protects me from writing nonsense and that blowing up in my face at runtime (*cough* dynamic languages *cough*).

That seems simple enough, except C# (and Java, and most other OOP languages) doesn’t even get it right. C# cannot actually guarantee that any member access is valid because any variable can be null at any time, making any member access potentially invalid. “null” doesn’t have a “Length” member or, indeed, any member; it doesn’t support any operation. Attempting to do something with null at least has well-defined behavior (NullReferenceException) but still is a run-time error, i.e. something that made it past the compiler.

How does F# improve the situation? F# largely gets rid of this problem. F# types cannot be null (unless you ask for it really hard, i.e. Unchecked.defaultof<‘T>). If you want to represent nullability, F# makes you encode that notion in the type system through Option types; using an Option as if it was the wrapped type is a compile-time error, and exhaustive pattern matching ensures you have to deal with the possibility that the value is absent. The fun counterpart is that everything else is guaranteed non-null, so you get nullability where you actually need it and don’t need to worry about it in general – unless of course you’re interacting with C# APIs.

2) void

In C#, all methods return something, except for those that are void, which must be treated completely differently. As Eric Lippert explains:

The effect of a call to a void method is fundamentally different than the effect of a call to a non-void method; a non-void method always puts something on the stack, which might need to be popped off. A void method never puts something on the stack. (…) A void returning method can never, ever be treated polymorphically with a non-void-returning method because doing so corrupts the stack!

This is why, for example, it was impossible in .NET 3.5 to have a single generic delegate for all return types; Func of void would blow up the runtime! Hence the silly Action delegate, and resulting duplication of code everywhere.
In addition, void methods are generally unusable with LINQ extension methods (Select, Aggregate, etc.) because you can’t have IEnumerable of void (what the hell is a sequence of absolutely nothing anyway?). They also screw up async/await in major ways; an async void method is indisguishable from the caller’s perspective from a non-async method, cannot be awaited and cannot be composed. This has become one of C#’s major gotchas.

How does F# improve the situation? In F#, void doesn’t exist whatsoever! All void methods are turned by the compiler into methods returning the singleton unit, a type that carries no useful information but is a real, ordinary type requiring no special treatment, unlike void. So in F# you have unified delegates and no async void nonsense. This makes life simpler and eliminates much code duplication and potential mistakes.

3) Exceptions

In C#, the type system says “this method evaluates to type X”; what it fails to say (aside from the fact that X might be null) is that this method may also not evaluate to anything and throw an exception instead. Exceptions got popular in OOP languages in the 1990s when the only alternative was checking for return codes, C-style. Of course the latter is maintenance and code clutter hell, and exceptions do allow us to write just the happy path in general and centralize error handling. It’s just something the type system totally ignores; the fact that your code compiles says nothing of how it accounts for exceptions; unless you are extremely disciplined, it probably blows up in various situations! Java is one of the few languages to have tried integrating exceptions in the type system: it’s a complete failure and it’s unclear how it could be improved. The designers of C# decided not to bother and just let you write nonsense.

Exceptions also assume a single-threaded, single-stack model; they just don’t mesh well with asynchronous code. Just think for yourself, what is supposed to happen if an exception is thrown by asynchronous code executing in some arbitrary thread? How is the caller notified? (Bonus points: what happens if the asynchronous method is void?) C# 5 does some intricate magic under the hood to allow you to still write something like:

try {
    await AsyncMethodThatMightThrow();
}
catch (InvalidOperationException e) {
}

and have that do what you’d expect; in fact the exception has to be handled by the execution context, stored temporarily, and re-thrown when execution resumes in the caller’s context. If the exception was contained in the return value of the method instead, you could simply inspect it and decide what to do, and you wouldn’t need cryptic (hopefully compiler-generated) boilerplate code to re-route the exception across stacks.

How does F# improve the situation? F# supports and embraces exceptions; one might argue that they’re still the default error mechanism; it’s hard to do otherwise when every .NET API throws exceptions. However, F# also has a functional background, which prefers encoding everything in return values. Indeed, it’s possible to write elegant exception-free code in F#, using the technique detailed by Scott Wlaschin in his article railway-oriented programming. I suggest reading the whole thing, but the basic idea is this: instead of returning T and perhaps throwing an exception, functions should return something like an Option type representing Success or Failure; Success contains the useful data (the T), and Failure contains some information regarding the error (like an exception does).

Doesn’t this throw us back to C-style return value hell? Not at all, because F# makes it easy to write very general adapter functions (i.e. monads) that simply pipeline errors through regular non-error-checking functions, until they hit whichever function actually wants to deal with the errors. This gives us both the happy-path convenience of exceptions and type safety, at the expense of minimal syntactic noise. This is technically also possible in C#, but it would look unimaginably horrible due to lack of discriminated unions, pattern matching, custom operators, aforementioned void and lousy higher-order function syntax.

In conclusion, when F# code compiles, it is 100% void-free, generally null-free, and as exception-free as you can make it; this means code that is consistent, does what it says and reserves fewer surprises at run-time. While C# 6 and particularly C# 7 may include more functional constructs like pattern matching or something more or less like discriminated unions, the 3 issues discussed here are fundamental to the design of the language and no amount of tacked-on features will fix them.

Sorting a linked list using only recursion

Sorting a linked list using only recursion is a common interview question and something any CS graduate should be able to do.

One good approach is to start from a known sort and clearly recursive algorithm such as QuickSort. Here’s a description of QuickSort in English: “Given a list of elements, choose one element to be the pivot; partition the list in two, i.e. the lists of elements lesser and greater than the pivot respectively; perform QuickSort on these two lists and concatenate the results.”

Here’s a description in pseudo-code:

function QuickSort elements (list -> list) :
    if elements is empty then
        return elements
    let pivot = elements.[0]
    let lesser, greater = Partition(elements, fun e -> e <= pivot)
    return Concat(QuickSort(lesser), [pivot], QuickSort(greater))

Note that the pivot can be chosen randomly, but since a linked list only gives us access to the first element directly, we’ll select the first element as the pivot.
Also note that we’re using a hypothetical Partition function that returns two list, one containing the elements for which the predicate applied to each element returns true and false respectively, and a hypothetical Concat function that concatenates 3 lists together.

From this the F# translation is actually shorter than the pseudocode:

let rec sort ls =
    match ls with
    | [] -> ls
    | hd::tl ->
        let lesser, greater = List.partition ((>=) hd) tl
        List.append (sort lesser) (hd::(sort greater))

Test run in FSI:

val sort : ls:'a list -> 'a list when 'a : comparison

> sort [-1;4;2;7;3];;
val it : int list = [-1; 2; 3; 4; 7]

It’s a bit of cheating to be using pre-defined List module functions that do most of the grunt work for us, so let’s implement partition and append (without looking at the F# source code).

partition is a function that takes a list, iterates through it while building two lists that it finally returns. Since it doesn’t take these two lists in parameter, we’ll have to use an inner loop, which in recursive style will make our function look like this:

let partition ls =
    let rec loop ls ls1 ls2 =
        // build ls1 and ls2
    loop ls [] []

Here in the inner loop, “ls” hides the outer “ls” and corresponds to the list that “loop” is iterating through; “ls1” and “ls2” are the two lists it is building. So now all we have to do is implement the body of the loop, which should take the first element of the list, add it to either ls1 and ls2 and call itself recursively on the rest of the list until it is empty. Very straightforward:

let partition fn ls =
    let rec loop ls ls1 ls2 =
        match ls with
        | hd::tl when fn hd -> loop tl (hd::ls1) ls2 
        | hd::tl -> loop tl ls1 (hd::ls2)
        | [] -> ls1, ls2
    loop ls [] []

Test run in FSI, partitioning a list of integers into even and odd lists:

val partition : fn:('a -> bool) -> ls:'a list -> 'a list * 'a list

> partition (fun e -> e % 2 = 0) [1;5;3;7;2;6;4;9];;
val it : int list * int list = ([4; 6; 2], [9; 7; 3; 5; 1])

It works, however it reverses the order of elements. This happens because of the recursive nature of the algorithm: as we advance through the list we are adding each element at the beginning of the list. Therefore the last element to be added becomes the first element, and the list is reversed. Does this matter to our sorting algorithm? No, because all that matters about the lists returned by partition is that they contain they elements for which the predicate returns true and false respectively, not that these elements are in any particular order. We can now swap List.partition for our version and verify that it still works:

> let rec sort ls =
    match ls with
    | [] -> ls
    | hd::tl ->
        let lesser, greater = partition ((>=) hd) tl
        List.append (sort lesser) (hd::(sort greater));;

val sort : ls:'a list -> 'a list when 'a : comparison

> sort [3;2;55;4;7;6];;
val it : int list = [2; 3; 4; 6; 7; 55]

Ok, now to write append. This is a function that takes two list and returns one list containing the elements of the first list followed by the elements of the second list, preserving their order. It does essentially the opposite of partition; whereas the former iterated one list to build two lists, append iterates two lists to build one list; but apart from that, the principle is similar. So the function will have a similar form:

let append ls1 ls2 =
    let rec loop ls ls1 ls2 =
        // build ls by iterating through ls1 and ls2
    loop [] ls1 ls2

At this point, implementing it is quite straightforward:

let append ls1 ls2 =
    let rec loop ls ls1 ls2 =
        match ls1, ls2 with
        | hd::tl, _ -> loop (hd::ls) tl ls2
        | _, hd::tl -> loop (hd::ls) ls1 tl
        | _ -> ls
    loop [] ls1 ls2

The idea is this: if there are elements in ls1 (the pattern hd::tl, _ matches), add the first element of ls1 to the list and keep iterating with the rest of ls1 (tl) and ls2. If there are no elements left in ls1 but there still are in ls2 (the first pattern doesn’t match but the second, _, hd::tl, matches), add the first element of ls2 to the list and keep iterating with the rest of ls1 (which should be empty) and ls2 (tl). If none of these pattern matches, then we’re done and we can return the constructed list.

As we test this, however, we notice a problem:

append [1;2;3] [5;6;7];;
val it : int list = [7; 6; 5; 3; 2; 1]

It does what it should except the order of elements is reversed; and again, this is due to the recursive nature of the algorithm; as we advance through both lists we are adding the elements we encounter to the beginning of the list; so the last element we encounter is the last to be added, and hence ends up being the first in the constructed list.

At this point it becomes apparent we need a reverse function. F# provides one, List.reverse, but of course we won’t use it and implement it ourselves. It turns out that implementing reverse was the most tricky thing for me to understand and the reason I’m writing this post. It would be possible to write reverse by fetching the nth element of the list for n = [length – 1 to 0], but that would run in O(n^2) which is completely unacceptable. So I started writing some code without really thinking and came up with this:

let rec badReverse ls =
    match ls with
    | hd::tl -> hd::(badReverse tl)
    | [] -> ls

But that just returns the original list! At this point I was very confused. In our implementations of partition and append, simply building a list by recursively appending each element to the beginning effectively reversed the list, because as we went through the list, each element was added to the beginning of the list, so the last element encountered became the new beginning of the list and so on. Aren’t we doing the same thing here?

And the answer is yes and no; yes it’s the same iteration mechanism, but instead of appending the element on the argument we pass to the next iteration, we’re appending it on the return value, so effectively we’ve reversed the order of operations. In the previous examples, we were starting with an empty list, and iterating through the original list, adding each element to that new list recursively.

So it appears that’s what we need to do here; start with an empty list and build it by appending the first element of the original list in the argument of the next loop iteration.

let reverse ls =
    let rec loop ls acc =
        match ls with
        | hd::tl -> loop tl (hd::acc)
        | [] -> acc
    loop ls []

And this works. If we have access to the very general List.fold, we also could just do:

> List.fold (fun list x -> x::list) [] [1;2;3];;
val it : int list = [3; 2; 1]

Putting it all together, our append function can now return elements in the correct order by reversing its result:

let append ls1 ls2 =
    let rec loop ls ls1 ls2 =
        match ls1, ls2 with
        | hd::tl, _ -> loop (hd::ls) tl ls2
        | _, hd::tl -> loop (hd::ls) ls1 tl
        | _ -> ls
    loop [] ls1 ls2 |> reverse

And now we could go and replace List.append with our own, and have our fully hand-crafted recursion-only algorithm working! But something still feels very wrong about that append function. We’re appending by adding elements in reverse order and then reversing the result. Couldn’t we just, you know, add the elements in the right order in the first place?

Hey, maybe that dumb badReverse code might be useful! It preserves order, does it not? All we need to do is have it iterate two lists instead of one and call it “append” instead:

let rec append ls1 ls2 =
    match ls1, ls2 with
    | hd::tl, _ -> hd::(append tl ls2)
    | _, hd::tl -> hd::(append ls1 tl)
    | _ -> []

… aaaand this works:

val append : ls1:'a list -> ls2:'a list -> 'a list

> append [1;2;3] [4;5;6];;
val it : int list = [1; 2; 3; 4; 5; 6]

Aaaah much better. More efficient, and we can get rid of the reverse function. So our full listing becomes:

let partition fn ls =
    let rec loop ls ls1 ls2 =
        match ls with
        | hd::tl when fn hd -> loop tl (hd::ls1) ls2 
        | hd::tl -> loop tl ls1 (hd::ls2)
        | [] -> ls1, ls2
    loop ls [] []

let rec append ls1 ls2 =
    match ls1, ls2 with
    | hd::tl, _ -> hd::(append tl ls2)
    | _, hd::tl -> hd::(append ls1 tl)
    | _ -> []

let rec sort ls =
    match ls with
    | [] -> ls
    | hd::tl ->
        let lesser, greater = partition ((>) hd) tl
        append (sort lesser) (hd::(sort greater))

Not a single library call in there, doesn’t this feel great? At this point we could even fix partition so that it preserves ordering, but I’ll leave that as an exercise to the reader.

In “fixing” our implementation to avoid calling reverse on a reversed list, however, we’ve lost one very useful property: tail-recursion. Indeed whereas our original append uses an inner tail-recursive loop, our new append performs an operation on the return value (appending an element) and so cannot be tail-recursive; therefore it will blow up the stack with large enough lists. Is there a way we could get both the simplicity and efficiency of our order-preserving append while keeping it tail-recursive?