Friday, September 01, 2006

.NET Generics : Numerical limitations

Generics were added to the .NET framework in version 2.0, and I for one gave a big cheer. I am a massive fan of templates in C++ and use them extensively, especially as part of the STL library. Unfortunately .NET generics have some serious limitations when used for numerical programming that are worth highlighting.

The Problem

Consider the following STL algorithm "accumulate" that simply adds a list of numbers:

template<typename T>
T accumulate(T* begin, T* end, T initalValue)
{
    T sum = initalValue;
    for(T* i = begin; i != end; ++i)
    {
        sum = sum + *i;
    }
    return sum;
}

I know this is not what the STL algorithm looks like, but I have simplified things (such as * instead of 'iterator') for those readers not as well acquainted with the beautiful language.

Now consider the .NET generic equivalent (in C++/CLI for easy comparison):

generic<typename T>
T accumulate(IEnumerable<T>^ items, T initalValue)
{
    T sum = initialValue;
    for each(T i in items)
    {
        sum = sum + i;
    }
    return sum;
}

Great....except this won't compile because sum + i is not supported by Object and that is how generics work.

The Reason

For templates, all wildcard parameters (like T in our example) are resolved at compile time. They are like smart (and safe) macros. For generics, this resolution is done at runtime, therefore any method calls or functions unique to the wildcard parameter (like +) could fail at runtime if the actual parameter type is not supported.

To avoid lots of runtime exceptions, the compiler tries to enforce some measure of constraint on the method calls a generic function makes through the where keyword. This can apply an interface constraint ("must support method..."), an inheritance constraint ("must be a..."), and various other constraints. Crucially though for our purposes: it cannot be applied to the numeric operators.

There is a good reason for delaying generic type resolution until runtime: it allows you to use generic libraries with parameters the library designer never expected. In C++ templates, this can only be done by distributing the header files, which is inconvenient from a source management point of view, and makes it difficult to control commercial IP.

Some people argue that runtime resolution reduces code bloat. It does, but who cares? Check your hard drive: it is not full of binary code, it is full of data.

The Solutions

The plural nature of this sections heading should warn you that no perfect fix is obvious. I present the solutions to date and leave the reader to make up their own mind:

  • Eric Gunnerson acknowledges the problem and suggests reflection could be used, or that we should wait for a language update.
  • Andrew Clymer has some good solutions involving delegate functions, perhaps stored in a factory. He also suggests how reflection might be used.
  • Juan Wajnerman shows us what the generic algorithms look like when compiled to IL. He points out that this is not a problem of the framework, just a restriction in the language syntax.
  • Frédéric Didier has written a port of STL in C# that overcomes the problem with delegates (albeit at some syntactic cost).
Personally I would like more expressive syntax in the where command, but it will be hard even then to capture all eventualities. Perhaps a where * clause that permits all function calls and tells the compiler: "if this goes wrong at runtime, I will take the rap".

After conferring with Steve, we think the constraints could be inferred from the functions you actually use. This is basically type inference for function prototypes. The "contract" is then automatically added to the generic functions prototype, and violations will still be detected at compile time by the function's clients. No need for runtime exceptions, and no need for the where command at all.

Maybe STL/CLR (neé STL.NET) will solve all these problems (it must address them somehow I guess). Any comments welcome.

1 comment:

Anonymous said...

There is a new C# STL library to consider: CSTL.

http://sourceforge.net/projects/cstl

It is not ready for primetime yet though.