Tuesday, November 1, 2011

Fast (approximate) Sqrt method in C#

As part of the video codec that Alanta uses, we need to calculate the variation between blocks in one frame and the equivalent blocks in the next frame. I’ve been using an algorithm proposed by Thiadmer Riemersma that looks something like this:

public static double GetColorDistance(byte r1, byte g1, byte b1, byte r2, byte g2, byte b2)
{
    int rmean = (r1 + r2) / 2;
    int r = r1 - r2;
    int g = g1 - g2;
    int b = b1 - b2;
    int weightR = 2 + rmean / 256;
    const int weightG = 4;
    int weightB = 2 + (255 - rmean) / 256;
    return Math.Sqrt(weightR * r * r + weightG * g * g + weightB * b * b);
}

It works pretty well, but that last step depends on a square root calculation, which is relatively slow; and when this is something you need to run on every pixel in a frame, you want it to be as fast as possible. Consequently, I’ve been looking at ways to optimize it.

The important thing to note is that for my purposes, close is good enough: I don’t need IEEE precision. It turns out that there’s a pretty good approximation that’s available in languages like C or C++ which let you do unsafe casts back and forth between ints and floats:

float sqrt_approx(float z)
{
    union
    {
        int tmp;
        float f;
    } u;
    u.f = z;
    u.tmp -= 1 << 23; /* Subtract 2^m. */
    u.tmp >>= 1; /* Divide by 2. */
    u.tmp += 1 << 29; /* Add ((b + 1) / 2) * 2^m. */
    return u.f;
}
The problem with this approach is that C# doesn’t normally let you play this kind of magic.
The key word being “normally”, of course.
Turns out there’s one trick you can use to make C# treat the same memory address as either an int or a float, and that’s to create a struct with a [StructLayout(LayoutKind.Explicit)] attribute. (And surprisingly enough, the trick works in Silverlight as well.) The resulting class looks like this:
public class Approximate
{
    public static float Sqrt(float z)
    {
        if (z == 0) return 0;
        FloatIntUnion u;
        u.tmp = 0;
        u.f = z;
        u.tmp -= 1 << 23; /* Subtract 2^m. */
        u.tmp >>= 1; /* Divide by 2. */
        u.tmp += 1 << 29; /* Add ((b + 1) / 2) * 2^m. */
        return u.f;
    }

    [StructLayout(LayoutKind.Explicit)]
    private struct FloatIntUnion
    {
        [FieldOffset(0)]
        public float f;

        [FieldOffset(0)]
        public int tmp;
    }
}
The results are pretty good: it’s more than twice as fast, and the results tend to be within 2% of the “real” answer:

MathSqrtTest: averageCompletionTime = 2424.000
ApproxSqrtTest: averageCompletionTime = 1058.000
Average variation: 0.0253587081881525

If you want a bit more accuracy at the cost of an additional CPU cycle or two, you can use this one, based on the famous inverse square root method in Quake 3:

public static float Sqrt2(float z)
{
    if (z == 0) return 0;
    FloatIntUnion u;
    u.tmp = 0;
    float xhalf = 0.5f * z;
    u.f = z;
    u.tmp = 0x5f375a86 - (u.tmp >> 1);
    u.f = u.f * (1.5f - xhalf * u.f * u.f);
    return u.f * z;
}
It’s a tad slower than the first (though still nearly 2x as fast as Math.Sqrt()), but much more accurate:

MathSqrtTest: averageCompletionTime = 2428.400
ApproxSqrt2Test: averageCompletionTime = 1361.000
Average variation for method2: 0.000928551700594071