UDTs and UDAggs – Serialization and Construction

I was talking with Dan Sullivan this week about UDTs in SQL Server 2005 and he's made an interesting discovery. About a week ago, I'd happened to mention to Dan that UDTs and UDAggs (user-defined aggregates) are supposed to have a pattern of "deserialize-do action-serialize". He thought I'd misunderstood. Then he went off to write an improved UDT/UDAgg for his talk at Windev. Dan has an engineering background and is excited about using UDTs for data types like Linear Dimension (he wrote the LDim example in our book).

What Dan discovered in attempting to implement some initialization inside the constructor is that, inside SQL Server operations, the constructor is never called! His UDT used Format.UserDefined, which means he implements the IBinarySerialize interface to Read and Write the UDT state. The engine just allocates some bare memory for an instance with the IL “newobj” instruction, then calls IBinarySerialize.Read to fill it out. What this means is that any internal fields that you would initialize inside a constructor must also be initialized when IBinarySerialize.Read is called as well. Dan found this out the hard way, reference types he'd expected to be initialized in the constructor were null when Read is called.

He's made a common implementation pattern in his UDTs: always have an Initialize or Init method and call it both in the constructor (for "ordinary" or client usage) and in IBinarySerialize.Read (for use inside SQL Server 2005). A good thing to keep in mind if you're writing a UDT.

When I asked if this wasn't against the "rules", his response was that C# always calls the constructor after newobj, but it's not a .NET requirement. Niels Berglund mentioned that this behavior is also true in Format.Native too, but all its fields are value types, so they are all initialized to zero by the runtime.

Niels also mentioned an interesting pattern he observed by tracing through a UDAgg he'd written with Format.UserDefined.UDAggs implement an Init() method instead of a constructor so that instances can be reused, for example in GROUP BY statements. The pattern Niels observed was:

1. Init
2. Write (but Accumulate hasn't been called yet)
3. Read-Accumulate-Write (for each row processed)
4. Read-Terminate

This pattern had puzzled him at the time, but alongside Dan's UDT observations this makes sense. I'd originally thought that the Read-Accumulate-Write pattern was for parallel processing, but it always happens even on a single processor. At this point we're all thinking it might a hedge based on how thread scheduling works in SQL Server.

So don't forget Dan's "Initialize in Read or constructor" pattern when you write Format.UserDefined UDTs. I think I have the good fortune to work with some pretty observant folks.

Other articles

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.