Origin of Cohelion: The action of process of multiple independent parts to become aligned around a central theme. Used in Astronomy and in Information Theory

T-SQL: About the Len() and DataLength() functions

In this technical blogpost, Peter de Heer explains the meaning of the Len() and DataLength() functions and how they’re used. By showing you a small demonstration we hope to give you some insights on these two functions that seems very similar, but have their own side-effects.

Let's see the difference

Both T-SQL functions are used for measuring the length of a string or binary. Each with their own performance characteristics and lesser known side-effects!

The bottom table shows what the SQL server responds with. Here,  Len gave a different answer than DataLength for both varchar and varbinary data-types!

What happened here?

The Len function trims away trailing spaces before counting the number of characters left. It does this for binary input as well. The hexadecimal coding 20 in the varbinary assignment is 32 in decimals, which represents a space in ASCII code.

(Typical) Programming languages developers are familiar with this but don’t display this same behaviour. This behaviour will likely catch many by surprise. Even if existing code is not functionally affected, the performance of the query likely is. The unaltered length of a variable length character or binary is readily accessible meta-data. But Len has to fetch the actual data and process it, in order to provide its answer!

So what about DataLength?

DataLength uses the meta-data directly and is very fast. But it does not exactly report the length of the input, instead it reports the number of bytes!
This requires the writer of the SQL to know exactly the data-type it is operating on. And this data-type must remain unaltered afterwards or the already written code will change meaning.
Executing this code will demonstrate the DataLength side-effect. Again, the table below shows what the SQL server responds with.

The Unicode characters are counted as two bytes each. To get the length we need, the output of DataLength has to be divided by two, when dealing with Unicode character input. The best performing function will be DataLength, even with the extra division for Unicode characters. This is especially true when executing over large unfiltered data-sets. Operating on meta-data alone is just faster. However, this ties-in the code with the data-type.

A very common use-case of determining string length, is to check if the length is zero or not. DataLength will give an instant answer, but Len will have to fetch the data, right-trim it and count the characters.