As a frequent reviewer of queries, and a regular contributor to our answers site and dba.stackexchange.com, I come across a lot of code that seems to have been written, shall we say, quickly. There are a lot of transgressions of course, but one that I notice frequently is inconsistency in case. People will have a table like this:
CREATE TABLE dbo.MyTable ( PersonID INT, FirstName SYSNAME );
And then a query like this:
SELECT personId, FIRSTName FROM dbo.myTable ORDER BY firstname;
The code compiled and the query returned results, so party time, right?
This can cause problems.
For one, being inconsistent about case can lead to situations like this, where another user executes the exact same logical query, but uses different case:
SELECT PersonId, Firstname FROM dbo.Mytable ORDER BY firstName;
There are a few subtle differences you might spot right away, and if you look at the plan cache, you'll see that you actually stored two copies of the plan (since the query_hash is case- and whitespace-sensitive). I talked about this a bit more thoroughly and showed an example late last year:
More importantly, though: neither query uses identifiers that exactly match the base table or its columns. And while the query may work on your machine, it will break if it gets deployed to a case-sensitive database or to a server using a case-sensitive collation.
I set up two SQL Server instances, one with the default SQL_Latin1_General_CP1_CI_AS collation, and one with its case-sensitive cousin, SQL_Latin1_General_CP1_CS_AS. On each instance, I set up two databases, one using each collation. What I wanted to do was simulate what would happen if I wrote code – sloppily, on purpose – in the best case scenario (case insensitive collation on a case insensitive instance), then tried to deploy it and run it in the other environments.
So, on that initial CI/CI instance, I created the following objects – a table and a matching TVP, along with a stored procedure that intentionally uses the wrong case for the table and column names, and even uses the wrong case for its own input parameter:
CREATE TABLE dbo.Customers ( CustomerID INT PRIMARY KEY, FirstName NVARCHAR(32) NOT NULL, LastName NVARCHAR(32) NOT NULL ); GO CREATE TYPE dbo.tvpCustomer AS TABLE ( CustomerID INT PRIMARY KEY, FirstName NVARCHAR(32) NOT NULL, LastName NVARCHAR(32) ); GO CREATE PROCEDURE dbo.AddCustomers @TVP dbo.tvpcustomer READONLY AS BEGIN SET NOCOUNT ON; INSERT dbo.customers(customerid,firstname,lastname) SELECT customerid,firstname,lastname FROM @tvp; END GO
First, I tried deploying the same objects to the case sensitive database on the case insensitive server. I can't get very far, because as soon as I try to create the procedure, I get all kinds of problems:
Column, parameter, or variable #1: Cannot find data type dbo.tvpcustomer.
Parameter or variable '@TVP' has an invalid data type.
Msg 1087, Level 16, State 1, Procedure AddCustomers
Must declare the table variable "@TVP".
I next tried the case insensitive database on the case sensitive server. There was only one error this time; interesting to note, though, that the error message uses the lower-case version of the table variable name (which comes from the procedure body), unlike above where it used the upper-case version (which comes from the parameter list):
Must declare the table variable "@tvp".
The case sensitive database on the case sensitive server returned to the more verbose error messages above. So then I tried partially correcting the procedure in that environment, making the parameter casing match and the type name correct:
CREATE PROCEDURE dbo.AddCustomers @TVP dbo.tvpCustomer READONLY --------------^ fixed this AS BEGIN SET NOCOUNT ON; INSERT dbo.customers(customerid,firstname,lastname) SELECT customerid,firstname,lastname FROM @TVP; ----------------------------- and this ----^^^ END GO
I was expecting this to actually get created, perhaps with that warning about deferred name resolution, because it shouldn't have been able to find
dbo.customers in the system metadata. But no, it matched the table regardless of case, but noticed that the column names didn't match:
Invalid column name 'customerid'.
Msg 207, Level 16, State 1, Procedure AddCustomers
Invalid column name 'firstname'.
Msg 207, Level 16, State 1, Procedure AddCustomers
Invalid column name 'lastname'.
So, this made me investigate another avenue: fixing the column names, but not the table name. Surely then the procedure wouldn't get created?
CREATE PROCEDURE dbo.AddCustomers @TVP dbo.tvpCustomer READONLY AS BEGIN SET NOCOUNT ON; INSERT dbo.customers(CustomerID,FirstName,LastName) --- fixed these -----^-------^^-^----^----^---^ SELECT CustomerID,FirstName,LastName FROM @TVP; -- and --^-------^^-^----^----^---^ END GO
Lo and behold:
Okay, so what if we call it (with code that doesn't get any of the case wrong, of course)?
DECLARE @tt dbo.tvpCustomer; INSERT @tt VALUES(1, N'Bob', N'Frank'); EXEC dbo.AddCustomers @TVP = @tt;
The engine must look in a different place now, because this yields:
Invalid object name 'dbo.customers'.
This is an even bigger problem, of course, because the code was deployed successfully, and now the error might not happen until an end user tries to add (or become) a customer…
A LITTLE SIMPLER.
Here's an even simpler set of code that I wanted to try on each database/instance combination:
-- wrong case for variable DECLARE @foo INT; SELECT @FOO; GO CREATE TABLE dbo.foo(bar INT); GO -- wrong case for column name SELECT BAR FROM dbo.foo; GO -- wrong case for object name SELECT bar FROM dbo.FOO; GO -- wrong case for database name USE Master; GO
Here's how they worked out:
|Database / Instance||variable||column||object||database|
|CI / CI||OK||OK||OK||OK|
|CS / CI||OK||Invalid column||Invalid object||OK|
|CI / CS||Must declare variable||OK||OK||Database does not exist|
|CS / CS||Must declare variable||Invalid column||Invalid object||Database does not exist|
The reasons for these patterns quickly become fairly evident: variables use instance collation, object and column names use the collation of
sys.objects etc. (which use database collation), and database names use
sys.databases (which uses instance collation). If we look at those in each case:
SELECT collation_name FROM sys.all_columns WHERE name = N'name' AND [object_id] = OBJECT_ID(N'sys.objects'); SELECT collation_name FROM master.sys.all_columns WHERE name = N'name' AND [object_id] = OBJECT_ID(N'sys.databases');
This is what they look like:
|Database / Instance||sys.all_objects||sys.all_columns|
|CI / CI||SQL_Latin1_General_CP1_CI_AS||SQL_Latin1_General_CP1_CI_AS|
|CS / CI||SQL_Latin1_General_CP1_CS_AS||SQL_Latin1_General_CP1_CI_AS|
|CI / CS||SQL_Latin1_General_CP1_CI_AS||SQL_Latin1_General_CP1_CS_AS|
|CS / CS||SQL_Latin1_General_CP1_CS_AS||SQL_Latin1_General_CP1_CS_AS|
There are some peculiarities, though. For example, with system objects, you can get the case of a column name wrong no problem, even in a case sensitive database on a case sensitive instance. This works no problem:
SELECT NAME FROM sys.objects;
However if you get the case wrong for the view itself instead of the column:
SELECT name FROM sys.Objects;
Still an error message:
Invalid object name 'sys.Objects'.
So it looks like different rules are used to determine whether case sensitivity matters for system objects, depending on whether you are looking at the object itself or its columns. Even more reason for you to just strive to have code that uses correct, matching case from the start, in all cases.
Long story short, a lot of things can go wrong when you develop in a case insensitive collation but code might ultimately get deployed to a case sensitive database or instance (or both). In most cases the errors will be caught at deployment time, but in a few, the objects could be created successfully anyway, and problems would not be detected until testing/QA time (or, worse, even later). This is one of those things that is fairly easy to get right from the start, and a lot more cumbersome to fix later.
There are some code analysis tools that can check for things like this; when I pinged George Mastros (@gmmastros), he agreed that it was a good idea, and is going to try to put it in an upcoming release of SQLCop.
But you can prevent these problems far earlier than that, if you're willing to do it. I have long been a proponent of developing – at least for code that does not cross databases or servers – on a case sensitive collation. This makes sure these errors are caught up front, and not six months from now when a database is migrated or consolidated. "But won't that slow me down?" Initially, maybe, a little. But in all honesty, IntelliSense (or tools like Redgate SQL Prompt) should prevent you from ever having to type this stuff correctly on your own anyway.