C# strings silently kill your SQL Server indexes in Dapper (opens in new tab)

(consultwithgriff.com)

126 pointsPretzelFisch19d ago126 comments

126 comments

This really doesn't have anything to do with C#. This is your classic nvarchar vs varchar issue (or unicode vs ASCII). The same thing happens if you mix collations.

I'm not sure why anyone would choose varchar for a column in 2026 unless if you have some sort of ancient backwards compatibility situation.

dspillett19d ago

> I'm not sure why anyone would choose varchar for a column in 2026

The same string takes roughly half the storage space, meaning more rows per page and therefore a smaller working set needed in memory for the same queries and less IO. Also, any indexes on those columns will also be similarly smaller. So if you are storing things that you know won't break out of the standard ASCII set⁰, stick with [VAR]CHARs¹, otherwise use N[VAR]CHARs.

Of course if you can guarantee that your stuff will be used on recent enough SQL Server versions that are configured to support UTF8 collations, then default to that instead unless you expect data in a character set where that might increase the data size over UTF16. You'll get the same size benefit for pure ASCII without losing wider character set support.

Furthermore, if you are using row or page compression it doesn't really matter: your wide-character strings will effectively be UTF8 encoded anyway. But be aware that there is a CPU hit for processing compressed rows and pages every access because they remain compressed in memory as well as on-disk.

--------

[0] Codes with fixed ranges, etc.

[1] Some would say that the other way around, and “use NVARCHAR if you think there might be any non-ASCIII characters”, but defaulting to NVARCHAR and moving to VARCHAR only if you are confident is the safer approach IMO.

gfody19d ago

utf16 is more efficient if you have non-english text, utf8 wastes space with long escape sequences. but the real reason to always use nvarchar is that it remains sargeable when varchar parameters are implicitly cast to nvarchar.

4 more replies

beart19d ago

I agree with your first point. I've seen this same issue crop up in several other ORMs.

As to your second point. VARCHAR uses N + 2 bytes where as NVARCHAR uses N*2 + 2 bytes for storage (at least on SQL Server). The vast majority of character fields in databases I've worked with do not need to store unicode values.

wvenable19d ago

> The vast majority of character fields in databases I've worked with do not need to store unicode values.

This has not been my experience at all. Exactly the opposite, in fact. ASCII is dead.

1 more reply

SigmundA19d ago

To complicate matters SQL Server can do Nvarchar compression, but they should have just done UTF-8 long ago:

https://learn.microsoft.com/en-us/sql/relational-databases/d...

Also UTF-8 is actually just a varchar collation so you don't use nvarchar with that, lol?

_3u1019d ago

Generally if it stores user input it needs to support Unicode. That said UTF-8 is probably a way better choice than UTF-16/UCS-2

2 more replies

croes19d ago

Since MS SQL Server 2019 varchar supports unicode so now it’s the opposite, you use nvarchar instead of varchar for backwards compatibility reasons.

jklowden17d ago

I’m not sure why the top-rated reply begins by presuming anything about the problem domain. Many domains have a specified language and implied if not explicit collation. Rejecting characters outside that domain is part of the job. There are no emojis listed on the NASDAQ.

paulsutter19d ago

Utf8 solved this completely. It works with any length unicode and on average takes up almost as little storage as ascii.

Utf16 is brain dead and an embarrassment

wvenable19d ago

Blame the Unicode consortium for not coming up UTF-8 first (or, really, at all). And for assuming that 65526 code points would be enough for everyone.

So many problems could be solved with a time machine.

1 more reply

Dwedit19d ago

It gets worse for UTF-16, Windows will let you name files using unpaired surrogates, now you have a filename that exists on your disk that cannot be represented in UTF-8 (nor compliant UTF-16 for that matter). Because of that, there's yet another encoding called WTF-8 that can represent the arbitrary invalid 16-bit values.

applfanboysbgon19d ago

I think this is a rather pertinent showcase of the danger of outsourcing your thinking to LLMs. This article strongly indicates to me that it is LLM-written, and it's likely the LLM diagnosed the issue as being a C# issue. When you don't understand the systems you're building with, all you can do is take the plausible-sounding generated text about what went wrong for granted, and then I suppose regurgitate it on your LLM-generated portfolio website in an ostensible show of your profound architectural knowledge.

ziml7719d ago

This is not at all just an LLM thing. I've been working with C# and MS SQL Server for many years and never even considered this could be happening when I use Dapper. There's likely code I have deployed running suboptimally because of this.

And it's not like I don't care about performance. If I see a small query taking more than a fraction of a second when testing in SSMS or If I see a larger query taking more than a few seconds I will dig into the query plan and try to make changes to improve it. For code that I took from testing in SSMS and moved into a Dapper query, I wouldn't have noticed performance issues from that move if the slowdown was never particularly large.

cosmez19d ago

This is a common issue, and most developers I worked with are not aware of it until they see the performance issues.

Most people are not aware of how Dapper maps types under the hood; once you know, you start being careful about it.

Nothing to do with LLMs, just plain old learning through mistakes.

keithnz19d ago

actually, LLMs do way better, with dapper the LLM generates code to specify types for strings

SigmundA19d ago

Yes I have run into this regardless of client language and I consider it a defect in the optimizer.

wvenable19d ago

I wouldn't consider it a defect in the optimizer; it's doing exactly what it's told to do. It cannot convert an nvarchar to varchar -- that's a narrowing conversion. All it can do is convert the other way and lose the ability to use the index. If you think that there is no danger converting an nvarchar that contains only ASCII to varchar then I have about 70+ different collations that say otherwise.

2 more replies

briHass19d ago

I've found and fixed this bug before. There are 2 other ways to handle it

Dapper has a static configuration for things like TypeMappers, and you can change the default mapping for string to use varchar with: Dapper.SqlMapper.AddTypeMap(typeof(string),System.Data.DbType.AnsiString). I typically set that in the app startup, because I avoid NVARCHAR almost entirely (to save the extra byte per character, since I rarely need anything outside of ANSI.)

Or, one could use stored procedures. Assuming you take in a parameter that is the correct type for your indexed predicate, the conversion happens once when the SPROC is called, not done by the optimizer in the query.

I still have mixed feelings about overuse of SQL stored procedures, but this is a classic example of where on of their benefits is revealed: they are a defined interface for the database, where DB-specific types can be handled instead of polluting your code with specifics about your DB.

(This is also a problem for other type mismatches like DateTime/Date, numeric types, etc.)

ziml7719d ago

Sprocs are how I handle complex queries rather than embedding them in our server applications. It's definitely saved me from running into problems like this. And it comes with another advantage of giving DBAs more control to manage performance (DBAs do not like hearing that they can't take care of a performance issue that's cropped up because the query is compiled into an application)

bonesss19d ago

As a general issue of hygiene I tend to wrap any ORM and access it through an internal interface.

1) The joy of writing and saying DapperWrapper can’t be overstated.

2) in conjunction with meaningful domain types it lets you handle these issues across the app at a single point of control, and capture more domain semantics for testing.

diath19d ago

It's weird that the article does not show any benchmarks but crappy descriptions like "milliseconds to microseconds" and "tens of thousands to single digits". This is the kind of vague performance description LLMs like to give when you ask them about performance differences between solutions and don't explicitly ask for a benchmark suite.

pllbnk19d ago

I disagree. I think it's a nice discovery many might be unaware of and later spend a lot of time on tracking down the performance issue independently. I also disagree that a rigorous benchmark is needed for every single performance-related blog post because good benchmarks are difficult to write, you have to account for multiple variables. Here, the author just said - "trust me, it's much faster" and I trust them because they explained the reasoning behind the degradation.

nmeofthestate19d ago

The writing style certainly screams LLM.

_vertigo19d ago

> No schema changes. No new indexes. No query rewrites. Just telling Dapper the correct parameter type.

pllbnk19d ago

Are we automatically discarding everything that might or might not have been written or assisted by an LLM? I get it when the articles are the type of meaningless self improvement or similar kind of word soup. However, if hypothetically an author uses LLM assistance to improve their styling to their liking, I see nothing wrong with that as long as the core message stands out.

1 more reply

downsplat19d ago

Did this post come out of a freezer from 1998? Who on earth creates databases in Latin1 in 2026?

Nevermind, looks like Sql Server didn't add utf8 collations until 2019 (!) and for decades people had to choose column by column between the 16-bit overhead of "nvarchar" and latin1... And still do if they want a bit of backwards compatibility. Amazing.

rmunn19d ago

"Just use Postgres" (which defaults to UTF-8 encoding unless specifically configured to use something else) is looking like better and better advice every day.

Doesn't help those tied to legacy systems that would require a huge, expensive effort to upgrade, though. Sorry, folks. There's a better system, you know it's a better system, and you can't use it because switching is too expensive? I've been there (not databases, in my case) and it truly sucks.

elmigranto19d ago

Third party dependencies are very easy: you just have to intimately know how it is implemented in addition to knowing your own code and stack, and then you are golden!

Nothing to learn, just focus on making your app, it’s all taken care of by This One Simple Package ;)

These things are so far from free as our tooling presents with “just nuget it or whatever”.

DeathMetal300019d ago

I’m sure writing their own ORM would have given them instantaneous insight into this issue and introduced no other challenges. Open source developers hate this one weird trick!

elmigranto19d ago

Especially for things used directly, you need to understand both, own and third party code, roughly to the same level. With own code, you only care for your own use case; with third-party — you have to kind of get everyone else's.

Depending on what you do and the dependency's scope, either way can make sense.

maciekkmrk19d ago

Interesting problem, but the AI prose makes me not want to read to the end.

jiggawatts19d ago

This feels like a bug in the SQL query optimizer rather than Dapper.

It ought to be smart enough to convert a constant parameter to the target column type in a predicate constraint and then check for the availability of a covering index.

valiant5519d ago

There's a data type precedence that it uses to determine which value should be casted[0]. Nvarchar is higher precedence, therefore the varchar value is "lifted" to an nvarchar value first. This wouldn't be an issue if the types were reversed.

0: https://learn.microsoft.com/en-us/sql/t-sql/data-types/data-...

Sankozi18d ago

So such issues can appear in more products and more datatypes (int and bigint have same problem).

This is really bad rule for SQL's "equality" operator.

Still optimizer should be able to handle it - if the result is the same, optimizer should take faster path.

wvenable19d ago

It's the optimizer caching the query plan as a parameterized query. It's not re-planning the index lookup on every execution.

SigmundA19d ago

The parameter type is part of the cache identity, nvarchar and varchar would have two cache entries with possibly different plans.

1 more reply

beart19d ago

How do you safely convert a 2 byte character to a 1 byte character?

jiggawatts19d ago

Easily! If it doesn't convert successfully because it includes characters outside of the range of the target codepage then the equality condition is necessarily false, and the engine should short-circuit and return an empty set.

jklowden17d ago

> SQL Server has to convert every single value in the column to nvarchar before it can compare.

This of course is not true. It is a defect in Microsoft’s query planner. And the proof lies in the remedy.

The recommended solution is to convert the search argument type to match that of the index. The user is forced to discover the problem and adjust manually. SQL Server could just as well have done that automatically.

No information is lost converting nvarchar to varchar if the index is varchar. If the search argument is ‘’, no conversion from varchar will match it (unless the index data is UTF8, which the server should know).

This is a longstanding bug in SQLserver, and not the only one. Instead of patting ourselves on the back for avoiding what SQL Server “has to do”, we should be insisting it not do it. Anymore.

smithkl4219d ago

Been bit by that before: it's not just an issue with Dapper, it can also hit you with Entity Framework.

mvdtnz19d ago

This is a really interesting blog post - the kind of old school stuff the web used to be riddled with. I must say - would it have been that hard to just write this by hand? The AI adds nothing here but the same annoying old AI-isms that distract from the piece.

wronex18d ago

This makes me sad. We have these type safe languages, then a DB comes along and brakes the type barrier. What are we to do? Property attributes and an ORM? Is there a Linq to SQL thing?

andrelaszlo19d ago

I thought, having just read the title, that maybe it's time to upgrade if you're still on Ubuntu 6.06.

pjmlp19d ago

I never had this issue with Dapper, as others point out, an holding it wrong problem.

adzm19d ago

even better is Entity Framework and how it handles null strings by creating some strange predicates in SQL that end up being unable to seek into string indexes

enord19d ago

This is due to utf-16, an unforgivable abomination.

bunbun6919d ago

AI slop article

Also no meaningful benchmarking was done

ltbarcly319d ago

Life is too short to use SQL Server. I know people that use it will swear it's "not bad anymore" but yes it is.

bni19d ago

yes it is

j / k navigate · click thread line to collapse

126 comments

wvenable19d ago

This really doesn't have anything to do with C#. This is your classic nvarchar vs varchar issue (or unicode vs ASCII). The same thing happens if you mix collations.

I'm not sure why anyone would choose varchar for a column in 2026 unless if you have some sort of ancient backwards compatibility situation.

dspillett19d ago

> I'm not sure why anyone would choose varchar for a column in 2026

--------

[0] Codes with fixed ranges, etc.

gfody19d ago

4 more replies

beart19d ago

I agree with your first point. I've seen this same issue crop up in several other ORMs.

wvenable19d ago

> The vast majority of character fields in databases I've worked with do not need to store unicode values.

This has not been my experience at all. Exactly the opposite, in fact. ASCII is dead.

1 more reply

SigmundA19d ago

To complicate matters SQL Server can do Nvarchar compression, but they should have just done UTF-8 long ago:

https://learn.microsoft.com/en-us/sql/relational-databases/d...

Also UTF-8 is actually just a varchar collation so you don't use nvarchar with that, lol?

_3u1019d ago

Generally if it stores user input it needs to support Unicode. That said UTF-8 is probably a way better choice than UTF-16/UCS-2

2 more replies

croes19d ago

Since MS SQL Server 2019 varchar supports unicode so now it’s the opposite, you use nvarchar instead of varchar for backwards compatibility reasons.

jklowden17d ago

paulsutter19d ago

Utf8 solved this completely. It works with any length unicode and on average takes up almost as little storage as ascii.

Utf16 is brain dead and an embarrassment

wvenable19d ago

Blame the Unicode consortium for not coming up UTF-8 first (or, really, at all). And for assuming that 65526 code points would be enough for everyone.

So many problems could be solved with a time machine.

1 more reply

Dwedit19d ago

applfanboysbgon19d ago

ziml7719d ago

cosmez19d ago

This is a common issue, and most developers I worked with are not aware of it until they see the performance issues.

Most people are not aware of how Dapper maps types under the hood; once you know, you start being careful about it.

Nothing to do with LLMs, just plain old learning through mistakes.

keithnz19d ago

actually, LLMs do way better, with dapper the LLM generates code to specify types for strings

SigmundA19d ago

Yes I have run into this regardless of client language and I consider it a defect in the optimizer.

wvenable19d ago

2 more replies

briHass19d ago

I've found and fixed this bug before. There are 2 other ways to handle it

(This is also a problem for other type mismatches like DateTime/Date, numeric types, etc.)

ziml7719d ago

bonesss19d ago

As a general issue of hygiene I tend to wrap any ORM and access it through an internal interface.

1) The joy of writing and saying DapperWrapper can’t be overstated.

2) in conjunction with meaningful domain types it lets you handle these issues across the app at a single point of control, and capture more domain semantics for testing.

diath19d ago

pllbnk19d ago

nmeofthestate19d ago

The writing style certainly screams LLM.

_vertigo19d ago

> No schema changes. No new indexes. No query rewrites. Just telling Dapper the correct parameter type.

pllbnk19d ago

1 more reply

downsplat19d ago

Did this post come out of a freezer from 1998? Who on earth creates databases in Latin1 in 2026?

rmunn19d ago

"Just use Postgres" (which defaults to UTF-8 encoding unless specifically configured to use something else) is looking like better and better advice every day.

elmigranto19d ago

Third party dependencies are very easy: you just have to intimately know how it is implemented in addition to knowing your own code and stack, and then you are golden!

Nothing to learn, just focus on making your app, it’s all taken care of by This One Simple Package ;)

These things are so far from free as our tooling presents with “just nuget it or whatever”.

DeathMetal300019d ago

I’m sure writing their own ORM would have given them instantaneous insight into this issue and introduced no other challenges. Open source developers hate this one weird trick!

elmigranto19d ago

Depending on what you do and the dependency's scope, either way can make sense.

maciekkmrk19d ago

Interesting problem, but the AI prose makes me not want to read to the end.

jiggawatts19d ago

This feels like a bug in the SQL query optimizer rather than Dapper.

It ought to be smart enough to convert a constant parameter to the target column type in a predicate constraint and then check for the availability of a covering index.

valiant5519d ago

0: https://learn.microsoft.com/en-us/sql/t-sql/data-types/data-...

Sankozi18d ago

So such issues can appear in more products and more datatypes (int and bigint have same problem).

This is really bad rule for SQL's "equality" operator.

Still optimizer should be able to handle it - if the result is the same, optimizer should take faster path.

wvenable19d ago

It's the optimizer caching the query plan as a parameterized query. It's not re-planning the index lookup on every execution.

SigmundA19d ago

The parameter type is part of the cache identity, nvarchar and varchar would have two cache entries with possibly different plans.

1 more reply

beart19d ago

How do you safely convert a 2 byte character to a 1 byte character?

jiggawatts19d ago

jklowden17d ago

> SQL Server has to convert every single value in the column to nvarchar before it can compare.

This of course is not true. It is a defect in Microsoft’s query planner. And the proof lies in the remedy.

This is a longstanding bug in SQLserver, and not the only one. Instead of patting ourselves on the back for avoiding what SQL Server “has to do”, we should be insisting it not do it. Anymore.

smithkl4219d ago

Been bit by that before: it's not just an issue with Dapper, it can also hit you with Entity Framework.

mvdtnz19d ago

wronex18d ago

This makes me sad. We have these type safe languages, then a DB comes along and brakes the type barrier. What are we to do? Property attributes and an ORM? Is there a Linq to SQL thing?

andrelaszlo19d ago

I thought, having just read the title, that maybe it's time to upgrade if you're still on Ubuntu 6.06.

pjmlp19d ago

I never had this issue with Dapper, as others point out, an holding it wrong problem.

adzm19d ago

even better is Entity Framework and how it handles null strings by creating some strange predicates in SQL that end up being unable to seek into string indexes

enord19d ago

This is due to utf-16, an unforgivable abomination.

bunbun6919d ago

AI slop article

Also no meaningful benchmarking was done

ltbarcly319d ago

Life is too short to use SQL Server. I know people that use it will swear it's "not bad anymore" but yes it is.

bni19d ago

yes it is

j / k navigate · click thread line to collapse