Fast Case Insensitive String Comparisons

M

Merk

What are some alternatives to using .ToUpper() to perform case insensitive
string comparisons?

The reason I'm asking is that I'm comparing strings in a long loop, looking
for equality; and I want for this loop to run as fast as possible. So I'm
looking for a method that would be faster than .ToUpper().

Thanks!
 
M

Marc Gravell

In 2.0, IIRC, from tests (now deleted) I believe that
string.Equals(lhs,rhs, ComparerOptions.OrdinalIgnoreCase) is the
fastest.

You can also use StringComparer.OrdinalIgnoreCase.Equals(...) but I
beleive that this is a little slower.

Your best bet is to try every option in a tight loop to test;
you could try:

lhs.ToUpper() == rhs.ToUpper()

lhs.Equals(rhs, StringComparison.OrdinalIgnoreCase); // or
InvariantCultureIgnoreCase

string.Equals(lhs, rhs, StringComparison.OrdinalIgnoreCase) // or
InvariantCultureIgnoreCase

StringComparer.OrdinalIgnoreCase.Equals(lhs, rhs); // or invariant case
insensitive

etc

Marc
 
M

Marc Gravell

ComparerOptions.OrdinalIgnoreCase
I meant StringComparison.OrdinalIgnoreCase, but intellisense would have
told you that...

Marc
 
B

bob.abrahamian

Merk said:
What are some alternatives to using .ToUpper() to perform case insensitive
string comparisons?

The reason I'm asking is that I'm comparing strings in a long loop, looking
for equality; and I want for this loop to run as fast as possible. So I'm
looking for a method that would be faster than .ToUpper().

Thanks!

System.String.Compare(string a, string b, bool ignoreCase);

or

System.Collection.CaseInsensitiveComparer.DefaultInvariant.Compare(string
a, string b);

anyone have any ideas about my mulitple browser problem?
 
B

Bruce Wood

Merk said:
What are some alternatives to using .ToUpper() to perform case insensitive
string comparisons?

The reason I'm asking is that I'm comparing strings in a long loop, looking
for equality; and I want for this loop to run as fast as possible. So I'm
looking for a method that would be faster than .ToUpper().

Are you comparing the same string over and over again? For example, are
you sorting an array? If you are, then store the strings in both their
uppercase and mixed case versions, and compare only the uppercase
versions. You'll incur the cost of uppercasing them only once, and then
get the payback on the comparisons. Trade memory for more speed.

If you test each string only once then, of course, this won't help.

Usually you gain efficiencies when you step back and look at the
overall problem, and how you can avoid doing the same work over and
over again, rather than trying to figure out how to do that work faster.
 
J

Jon Skeet [C# MVP]

Marc Gravell said:
In 2.0, IIRC, from tests (now deleted) I believe that
string.Equals(lhs,rhs, ComparerOptions.OrdinalIgnoreCase) is the
fastest.

You can also use StringComparer.OrdinalIgnoreCase.Equals(...) but I
beleive that this is a little slower.

Your best bet is to try every option in a tight loop to test;
you could try:

lhs.ToUpper() == rhs.ToUpper()

Note that this test is not a culture-safe one. For instance, in Turkey,
I believe (if I remember the bug I had to fix in a system a while ago
:) that "mail".ToUpper() != "MAIL".

Using a StringComparer is a much better way, IMO.
 
M

Mark Wilden

Jon Skeet said:
Note that this test is not a culture-safe one. For instance, in Turkey,
I believe (if I remember the bug I had to fix in a system a while ago
:) that "mail".ToUpper() != "MAIL".

Just out of curiosity, did "mail".ToUpper() == "MAIL".ToUpper()?

///ark
 
J

Jon Skeet [C# MVP]

Mark Wilden said:
Just out of curiosity, did "mail".ToUpper() == "MAIL".ToUpper()?

Nope :)

using System;
using System.Globalization;
using System.Threading;

class Test
{
static void Main()
{
CultureInfo info = CultureInfo.CreateSpecificCulture("tr-TR");

Thread.CurrentThread.CurrentCulture = info;

Console.WriteLine ("mail".ToUpper()=="MAIL");
Console.WriteLine ("mail".ToUpper()=="MAIL".ToUpper());
}
}

ToLower() doesn't work either.

Isn't i18n fun? :)
 
M

Marc Gravell

Good to know; cheers for the input Jon.

For ref, I only mentioned the ToUpper() as a performance comparison
(since the OP explicitely mentioned it) to the StringComparer and
string.Equals() [with stated comparison], but thanks for the heads-up
and "proof positive" example.

Marc
 
L

Lucian Wischik

Jon Skeet said:

Funny!

The issue was that lowercase "i" gets capitalised to U+0130, "Latin
Capital Letter I With Dot Above".

Instead of the more normal U+0049, "Latin Capital Letter I".


I'm curious! Are there any Turks here who can explain Turkish
capitalisation?
 
J

Jon Skeet [C# MVP]

Mark Wilden said:
Oh well - I guess it's nobody's business but the Turks'.

Are you suggesting a history-insensitive comparison?

StringComparer.IgnoreHistory.Equals("Istanbul". "Constantinople")

Next up: a "man" comparison: Man.Triangle > Man.Particle etc?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,734
Messages
2,569,441
Members
44,832
Latest member
GlennSmall

Latest Threads

Top