std::string and char[] comparisons

W

webretard

I never used C much and have grown accustomed to C++ std::string. This
is a weakness because I don't understand why comparisons ( == ) are
unreliable when dealing only with char[] types. It seems std::string
is always comparable and that std::string and char[] are always
comparable, but char[] to char[] comparison works sometimes, but not
others depending on the content of the char[]. Here is a small snippet
I wrote to demonstrate:

int main()
{

std::string one = "1qaz2wsx";
std::string two = "1qaz2wsx";

// Always matches
if ( one == two )
{
std::cout << "std::string Match" << std::endl;
std::cout << one << std::endl;
std::cout << two << std::endl;
}
else
{
std::cout << "std::string No Match" << std::endl;
std::cout << one << std::endl;
std::cout << two << std::endl;
}

// ------------------------------------------------------

char three[9] = "1qaz2wsx";
char four[9] = "1qaz2wsx";

// Matches sometimes, not others
if ( three == four )
{
std::cout << "char Match" << std::endl;
std::cout << three << std::endl;
std::cout << four << std::endl;
}
else
{
std::cout << "char No Match" << std::endl;
std::cout << three << std::endl;
std::cout << four << std::endl;
}

// ------------------------------------------------------

// Always matches
if ( one == three )
{
std::cout << "std::string and char Match" << std::endl;
std::cout << three << std::endl;
std::cout << four << std::endl;
}
else
{
std::cout << "std::string and char No Match" << std::endl;
std::cout << three << std::endl;
std::cout << four << std::endl;
}

return 0;
}

My goal is to compare char[] just as reliably as I can compare
std::string objects.

Terry
 
F

Francesco S. Carta

I never used C much and have grown accustomed to C++ std::string. This
is a weakness because I don't understand why comparisons ( == ) are
unreliable when dealing only with char[] types. It seems std::string
is always comparable and that std::string and char[] are always
comparable, but char[] to char[] comparison works sometimes, but not
others depending on the content of the char[]. Here is a small snippet
I wrote to demonstrate:

int main()
{

std::string one = "1qaz2wsx";
std::string two = "1qaz2wsx";

// Always matches
if ( one == two )
{
std::cout<< "std::string Match"<< std::endl;
std::cout<< one<< std::endl;
std::cout<< two<< std::endl;
}
else
{
std::cout<< "std::string No Match"<< std::endl;
std::cout<< one<< std::endl;
std::cout<< two<< std::endl;
}

// ------------------------------------------------------

char three[9] = "1qaz2wsx";
char four[9] = "1qaz2wsx";

// Matches sometimes, not others
if ( three == four )
{
std::cout<< "char Match"<< std::endl;
std::cout<< three<< std::endl;
std::cout<< four<< std::endl;
}
else
{
std::cout<< "char No Match"<< std::endl;
std::cout<< three<< std::endl;
std::cout<< four<< std::endl;
}

// ------------------------------------------------------

// Always matches
if ( one == three )
{
std::cout<< "std::string and char Match"<< std::endl;
std::cout<< three<< std::endl;
std::cout<< four<< std::endl;
}
else
{
std::cout<< "std::string and char No Match"<< std::endl;
std::cout<< three<< std::endl;
std::cout<< four<< std::endl;
}

return 0;
}

My goal is to compare char[] just as reliably as I can compare
std::string objects.

You can't. When you compare char[], you're comparing char*, in other
words, you're comparing addresses and not the data they point to.

Read the manual about C strings handling and you'll find the functions
needed to compare them. Converting them to std::string and comparing
them is just fine too.
 
V

Victor Bazarov

I never used C much and have grown accustomed to C++ std::string. This
is a weakness because I don't understand why comparisons ( == ) are
unreliable when dealing only with char[] types. It seems std::string
is always comparable and that std::string and char[] are always
comparable, but char[] to char[] comparison works sometimes, but not
others depending on the content of the char[]. Here is a small snippet
I wrote to demonstrate:

int main()
{

std::string one = "1qaz2wsx";
std::string two = "1qaz2wsx";

// Always matches
if ( one == two )
{
std::cout<< "std::string Match"<< std::endl;
std::cout<< one<< std::endl;
std::cout<< two<< std::endl;
}
else
{
std::cout<< "std::string No Match"<< std::endl;
std::cout<< one<< std::endl;
std::cout<< two<< std::endl;
}

// ------------------------------------------------------

char three[9] = "1qaz2wsx";
char four[9] = "1qaz2wsx";

// Matches sometimes, not others

If the compiler optimizes your strings to share the memory location
(since it can see that you never change the contents of the arrays), you
can get equality. Otherwise, they will always be different.
if ( three == four )
{
std::cout<< "char Match"<< std::endl;
std::cout<< three<< std::endl;
std::cout<< four<< std::endl;
}
else
{
std::cout<< "char No Match"<< std::endl;
std::cout<< three<< std::endl;
std::cout<< four<< std::endl;
}

// ------------------------------------------------------

// Always matches
if ( one == three )
{
std::cout<< "std::string and char Match"<< std::endl;
std::cout<< three<< std::endl;
std::cout<< four<< std::endl;
}
else
{
std::cout<< "std::string and char No Match"<< std::endl;
std::cout<< three<< std::endl;
std::cout<< four<< std::endl;
}

return 0;
}

My goal is to compare char[] just as reliably as I can compare
std::string objects.

You can't. Comparing pointers just doesn't work in C++. The array name
used in an expression decays into a pointer to the first element of the
array. Two pointers compare equal when they point to the same object.
In general two arrays occupy different memory (and if your compiler
somehow optimizes the memory allocation to make the arrays share the
memory, it can actually be a BAD THING(tm)).

V
 
J

James Kanze

On 7/7/2010 11:09 AM, (e-mail address removed) wrote:
My goal is to compare char[] just as reliably as I can
compare std::string objects.
You can't.

Without using a function.
Comparing pointers just doesn't work in C++.

Comparing pointers works very well: == returns true if the
pointers point to the same memory, and false if they don't. The
problem here is that the poster doesn't want to compare
pointers.
The array name used in an expression decays into a pointer to
the first element of the array.

The initial problem is that == isn't defined on arrays, and it
is on pointers. So the automatic conversion kicks in.
Two pointers compare equal when they point to the same object.
In general two arrays occupy different memory (and if your
compiler somehow optimizes the memory allocation to make the
arrays share the memory, it can actually be a BAD THING(tm)).

In his actual code, the language guarantees that the two arrays
have different addresses. (Either his comment "matches
sometimes, not others" is mistaken, or his compiler has an
error.) The compiler *is* allowed to place different string
literals at the same address, however; if you do:

char const* f1 = "abc";
char const* f2 = "abc";

, it's unspecified whether f1 == f2. (I'm not sure, but I think
even the following is allowed:

char const* f1 = "0abc";
char const* f2 = "abc";
if (f1 + 1 == f2) ...

..)
 
J

James Kanze

Stuart said:
(e-mail address removed) wrote:
int main()
{ [...]
char three[9] = "1qaz2wsx";
char four[9] = "1qaz2wsx";
// Matches sometimes, not others
if ( three == four )
{
std::cout << "char Match" << std::endl;
std::cout << three << std::endl;
std::cout << four << std::endl;
}
else
{
std::cout << "char No Match" << std::endl;
std::cout << three << std::endl;
std::cout << four << std::endl;
}

[...]
It is *not* the same. Special rules apply to string literals.
...although perhaps I should if the compiler can prove neither
array is later changed, judging by Victor's reply.

No. According to the standard, three and four are distinct
objects, and must have distinct addresses. A compiler can merge
them (or do anything else it wants), *if* *and* *only* *if* it
can prove that this has no effect on the "observable behavior"
of the program. In this case, it quite clearly does have an
effect on the observable behavior.
 
T

Terry

(e-mail address removed), on 07/07/2010 08:09:59, wrote:


I never used C much and have grown accustomed to C++ std::string. This
is a weakness because I don't understand why comparisons ( == ) are
unreliable when dealing only with char[] types. It seems std::string
is always comparable and that std::string and char[] are always
comparable, but char[] to char[] comparison works sometimes, but not
others depending on the content of the char[]. Here is a small snippet
I wrote to demonstrate:
int main()
{
   std::string one = "1qaz2wsx";
   std::string two = "1qaz2wsx";
         // Always matches
   if ( one == two )
   {
           std::cout<<  "std::string Match"<<  std::endl;
           std::cout<<  one<<  std::endl;
           std::cout<<  two<<  std::endl;
   }
   else
   {
           std::cout<<  "std::string No Match"<<  std::endl;
           std::cout<<  one<<  std::endl;
           std::cout<<  two<<  std::endl;
   }
// ------------------------------------------------------
   char three[9] = "1qaz2wsx";
   char four[9] =  "1qaz2wsx";
         // Matches sometimes, not others
   if ( three == four )
   {
           std::cout<<  "char Match"<<  std::endl;
           std::cout<<  three<<  std::endl;
           std::cout<<  four<<  std::endl;
   }
   else
   {
           std::cout<<  "char No Match"<<  std::endl;
           std::cout<<  three<<  std::endl;
           std::cout<<  four<<  std::endl;
   }
// ------------------------------------------------------
         // Always matches
   if ( one == three )
   {
           std::cout<<  "std::string and char Match"<<  std::endl;
           std::cout<<  three<<  std::endl;
           std::cout<<  four<<  std::endl;
   }
   else
   {
           std::cout<<  "std::string and char No Match"<<  std::endl;
           std::cout<<  three<<  std::endl;
           std::cout<<  four<<  std::endl;
   }
   return 0;
}
My goal is to compare char[] just as reliably as I can compare
std::string objects.

You can't. When you compare char[], you're comparing char*, in other
words, you're comparing addresses and not the data they point to.

Read the manual about C strings handling and you'll find the functions
needed to compare them. Converting them to std::string and comparing
them is just fine too.

Thank you all. I understand now and have found that converting to
std::string works well for me:

char one[9] = "1qaz2wsx";
char two[9] = "1qaz2wsx";

std::string my_one( one, 8 );

// Always matches
if ( my_one == two )
...

Thanks to all. This list is a helpful place.

Terry
 
F

Francesco S. Carta

(e-mail address removed), on 07/07/2010 08:09:59, wrote:


I never used C much and have grown accustomed to C++ std::string. This
is a weakness because I don't understand why comparisons ( == ) are
unreliable when dealing only with char[] types. It seems std::string
is always comparable and that std::string and char[] are always
comparable, but char[] to char[] comparison works sometimes, but not
others depending on the content of the char[]. Here is a small snippet
I wrote to demonstrate:
int main()
{
std::string one = "1qaz2wsx";
std::string two = "1qaz2wsx";
// Always matches
if ( one == two )
{
std::cout<< "std::string Match"<< std::endl;
std::cout<< one<< std::endl;
std::cout<< two<< std::endl;
}
else
{
std::cout<< "std::string No Match"<< std::endl;
std::cout<< one<< std::endl;
std::cout<< two<< std::endl;
}
// ------------------------------------------------------
char three[9] = "1qaz2wsx";
char four[9] = "1qaz2wsx";
// Matches sometimes, not others
if ( three == four )
{
std::cout<< "char Match"<< std::endl;
std::cout<< three<< std::endl;
std::cout<< four<< std::endl;
}
else
{
std::cout<< "char No Match"<< std::endl;
std::cout<< three<< std::endl;
std::cout<< four<< std::endl;
}
// ------------------------------------------------------
// Always matches
if ( one == three )
{
std::cout<< "std::string and char Match"<< std::endl;
std::cout<< three<< std::endl;
std::cout<< four<< std::endl;
}
else
{
std::cout<< "std::string and char No Match"<< std::endl;
std::cout<< three<< std::endl;
std::cout<< four<< std::endl;
}
return 0;
}
My goal is to compare char[] just as reliably as I can compare
std::string objects.

You can't. When you compare char[], you're comparing char*, in other
words, you're comparing addresses and not the data they point to.

Read the manual about C strings handling and you'll find the functions
needed to compare them. Converting them to std::string and comparing
them is just fine too.

Thank you all. I understand now and have found that converting to
std::string works well for me:

char one[9] = "1qaz2wsx";
char two[9] = "1qaz2wsx";

std::string my_one( one, 8 );

// Always matches
if ( my_one == two )
...

Thanks to all. This list is a helpful place.

You're welcome.

Once you're there, consider completely avoiding to use char arrays and
limiting the use of the sole const char* to the places you really need
to - if any.

One of the things allowed for legacy with C is to assign the address of
a string literal to a char* like this:

char* pc = "test";

That's the reason of stressing "const char*" in passage above.

You'll get a strong thrust in your understanding of C++ (and its
different approach wrt to C) by reading the C++ FAQ, you'll not regret
it, good luck.
 
J

James Kanze

On Jul 7, 11:51 pm, Stuart Golodetz

[...]
Ok, makes sense - the observable effect in this case being that you
always expect the comparison in the array case to return false, and the
compiler merging them would cause it to return true. I'm puzzled by the
OP's "Matches sometimes, not others" comment in that case (and I was
when I wrote my original reply, hence the "But if you're using arrays
then I wouldn't expect this to happen").

I am too. For the code he posted, the standard requires the
strings to compare different. Either some compiler's optimizer
is a bit too eager (or he was compiling in a non-standard
compliant mode), or the differences he saw were from a slightly
different variant of code than what he posted.
To clarify what I think is the case now then:
* In the string literal case, the compiler *may* (but doesn't have to)
optimize and put them in the same place, so the comparison may
potentially return either true or false (depending on whether the
optimization happens).
* In the array case, the comparison is guaranteed to return false,
unless your compiler is broken.
Is that correct?

Exactly.
 
M

Michael Doubez

I never used C much and have grown accustomed to C++ std::string. This
is a weakness because I don't understand why comparisons ( == ) are
unreliable when dealing only with char[] types. It seems std::string
is always comparable and that std::string and char[] are always
comparable, but char[] to char[] comparison works sometimes, but not
others depending on the content of the char[]. Here is a small snippet
I wrote to demonstrate:
[snip]

My goal is to compare char[] just as reliably as I can compare
std::string objects.

And what would be the comparison ? Should they have the same memory
representation (equal size && memcmp() == 0) or the same string
representation (strcmp()==0).

In the first case, it is useless for string comparison because:
char a[10]="012345";
char b[] ="012345";
would lead to a!=b (they don't have the same size)

In the second case, it would have no meaning for other types.

As a conclusion, for strings, you don't need the char[]/char*
distinction.

If you need some security check you can use a custom structure:
#include <cstdlib>
#include <iostream>
#include <string.h>


template<int N>
struct StrArrayType
{
const char* data;
enum{ size = N } ;

StrArrayType( char const * v):data(v){}
};

template<int N>
StrArrayType<N> Str( char const (&data)[N] )
{
return data;
}

template<int N>
bool operator==( StrArrayType<N> const & lhs , char const *rhs )
{
return ::strncmp(lhs.data,rhs,N) == 0;
}
template<int N, int M>
bool operator==( StrArrayType<N> const & lhs , StrArrayType<M> const &
rhs )
{
return ::strncmp(lhs.data,rhs.data,N>M?M:N) == 0;
}

int main()
{
char a[]="Test1";
char b[10]="Test1";

std::cout<<a<<(a==b?"==":"!=")<<b<<std::endl;
std::cout<<a<<(Str(a)==b?"==":"!=")<<b<<std::endl;
std::cout<<a<<(Str(a)==Str(b)?"==":"!=")<<b<<std::endl;
}

Output:
Test1!=Test1
Test1==Test1
Test1==Test1

But there is always the risk to forgot Str() somewhere; so you should
perhaps stick with strings.
 
G

Gennaro Prota

On 08/07/2010 0.51, Stuart Golodetz wrote:
[...]
To clarify what I think is the case now then:

* In the string literal case, the compiler *may* (but doesn't have to)
optimize and put them in the same place, so the comparison may
potentially return either true or false (depending on whether the
optimization happens).

* In the array case, the comparison is guaranteed to return false,
unless your compiler is broken.

Just to clarify: it's not a "string literal vs. array" issue but
"pointer to some char vs. copy into a new array" issue.

Both

a) char const * p = "something" ;
b) char const arr[] = "something" ;

involve string literals. A string literal is a purely
compile-time entity (a token) but also has an associated array
with static duration (for the literary eternity of this post
let's call it the "associated array"; some confusion arises
exactly when *this* array is called the "string literal").

So both have a string literal and both "have" an array; the
difference is that in (a) you point into that array directly
(because the language says so), while in (b) you "make a copy".

It is *unspecified* whether two string-literals result in
distinct arrays so two pointers formed "the (a) way" may, in
general, compare equal.

As an aside, I always use form (b) but it would be quite long to
explain why... After all, there's a reason why the specification
of __func__ is given in terms of (b), too :)

(I hope the post just clarified a possible source of confusion.
Summarizing language rules this way is often tricky or outright
impossible; and if one really wants to, there's a lot of
nitpicking that can be done, say on the difference between
"string literal" and "string-literal", between expression and
token or on the fact that string literals can be concatenated in
various ways, or even by going to the C standard and finding
phrases along the lines of "attempt to modify a string-literal"
or "an implementation may make string literals modifiable" (from
memory), which would "prove" some of my distinctions wrong. But
neither the C standard or the C++ one are written on such
grounds of precision and coherence as to withstand this sort of
exercises; basically one has to understand the intent or just
read Russel :))
 
G

Gennaro Prota

Gennaro said:
On 08/07/2010 0.51, Stuart Golodetz wrote:
[...]
To clarify what I think is the case now then:

* In the string literal case, the compiler *may* (but doesn't have to)
optimize and put them in the same place, so the comparison may
potentially return either true or false (depending on whether the
optimization happens).

* In the array case, the comparison is guaranteed to return false,
unless your compiler is broken.

Just to clarify: it's not a "string literal vs. array" issue but
"pointer to some char vs. copy into a new array" issue.

Both

a) char const * p = "something" ;
b) char const arr[] = "something" ;

involve string literals. A string literal is a purely
compile-time entity (a token) but also has an associated array
with static duration (for the literary eternity of this post
let's call it the "associated array"; some confusion arises
exactly when *this* array is called the "string literal").

So both have a string literal and both "have" an array; the
difference is that in (a) you point into that array directly
(because the language says so), while in (b) you "make a copy".

It is *unspecified* whether two string-literals result in
distinct arrays so two pointers formed "the (a) way" may, in
general, compare equal.

As an aside, I always use form (b) but it would be quite long to
explain why... After all, there's a reason why the specification
of __func__ is given in terms of (b), too :)

(I hope the post just clarified a possible source of confusion.
Summarizing language rules this way is often tricky or outright
impossible; and if one really wants to, there's a lot of
nitpicking that can be done, say on the difference between
"string literal" and "string-literal", between expression and
token or on the fact that string literals can be concatenated in
various ways, or even by going to the C standard and finding
phrases along the lines of "attempt to modify a string-literal"
or "an implementation may make string literals modifiable" (from
memory), which would "prove" some of my distinctions wrong. But
neither the C standard or the C++ one are written on such
grounds of precision and coherence as to withstand this sort of
exercises; basically one has to understand the intent or just
read Russel :))

Thanks Gennaro :) That definitely helps clarify matters.

Cheers,
Stu

p.s. I'm going to regret asking this, but what *is* the difference
between "string literal" and "string-literal"? :)

_Usually_ (but it's really just an observation on my part and
probably you could find exceptions or border-line cases) the C++
standard uses the hyphenated spelling, "string-literal", in
grammar productions and the other one elsewhere.

<NOTE>
I haven't respected this distinction, and my post above
interchanges the two forms more or less at random, including in
the "from-memory" hypothetical quotes from the C standard; sorry
for that; I should have done a better job at copy-editing.
</NOTE>

But I was just trying to prevent a useless "language-lawyering
subthread": someone might have taken my sentence "a string
literal is a purely compile-time entity (a token)" and objected
that actually a string literal is an expression, thus it also
"has a life" in runtime (because that's when it is "evaluated").

From there, we would have probably had posts highlighting that
"string-literals" are tokens while "string literals" are
expressions, and things like that. Then probably someone would
have found a sentence in the standard that didn't fit the scheme
and either use it to confuse the whole discussion (a calamity,
IMHO; I find it important that people reading the newsgroup
archives can really find answers, rather than getting lost into
endless threads) or to file an official defect report. I've seen
all this, a lot of times, although probably it's more frequent
on the moderated counterpart.
 
J

James Kanze

On 08/07/2010 20.02, Stuart Golodetz wrote:

[...]
_Usually_ (but it's really just an observation on my part and
probably you could find exceptions or border-line cases) the
C++ standard uses the hyphenated spelling, "string-literal",
in grammar productions and the other one elsewhere.

I think it's pretty general.

There really is a distinction in this case: given something
like:

char c[] = "some text";

, the string in the code is a string-literal, but there are no
string literals (which cannot be modified, and can be shared).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,534
Members
45,007
Latest member
obedient dusk

Latest Threads

Top