evaluating strings

T

temp34k45k

I need to evaluate two strings for their order.

The strings contain Letters (A thru Z upper case only) and Numbers
(0-9) and the decimal point (.).

I need an order like the list that follows: ( it's just an example
of the order )

0
1
2
3
A.0
A.1
A.2
B
C
Z
AA.1
AA.4
AA
AB
AZ

Currently I am testing two strings:

if(string1 < string 2)

It seemed to work ok until I encountered (AB < Z ) as TRUE, it
evaluated backwards having Z greater than AB. I found true that :

(AB > A)

but

(AB < B)
(AB < C)

are the single characters in the strings being evaluated as "A "
with a space after the letter? That's the only thing that's
logical. If so, how do I make it evaluate correctly.

Thanks in advance.

JC
 
K

Kai-Uwe Bux

temp34k45k said:
I need to evaluate two strings for their order.

The strings contain Letters (A thru Z upper case only) and Numbers
(0-9) and the decimal point (.).

I need an order like the list that follows: ( it's just an example
of the order )

0
1
2
3
A.0
A.1
A.2
B
C
Z
AA.1
AA.4
AA
AB
AZ

Currently I am testing two strings:

if(string1 < string 2)

It seemed to work ok until I encountered (AB < Z ) as TRUE, it
evaluated backwards having Z greater than AB. I found true that :

(AB > A)

but

(AB < B)
(AB < C)

are the single characters in the strings being evaluated as "A "
with a space after the letter? That's the only thing that's
logical. If so, how do I make it evaluate correctly.

a) post code that shows the problem. Without code, we can only guess.

b) Here is my guess: Did you use char* to represent the strings in your
program? Use std::string instead. If you use operator< on char* it will not
give you the lexicographic ordering but undefined results.


Best

Kai-Uwe Bux
 
M

mlimber

temp34k45k said:
I need to evaluate two strings for their order.

The strings contain Letters (A thru Z upper case only) and Numbers
(0-9) and the decimal point (.).

I need an order like the list that follows: ( it's just an example
of the order )

0
1
2
3
A.0
A.1
A.2
B
C
Z
AA.1
AA.4
AA
AB
AZ

Currently I am testing two strings:

if(string1 < string 2)

It seemed to work ok until I encountered (AB < Z ) as TRUE, it
evaluated backwards having Z greater than AB. I found true that :

(AB > A)

but

(AB < B)
(AB < C)

are the single characters in the strings being evaluated as "A "
with a space after the letter? That's the only thing that's
logical. If so, how do I make it evaluate correctly.

Thanks in advance.

JC

Strings are compared lexicographically (see
http://www.sgi.com/tech/stl/lexicographical_compare.html). If that is
not what you want, implement your own comparison function.

Cheers! --M
 
G

Gary Wessle

temp34k45k said:
I need to evaluate two strings for their order.

The strings contain Letters (A thru Z upper case only) and Numbers
(0-9) and the decimal point (.).

I need an order like the list that follows: ( it's just an example
of the order )

0
1
2
3
A.0
A.1
A.2
B
C
Z
AA.1
AA.4
AA
AB
AZ

Currently I am testing two strings:

if(string1 < string 2)

It seemed to work ok until I encountered (AB < Z ) as TRUE, it
evaluated backwards having Z greater than AB. I found true that :

(AB > A)

but

(AB < B)
(AB < C)

( ((int) "A" + (int) "B") < (int) "B" )
 
K

Kai-Uwe Bux

Gary said:
( ((int) "A" + (int) "B") < (int) "B" )

Do you realize that this code does the following:

a) convert a pointer to some preallocated string literal "A" to an int.
(this int know contains an implementation defined value that maybe
related to the address of the char A in the string literal.)
b) add to this an int obtained in the same way from the address of the B
in the second string literal.
c) compare this to an int obtained from the address of the B in some other
string literal (and in fact, the compiler is free to use a third
literal but also free to reuse the second).

The result is devoid of any meaning.


Best

Kai-Uwe Bux
 
H

Howard

temp34k45k said:
I need to evaluate two strings for their order.

The strings contain Letters (A thru Z upper case only) and Numbers
(0-9) and the decimal point (.).

I need an order like the list that follows: ( it's just an example
of the order )

0
1
2
3
A.0
A.1
A.2
B
C
Z
AA.1
AA.4
AA
AB
AZ

Currently I am testing two strings:

if(string1 < string 2)

It seemed to work ok until I encountered (AB < Z ) as TRUE, it
evaluated backwards having Z greater than AB. I found true that :

(AB > A)

Right. The first characters are identical ('A'), but the string "AB" has
more characters after that, and "A" does not, so "A" must come first. Like
"admit" and "admittance" in the dictionary, where "admit" comes first.
but

(AB < B)
(AB < C)

Right, because in the first case 'A' is less than 'B', and in the second
case because 'A' is less than 'C'. No need to look further.
are the single characters in the strings being evaluated as "A "
with a space after the letter? That's the only thing that's
logical.

No, that's not what happens. Looking at the strings "Z" and "AB", the first
character is compared first. The value for the char 'A' is less than for
'Z', so we know already that the string "AB" must come first. No need to
compare further. In the dictionary, the word "aardvark" comes before the
word "up", right? Just like that, "AB" comes before "Z".

The length of the string is only an issue when comparing two string which
are otherwise identical up to the point where one of them ends. The shorter
one comse first. But ONLY if they're identical up to the end of the shorter
one. That's why "AA" comes before "AAAA". They're identical up to the
second 'A', but then "AA" ends, while "AAAA" has two more characters to go,
so it comes later in the order (normally).
If so, how do I make it evaluate correctly.

It is doing it "correctly". If you want it to do things another way, then
you have to write a comparison function yourself, with its own set of rules.

But those rules look like pretty odd to me. You're putting "AA.1" before
"AA", whereas (I'm assuming) "AAB" would come AFTER "AA". This makes the
'.' character special, and makes your job of parsing more difficult, because
you need to handle different characters with different code, and not just
compare character-by-character until there's a difference or until one or
both strings ends, as is usually done.

-Howard
 
T

temp34k45k

Howard said:
Right. The first characters are identical ('A'), but the string "AB" has
more characters after that, and "A" does not, so "A" must come first. Like
"admit" and "admittance" in the dictionary, where "admit" comes first.


Right, because in the first case 'A' is less than 'B', and in the second
case because 'A' is less than 'C'. No need to look further.


No, that's not what happens. Looking at the strings "Z" and "AB", the first
character is compared first. The value for the char 'A' is less than for
'Z', so we know already that the string "AB" must come first. No need to
compare further. In the dictionary, the word "aardvark" comes before the
word "up", right? Just like that, "AB" comes before "Z".

The length of the string is only an issue when comparing two string which
are otherwise identical up to the point where one of them ends. The shorter
one comse first. But ONLY if they're identical up to the end of the shorter
one. That's why "AA" comes before "AAAA". They're identical up to the
second 'A', but then "AA" ends, while "AAAA" has two more characters to go,
so it comes later in the order (normally).


It is doing it "correctly". If you want it to do things another way, then
you have to write a comparison function yourself, with its own set of rules.

But those rules look like pretty odd to me. You're putting "AA.1" before
"AA", whereas (I'm assuming) "AAB" would come AFTER "AA". This makes the
'.' character special, and makes your job of parsing more difficult, because
you need to handle different characters with different code, and not just
compare character-by-character until there's a difference or until one or
both strings ends, as is usually done.

-Howard

I wrote my own rules and it works very simple. Shorter strings come
first because the only time a . is encountered is when both strings to
compare have a . in them. So if I encounter single char with mulitple
char string the single will be less than the multiple. I didn't know
that it was only using the first A of the AB when comparing to Z. I
thought is was using the ascii value of B appended to A (065066) to
compare to Z (090) not 65 < 90. Thanks for the input on string
comparisons.

Grant
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,431
Messages
2,571,677
Members
48,796
Latest member
Greg L.

Latest Threads

Top