String comparison. Why does Ruby consider this true?

Abder-rahman Ali · Jun 18, 2010

When I try for example to compare the following strings in Ruby, I get
"true".

puts 'Xeo' < 'ball'

When I make 'Xeo' start with a lowercase letter, i get 'false'

puts 'xeo' < 'ball'

The second statement is clear, but why when I capitalize 'Xeo' I get
true?

Thanks.

Abder-rahman Ali · Jun 18, 2010

Abder-rahman Ali said:
When I try for example to compare the following strings in Ruby, I get
"true".

puts 'Xeo' < 'ball'

When I make 'Xeo' start with a lowercase letter, i get 'false'

puts 'xeo' < 'ball'

The second statement is clear, but why when I capitalize 'Xeo' I get
true?

Thanks.

The "Learn to Program" book by Chris Pine mentions that computers order
capital letters as coming before lowercase letters. So, can it be
explained then by this?

Thanks.

Jonathan Nielsen · Jun 18, 2010

[Note: parts of this message were removed to make it a legal post.]

On Fri, Jun 18, 2010 at 11:46 AM, Abder-rahman Ali <

When I try for example to compare the following strings in Ruby, I get
"true".

puts 'Xeo' < 'ball'

When I make 'Xeo' start with a lowercase letter, i get 'false'

puts 'xeo' < 'ball'

The second statement is clear, but why when I capitalize 'Xeo' I get
true?

Thanks.

Because the '<' is doing a character-by-character compare on the strings.
As it turns out, 'X' < 'b' is true, while 'x' < 'b' is false. This is
because in the basic character set, the uppercase letters are lower-valued
than lowercase letters. See http://www.asciitable.com/

-Jonathan Nielsen

Kirk Haines · Jun 18, 2010

When I try for example to compare the following strings in Ruby, I get
"true".

puts 'Xeo' < 'ball'

When I make 'Xeo' start with a lowercase letter, i get 'false'

puts 'xeo' < 'ball'

The second statement is clear, but why when I capitalize 'Xeo' I get
true?

Uppercase letters come before lowercase letters.

You can look at the implementation in the source (start at
rb_str_cmp()), but if you dig deeply enough, it comes down to the way
the standard C library function memcmp() works. It compares bytes. And
an ASCII 'X' is represented by a smaller value (88) than an ASCII 'b'
(98). So 'Xeo' is less than 'ball'.

Kirk Haines
Developer
Engine Yard

Josh Cheek · Jun 18, 2010

[Note: parts of this message were removed to make it a legal post.]

On Fri, Jun 18, 2010 at 12:46 PM, Abder-rahman Ali <

When I try for example to compare the following strings in Ruby, I get
"true".

puts 'Xeo' < 'ball'

When I make 'Xeo' start with a lowercase letter, i get 'false'

puts 'xeo' < 'ball'

The second statement is clear, but why when I capitalize 'Xeo' I get
true?

Thanks.

Well, this used to be easy to show, but apparently since ascii has been
abandoned, and I don't know unicode, I have to resort to hacky things like
this to explain it.

$chars = (1..128).inject(Hash.new) { |chars,num| chars[num.chr] = num ;
chars }

def to_number_array(str)
str.split(//).map { |char| $chars[char] }
end

to_number_array 'Xeo' # => [88, 101, 111]
to_number_array 'xeo' # => [120, 101, 111]
to_number_array 'ball' # => [98, 97, 108, 108]
to_number_array 'ABC' # => [65, 66, 67]
to_number_array 'abc' # => [97, 98, 99]

In this case, $chars is a hash that will take a 1 character string, and
return its ascii value. So the method receives a String, and returns an
array where each index is the ascii value of the character.

Then to understand why one would be less than or greater than the other, go
through index by index, comparing the number in that index. If the two
strings (or in this case, their array representations that I made) have
different numbers, then whichever has the smaller number is considered less
than the other. If you run out of indexes on one of them, then that one
comes before the other. If you run out of indexes on them both
simultaneously, then they are equal.

Michael Fellinger · Jun 18, 2010

On Fri, Jun 18, 2010 at 12:46 PM, Abder-rahman Ali <

When I try for example to compare the following strings in Ruby, I get
"true".

puts 'Xeo' < 'ball'

When I make 'Xeo' start with a lowercase letter, i get 'false'

puts 'xeo' < 'ball'

The second statement is clear, but why when I capitalize 'Xeo' I get
true?

Thanks.

Click to expand...

Well, this used to be easy to show, but apparently since ascii has been
abandoned, and I don't know unicode, I have to resort to hacky things lik= e
this to explain it.

$chars =3D (1..128).inject(Hash.new) { |chars,num| chars[num.chr] =3D num= ;
chars }

def to_number_array(str)
=C2=A0str.split(//).map { |char| $chars[char] }
end

to_number_array 'Xeo' =C2=A0 # =3D> [88, 101, 111]
to_number_array 'xeo' =C2=A0 # =3D> [120, 101, 111]
to_number_array 'ball' =C2=A0# =3D> [98, 97, 108, 108]
to_number_array 'ABC' =C2=A0 # =3D> [65, 66, 67]
to_number_array 'abc' =C2=A0 # =3D> [97, 98, 99]

%w[Xeo xeo ball ABC abc].sort.each{|word| p word =3D> word.codepoints.to=

Click to expand...

_a }
{"ABC"=3D>[65, 66, 67]}
{"Xeo"=3D>[88, 101, 111]}
{"abc"=3D>[97, 98, 99]}
{"ball"=3D>[98, 97, 108, 108]}
{"xeo"=3D>[120, 101, 111]}
=3D> ["ABC", "Xeo", "abc", "ball", "xeo"]

--=20
Michael Fellinger
CTO, The Rubyists, LLC

Xeno Campanoli / Eskimo North and Gmail · Jun 18, 2010

That's an artifact of the old ASCII encoding. Uppercase letters came out first
so they have a lower integer value than uppercase.

Thanks.

Click to expand...

Well, this used to be easy to show, but apparently since ascii has been
abandoned, and I don't know unicode, I have to resort to hacky things like
this to explain it.

$chars = (1..128).inject(Hash.new) { |chars,num| chars[num.chr] = num ;
chars }

def to_number_array(str)
str.split(//).map { |char| $chars[char] }
end

to_number_array 'Xeo' # => [88, 101, 111]
to_number_array 'xeo' # => [120, 101, 111]
to_number_array 'ball' # => [98, 97, 108, 108]
to_number_array 'ABC' # => [65, 66, 67]
to_number_array 'abc' # => [97, 98, 99]

%w[Xeo xeo ball ABC abc].sort.each{|word| p word => word.codepoints.to_a }

Click to expand...

Click to expand...

{"ABC"=>[65, 66, 67]}
{"Xeo"=>[88, 101, 111]}
{"abc"=>[97, 98, 99]}
{"ball"=>[98, 97, 108, 108]}
{"xeo"=>[120, 101, 111]}
=> ["ABC", "Xeo", "abc", "ball", "xeo"]

Josh Cheek · Jun 18, 2010

[Note: parts of this message were removed to make it a legal post.]

On Fri, Jun 18, 2010 at 12:46 PM, Abder-rahman Ali <

When I try for example to compare the following strings in Ruby, I get
"true".

puts 'Xeo' < 'ball'

When I make 'Xeo' start with a lowercase letter, i get 'false'

puts 'xeo' < 'ball'

The second statement is clear, but why when I capitalize 'Xeo' I get
true?

Thanks.

Click to expand...

Well, this used to be easy to show, but apparently since ascii has been
abandoned, and I don't know unicode, I have to resort to hacky things like
this to explain it.

$chars = (1..128).inject(Hash.new) { |chars,num| chars[num.chr] = num ;
chars }

def to_number_array(str)
str.split(//).map { |char| $chars[char] }
end

to_number_array 'Xeo' # => [88, 101, 111]
to_number_array 'xeo' # => [120, 101, 111]
to_number_array 'ball' # => [98, 97, 108, 108]
to_number_array 'ABC' # => [65, 66, 67]
to_number_array 'abc' # => [97, 98, 99]

%w[Xeo xeo ball ABC abc].sort.each{|word| p word => word.codepoints.to_a

Click to expand...

Click to expand...

}
{"ABC"=>[65, 66, 67]}
{"Xeo"=>[88, 101, 111]}
{"abc"=>[97, 98, 99]}
{"ball"=>[98, 97, 108, 108]}
{"xeo"=>[120, 101, 111]}
=> ["ABC", "Xeo", "abc", "ball", "xeo"]

Thanks, but it doesn't seem to work on 1.8

RUBY_VERSION # => "1.8.7"

%w[Xeo xeo ball ABC abc].sort.each{|word| p word => word.codepoints.to_a } #
=>
# ~> -:3: undefined method `codepoints' for "ABC":String (NoMethodError)
# ~> from -:3:in `each'
# ~> from -:3

And the 1.8 ways to get it don't work on 1.9 (ie "a"[0])

Xeno Campanoli / Eskimo North and Gmail · Jun 18, 2010

I thought Unicode started with ASCII anyway, so I don't think that solves it...

Yes, here:

http://www.tamasoft.co.jp/en/general-info/unicode.html

When I try for example to compare the following strings in Ruby, I get
"true".

puts 'Xeo'< 'ball'

When I make 'Xeo' start with a lowercase letter, i get 'false'

puts 'xeo'< 'ball'

The second statement is clear, but why when I capitalize 'Xeo' I get
true?

Thanks.
--
Posted via http://www.ruby-forum.com/.

Well, this used to be easy to show, but apparently since ascii has been
abandoned, and I don't know unicode, I have to resort to hacky things like
this to explain it.

$chars = (1..128).inject(Hash.new) { |chars,num| chars[num.chr] = num ;
chars }

def to_number_array(str)
str.split(//).map { |char| $chars[char] }
end

to_number_array 'Xeo' # => [88, 101, 111]
to_number_array 'xeo' # => [120, 101, 111]
to_number_array 'ball' # => [98, 97, 108, 108]
to_number_array 'ABC' # => [65, 66, 67]
to_number_array 'abc' # => [97, 98, 99]

Click to expand...

%w[Xeo xeo ball ABC abc].sort.each{|word| p word => word.codepoints.to_a

Click to expand...

}
{"ABC"=>[65, 66, 67]}
{"Xeo"=>[88, 101, 111]}
{"abc"=>[97, 98, 99]}
{"ball"=>[98, 97, 108, 108]}
{"xeo"=>[120, 101, 111]}
=> ["ABC", "Xeo", "abc", "ball", "xeo"]

Click to expand...

Thanks, but it doesn't seem to work on 1.8

RUBY_VERSION # => "1.8.7"

%w[Xeo xeo ball ABC abc].sort.each{|word| p word => word.codepoints.to_a } #
=>
# ~> -:3: undefined method `codepoints' for "ABC":String (NoMethodError)
# ~> from -:3:in `each'
# ~> from -:3

And the 1.8 ways to get it don't work on 1.9 (ie "a"[0])

Michael Fellinger · Jun 19, 2010

Thanks, but it doesn't seem to work on 1.8

RUBY_VERSION # =3D> "1.8.7"

%w[Xeo xeo ball ABC abc].sort.each{|word| p word =3D> word.codepoints.to_= a } #
=3D>
# ~> -:3: undefined method `codepoints' for "ABC":String (NoMethodError)
# ~> =C2=A0 =C2=A0 from -:3:in `each'
# ~> =C2=A0 =C2=A0 from -:3

And the 1.8 ways to get it don't work on 1.9 (ie "a"[0])

%w[Xeo xeo ball ABC abc].sort.each{|word| p word =3D> word.unpack('C*') =

Click to expand...

}
{"ABC"=3D>[65, 66, 67]}
{"Xeo"=3D>[88, 101, 111]}
{"abc"=3D>[97, 98, 99]}
{"ball"=3D>[98, 97, 108, 108]}
{"xeo"=3D>[120, 101, 111]}
=3D> ["ABC", "Xeo", "abc", "ball", "xeo"]

There is always a way to make things work on both, it's just that I
don't care much about 1.8 anymore.

--=20
Michael Fellinger
CTO, The Rubyists, LLC

Josh Cheek · Jun 19, 2010

[Note: parts of this message were removed to make it a legal post.]

Thanks, but it doesn't seem to work on 1.8

RUBY_VERSION # => "1.8.7"

%w[Xeo xeo ball ABC abc].sort.each{|word| p word => word.codepoints.to_a } #
=>
# ~> -:3: undefined method `codepoints' for "ABC":String (NoMethodError)
# ~> from -:3:in `each'
# ~> from -:3

And the 1.8 ways to get it don't work on 1.9 (ie "a"[0])

%w[Xeo xeo ball ABC abc].sort.each{|word| p word => word.unpack('C*') }

Click to expand...

Click to expand...

{"ABC"=>[65, 66, 67]}
{"Xeo"=>[88, 101, 111]}
{"abc"=>[97, 98, 99]}
{"ball"=>[98, 97, 108, 108]}
{"xeo"=>[120, 101, 111]}
=> ["ABC", "Xeo", "abc", "ball", "xeo"]

There is always a way to make things work on both, it's just that I
don't care much about 1.8 anymore.

Well, a lot of systems still ship with it, SnowLeopard, for example ships
with 1.8.7, so I think that while this is a legitimate personal decision, it
is good to be aware of one's audience. For example, since Abder-rahman is
having difficulty understanding String comparison, then it is probably fair
to assume he isn't initiated enough to understand why the example that is
supposed to help him understand ends up breaking (if he is on 1.8). That
could be very discouraging for someone new, come to the ML to get a better
understanding, and the answers, given by the people who know what they are
doing won't even run.

Anyway, I really do like your solution ^_^ It is elegant and uniform, thank
you for providing it.

Brian Candler · Jun 21, 2010

Josh said:
Well, this used to be easy to show, but apparently since ascii has been
abandoned, and I don't know unicode, I have to resort to hacky things
like
this to explain it.

$chars = (1..128).inject(Hash.new) { |chars,num| chars[num.chr] = num ;
chars }

def to_number_array(str)
str.split(//).map { |char| $chars[char] }
end

to_number_array 'Xeo' # => [88, 101, 111]
to_number_array 'xeo' # => [120, 101, 111]
to_number_array 'ball' # => [98, 97, 108, 108]
to_number_array 'ABC' # => [65, 66, 67]
to_number_array 'abc' # => [97, 98, 99]

Except that this is irrelevant, because even ruby 1.9 does not compare
strings by codepoints. It compares them byte-by-byte using memcmp. See
rb_str_cmp_m() and rb_str_cmp() in string.c

It's a designed-in side-effect of UTF-8 encoding that higher codepoints
sort after lower ones. There is a table at
http://en.wikipedia.org/wiki/UTF-8 under "Description" which illustrates
this.

However this does not work for other encodings. Try this for size:
=> true
=> false

Yes: that's the same two unicode codepoints, but sorting in different
order. For encodings like UTF-16LE, where the least-significant byte
comes before the most-significant byte, you get an almost arbitrary
ordering.

Proviso: I tested this with
ruby 1.9.2dev (2009-07-18 trunk 24186) [i686-linux]

ruby 1.9.x string encoding rules are (a) undocumented, and (b) subject
to arbitrary changes between patchlevels, hence YMMV.

Brian Candler · Jun 21, 2010

Michael said:
%w[Xeo xeo ball ABC abc].sort.each{|word| p word => word.unpack('C*') }

Click to expand...

Click to expand...

{"ABC"=>[65, 66, 67]}
{"Xeo"=>[88, 101, 111]}
{"abc"=>[97, 98, 99]}
{"ball"=>[98, 97, 108, 108]}
{"xeo"=>[120, 101, 111]}
=> ["ABC", "Xeo", "abc", "ball", "xeo"]

There is always a way to make things work on both, it's just that I
don't care much about 1.8 anymore.

That does work the same on both, but it doesn't give codepoints.

$ irb --simple-prompt

"groÃŸ".unpack("C*") => [103, 114, 111, 195, 159]
RUBY_VERSION

Click to expand...

=> "1.8.6"

$ irb19 --simple-prompt

"groÃŸ".unpack('C*') => [103, 114, 111, 195, 159]
"groÃŸ".codepoints.to_a => [103, 114, 111, 223]
RUBY_DESCRIPTION

Click to expand...

=> "ruby 1.9.2dev (2009-07-18 trunk 24186) [i686-linux]"

Picture Comparison Code Not Working Properly	1	Jul 24, 2021
Image overlay and comparison code error.	2	Jul 1, 2021
string comparison	1	Dec 14, 2005
GET NEIL DEGRASSES TYSON, I ripped a hole with this one...	0	Nov 10, 2022
Tasks	1	Nov 29, 2022
Why does this template code compile?	2	Jan 4, 2014
int comparison always returns true?	6	Jun 8, 2008
false or true == true .... WTF?	4	Apr 5, 2007

String comparison. Why does Ruby consider this true?

Abder-rahman Ali

Abder-rahman Ali

Jonathan Nielsen

Kirk Haines

Josh Cheek

Michael Fellinger

Xeno Campanoli / Eskimo North and Gmail

Josh Cheek

Xeno Campanoli / Eskimo North and Gmail

Michael Fellinger

Josh Cheek

Brian Candler

Brian Candler

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads