String comparison. Why does Ruby consider this true?

A

Abder-rahman Ali

When I try for example to compare the following strings in Ruby, I get
"true".

puts 'Xeo' < 'ball'

When I make 'Xeo' start with a lowercase letter, i get 'false'

puts 'xeo' < 'ball'

The second statement is clear, but why when I capitalize 'Xeo' I get
true?

Thanks.
 
A

Abder-rahman Ali

Abder-rahman Ali said:
When I try for example to compare the following strings in Ruby, I get
"true".

puts 'Xeo' < 'ball'

When I make 'Xeo' start with a lowercase letter, i get 'false'

puts 'xeo' < 'ball'

The second statement is clear, but why when I capitalize 'Xeo' I get
true?

Thanks.

The "Learn to Program" book by Chris Pine mentions that computers order
capital letters as coming before lowercase letters. So, can it be
explained then by this?

Thanks.
 
J

Jonathan Nielsen

[Note: parts of this message were removed to make it a legal post.]

On Fri, Jun 18, 2010 at 11:46 AM, Abder-rahman Ali <
When I try for example to compare the following strings in Ruby, I get
"true".

puts 'Xeo' < 'ball'

When I make 'Xeo' start with a lowercase letter, i get 'false'

puts 'xeo' < 'ball'

The second statement is clear, but why when I capitalize 'Xeo' I get
true?

Thanks.
Because the '<' is doing a character-by-character compare on the strings.
As it turns out, 'X' < 'b' is true, while 'x' < 'b' is false. This is
because in the basic character set, the uppercase letters are lower-valued
than lowercase letters. See http://www.asciitable.com/

-Jonathan Nielsen
 
K

Kirk Haines

When I try for example to compare the following strings in Ruby, I get
"true".

puts 'Xeo' < 'ball'

When I make 'Xeo' start with a lowercase letter, i get 'false'

puts 'xeo' < 'ball'

The second statement is clear, but why when I capitalize 'Xeo' I get
true?

Uppercase letters come before lowercase letters.

You can look at the implementation in the source (start at
rb_str_cmp()), but if you dig deeply enough, it comes down to the way
the standard C library function memcmp() works. It compares bytes. And
an ASCII 'X' is represented by a smaller value (88) than an ASCII 'b'
(98). So 'Xeo' is less than 'ball'.


Kirk Haines
Developer
Engine Yard
 
J

Josh Cheek

[Note: parts of this message were removed to make it a legal post.]

On Fri, Jun 18, 2010 at 12:46 PM, Abder-rahman Ali <
When I try for example to compare the following strings in Ruby, I get
"true".

puts 'Xeo' < 'ball'

When I make 'Xeo' start with a lowercase letter, i get 'false'

puts 'xeo' < 'ball'

The second statement is clear, but why when I capitalize 'Xeo' I get
true?

Thanks.
Well, this used to be easy to show, but apparently since ascii has been
abandoned, and I don't know unicode, I have to resort to hacky things like
this to explain it.


$chars = (1..128).inject(Hash.new) { |chars,num| chars[num.chr] = num ;
chars }

def to_number_array(str)
str.split(//).map { |char| $chars[char] }
end

to_number_array 'Xeo' # => [88, 101, 111]
to_number_array 'xeo' # => [120, 101, 111]
to_number_array 'ball' # => [98, 97, 108, 108]
to_number_array 'ABC' # => [65, 66, 67]
to_number_array 'abc' # => [97, 98, 99]



In this case, $chars is a hash that will take a 1 character string, and
return its ascii value. So the method receives a String, and returns an
array where each index is the ascii value of the character.

Then to understand why one would be less than or greater than the other, go
through index by index, comparing the number in that index. If the two
strings (or in this case, their array representations that I made) have
different numbers, then whichever has the smaller number is considered less
than the other. If you run out of indexes on one of them, then that one
comes before the other. If you run out of indexes on them both
simultaneously, then they are equal.
 
M

Michael Fellinger

On Fri, Jun 18, 2010 at 12:46 PM, Abder-rahman Ali <
When I try for example to compare the following strings in Ruby, I get
"true".

puts 'Xeo' < 'ball'

When I make 'Xeo' start with a lowercase letter, i get 'false'

puts 'xeo' < 'ball'

The second statement is clear, but why when I capitalize 'Xeo' I get
true?

Thanks.
Well, this used to be easy to show, but apparently since ascii has been
abandoned, and I don't know unicode, I have to resort to hacky things lik= e
this to explain it.


$chars =3D (1..128).inject(Hash.new) { |chars,num| chars[num.chr] =3D num= ;
chars }

def to_number_array(str)
=C2=A0str.split(//).map { |char| $chars[char] }
end

to_number_array 'Xeo' =C2=A0 # =3D> [88, 101, 111]
to_number_array 'xeo' =C2=A0 # =3D> [120, 101, 111]
to_number_array 'ball' =C2=A0# =3D> [98, 97, 108, 108]
to_number_array 'ABC' =C2=A0 # =3D> [65, 66, 67]
to_number_array 'abc' =C2=A0 # =3D> [97, 98, 99]
%w[Xeo xeo ball ABC abc].sort.each{|word| p word =3D> word.codepoints.to=
_a }
{"ABC"=3D>[65, 66, 67]}
{"Xeo"=3D>[88, 101, 111]}
{"abc"=3D>[97, 98, 99]}
{"ball"=3D>[98, 97, 108, 108]}
{"xeo"=3D>[120, 101, 111]}
=3D> ["ABC", "Xeo", "abc", "ball", "xeo"]


--=20
Michael Fellinger
CTO, The Rubyists, LLC
 
X

Xeno Campanoli / Eskimo North and Gmail

That's an artifact of the old ASCII encoding. Uppercase letters came out first
so they have a lower integer value than uppercase.
Well, this used to be easy to show, but apparently since ascii has been
abandoned, and I don't know unicode, I have to resort to hacky things like
this to explain it.


$chars = (1..128).inject(Hash.new) { |chars,num| chars[num.chr] = num ;
chars }

def to_number_array(str)
str.split(//).map { |char| $chars[char] }
end

to_number_array 'Xeo' # => [88, 101, 111]
to_number_array 'xeo' # => [120, 101, 111]
to_number_array 'ball' # => [98, 97, 108, 108]
to_number_array 'ABC' # => [65, 66, 67]
to_number_array 'abc' # => [97, 98, 99]
%w[Xeo xeo ball ABC abc].sort.each{|word| p word => word.codepoints.to_a }
{"ABC"=>[65, 66, 67]}
{"Xeo"=>[88, 101, 111]}
{"abc"=>[97, 98, 99]}
{"ball"=>[98, 97, 108, 108]}
{"xeo"=>[120, 101, 111]}
=> ["ABC", "Xeo", "abc", "ball", "xeo"]
 
J

Josh Cheek

[Note: parts of this message were removed to make it a legal post.]

On Fri, Jun 18, 2010 at 12:46 PM, Abder-rahman Ali <
When I try for example to compare the following strings in Ruby, I get
"true".

puts 'Xeo' < 'ball'

When I make 'Xeo' start with a lowercase letter, i get 'false'

puts 'xeo' < 'ball'

The second statement is clear, but why when I capitalize 'Xeo' I get
true?

Thanks.
Well, this used to be easy to show, but apparently since ascii has been
abandoned, and I don't know unicode, I have to resort to hacky things like
this to explain it.


$chars = (1..128).inject(Hash.new) { |chars,num| chars[num.chr] = num ;
chars }

def to_number_array(str)
str.split(//).map { |char| $chars[char] }
end

to_number_array 'Xeo' # => [88, 101, 111]
to_number_array 'xeo' # => [120, 101, 111]
to_number_array 'ball' # => [98, 97, 108, 108]
to_number_array 'ABC' # => [65, 66, 67]
to_number_array 'abc' # => [97, 98, 99]
%w[Xeo xeo ball ABC abc].sort.each{|word| p word => word.codepoints.to_a
}
{"ABC"=>[65, 66, 67]}
{"Xeo"=>[88, 101, 111]}
{"abc"=>[97, 98, 99]}
{"ball"=>[98, 97, 108, 108]}
{"xeo"=>[120, 101, 111]}
=> ["ABC", "Xeo", "abc", "ball", "xeo"]

Thanks, but it doesn't seem to work on 1.8


RUBY_VERSION # => "1.8.7"

%w[Xeo xeo ball ABC abc].sort.each{|word| p word => word.codepoints.to_a } #
=>
# ~> -:3: undefined method `codepoints' for "ABC":String (NoMethodError)
# ~> from -:3:in `each'
# ~> from -:3




And the 1.8 ways to get it don't work on 1.9 (ie "a"[0])
 
X

Xeno Campanoli / Eskimo North and Gmail

I thought Unicode started with ASCII anyway, so I don't think that solves it...

Yes, here:

http://www.tamasoft.co.jp/en/general-info/unicode.html
When I try for example to compare the following strings in Ruby, I get
"true".

puts 'Xeo'< 'ball'

When I make 'Xeo' start with a lowercase letter, i get 'false'

puts 'xeo'< 'ball'

The second statement is clear, but why when I capitalize 'Xeo' I get
true?

Thanks.
--
Posted via http://www.ruby-forum.com/.


Well, this used to be easy to show, but apparently since ascii has been
abandoned, and I don't know unicode, I have to resort to hacky things like
this to explain it.


$chars = (1..128).inject(Hash.new) { |chars,num| chars[num.chr] = num ;
chars }

def to_number_array(str)
str.split(//).map { |char| $chars[char] }
end

to_number_array 'Xeo' # => [88, 101, 111]
to_number_array 'xeo' # => [120, 101, 111]
to_number_array 'ball' # => [98, 97, 108, 108]
to_number_array 'ABC' # => [65, 66, 67]
to_number_array 'abc' # => [97, 98, 99]
%w[Xeo xeo ball ABC abc].sort.each{|word| p word => word.codepoints.to_a
}
{"ABC"=>[65, 66, 67]}
{"Xeo"=>[88, 101, 111]}
{"abc"=>[97, 98, 99]}
{"ball"=>[98, 97, 108, 108]}
{"xeo"=>[120, 101, 111]}
=> ["ABC", "Xeo", "abc", "ball", "xeo"]

Thanks, but it doesn't seem to work on 1.8


RUBY_VERSION # => "1.8.7"

%w[Xeo xeo ball ABC abc].sort.each{|word| p word => word.codepoints.to_a } #
=>
# ~> -:3: undefined method `codepoints' for "ABC":String (NoMethodError)
# ~> from -:3:in `each'
# ~> from -:3




And the 1.8 ways to get it don't work on 1.9 (ie "a"[0])
 
M

Michael Fellinger

Thanks, but it doesn't seem to work on 1.8


RUBY_VERSION # =3D> "1.8.7"

%w[Xeo xeo ball ABC abc].sort.each{|word| p word =3D> word.codepoints.to_= a } #
=3D>
# ~> -:3: undefined method `codepoints' for "ABC":String (NoMethodError)
# ~> =C2=A0 =C2=A0 from -:3:in `each'
# ~> =C2=A0 =C2=A0 from -:3




And the 1.8 ways to get it don't work on 1.9 (ie "a"[0])
%w[Xeo xeo ball ABC abc].sort.each{|word| p word =3D> word.unpack('C*') =
}
{"ABC"=3D>[65, 66, 67]}
{"Xeo"=3D>[88, 101, 111]}
{"abc"=3D>[97, 98, 99]}
{"ball"=3D>[98, 97, 108, 108]}
{"xeo"=3D>[120, 101, 111]}
=3D> ["ABC", "Xeo", "abc", "ball", "xeo"]

There is always a way to make things work on both, it's just that I
don't care much about 1.8 anymore.

--=20
Michael Fellinger
CTO, The Rubyists, LLC
 
J

Josh Cheek

[Note: parts of this message were removed to make it a legal post.]

Thanks, but it doesn't seem to work on 1.8


RUBY_VERSION # => "1.8.7"

%w[Xeo xeo ball ABC abc].sort.each{|word| p word => word.codepoints.to_a } #
=>
# ~> -:3: undefined method `codepoints' for "ABC":String (NoMethodError)
# ~> from -:3:in `each'
# ~> from -:3




And the 1.8 ways to get it don't work on 1.9 (ie "a"[0])
%w[Xeo xeo ball ABC abc].sort.each{|word| p word => word.unpack('C*') }
{"ABC"=>[65, 66, 67]}
{"Xeo"=>[88, 101, 111]}
{"abc"=>[97, 98, 99]}
{"ball"=>[98, 97, 108, 108]}
{"xeo"=>[120, 101, 111]}
=> ["ABC", "Xeo", "abc", "ball", "xeo"]

There is always a way to make things work on both, it's just that I
don't care much about 1.8 anymore.
Well, a lot of systems still ship with it, SnowLeopard, for example ships
with 1.8.7, so I think that while this is a legitimate personal decision, it
is good to be aware of one's audience. For example, since Abder-rahman is
having difficulty understanding String comparison, then it is probably fair
to assume he isn't initiated enough to understand why the example that is
supposed to help him understand ends up breaking (if he is on 1.8). That
could be very discouraging for someone new, come to the ML to get a better
understanding, and the answers, given by the people who know what they are
doing won't even run.

Anyway, I really do like your solution ^_^ It is elegant and uniform, thank
you for providing it.
 
B

Brian Candler

Josh said:
Well, this used to be easy to show, but apparently since ascii has been
abandoned, and I don't know unicode, I have to resort to hacky things
like
this to explain it.


$chars = (1..128).inject(Hash.new) { |chars,num| chars[num.chr] = num ;
chars }

def to_number_array(str)
str.split(//).map { |char| $chars[char] }
end

to_number_array 'Xeo' # => [88, 101, 111]
to_number_array 'xeo' # => [120, 101, 111]
to_number_array 'ball' # => [98, 97, 108, 108]
to_number_array 'ABC' # => [65, 66, 67]
to_number_array 'abc' # => [97, 98, 99]

Except that this is irrelevant, because even ruby 1.9 does not compare
strings by codepoints. It compares them byte-by-byte using memcmp. See
rb_str_cmp_m() and rb_str_cmp() in string.c

It's a designed-in side-effect of UTF-8 encoding that higher codepoints
sort after lower ones. There is a table at
http://en.wikipedia.org/wiki/UTF-8 under "Description" which illustrates
this.

However this does not work for other encodings. Try this for size:
=> true
=> false

Yes: that's the same two unicode codepoints, but sorting in different
order. For encodings like UTF-16LE, where the least-significant byte
comes before the most-significant byte, you get an almost arbitrary
ordering.

Proviso: I tested this with
ruby 1.9.2dev (2009-07-18 trunk 24186) [i686-linux]

ruby 1.9.x string encoding rules are (a) undocumented, and (b) subject
to arbitrary changes between patchlevels, hence YMMV.
 
B

Brian Candler

Michael said:
%w[Xeo xeo ball ABC abc].sort.each{|word| p word => word.unpack('C*') }
{"ABC"=>[65, 66, 67]}
{"Xeo"=>[88, 101, 111]}
{"abc"=>[97, 98, 99]}
{"ball"=>[98, 97, 108, 108]}
{"xeo"=>[120, 101, 111]}
=> ["ABC", "Xeo", "abc", "ball", "xeo"]

There is always a way to make things work on both, it's just that I
don't care much about 1.8 anymore.

That does work the same on both, but it doesn't give codepoints.

$ irb --simple-prompt
"groß".unpack("C*") => [103, 114, 111, 195, 159]
RUBY_VERSION
=> "1.8.6"

$ irb19 --simple-prompt
"groß".unpack('C*') => [103, 114, 111, 195, 159]
"groß".codepoints.to_a => [103, 114, 111, 223]
RUBY_DESCRIPTION
=> "ruby 1.9.2dev (2009-07-18 trunk 24186) [i686-linux]"
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,575
Members
45,053
Latest member
billing-software

Latest Threads

Top