what is String#ord?

X

Xavier Noria

Ruby 1.9 docs for String#ord say:

Return the <code>Integer</code> ordinal of a one-character string.

What does that mean? Check for example

"=C3=97".ord # =3D> 215
"=C3=97".bytes.to_a # =3D> [195, 151]

-- fxn
 
X

Xavier Noria

Ruby 1.9 docs for String#ord say:

=C2=A0 =C2=A0Return the <code>Integer</code> ordinal of a one-character s= tring.

What does that mean? Check for example

=C2=A0 =C2=A0"=C3=97".ord # =3D> 215
=C2=A0 =C2=A0"=C3=97".bytes.to_a # =3D> [195, 151]

Trial and error suggests it is the code of the character in the
encoding of the string:

euro =3D "\u20AC"

euro.ord.to_s(16) # =3D> "20ac"
euro.encode("iso-8859-15").ord.to_s(16) # =3D> "a4"

That is what the source code suggests also:

VALUE
rb_str_ord(VALUE s)
{
unsigned int c;

c =3D rb_enc_codepoint(RSTRING_PTR(s), RSTRING_END(s), STR_ENC_GET(s));
return UINT2NUM(c);
}
 
B

Benoit Daloze

Ruby 1.9 docs for String#ord say:

Return the <code>Integer</code> ordinal of a one-character string.

What does that mean? Check for example

"=D7".ord # =3D> 215
"=D7".bytes.to_a # =3D> [195, 151]

Trial and error suggests it is the code of the character in the
encoding of the string:

euro =3D "\u20AC"

euro.ord.to_s(16) # =3D> "20ac"
euro.encode("iso-8859-15").ord.to_s(16) # =3D> "a4"

That is what the source code suggests also:

VALUE
rb_str_ord(VALUE s)
{
unsigned int c;

c =3D rb_enc_codepoint(RSTRING_PTR(s), RSTRING_END(s), STR_ENC_GET(s))= ;
return UINT2NUM(c);
}
p "=D7".ord # =3D> 215
p "=D7".bytes.to_a # =3D> [195, 151]
p "=D7".encoding # =3D> #<Encoding:UTF-8>
p "=D7".codepoints.to_a #=3D> [215]

In UTF-8, (and Unicode in general), one byte is not always(or even never) a
character.
A codepoint represent a character ;)

So, you can think of ord as codepoints[0], and that number of course depend=
s
of the String's Encoding.

Regards,
B.D.
 
X

Xavier Noria

Yes of course, a posteriori that's the only thing that makes sense. I
was in a different context and the doc was not clear enough for me.

Perhaps I send a patch to define #ord in terms of the code/codepoint
in the string's character encoding, instead of that bare "ordinal".
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,905
Latest member
Kristy_Poole

Latest Threads

Top