How can i get the first letter of this string

D

duc nguyen

Hello, i'm a newbie. I have a question that how can i get the first
letter of this string: "=C3=A1 is the first letter".
my solution:
string =3D "=C3=A1 is the first letter"
str =3D string[0..1] # [0..1] because =C3=A1 =3D 2byte

But in many case i don't know the size of the first letter.
Any help would be much appreciated.

-- =

Posted via http://www.ruby-forum.com/.=
 
7

7stud --

You can use the /u flag for a regex to signal utf-8 matching...but you
didn't mention what encoding your strings are in. The regex would look
like this:

"some string" =~ /^(.)/u
puts $1

You also didn't mention what version of ruby you are using...ruby 1.8 or
1.9.
 
T

Thiago Massa

[Note: parts of this message were removed to make it a legal post.]

irb(main):023:0> bacon="chunky"
=> "chunky"
irb(main):024:0> bacon[0]
=> 99
irb(main):025:0> bacon[0].chr
=> "c"

99 is the ascii number that represents the letter 'c'.

I think that explains what you want.
 
M

Michael Edgar

That won't work on Ruby 1.9, as bacon[0] will return the first actual =
character. This technique
works on both:
bacon =3D "chunky" =3D> "chunky"
bacon[0,1]
=3D> "c"

Michael Edgar
(e-mail address removed)
http://carboni.ca/

irb(main):023:0> bacon=3D"chunky"
=3D> "chunky"
irb(main):024:0> bacon[0]
=3D> 99
irb(main):025:0> bacon[0].chr
=3D> "c"
=20
99 is the ascii number that represents the letter 'c'.
=20
I think that explains what you want.
=20
--=20
Thiago Fernandes Massa
11 83979414
 
D

duc nguyen

thanks for your help!
My strings are in UTF-8 encoding. Ruby version 1.8.7

I just follow your instructions and got the first letter of that string. =

But i still not get the remainder of string after get the first letter.

My problem now is convert the UTF-8 string to ansi string.
ex: purpose: change "=C3=A2=C3=AA=C6=A1=C3=AD=C5=A9" to "aeoiu".
I really confuse and don't have any way to do that.

Regards,

-- =

Posted via http://www.ruby-forum.com/.=
 
P

pp

DQpISSwgdGhlcmUgbWlnaHQgYmUgbm8gZWFzeSB3YXkgdG8gd29yayB0aGlzIGFyb3VuZC50YWtl
IGEgbG9vayBhdCBBU0NJSSB0YWJsZSBhbmQgdW5pY29kZSB0YWJsZSwgdHJ5IHRvIHdyaXRlIGEg
bWFwIGZ1bmN0aW9uIHlvdXJzZWxmDQoNCj4gRGF0ZTogVGh1LCAzIE1hciAyMDExIDE2OjMyOjA2
ICswOTAwDQo+IEZyb206IG1pbmhkdWN0NEBnbWFpbC5jb20NCj4gU3ViamVjdDogUmU6IEhvdyBj
YW4gaSBnZXQgdGhlIGZpcnN0IGxldHRlciBvZiB0aGlzIHN0cmluZw0KPiBUbzogcnVieS10YWxr
QHJ1YnktbGFuZy5vcmcNCj4gDQo+IHRoYW5rcyBmb3IgeW91ciAgaGVscCENCj4gTXkgc3RyaW5n
cyBhcmUgaW4gVVRGLTggZW5jb2RpbmcuIFJ1YnkgdmVyc2lvbiAxLjguNw0KPiANCj4gSSBqdXN0
IGZvbGxvdyB5b3VyIGluc3RydWN0aW9ucyBhbmQgZ290IHRoZSBmaXJzdCBsZXR0ZXIgb2YgdGhh
dCBzdHJpbmcuIA0KPiBCdXQgaSBzdGlsbCBub3QgZ2V0IHRoZSByZW1haW5kZXIgb2Ygc3RyaW5n
IGFmdGVyIGdldCB0aGUgZmlyc3QgbGV0dGVyLg0KPiANCj4gTXkgcHJvYmxlbSBub3cgaXMgY29u
dmVydCB0aGUgVVRGLTggc3RyaW5nIHRvIGFuc2kgc3RyaW5nLg0KPiBleDogcHVycG9zZTogY2hh
bmdlICLDosOqxqHDrcWpIiB0byAiYWVvaXUiLg0KPiBJIHJlYWxseSBjb25mdXNlIGFuZCBkb24n
dCBoYXZlIGFueSB3YXkgdG8gZG8gdGhhdC4NCj4gDQo+IFJlZ2FyZHMsDQo+IA0KPiAtLSANCj4g
UG9zdGVkIHZpYSBodHRwOi8vd3d3LnJ1YnktZm9ydW0uY29tLy4NCiAJCSAJICAgCQkgIA==
 
A

Adam Prescott

irb(main):023:0> bacon=3D"chunky"
=3D> "chunky"
irb(main):024:0> bacon[0]
=3D> 99
irb(main):025:0> bacon[0].chr
=3D> "c"

99 is the ascii number that represents the letter 'c'.

I think that explains what you want.

That won't work on Ruby 1.9, as bacon[0] will return the first actual
character. This technique
works on both:
bacon =3D "chunky" =3D> "chunky"
bacon[0,1]
=3D> "c"

This will not work on Ruby 1.8, because the character involved is a two-byt=
e
character, and before 1.9 there's nothing but bytes.

s =3D "=C3=A1"
s.inspect #=3D> =3D> "\303\241"
s[0] #=3D> "\303"
s[0, 1] #=3D> "\303"
s[0].chr #=3D> "\303"

Since you're on 1.8.7, you could use

$KCODE =3D "U" # UTF-8
s.chars.first #=3D> "=C3=A1"

Note that if you do not set $KCODE, you will just get "\303".
 
D

duc nguyen

thanks for all!
I tried all instructions above and they work fine when i run from =

command line. But when i insert them to my .rb file in my project and =

debug, An error has happen, and i don't know why

"=C3=A1 is the first letter" =3D~ /^(.)(.*)/u
puts "-->#{$1}<--"
puts "-->#{$2}<--"

--output in command prompt window:--
-->=C3=A1<--
--> is the first letter<--

But in the log file of my project
I 03/04/2011 09:07:39:169 b0353000 APP| -->=E2=88=9A<--
I 03/04/2011 09:07:39:169 b0353000 APP| -->=C2=B0 is the first letter<--=


Again, i change the code:

"=C3=A1 is the first letter" =3D~ /^(..)(.*)/u
puts "-->#{$1}<--"
puts "-->#{$2}<--"

And now it works.
I 03/04/2011 09:11:03:414 b03d5000 APP| -->=C3=A1<--
I 03/04/2011 09:11:03:414 b03d5000 APP| --> is the first letter<--

My project is a rhodes project. Rhodes version 2.2.6. Ruby 1.8.7.

-- =

Posted via http://www.ruby-forum.com/.=
 
7

7stud --

Wow...that's strange: when I run my example, no log file is created. I
wonder why that is???

In other words, you are running some extra code that writes to a file,
and you want us to troubleshoot that code without seeing it? O...kay.
Your statement that writes to the file takes one byte of the two bytes
making up the and writes the single byte to the file, which is then
interpreted as gibberish by whatever application you are using to view
the file.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top