utf8 string with reverse question

M

Mitch Tishmack

Hi everyone, long time lurker here. Even longer Ruby user.

I have a minor problem with a utf8 string.

In short I see this behavior:

"Stuhlu".sub(/u/,'=FC')
=3D> "St=FChlu"
"Stuhlu".reverse.sub(/u/,'=FC').reverse
=3D> "Stuhl\274\303"
"Stuhlu".reverse.sub(/u/,'=FC').split(//).reverse.join
=3D> "Stuhl=FC"

The general goal is to sub the final "u" in that word with an =20
umlauted version and not the first. I started irb with -Ku so that I =20
get utf8 support in all things ruby. But the behavior of reverse on =20
the substituted string is really baffling me.

Does anyone know the reason for the weirdness of reverse after the =20
sub? The last version was a hack to get things to just work. Am I =20
mising a Regexp option that would make the final match work? I don't =20
normally look for a final match to substitute on. And reverse seemed =20
the most logical choice for a solution.

Any help would be appreciated!

Thanks,
Mitch=
 
P

Pit Capitain

Mitch said:
In short I see this behavior:
=20
"Stuhlu".sub(/u/,'=FC')
=3D> "St=FChlu"
"Stuhlu".reverse.sub(/u/,'=FC').reverse
=3D> "Stuhl\274\303"
"Stuhlu".reverse.sub(/u/,'=FC').split(//).reverse.join
=3D> "Stuhl=FC"
=20
The general goal is to sub the final "u" in that word with an umlauted= =20
version and not the first.
...
Am I mising a=20
Regexp option that would make the final match work?

Can't help with the reverse behaviour, but if you want to substitute=20
single letters only the following regexp should work:

"Stuhlu".sub(/u(?=3D[^u]*$)/,'=FC')

Regards,
Pit
 
B

Brian Schröder

Hi everyone, long time lurker here. Even longer Ruby user.
=20
I have a minor problem with a utf8 string.
=20
In short I see this behavior:
=20
"Stuhlu".sub(/u/,'=FC')
=3D> "St=FChlu"
"Stuhlu".reverse.sub(/u/,'=FC').reverse
=3D> "Stuhl\274\303"

It seems like reverse is acting on the string as a byte-array. That
means you are reversing the two byte character =FC =3D '\303\274' into the
non-character '\274\303' when reversing the string '=FChlutS'

Are you trying to build a german -> pig-t=FCrkisch translator ;)

regards,

Brian
"Stuhlu".reverse.sub(/u/,'=FC').split(//).reverse.join
=3D> "Stuhl=FC"
=20
The general goal is to sub the final "u" in that word with an
umlauted version and not the first. I started irb with -Ku so that I
get utf8 support in all things ruby. But the behavior of reverse on
the substituted string is really baffling me.
=20
Does anyone know the reason for the weirdness of reverse after the
sub? The last version was a hack to get things to just work. Am I
mising a Regexp option that would make the final match work? I don't
normally look for a final match to substitute on. And reverse seemed
the most logical choice for a solution.
=20
Any help would be appreciated!
=20
Thanks,
Mitch
=20


--=20
http://ruby.brian-schroeder.de/

Stringed instrument chords: http://chordlist.brian-schroeder.de/
 
M

Mitch

It seems like reverse is acting on the string as a byte-array. That
means you are reversing the two byte character =FC =3D '\303\274' into th= e
non-character '\274\303' when reversing the string '=FChlutS'

That makes sense, but seems like incorrect behavior for this instance.
=20
Are you trying to build a german -> pig-t=FCrkisch translator ;)

Not quite :), I just picked a random German word and appended anther u
to it for testing. I suppose my example could have been Kuhlstuhl. I
am actually working on a German Noun/Verb helper, all it will do is
conjugate the verb/noun according to proper grammatical rules. ie der
Fisch -> die Fische etc... der Stuhl -> die St=FChle

I was just worrying about compound nouns where the final noun is what
is conjugated.

Yes I DO have too much time on my hands right now.

Cheers,
Mitch
 
M

Mitch

Aha, I knew there was something I was missing in Regexp. Thanks for
confirming. :)

I will get back to my program after work today.

Thanks,
Mitch

Mitch said:
In short I see this behavior:

"Stuhlu".sub(/u/,'=FC')
=3D> "St=FChlu"
"Stuhlu".reverse.sub(/u/,'=FC').reverse
=3D> "Stuhl\274\303"
"Stuhlu".reverse.sub(/u/,'=FC').split(//).reverse.join
=3D> "Stuhl=FC"

The general goal is to sub the final "u" in that word with an umlauted
version and not the first.
...
Am I mising a
Regexp option that would make the final match work?
=20
Can't help with the reverse behaviour, but if you want to substitute
single letters only the following regexp should work:
=20
"Stuhlu".sub(/u(?=3D[^u]*$)/,'=FC')
=20
Regards,
Pit
=20
 
L

Levin Alexander

Mitch Tishmack said:
I have a minor problem with a utf8 string.

In short I see this behavior:

"Stuhlu".sub(/u/,'=FC')
=3D> "St=FChlu"
"Stuhlu".reverse.sub(/u/,'=FC').reverse
=3D> "Stuhl\274\303"
"Stuhlu".reverse.sub(/u/,'=FC').split(//).reverse.join
=3D> "Stuhl=FC"

You could do this:

$KCODE =3D 'u'
class String
def reverse; self.scan(/./).reverse.join end
end

"Stuhl=FC".reverse #=3D> "=FClhutS"

found on =20
<http://redhanded.hobix.com/inspect/closingInOnUnicodeWithJcode.html>

-Levin
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,575
Members
45,051
Latest member
CarleyMcCr

Latest Threads

Top