utf8 string with reverse question

Discussion in 'Ruby' started by Mitch Tishmack, Aug 15, 2005.

  1. Hi everyone, long time lurker here. Even longer Ruby user.

    I have a minor problem with a utf8 string.

    In short I see this behavior:

    "Stuhlu".sub(/u/,'=FC')
    =3D> "St=FChlu"
    "Stuhlu".reverse.sub(/u/,'=FC').reverse
    =3D> "Stuhl\274\303"
    "Stuhlu".reverse.sub(/u/,'=FC').split(//).reverse.join
    =3D> "Stuhl=FC"

    The general goal is to sub the final "u" in that word with an =20
    umlauted version and not the first. I started irb with -Ku so that I =20
    get utf8 support in all things ruby. But the behavior of reverse on =20
    the substituted string is really baffling me.

    Does anyone know the reason for the weirdness of reverse after the =20
    sub? The last version was a hack to get things to just work. Am I =20
    mising a Regexp option that would make the final match work? I don't =20
    normally look for a final match to substitute on. And reverse seemed =20
    the most logical choice for a solution.

    Any help would be appreciated!

    Thanks,
    Mitch=
     
    Mitch Tishmack, Aug 15, 2005
    #1
    1. Advertising

  2. Mitch Tishmack

    Pit Capitain Guest

    Mitch Tishmack schrieb:
    > In short I see this behavior:
    >=20
    > "Stuhlu".sub(/u/,'=FC')
    > =3D> "St=FChlu"
    > "Stuhlu".reverse.sub(/u/,'=FC').reverse
    > =3D> "Stuhl\274\303"
    > "Stuhlu".reverse.sub(/u/,'=FC').split(//).reverse.join
    > =3D> "Stuhl=FC"
    >=20
    > The general goal is to sub the final "u" in that word with an umlauted=

    =20
    > version and not the first.
    > ...
    > Am I mising a=20
    > Regexp option that would make the final match work?


    Can't help with the reverse behaviour, but if you want to substitute=20
    single letters only the following regexp should work:

    "Stuhlu".sub(/u(?=3D[^u]*$)/,'=FC')

    Regards,
    Pit
     
    Pit Capitain, Aug 15, 2005
    #2
    1. Advertising

  3. On 15/08/05, Mitch Tishmack <> wrote:
    > Hi everyone, long time lurker here. Even longer Ruby user.
    >=20
    > I have a minor problem with a utf8 string.
    >=20
    > In short I see this behavior:
    >=20
    > "Stuhlu".sub(/u/,'=FC')
    > =3D> "St=FChlu"
    > "Stuhlu".reverse.sub(/u/,'=FC').reverse
    > =3D> "Stuhl\274\303"


    It seems like reverse is acting on the string as a byte-array. That
    means you are reversing the two byte character =FC =3D '\303\274' into the
    non-character '\274\303' when reversing the string '=FChlutS'

    Are you trying to build a german -> pig-t=FCrkisch translator ;)

    regards,

    Brian

    > "Stuhlu".reverse.sub(/u/,'=FC').split(//).reverse.join
    > =3D> "Stuhl=FC"
    >=20
    > The general goal is to sub the final "u" in that word with an
    > umlauted version and not the first. I started irb with -Ku so that I
    > get utf8 support in all things ruby. But the behavior of reverse on
    > the substituted string is really baffling me.
    >=20
    > Does anyone know the reason for the weirdness of reverse after the
    > sub? The last version was a hack to get things to just work. Am I
    > mising a Regexp option that would make the final match work? I don't
    > normally look for a final match to substitute on. And reverse seemed
    > the most logical choice for a solution.
    >=20
    > Any help would be appreciated!
    >=20
    > Thanks,
    > Mitch
    >=20



    --=20
    http://ruby.brian-schroeder.de/

    Stringed instrument chords: http://chordlist.brian-schroeder.de/
     
    Brian Schröder, Aug 15, 2005
    #3
  4. Mitch Tishmack

    Mitch Guest

    On 8/15/05, Brian Schr=F6der <> wrote:
    > It seems like reverse is acting on the string as a byte-array. That
    > means you are reversing the two byte character =FC =3D '\303\274' into th=

    e
    > non-character '\274\303' when reversing the string '=FChlutS'


    That makes sense, but seems like incorrect behavior for this instance.

    >=20
    > Are you trying to build a german -> pig-t=FCrkisch translator ;)


    Not quite :), I just picked a random German word and appended anther u
    to it for testing. I suppose my example could have been Kuhlstuhl. I
    am actually working on a German Noun/Verb helper, all it will do is
    conjugate the verb/noun according to proper grammatical rules. ie der
    Fisch -> die Fische etc... der Stuhl -> die St=FChle

    I was just worrying about compound nouns where the final noun is what
    is conjugated.

    Yes I DO have too much time on my hands right now.

    Cheers,
    Mitch
     
    Mitch, Aug 15, 2005
    #4
  5. Mitch Tishmack

    Mitch Guest

    Aha, I knew there was something I was missing in Regexp. Thanks for
    confirming. :)

    I will get back to my program after work today.

    Thanks,
    Mitch

    On 8/15/05, Pit Capitain <> wrote:
    > Mitch Tishmack schrieb:
    > > In short I see this behavior:
    > >
    > > "Stuhlu".sub(/u/,'=FC')
    > > =3D> "St=FChlu"
    > > "Stuhlu".reverse.sub(/u/,'=FC').reverse
    > > =3D> "Stuhl\274\303"
    > > "Stuhlu".reverse.sub(/u/,'=FC').split(//).reverse.join
    > > =3D> "Stuhl=FC"
    > >
    > > The general goal is to sub the final "u" in that word with an umlauted
    > > version and not the first.
    > > ...
    > > Am I mising a
    > > Regexp option that would make the final match work?

    >=20
    > Can't help with the reverse behaviour, but if you want to substitute
    > single letters only the following regexp should work:
    >=20
    > "Stuhlu".sub(/u(?=3D[^u]*$)/,'=FC')
    >=20
    > Regards,
    > Pit
    >=20
    >
     
    Mitch, Aug 15, 2005
    #5
  6. Mitch Tishmack <> wrote:

    > I have a minor problem with a utf8 string.
    >
    > In short I see this behavior:
    >
    > "Stuhlu".sub(/u/,'=FC')
    > =3D> "St=FChlu"
    > "Stuhlu".reverse.sub(/u/,'=FC').reverse
    > =3D> "Stuhl\274\303"
    > "Stuhlu".reverse.sub(/u/,'=FC').split(//).reverse.join
    > =3D> "Stuhl=FC"


    You could do this:

    $KCODE =3D 'u'
    class String
    def reverse; self.scan(/./).reverse.join end
    end

    "Stuhl=FC".reverse #=3D> "=FClhutS"

    found on =20
    <http://redhanded.hobix.com/inspect/closingInOnUnicodeWithJcode.html>

    -Levin
     
    Levin Alexander, Aug 15, 2005
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Michael Berg
    Replies:
    0
    Views:
    501
    Michael Berg
    May 30, 2004
  2. dogbite
    Replies:
    4
    Views:
    732
    osmium
    Oct 10, 2003
  3. Rakesh

    String Reverse Question

    Rakesh, Apr 16, 2004, in forum: C Programming
    Replies:
    45
    Views:
    1,371
    Christopher Benson-Manica
    Apr 20, 2004
  4. ssecorp
    Replies:
    47
    Views:
    1,086
    Default User
    Aug 8, 2008
  5. gry
    Replies:
    2
    Views:
    825
    Alf P. Steinbach
    Mar 13, 2012
Loading...

Share This Page