How can I get a character, given its Unicode index?

Discussion in 'Perl Misc' started by Raymundo, Aug 30, 2009.

  1. Raymundo

    Raymundo Guest

    Hello,

    At first, I'm sorry that I'm not good at English.


    To represent a Unicode character in a string or in a regexp, I can use
    "\x{hex}" notation.

    my $char = "\x{AC00}";
    # $char = "ê°€" -- a Korean character, pronounced "GA"
    (I'm not sure you can see this Korean character in your browser.
    Please tell me if you can't)


    However, it seems that this representation works only when it is hard-
    coded. That means, I can't use a variable for the hex value:

    my $index = "AC00";
    my $char = "\x{$index}"; # This doesn't work.
    print length($char),"\n";
    print "[$char]\n";

    > ./test.pl

    1 -- $char has one character but...
    [] -- that character is not "ê°€"(GA). It isn't even a printable
    character.
    (In fact, $char seems to be null char "\0". I found it by redirecting
    the output into a file and viewing the file with hex editor)



    Anyway, I tried several codes including double quote, single quote,
    s/// op, etc.

    Finally I found the code that works:

    (code)
    #!/usr/bin/perl

    my $index = "AC00";
    my $char = eval( "\"\\x{$index}\"" );
    print length($char),"\n";
    print "[$char]\n";

    (output)
    > ./test.pl

    1
    Wide character in print at ./test.pl line 6.
    [ê°€]

    I had to make a string that consists of
    double quote " (it must be quoted with backslash)
    backslash \ (quoted)
    x
    brace {
    Unicode index
    brace }
    double quote " (quoted)
    Then I have to eval that string... This is, I think, so complicated.


    I think there may be a better way to do this. I found that
    Unicode::Char module provides u() subroutine:

    my $u = Unicode::Char->new();
    my $char = $u->u('AC00'); # u() returns a character of Unicode
    index AC00
    ( http://search.cpan.org/~dankogai/Unicode-Char-0.02/lib/Unicode/Char.pm
    )


    But I still wonder if there is a Perl internel function or standard
    module that do same thing. I want to know what is the most popular
    way.


    Thanks.
    G.Y.Park from South Korea.
     
    Raymundo, Aug 30, 2009
    #1
    1. Advertising

  2. Raymundo

    Klaus Guest

    On 30 août, 18:47, Raymundo <> wrote:
    > To represent a Unicode character in a string or in a regexp, I can use
    > "\x{hex}" notation.
    >
    > my $char = "\x{AC00}";
    > # $char = "ê°€" -- a Korean character, pronounced "GA"


    [...]

    > However, it seems that this representation works only when it is hard-
    > coded. That means, I can't use a variable for the hex value:
    >
    > my $index = "AC00";
    > my $char = "\x{$index}";   # This doesn't work.


    [...]

    > Finally I found the code that works:


    [...]

    > my $index = "AC00";
    > my $char = eval( "\"\\x{$index}\"" );


    [...]

    > Then I have to eval that string... This is, I think, so complicated.
    >
    > I think there may be a better way to do this. I found that
    > Unicode::Char module provides u() subroutine:
    >
    > my $u = Unicode::Char->new();
    > my $char = $u->u('AC00');     # u() returns a character of Unicode
    > index AC00
    > (http://search.cpan.org/~dankogai/Unicode-Char-0.02/lib/Unicode/Char.pm
    > )
    >
    > But I still wonder if there is a Perl internel function or standard
    > module that do same thing.


    perldoc -f chr
    perldoc -f oct

    the easiest would be:

    my $index = "AC00";
    my $char = chr(oct("0x$index"));
    print length($char),"\n";
    print "[$char]\n";

    --
    Klaus
     
    Klaus, Aug 30, 2009
    #2
    1. Advertising

  3. Raymundo <> wrote:
    >To represent a Unicode character in a string or in a regexp, I can use
    >"\x{hex}" notation.
    >
    >my $char = "\x{AC00}";
    ># $char = "?" -- a Korean character, pronounced "GA"
    >(I'm not sure you can see this Korean character in your browser.
    >Please tell me if you can't)


    Obviously you need a Korean font to view Korean characters. As I don't
    have a Korean font installed, obviously I can't see it.

    >However, it seems that this representation works only when it is hard-
    >coded. That means, I can't use a variable for the hex value:
    >
    >my $index = "AC00";
    >my $char = "\x{$index}"; # This doesn't work.


    Right. And it's not "hardcoded", but think of it as a notation for a
    character.

    If you do
    $wh = 'wh';
    {$wh}ile (someCondition) {...}
    then you don't get a while loop, either.

    >my $char = eval( "\"\\x{$index}\"" );


    Arggg, that's ugly!

    >I think there may be a better way to do this. I found that


    perldoc -f chr

    jue
     
    Jürgen Exner, Aug 30, 2009
    #3
  4. Klaus wrote:
    > On 30 août, 18:47, Raymundo <> wrote:
    >> To represent a Unicode character in a string or in a regexp, I can use
    >> "\x{hex}" notation.
    >>
    >> my $char = "\x{AC00}";
    >> # $char = "ê°€" -- a Korean character, pronounced "GA"

    >
    > [...]
    >
    >> But I still wonder if there is a Perl internel function or standard
    >> module that do same thing.

    >
    > perldoc -f chr
    > perldoc -f oct
    >
    > the easiest would be:
    >
    > my $index = "AC00";
    > my $char = chr(oct("0x$index"));


    Or:

    my $char = chr hex $index;

    > print length($char),"\n";
    > print "[$char]\n";




    John
    --
    Those people who think they know everything are a great
    annoyance to those of us who do. -- Isaac Asimov
     
    John W. Krahn, Aug 30, 2009
    #4
  5. Raymundo

    Raymundo Guest

    On 8ì›”31ì¼, 오전3ì‹œ26분, "John W.Krahn" <> wrote:
    > Klaus wrote:
    > > the easiest would be:

    >
    > > my $index = "AC00";
    > > my $char = chr(oct("0x$index"));

    >
    > Or:
    >
    > my $char = chr hex $index;
    >



    Oops, "chr" can receive Unicode index as its argument.

    I've thought it accepts only bytes (0~255)... I'm so sorry for
    bothering you.

    Thank you all.
    G.Y.Park in South Korea
     
    Raymundo, Aug 31, 2009
    #5
  6. "John W. Krahn" <> writes:
    > Klaus wrote:
    >> On 30 août, 18:47, Raymundo <> wrote:
    >>> To represent a Unicode character in a string or in a regexp, I can use
    >>> "\x{hex}" notation.
    >>>
    >>> my $char = "\x{AC00}";
    >>> # $char = "ê°€" -- a Korean character, pronounced "GA"

    >>
    >> [...]
    >>
    >>> But I still wonder if there is a Perl internel function or standard
    >>> module that do same thing.

    >>
    >> perldoc -f chr
    >> perldoc -f oct
    >>
    >> the easiest would be:
    >>
    >> my $index = "AC00";
    >> my $char = chr(oct("0x$index"));

    >
    > Or:
    >
    > my $char = chr hex $index;


    Or:

    my $index = 0xAC00;
    my $char = chr $index;

    Though if you have the index as a string, you'll need to use hex().

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Keith Thompson, Aug 31, 2009
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Apple
    Replies:
    3
    Views:
    325
    Apple
    Aug 1, 2005
  2. thunk
    Replies:
    1
    Views:
    364
    thunk
    Mar 30, 2010
  3. thunk
    Replies:
    0
    Views:
    539
    thunk
    Apr 1, 2010
  4. thunk
    Replies:
    14
    Views:
    664
    thunk
    Apr 3, 2010
  5. Tomasz Chmielewski

    sorting index-15, index-9, index-110 "the human way"?

    Tomasz Chmielewski, Mar 4, 2008, in forum: Perl Misc
    Replies:
    4
    Views:
    360
    Tomasz Chmielewski
    Mar 4, 2008
Loading...

Share This Page