Re: korean character sets

Discussion in 'Perl Misc' started by Ben Bacarisse, Aug 22, 2013.

  1. Cal Dershowitz <> writes:
    <snip>
    > Then I fire up google translate, pasting the 3 paragraphs in, and
    > first lifting out the cyrillic characters that say the right thing and
    > stuff them in a file called russian1 . Then I take the phonetic
    > output that they give you now just because you asked and stuffed it in
    > a file called russian2.
    >
    > http://merrillpjensen.com/pages/obamazombies_3.php


    Why do you think this is a Perl question? It is more likely to be a PHP
    one, so a PHP group might be better.

    In case it helps you frame the question better when you do post in a
    suitable group, I offer a few observations:

    The server does not specify a character encoding when serving the
    page. Not fatal, but it sure helps to get that sorted out.

    Almost all the key data is missing. What actually is in "russian1"?
    When you post it (to the right group) a hex dump might be the best way
    to show the contents, but "cat -A" would also work.

    You don't show the PHP code, and I am not sure you show the template
    that the code is acting on. You did post a file that might have been
    the template but it did not look like one to me. It certainly contained
    no clues as to what processing happens to build the page.

    Please don't take this an invitation to discuss PHP, server
    configuration, or any other non-Perl topics here.

    --
    Ben.
    Ben Bacarisse, Aug 22, 2013
    #1
    1. Advertising

  2. Cal Dershowitz <> writes:

    > On 08/22/2013 05:17 AM, Ben Bacarisse wrote:
    >> Cal Dershowitz <> writes:
    >> <snip>
    >>> Then I fire up google translate, pasting the 3 paragraphs in, and
    >>> first lifting out the cyrillic characters that say the right thing and
    >>> stuff them in a file called russian1 . Then I take the phonetic
    >>> output that they give you now just because you asked and stuffed it in
    >>> a file called russian2.
    >>>
    >>> http://merrillpjensen.com/pages/obamazombies_3.php

    >>
    >> Why do you think this is a Perl question? It is more likely to be a PHP
    >> one, so a PHP group might be better.

    >
    > [x-posted to c.l.php]
    >
    > I find that many applications are mixtures. For example, I would not
    > use php to template a php page. The topic here is the perl templating
    > of php and is topical in both these groups.


    Your previous post had no Perl and no Perl question. You now
    cross-post to a PHP group and all I can find is some Perl code and a
    report that you've fixed the PHP issue.

    I'll remove comp.lang.php since I see nothing PHP related here.

    <snip>
    > Anyways, the problem was that I was running utf-8 characters through
    > html entities, and if you want to see a bunch of upside-down question
    > marks, you can do that too.


    That can't explain what you were seeing. HTML encoding a bunch bytes
    that just happen to be the UTF-8 encoding of some other character can do
    what you were seeing, but that is not quite the same thing.

    At every stage you need to bear in mind two things: (a) what is the
    chracter encoding of the data, and (b) what character encoding does this
    part of the system /think/ is being used.

    Simply running a UTF-8 encoded string though encode_entities (from
    HTML::Entities) will work correctly if the Perl code knows that the
    characters are UTF-8 encoded. If not, it will interpret the string as
    some other encoding -- probably just plain bytes -- and encode those.

    > Q1) If I'm typing on this keyboard and herd the corresponding utf-8
    > characters between paragraph tags, do I ever need to call
    > HTML::Entities to sort out what I did?


    The depends on a whole bunch of things. Very often you can use Perl
    transparently -- you read from a file and you output to the browser
    without ever having to encode or decode anything. This works if the
    file uses the same encoding that the browser will use.

    At other times you need to convert between encodings. Think of "HTML
    entities" simply as yet another character encoding.

    > There's always a bit of html in any properly-phrased question along
    > these lines.
    >
    > This counts as successful output, subject to comment:
    >
    > http://merrillpjensen.com/pages/obamazombies_1.php
    >
    >
    > # captions
    > my $caption = <$CAPTIONS>;
    > # I think the next line is a mistake.
    > # $caption = encode_entities($caption);


    It may be unnecessary. It will only be a mistake if Perl does not know
    that the file is UTF-8 encoded characters. Reading perlio will help.
    You also need to know about "use utf8" if your perl /source/ contains
    any UTF-8 encoded data.

    > printf $fh $template, "${word2}/" . $remote_file, $caption;


    <snip>
    --
    Ben.
    Ben Bacarisse, Aug 24, 2013
    #2
    1. Advertising

  3. Cal Dershowitz <> writes:

    > On 08/24/2013 06:32 AM, Ben Bacarisse wrote:
    >> Cal Dershowitz <> writes:

    >
    >>>> Why do you think this is a Perl question? It is more likely to be a PHP
    >>>> one, so a PHP group might be better.
    >>>
    >>> [x-posted to c.l.php]
    >>>
    >>> I find that many applications are mixtures. For example, I would not
    >>> use php to template a php page. The topic here is the perl templating
    >>> of php and is topical in both these groups.

    >>
    >> Your previous post had no Perl and no Perl question. You now
    >> cross-post to a PHP group and all I can find is some Perl code and a
    >> report that you've fixed the PHP issue.
    >>
    >> I'll remove comp.lang.php since I see nothing PHP related here.

    >
    > Alright, but it basically amounted to you suggesting it and then
    > unsuggesting it.


    Yes, you got me. The message I replied to had not one line of Perl in
    it. It did talk about PHP, linked to several .php URLs, and it
    contained a reference to some PHP template files. The only questions in
    it were a general one ("Why can't I get this right?") and one about HTML
    meta charset declarations. But I see now it was about Perl.

    This:
    > I get you pretty well, but I'm a little stuck. Now I think I'm
    > drawing a blank on the printf statement.
    >
    > Use of uninitialized value $caption in printf at ./russian1.pl line 109.


    Talking your code and clipping it to a file suggests that line 109 is
    the one before the printf. Did you post what you are running?

    <snip>
    > I thought I was figuring it out, but don't get why it cats nothing:
    >
    > http://merrillpjensen.com/pages/utf_1.php


    I am not sure what anyone can do with this.

    --
    Ben.
    Ben Bacarisse, Aug 25, 2013
    #3
  4. >>>>> "C" == Cal Dershowitz <> writes:

    C> I think my machine or my brain wasn't working.

    Finally we are all on the same page.

    Charlton

    --
    Charlton Wilbur
    Charlton Wilbur, Aug 28, 2013
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?amlu?=
    Replies:
    0
    Views:
    470
    =?Utf-8?B?amlu?=
    Feb 15, 2006
  2. Nicholas Pappas

    JAI, Korean Text and Windows UNICODE

    Nicholas Pappas, May 31, 2004, in forum: Java
    Replies:
    0
    Views:
    371
    Nicholas Pappas
    May 31, 2004
  3. Erik  Bethke
    Replies:
    7
    Views:
    386
    =?ISO-8859-1?Q?Walter_D=F6rwald?=
    Feb 9, 2005
  4. Replies:
    2
    Views:
    366
    Major Quaternion Dirt Quantum
    Apr 25, 2007
  5. J.H.Kim

    Korean language broken

    J.H.Kim, Jul 13, 2010, in forum: Java
    Replies:
    2
    Views:
    848
    Roedy Green
    Jul 14, 2010
Loading...

Share This Page