Re: Is Chinese C++ SourceCode restricted to ASCII?

Discussion in 'C++' started by Francesco, Sep 5, 2009.

  1. Francesco

    Francesco Guest

    On 5 Set, 19:25, "Peter Olcott" <> wrote:

    > Okay so C++ Identifier are limited to ASCII


    No, they aren't. The C++ standard requires implementations to support
    _at least_ a subset of ASCII. After that point, the implementations
    are free to give the opportunity to write the source code in any
    charset whatsoever.

    This is a snippet of an old version of the standard:

    -------
    2.1 Phases of translation [lex.phases]

    1 The precedence among the syntax rules of translation is specified
    by
    the following phases.1)

    1 Physical source file characters are mapped, in an
    implementation-
    defined manner, to the source character set (introducing new-
    line
    characters for end-of-line indicators) if necessary.
    Trigraph
    sequences (_lex.trigraph_) are replaced by corresponding
    single-
    character internal representations. Any source file character
    not
    in the basic source character set (_lex.charset_) is replaced
    by
    the universal-character-name that designates that character.
    -------

    The current normative wording may vary but the essence should be the
    same: your implementation is free to allow you writing the source code
    in the charset of your preference.

    Best regards,
    Francesco
     
    Francesco, Sep 5, 2009
    #1
    1. Advertising

  2. Francesco

    Francesco Guest

    On 5 Set, 20:47, Francesco <> wrote:
    > On 5 Set, 19:25, "Peter Olcott" <> wrote:
    >
    > > Okay so C++ Identifier are limited to ASCII

    >
    > No, they aren't. The C++ standard requires implementations to support
    > _at least_ a subset of ASCII. After that point, the implementations
    > are free to give the opportunity to write the source code in any
    > charset whatsoever.
    >
    > This is a snippet of an old version of the standard:
    >
    > -------
    > 2.1  Phases of translation                                [lex.phases]
    >
    > 1 The  precedence  among the syntax rules of translation is specified
    > by
    >   the following phases.1)
    >
    >     1 Physical  source file characters are mapped, in an
    > implementation-
    >       defined manner, to the source character set (introducing  new-
    > line
    >       characters  for  end-of-line  indicators)  if necessary.
    > Trigraph
    >       sequences (_lex.trigraph_) are replaced by  corresponding
    > single-
    >       character internal representations.  Any source file character
    > not
    >       in the basic source character set (_lex.charset_) is  replaced
    > by
    >       the universal-character-name that designates that character.
    > -------
    >
    > The current normative wording may vary but the essence should be the
    > same: your implementation is free to allow you writing the source code
    > in the charset of your preference.
    >
    > Best regards,
    > Francesco


    Citation messed up, sorry.
    The following should be more readable:

    -------
    2.1 Phases of translation [lex.phases]

    1
    The precedence among the syntax rules of translation is specified by
    the following phases.

    1
    Physical source file characters are mapped, in an implementation-
    defined manner, to the source character set (introducing new-line
    characters for end-of-line indicators) if necessary.
    Trigraph sequences (_lex.trigraph_) are replaced by corresponding
    single-character internal representations. Any source file character
    not in the basic source character set (_lex.charset_) is replaced by
    the universal-character-name that designates that character.
    -------

    Francesco
     
    Francesco, Sep 5, 2009
    #2
    1. Advertising

  3. Francesco

    James Kanze Guest

    On Sep 5, 8:47 pm, Francesco <> wrote:
    > On 5 Set, 19:25, "Peter Olcott" <> wrote:


    > > Okay so C++ Identifier are limited to ASCII


    > No, they aren't. The C++ standard requires implementations to
    > support _at least_ a subset of ASCII.


    No. It requires implementations to support some encoding for
    characters in the basic character set. EBCDIC is just as valid
    as ASCII.

    With regards to identifiers, the standard *requires* an
    implementation to support all Unicode characters classified as
    alphanumeric. If the desired character isn't available in the
    input encoding, then it can be specified by means of a universal
    character name.

    > After that point, the implementations are free to give the
    > opportunity to write the source code in any charset
    > whatsoever.


    > This is a snippet of an old version of the standard:


    > -------
    > 2.1 Phases of translation [lex.phases]


    > 1 The precedence among the syntax rules of translation is specified
    > by
    > the following phases.1)
    >
    > 1 Physical source file characters are mapped, in an
    > implementation-
    > defined manner, to the source character set (introducing new-
    > line
    > characters for end-of-line indicators) if necessary.
    > Trigraph
    > sequences (_lex.trigraph_) are replaced by corresponding
    > single-
    > character internal representations. Any source file character
    > not
    > in the basic source character set (_lex.charset_) is replaced
    > by
    > the universal-character-name that designates that character.
    > -------


    > The current normative wording may vary but the essence should
    > be the same: your implementation is free to allow you writing
    > the source code in the charset of your preference.


    Or the character encoding of its preference:).

    In practice, in this case, Java copied exactly what C++ (and
    later C90) required, with the difference that the first Java
    compiler actually implemented it, whereas even today, very few
    C++ compilers do.

    --
    James Kanze
     
    James Kanze, Sep 5, 2009
    #3
  4. Francesco

    Francesco Guest

    On 5 Set, 23:36, James Kanze <> wrote:
    > On Sep 5, 8:47 pm, Francesco <> wrote:
    >
    > > On 5 Set, 19:25, "Peter Olcott" <> wrote:
    > > > Okay so C++ Identifier are limited to ASCII

    > > No, they aren't. The C++ standard requires implementations to
    > > support _at least_ a subset of ASCII.

    >
    > No. It requires implementations to support some encoding for
    > characters in the basic character set. EBCDIC is just as valid
    > as ASCII.
    >
    > With regards to identifiers, the standard *requires* an
    > implementation to support all Unicode characters classified as
    > alphanumeric. If the desired character isn't available in the
    > input encoding, then it can be specified by means of a universal
    > character name.
    >
    >
    >
    > > After that point, the implementations are free to give the
    > > opportunity to write the source code in any charset
    > > whatsoever.
    > > This is a snippet of an old version of the standard:
    > > -------
    > > 2.1 Phases of translation [lex.phases]
    > > 1 The precedence among the syntax rules of translation is specified
    > > by
    > > the following phases.1)

    >
    > > 1 Physical source file characters are mapped, in an
    > > implementation-
    > > defined manner, to the source character set (introducing new-
    > > line
    > > characters for end-of-line indicators) if necessary.
    > > Trigraph
    > > sequences (_lex.trigraph_) are replaced by corresponding
    > > single-
    > > character internal representations. Any source file character
    > > not
    > > in the basic source character set (_lex.charset_) is replaced
    > > by
    > > the universal-character-name that designates that character.
    > > -------
    > > The current normative wording may vary but the essence should
    > > be the same: your implementation is free to allow you writing
    > > the source code in the charset of your preference.

    >
    > Or the character encoding of its preference:).
    >
    > In practice, in this case, Java copied exactly what C++ (and
    > later C90) required, with the difference that the first Java
    > compiler actually implemented it, whereas even today, very few
    > C++ compilers do.


    Thank you for refining my reply, James. With my limited knowledge,
    I've done my best to recover the misrepresentation of C++ which was
    being done here.

    I take the chance to add that the standard imposes no limit to
    identifiers' length.

    As an addition, to respond to the original post, here is a valid C++
    program with Chinese identifiers and comments:

    -------
    #include <iostream>
    #include <iomanip>
    #include <string>

    using namespace std;

    /* Õâ¸öC + +³ÌÐò´òÓ¡±Ï´ï¸çÀ­Ë¹±í¡£*/

    int main() {
    cout << " X |";
    for (int ¼ÆÊýÆ÷ = 1; ¼ÆÊýÆ÷ <= 10; ++¼ÆÊýÆ÷) {
    cout << setw(4) << ¼ÆÊýÆ÷;
    }
    cout << endl << "-----+" << string(40, '-') << endl;
    for (int Ê×ÏÈ = 1; Ê×ÏÈ <= 10; ++Ê×ÏÈ) {
    cout << setw(4) << Ê×ÏÈ << " |";
    for (int µÚ¶þ = 1; µÚ¶þ <= 10; ++µÚ¶þ) {
    cout << setw(4) << Ê×ÏÈ * µÚ¶þ;
    }
    cout << endl;
    }
    return 0;
    }
    -------

    Whether or not the above code will be displayed correctly on other
    machines than mine depends on installed fonts and reader's active
    encoding.

    Whether a C++ compiler for the above encoding exists or not is
    irrelevant for the purpose of this post ;-)

    Cheers,
    Francesco
     
    Francesco, Sep 5, 2009
    #4
  5. Francesco wrote:

    > #include <iostream>
    > #include <iomanip>
    > #include <string>
    >
    > using namespace std;
    >
    > /* 这个C + +程åºæ‰“å°æ¯•è¾¾å“¥æ‹‰æ–¯è¡¨ã€‚*/
    >
    > int main() {
    > cout << " X |";
    > for (int 计数器 = 1; 计数器 <= 10; ++计数器) {
    > cout << setw(4) << 计数器;
    > }
    > cout << endl << "-----+" << string(40, '-') << endl;
    > for (int 首先 = 1; 首先 <= 10; ++首先) {
    > cout << setw(4) << 首先 << " |";
    > for (int 第二 = 1; 第二 <= 10; ++第二) {
    > cout << setw(4) << 首先 * 第二;
    > }
    > cout << endl;
    > }
    > return 0;
    > }


    I CAN'T GET THIS PROGRAM COMPILED BY A GCC 4.4.1 COMPILER RUNNING ON AN
    x86_64-unknown-linux-gnu !!!

    test.cpp:10: error: stray ‘\302’ in program
    test.cpp:10: error: stray ‘\240’ in program
    test.cpp:10: error: stray ‘\302’ in program
    test.cpp:10: error: stray ‘\240’ in program
    test.cpp:11: error: stray ‘\302’ in program
    test.cpp:11: error: stray ‘\240’ in program
    test.cpp:11: error: stray ‘\302’ in program
    test.cpp:11: error: stray ‘\240’ in program
    test.cpp:11: error: stray ‘\350’ in program
    test.cpp:11: error: stray ‘\256’ in program
    test.cpp:11: error: stray ‘\241’ in program
    test.cpp:11: error: stray ‘\346’ in program
    test.cpp:11: error: stray ‘\225’ in program
    test.cpp:11: error: stray ‘\260’ in program
    test.cpp:11: error: stray ‘\345’ in program
    test.cpp:11: error: stray ‘\231’ in program
    test.cpp:11: error: stray ‘\250’ in program
    test.cpp:11: error: stray ‘\350’ in program
    test.cpp:11: error: stray ‘\256’ in program
    test.cpp:11: error: stray ‘\241’ in program
    test.cpp:11: error: stray ‘\346’ in program
    test.cpp:11: error: stray ‘\225’ in program
    test.cpp:11: error: stray ‘\260’ in program
    test.cpp:11: error: stray ‘\345’ in program
    test.cpp:11: error: stray ‘\231’ in program
    test.cpp:11: error: stray ‘\250’ in program
    test.cpp:11: error: stray ‘\350’ in program
    test.cpp:11: error: stray ‘\256’ in program
    test.cpp:11: error: stray ‘\241’ in program
    test.cpp:11: error: stray ‘\346’ in program
    test.cpp:11: error: stray ‘\225’ in program
    test.cpp:11: error: stray ‘\260’ in program
    test.cpp:11: error: stray ‘\345’ in program
    test.cpp:11: error: stray ‘\231’ in program
    test.cpp:11: error: stray ‘\250’ in program
    test.cpp:12: error: stray ‘\302’ in program
    test.cpp:12: error: stray ‘\240’ in program
    test.cpp:12: error: stray ‘\302’ in program
    test.cpp:12: error: stray ‘\240’ in program
    test.cpp:12: error: stray ‘\302’ in program
    test.cpp:12: error: stray ‘\240’ in program
    test.cpp:12: error: stray ‘\302’ in program
    test.cpp:12: error: stray ‘\240’ in program
    test.cpp:12: error: stray ‘\350’ in program
    test.cpp:12: error: stray ‘\256’ in program
    test.cpp:12: error: stray ‘\241’ in program
    test.cpp:12: error: stray ‘\346’ in program
    test.cpp:12: error: stray ‘\225’ in program
    test.cpp:12: error: stray ‘\260’ in program
    test.cpp:12: error: stray ‘\345’ in program
    test.cpp:12: error: stray ‘\231’ in program
    test.cpp:12: error: stray ‘\250’ in program
    test.cpp:13: error: stray ‘\302’ in program
    test.cpp:13: error: stray ‘\240’ in program
    test.cpp:13: error: stray ‘\302’ in program
    test.cpp:13: error: stray ‘\240’ in program
    test.cpp:14: error: stray ‘\302’ in program
    test.cpp:14: error: stray ‘\240’ in program
    test.cpp:14: error: stray ‘\302’ in program
    test.cpp:14: error: stray ‘\240’ in program
    test.cpp:15: error: stray ‘\302’ in program
    test.cpp:15: error: stray ‘\240’ in program
    test.cpp:15: error: stray ‘\302’ in program
    test.cpp:15: error: stray ‘\240’ in program
    test.cpp:15: error: stray ‘\351’ in program
    test.cpp:15: error: stray ‘\246’ in program
    test.cpp:15: error: stray ‘\226’ in program
    test.cpp:15: error: stray ‘\345’ in program
    test.cpp:15: error: stray ‘\205’ in program
    test.cpp:15: error: stray ‘\210’ in program
    test.cpp:15: error: stray ‘\351’ in program
    test.cpp:15: error: stray ‘\246’ in program
    test.cpp:15: error: stray ‘\226’ in program
    test.cpp:15: error: stray ‘\345’ in program
    test.cpp:15: error: stray ‘\205’ in program
    test.cpp:15: error: stray ‘\210’ in program
    test.cpp:15: error: stray ‘\351’ in program
    test.cpp:15: error: stray ‘\246’ in program
    test.cpp:15: error: stray ‘\226’ in program
    test.cpp:15: error: stray ‘\345’ in program
    test.cpp:15: error: stray ‘\205’ in program
    test.cpp:15: error: stray ‘\210’ in program
    test.cpp:16: error: stray ‘\302’ in program
    test.cpp:16: error: stray ‘\240’ in program
    test.cpp:16: error: stray ‘\302’ in program
    test.cpp:16: error: stray ‘\240’ in program
    test.cpp:16: error: stray ‘\302’ in program
    test.cpp:16: error: stray ‘\240’ in program
    test.cpp:16: error: stray ‘\302’ in program
    test.cpp:16: error: stray ‘\240’ in program
    test.cpp:16: error: stray ‘\351’ in program
    test.cpp:16: error: stray ‘\246’ in program
    test.cpp:16: error: stray ‘\226’ in program
    test.cpp:16: error: stray ‘\345’ in program
    test.cpp:16: error: stray ‘\205’ in program
    test.cpp:16: error: stray ‘\210’ in program
    test.cpp:17: error: stray ‘\302’ in program
    test.cpp:17: error: stray ‘\240’ in program
    test.cpp:17: error: stray ‘\302’ in program
    test.cpp:17: error: stray ‘\240’ in program
    test.cpp:17: error: stray ‘\302’ in program
    test.cpp:17: error: stray ‘\240’ in program
    test.cpp:17: error: stray ‘\302’ in program
    test.cpp:17: error: stray ‘\240’ in program
    test.cpp:17: error: stray ‘\347’ in program
    test.cpp:17: error: stray ‘\254’ in program
    test.cpp:17: error: stray ‘\254’ in program
    test.cpp:17: error: stray ‘\344’ in program
    test.cpp:17: error: stray ‘\272’ in program
    test.cpp:17: error: stray ‘\214’ in program
    test.cpp:17: error: stray ‘\347’ in program
    test.cpp:17: error: stray ‘\254’ in program
    test.cpp:17: error: stray ‘\254’ in program
    test.cpp:17: error: stray ‘\344’ in program
    test.cpp:17: error: stray ‘\272’ in program
    test.cpp:17: error: stray ‘\214’ in program
    test.cpp:17: error: stray ‘\347’ in program
    test.cpp:17: error: stray ‘\254’ in program
    test.cpp:17: error: stray ‘\254’ in program
    test.cpp:17: error: stray ‘\344’ in program
    test.cpp:17: error: stray ‘\272’ in program
    test.cpp:17: error: stray ‘\214’ in program
    test.cpp:18: error: stray ‘\302’ in program
    test.cpp:18: error: stray ‘\240’ in program
    test.cpp:18: error: stray ‘\302’ in program
    test.cpp:18: error: stray ‘\240’ in program
    test.cpp:18: error: stray ‘\302’ in program
    test.cpp:18: error: stray ‘\240’ in program
    test.cpp:18: error: stray ‘\302’ in program
    test.cpp:18: error: stray ‘\240’ in program
    test.cpp:18: error: stray ‘\302’ in program
    test.cpp:18: error: stray ‘\240’ in program
    test.cpp:18: error: stray ‘\302’ in program
    test.cpp:18: error: stray ‘\240’ in program
    test.cpp:18: error: stray ‘\351’ in program
    test.cpp:18: error: stray ‘\246’ in program
    test.cpp:18: error: stray ‘\226’ in program
    test.cpp:18: error: stray ‘\345’ in program
    test.cpp:18: error: stray ‘\205’ in program
    test.cpp:18: error: stray ‘\210’ in program
    test.cpp:18: error: stray ‘\347’ in program
    test.cpp:18: error: stray ‘\254’ in program
    test.cpp:18: error: stray ‘\254’ in program
    test.cpp:18: error: stray ‘\344’ in program
    test.cpp:18: error: stray ‘\272’ in program
    test.cpp:18: error: stray ‘\214’ in program
    test.cpp:19: error: stray ‘\302’ in program
    test.cpp:19: error: stray ‘\240’ in program
    test.cpp:19: error: stray ‘\302’ in program
    test.cpp:19: error: stray ‘\240’ in program
    test.cpp:19: error: stray ‘\302’ in program
    test.cpp:19: error: stray ‘\240’ in program
    test.cpp:19: error: stray ‘\302’ in program
    test.cpp:19: error: stray ‘\240’ in program
    test.cpp:20: error: stray ‘\302’ in program
    test.cpp:20: error: stray ‘\240’ in program
    test.cpp:20: error: stray ‘\302’ in program
    test.cpp:20: error: stray ‘\240’ in program
    test.cpp:20: error: stray ‘\302’ in program
    test.cpp:20: error: stray ‘\240’ in program
    test.cpp:20: error: stray ‘\302’ in program
    test.cpp:20: error: stray ‘\240’ in program
    test.cpp:21: error: stray ‘\302’ in program
    test.cpp:21: error: stray ‘\240’ in program
    test.cpp:21: error: stray ‘\302’ in program
    test.cpp:21: error: stray ‘\240’ in program
    test.cpp:22: error: stray ‘\302’ in program
    test.cpp:22: error: stray ‘\240’ in program
    test.cpp:22: error: stray ‘\302’ in program
    test.cpp:22: error: stray ‘\240’ in program
    test.cpp: In function ‘int main()’:
    test.cpp:11: error: expected unqualified-id before ‘=’ token
    test.cpp:11: error: expected primary-expression before ‘<=’ token
    test.cpp:11: error: expected primary-expression before ‘)’ token
    test.cpp:12: error: expected primary-expression before ‘;’ token
    test.cpp:15: error: expected unqualified-id before ‘=’ token
    test.cpp:15: error: expected primary-expression before ‘<=’ token
    test.cpp:15: error: expected primary-expression before ‘)’ token
    test.cpp:16: error: expected primary-expression before ‘<<’ token
    test.cpp:17: error: expected unqualified-id before ‘=’ token
    test.cpp:17: error: expected primary-expression before ‘<=’ token
    test.cpp:17: error: expected primary-expression before ‘)’ token
    test.cpp:18: error: expected primary-expression before ‘;’ token
     
    Michael Tsang, Sep 9, 2009
    #5
  6. Francesco

    Francesco Guest

    On Sep 9, 2:38 pm, Michael Tsang <> wrote:
    > Francesco wrote:
    > > #include <iostream>
    > > #include <iomanip>
    > > #include <string>

    >
    > > using namespace std;

    >
    > > /* 这个C + +程åºæ‰“å°æ¯•è¾¾å“¥æ‹‰æ–¯è¡¨ã€‚*/

    >
    > > int main() {
    > >   cout << "   X |";
    > >   for (int 计数器 = 1; 计数器 <= 10; ++计数器) {
    > >     cout << setw(4) << 计数器;
    > >   }
    > >   cout << endl << "-----+" << string(40, '-') << endl;
    > >   for (int 首先 = 1; 首先 <= 10; ++首先) {
    > >     cout << setw(4) << 首先 << " |";
    > >     for (int 第二 = 1; 第二 <= 10; ++第二) {
    > >       cout << setw(4) << 首先 * 第二;
    > >     }
    > >     cout << endl;
    > >   }
    > >   return 0;
    > > }

    >
    > I CAN'T GET THIS PROGRAM COMPILED BY A GCC 4.4.1 COMPILER RUNNING ON AN
    > x86_64-unknown-linux-gnu !!!
    >


    [ snip whole lot of compiler errors ]

    Oh, really you can't? How strange! Eheheheh...

    I don't know if GCC can compile it, eventually you'd have to find the
    appropriate option to pass to it in order to accept Unicode
    characters. If you have it, I've heard that maybe Visual C++ could be
    able to compile it (even there, maybe you need to set some option).

    But really, forget about that program, I posted it as an example.
    Stick to ASCII characters, the program above isn't portable at all.

    Use this instead:

    -------
    #include <iostream>
    #include <iomanip>
    #include <string>

    using namespace std;

    /* C++ program to print the Pythagorean table*/

    int main() {
    cout << " X |";
    for (int counter = 1; counter <= 10; ++counter) {
    cout << setw(4) << counter;
    }
    cout << endl << "-----+" << string(40, '-') << endl;
    for (int first = 1; first <= 10; ++first) {
    cout << setw(4) << first << " |";
    for (int second = 1; second <= 10; ++second) {
    cout << setw(4) << first * second;
    }
    cout << endl;
    }
    return 0;
    }
    -------

    If your post was meant as a joke: good! really funny ;-)

    Cheers,
    Francesco
     
    Francesco, Sep 9, 2009
    #6
  7. GCC by default guesses the encoding from the environment variables (my LANG
    is en_HK.UTF-8). If it is not available, it will assume the text is in
    UTF-8.
     
    Michael Tsang, Sep 9, 2009
    #7
  8. * Michael Tsang:
    > GCC by default guesses the encoding from the environment variables (my LANG
    > is en_HK.UTF-8). If it is not available, it will assume the text is in
    > UTF-8.


    It doesn't seem that MinGW g++ for Windows does that encoding guessing.


    C:\test> type x.cpp
    int main()
    {
    L"Blåbærsyltetøy";
    }

    C:\test> msvc x.cpp
    x.cpp

    C:\test> gnuc x.cpp
    x.cpp:3:5: converting to execution character set: Illegal byte sequence
    x.cpp: In function `int main()':
    x.cpp:3: warning: statement has no effect

    C:\test> set LANG=no-NO.Latin-1

    C:\test> gnuc x.cpp
    x.cpp:3:5: converting to execution character set: Illegal byte sequence
    x.cpp: In function `int main()':
    x.cpp:3: warning: statement has no effect

    C:\test> _


    I had the impression that the above would just not compile with MinGW g++.

    Is there any way to make it compile short of preprocessing the source code (or
    did I for example get the LANG variable wrong, I'm not familiar with it)?


    Cheers & TIA.,

    - Alf
     
    Alf P. Steinbach, Sep 9, 2009
    #8
  9. Francesco

    Francesco Guest

    On Sep 9, 4:32 pm, Michael Tsang <> wrote:
    > GCC by default guesses the encoding from the environment variables (my LANG
    > is en_HK.UTF-8). If it is not available, it will assume the text is in
    > UTF-8.


    Accepting the general encoding of a source file is one thing,
    accepting those characters outside of string literals and comments is
    another.

    Summarizing this thread, the C++ Standard expects implementations to
    accept those characters for identifiers too, while in reality not all
    implementations do - I know none for sure, I've been told that VC++
    does.

    The following program compiles fine in my GCC - should do the same on
    yours, Michael:

    -------
    #include <iostream>
    /* Õâ¸öC + +³ÌÐò´òÓ¡±Ï´ï¸çÀ­Ë¹±í¡£*/
    int main() {
    std::cout << "Õâ¸öC + +³ÌÐò´òÓ¡±Ï´ï¸çÀ­Ë¹±í¡£" << std::endl;
    // ...
    return 0;
    }
    -------

    The actual output depends on the settings and on the system (WindowsXP
    accepts putting the console in UTF-8 mode with an API call, but
    actually it doesn't print out anything outside of ASCII, at least on
    my system)

    Best regards,
    Francesco
     
    Francesco, Sep 9, 2009
    #9
  10. Michael Tsang <> writes:

    > Francesco wrote:
    >

    Since I use emacs, I'd had on the first line this comment:

    /* coding:utf-8 */
    >> #include <iostream>
    >> #include <iomanip>
    >> #include <string>
    >>
    >> using namespace std;
    >>
    >> /* 这个C + +程åºæ‰“å°æ¯•è¾¾å“¥æ‹‰æ–¯è¡¨ã€‚*/
    >>
    >> int main() {
    >> cout << " X |";
    >> for (int 计数器 = 1; 计数器 <= 10; ++计数器) {
    >> cout << setw(4) << 计数器;
    >> }
    >> cout << endl << "-----+" << string(40, '-') << endl;
    >> for (int 首先 = 1; 首先 <= 10; ++首先) {
    >> cout << setw(4) << 首先 << " |";
    >> for (int 第二 = 1; 第二 <= 10; ++第二) {
    >> cout << setw(4) << 首先 * 第二;
    >> }
    >> cout << endl;
    >> }
    >> return 0;
    >> }

    >
    > I CAN'T GET THIS PROGRAM COMPILED BY A GCC 4.4.1 COMPILER RUNNING ON AN
    > x86_64-unknown-linux-gnu !!!


    Have you used the -fextended-identifiers -finput-charset=UTF-8 options of g++?
    With possibly also: -fexec-charset=UTF-8?

    Not that they work with my version of gcc 4.4.1 either :-(
    -fextended-identifiers is an experimental option.
    You should complain on gnu.help.gcc, and try another version of gcc.

    --
    __Pascal Bourguignon__
     
    Pascal J. Bourguignon, Sep 10, 2009
    #10
  11. Francesco

    James Kanze Guest

    On Sep 9, 5:26 pm, Francesco <> wrote:
    > On Sep 9, 4:32 pm, Michael Tsang <> wrote:


    > > GCC by default guesses the encoding from the environment
    > > variables (my LANG is en_HK.UTF-8). If it is not available,
    > > it will assume the text is in UTF-8.


    > Accepting the general encoding of a source file is one thing,
    > accepting those characters outside of string literals and
    > comments is another.


    > Summarizing this thread, the C++ Standard expects
    > implementations to accept those characters for identifiers
    > too, while in reality not all implementations do - I know none
    > for sure, I've been told that VC++ does.


    I just ran a quick test, encoding a file in UTF16, and VC++
    accepts accented characters in variable names. (I don't know
    how to input Chinese characters on my system, so I can't test
    those, but presumably, they'd work as well.) UTF16 is the
    default encoding under Windows, so that makes sense.

    --
    James Kanze
     
    James Kanze, Sep 10, 2009
    #11
  12. Francesco

    Francesco Guest

    On 10 Set, 11:44, James Kanze <> wrote:
    > On Sep 9, 5:26 pm, Francesco <> wrote:
    >
    > > On Sep 9, 4:32 pm, Michael Tsang <> wrote:
    > > > GCC by default guesses the encoding from the environment
    > > > variables (my LANG is en_HK.UTF-8). If it is not available,
    > > > it will assume the text is in UTF-8.

    > > Accepting the general encoding of a source file is one thing,
    > > accepting those characters outside of string literals and
    > > comments is another.
    > > Summarizing this thread, the C++ Standard expects
    > > implementations to accept those characters for identifiers
    > > too, while in reality not all implementations do - I know none
    > > for sure, I've been told that VC++ does.

    >
    > I just ran a quick test, encoding a file in UTF16, and VC++
    > accepts accented characters in variable names.  (I don't know
    > how to input Chinese characters on my system, so I can't test
    > those, but presumably, they'd work as well.)  UTF16 is the
    > default encoding under Windows, so that makes sense.


    Thanks for reporting your test James, glad to read the confirmation -
    and glad to read that gcc is leaning towards this standard compliance,
    wrt previous posts.

    On my system I've been able to insert Chinese character by copying
    them from Firefox and pasting them into the editor (it displayed them
    as a bunch of empty squares, but it worked - my gcc reads the files as
    utf8). Since the console chokes on non-ASCII UTF-8, I printed the
    values in a HTML file and I opened it in Firefox again - the
    characters were correctly displayed (I've set the appropriate encoding
    in the HTML file).

    Now I've added Eastern Language support to my WinXP and I can see
    Chinese characters in my editor (even keeping the Courier New font).

    Just a couple of notes, nothing more.

    Cheers,
    Francesco
     
    Francesco, Sep 10, 2009
    #12
  13. Francesco

    Francesco Guest

    On 10 Set, 11:06, (Pascal J. Bourguignon)
    wrote:
    > Michael Tsang <> writes:
    > > Francesco wrote:

    >
    > Since I use emacs, I'd had on the first line this comment:
    >
    > /* coding:utf-8 */
    >
    >
    >
    > >> #include <iostream>
    > >> #include <iomanip>
    > >> #include <string>

    >
    > >> using namespace std;

    >
    > >> /* 这个C + +程åºæ‰“å°æ¯•è¾¾å“¥æ‹‰æ–¯è¡¨ã€‚*/

    >
    > >> int main() {
    > >>   cout << "   X |";
    > >>   for (int 计数器 = 1; 计数器 <= 10; ++计数器) {
    > >>     cout << setw(4) << 计数器;
    > >>   }
    > >>   cout << endl << "-----+" << string(40, '-') << endl;
    > >>   for (int 首先 = 1; 首先 <= 10; ++首先) {
    > >>     cout << setw(4) << 首先 << " |";
    > >>     for (int 第二 = 1; 第二 <= 10; ++第二) {
    > >>       cout << setw(4) << 首先 * 第二;
    > >>     }
    > >>     cout << endl;
    > >>   }
    > >>   return 0;
    > >> }

    >
    > > I CAN'T GET THIS PROGRAM COMPILED BY A GCC 4.4.1 COMPILER RUNNING ON AN
    > > x86_64-unknown-linux-gnu !!!

    >
    > Have you used the  -fextended-identifiers  -finput-charset=UTF-8  options of g++?
    > With possibly also: -fexec-charset=UTF-8?
    >
    > Not that they work with my version of gcc 4.4.1 either :-(
    > -fextended-identifiers is an experimental option.


    Good to read that they're going there somehow!
    So then even with those options it didn't compile on 4.4.1?

    My gcc version is quite older (I have the one that comes with latest
    MinGW release), I don't think it will work here, but I'll try :-/

    Cheers,
    Francesco
     
    Francesco, Sep 10, 2009
    #13
  14. On Sep 10, 3:29 am, Francesco <> wrote:
    > On 10 Set, 11:06, (Pascal J. Bourguignon)
    > wrote:
    >
    >
    >
    >
    >
    > > Michael Tsang <> writes:
    > > > Francesco wrote:

    >
    > > Since I use emacs, I'd had on the first line this comment:

    >
    > > /* coding:utf-8 */

    >
    > > >> #include <iostream>
    > > >> #include <iomanip>
    > > >> #include <string>

    >
    > > >> using namespace std;

    >
    > > >> /* 这个C + +程åºæ‰“å°æ¯•è¾¾å“¥æ‹‰æ–¯è¡¨ã€‚*/

    >
    > > >> int main() {
    > > >>   cout << "   X |";
    > > >>   for (int 计数器 = 1; 计数器 <= 10; ++计数器) {
    > > >>     cout << setw(4) << 计数器;
    > > >>   }
    > > >>   cout << endl << "-----+" << string(40, '-') << endl;
    > > >>   for (int 首先 = 1; 首先 <= 10; ++首先) {
    > > >>     cout << setw(4) << 首先 << " |";
    > > >>     for (int 第二 = 1; 第二 <= 10; ++第二) {
    > > >>       cout << setw(4) << 首先 * 第二;
    > > >>     }
    > > >>     cout << endl;
    > > >>   }
    > > >>   return 0;
    > > >> }

    >
    > > > I CAN'T GET THIS PROGRAM COMPILED BY A GCC 4.4.1 COMPILER RUNNING ON AN
    > > > x86_64-unknown-linux-gnu !!!

    >
    > > Have you used the  -fextended-identifiers  -finput-charset=UTF-8  options of g++?
    > > With possibly also: -fexec-charset=UTF-8?

    >
    > > Not that they work with my version of gcc 4.4.1 either :-(
    > > -fextended-identifiers is an experimental option.

    >
    > Good to read that they're going there somehow!
    > So then even with those options it didn't compile on 4.4.1?
    >
    > My gcc version is quite older (I have the one that comes with latest
    > MinGW release), I don't think it will work here, but I'll try :-/
    >
    > Cheers,
    > Francesco


    I am also interested in compiling code with foreign variable names.

    I have tried compiling a UTF-8 encoded source file with G++ 4.4.1
    using the flags suggested above (-fextended-identifiers -finput-
    charset=UTF-8 -fexec-charset=UTF-8), to no avail. (Have tried using
    both Greek and Japanese characters.) I can, however, get foreign
    characters to compile if they appear in strings or comments.

    As for identifiers, the GCC documentation reads:

    -fextended-identifiers
    Accept universal character names in identifiers. This option is
    experimental; in a future version of GCC, it will be enabled by
    default for C99 and C++.

    So if "extended identifiers" doesn't mean Unicode identifiers, what
    *does* it mean?
    Thanks,

    Trevor
     
    Trevor Goodchild, Sep 10, 2009
    #14
  15. Trevor Goodchild <> writes:

    > On Sep 10, 3:29 am, Francesco <> wrote:
    >> On 10 Set, 11:06, (Pascal J. Bourguignon)
    >> wrote:
    >>
    >>
    >>
    >>
    >>
    >> > Michael Tsang <> writes:
    >> > > Francesco wrote:

    >>
    >> > Since I use emacs, I'd had on the first line this comment:

    >>
    >> > /* coding:utf-8 */

    >>
    >> > >> #include <iostream>
    >> > >> #include <iomanip>
    >> > >> #include <string>

    >>
    >> > >> using namespace std;

    >>
    >> > >> /* 这个C + +程åºæ‰“å°æ¯•è¾¾å“¥æ‹‰æ–¯è¡¨ã€‚*/

    >>
    >> > >> int main() {
    >> > >>   cout << "   X |";
    >> > >>   for (int 计数器 = 1; 计数器 <= 10; ++计数器) {
    >> > >>     cout << setw(4) << 计数器;
    >> > >>   }
    >> > >>   cout << endl << "-----+" << string(40, '-') << endl;
    >> > >>   for (int 首先 = 1; 首先 <= 10; ++首先) {
    >> > >>     cout << setw(4) << 首先 << " |";
    >> > >>     for (int 第二 = 1; 第二 <= 10; ++第二) {
    >> > >>       cout << setw(4) << 首先 * 第二;
    >> > >>     }
    >> > >>     cout << endl;
    >> > >>   }
    >> > >>   return 0;
    >> > >> }

    >>
    >> > > I CAN'T GET THIS PROGRAM COMPILED BY A GCC 4.4.1 COMPILER RUNNING ON AN
    >> > > x86_64-unknown-linux-gnu !!!

    >>
    >> > Have you used the  -fextended-identifiers  -finput-charset=UTF-8  options of g++?
    >> > With possibly also: -fexec-charset=UTF-8?

    >>
    >> > Not that they work with my version of gcc 4.4.1 either :-(
    >> > -fextended-identifiers is an experimental option.

    >>
    >> Good to read that they're going there somehow!
    >> So then even with those options it didn't compile on 4.4.1?
    >>
    >> My gcc version is quite older (I have the one that comes with latest
    >> MinGW release), I don't think it will work here, but I'll try :-/
    >>
    >> Cheers,
    >> Francesco

    >
    > I am also interested in compiling code with foreign variable names.
    >
    > I have tried compiling a UTF-8 encoded source file with G++ 4.4.1
    > using the flags suggested above (-fextended-identifiers -finput-
    > charset=UTF-8 -fexec-charset=UTF-8), to no avail. (Have tried using
    > both Greek and Japanese characters.) I can, however, get foreign
    > characters to compile if they appear in strings or comments.
    >
    > As for identifiers, the GCC documentation reads:
    >
    > -fextended-identifiers
    > Accept universal character names in identifiers. This option is
    > experimental; in a future version of GCC, it will be enabled by
    > default for C99 and C++.
    >
    > So if "extended identifiers" doesn't mean Unicode identifiers, what
    > *does* it mean?


    Perhaps \uABCD ?

    Let's try:

    $ cat e.cxx
    int foo\u0141bar(int f){
    return(f+1);
    }
    $ /opt/local/bin/gcc-mp-4.3 -fextended-identifiers -o e.o -c e.cxx
    $ nm /tmp/e.o
    00000000 T _Z8fooÅbari
    00000000 A _Z8fooÅbari.eh

    Yay!

    So we just need to convert the utf-8 sources into ASCII sources using
    extended-identifiers:


    [pjb@galatea :0.0 tmp]$ ~/bin/extend-identifiers < chinese.cxx > chinese-ext.cxx
    [pjb@galatea :0.0 tmp]$ /opt/local/bin/gcc-mp-4.2 -fextended-identifiers -c -o chinese.o chinese-ext.cxx
    [pjb@galatea :0.0 tmp]$ nm chinese.o
    00000454 s EH_frame1
    0000040e s GCC_except_table0
    0000016a t _GLOBAL__I__Z12添加一个i
    00000000 T _Z12添加一个i
    0000051c s __GLOBAL__I__Z12添加一个i.eh
    U __Unwind_Resume
    00000000 A __Z12添加一个i.eh
    00000118 t __Z41__static_initialization_and_destruction_0ii
    000004f0 s __Z41__static_initialization_and_destruction_0ii.eh
    U __ZNKSs4sizeEv
    U __ZNKSsixEm
    U __ZNSaIcEC1Ev
    U __ZNSaIcED1Ev
    U __ZNSolsEPFRSoS_E
    U __ZNSolsEi
    U __ZNSsC1EmcRKSaIcE
    U __ZNSsD1Ev
    U __ZNSt8ios_base4InitC1Ev
    U __ZNSt8ios_base4InitD1Ev
    0000000c t __ZSt17__verify_groupingPKcmRKSs
    000004c4 s __ZSt17__verify_groupingPKcmRKSs.eh
    000003d2 S __ZSt3minImERKT_S2_S2_
    0000049c S __ZSt3minImERKT_S2_S2_.eh
    U __ZSt4cout
    U __ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_
    000003c0 S __ZSt4setwi
    00000474 S __ZSt4setwi.eh
    000007b0 b __ZSt8__ioinit
    U __ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
    U __ZStlsIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_St5_Setw
    U __ZStlsIcSt11char_traitsIcESaIcEERSt13basic_ostreamIT_T0_ES7_RKSbIS4_S5_T1_E
    U ___cxa_atexit
    U ___dso_handle
    U ___gxx_personality_v0
    000003f9 S ___i686.get_pc_thunk.bx
    00000186 t ___tcf_0
    00000544 s ___tcf_0.eh
    000001a6 T _main
    00000570 S _main.eh
    U dyld_stub_binding_helper
    [pjb@galatea :0.0 tmp]$ cat chinese.cxx
    // coding:utf-8
    #include <iostream>
    #include <iomanip>
    #include <string>

    using namespace std;

    /* 这个C + +程åºæ‰“å°æ¯•è¾¾å“¥æ‹‰æ–¯è¡¨ã€‚*/

    int 添加一个(int x){ return(x+1); }

    int main() {
    cout << " X |";
    for (int 计数器 = 1; 计数器 <= 10; 计数器=添加一个(计数器)) {
    cout << setw(4) << 计数器;
    }
    cout << endl << "-----+" << string(40, '-') << endl;
    for (int 首先 = 1; 首先 <= 10; ++首先) {
    cout << setw(4) << 首先 << " |";
    for (int 第二 = 1; 第二 <= 10; ++第二) {
    cout << setw(4) << 首先 * 第二;
    }
    cout << endl;
    }
    return 0;
    }
    [pjb@galatea :0.0 tmp]$ cat chinese-ext.cxx
    // coding:utf-8
    #include <iostream>
    #include <iomanip>
    #include <string>

    using namespace std;

    /* \u8FD9\u4E2AC + +\u7A0B\u5E8F\u6253\u5370\u6BD5\u8FBE\u54E5\u62C9\u65AF\u8868\u3002*/

    int \u6DFB\u52A0\u4E00\u4E2A(int x){ return(x+1); }

    int main() {
    cout << " X |";
    for (int \u8BA1\u6570\u5668 = 1; \u8BA1\u6570\u5668 <= 10; \u8BA1\u6570\u5668=\u6DFB\u52A0\u4E00\u4E2A(\u8BA1\u6570\u5668)) {
    cout << setw(4) << \u8BA1\u6570\u5668;
    }
    cout << endl << "-----+" << string(40, '-') << endl;
    for (int \u9996\u5148 = 1; \u9996\u5148 <= 10; ++\u9996\u5148) {
    cout << setw(4) << \u9996\u5148 << " |";
    for (int \u7B2C\u4E8C = 1; \u7B2C\u4E8C <= 10; ++\u7B2C\u4E8C) {
    cout << setw(4) << \u9996\u5148 * \u7B2C\u4E8C;
    }
    cout << endl;
    }
    return 0;
    }
    [pjb@galatea :0.0 tmp]$ cat ~/bin/extend-identifiers
    #!/usr/local/bin/clisp -ansi -q -Kfull -E utf-8

    (defun extend-characters (line)
    (with-output-to-string (*standard-output*)
    (loop
    for ch across line
    do (if (< (char-code ch) 128)
    (princ ch)
    (format t "\\u~4,'0X" (char-code ch))))))

    (loop
    for line = (read-line *standard-input* nil nil)
    while line
    do (write-line (extend-characters line)))

    (ext:exit 0)


    [pjb@galatea :0.0 tmp]$



    --
    __Pascal Bourguignon__
     
    Pascal J. Bourguignon, Sep 11, 2009
    #15
  16. Francesco

    Francesco Guest

    On 11 Set, 04:00, (Pascal J. Bourguignon)
    wrote:
    > Trevor Goodchild <> writes:
    > > On Sep 10, 3:29 am, Francesco <> wrote:
    > >> On 10 Set, 11:06, (Pascal J. Bourguignon)
    > >> wrote:

    >
    > >> > Michael Tsang <> writes:
    > >> > > Francesco wrote:

    >
    > >> > Since I use emacs, I'd had on the first line this comment:

    >
    > >> > /* coding:utf-8 */

    >
    > >> > >> #include <iostream>
    > >> > >> #include <iomanip>
    > >> > >> #include <string>

    >
    > >> > >> using namespace std;

    >
    > >> > >> /* 这个C + +程åºæ‰“å°æ¯•è¾¾å“¥æ‹‰æ–¯è¡¨ã€‚*/

    >
    > >> > >> int main() {
    > >> > >>   cout << "   X |";
    > >> > >>   for (int 计数器 = 1; 计数器 <= 10; ++计数器) {
    > >> > >>     cout << setw(4) << 计数器;
    > >> > >>   }
    > >> > >>   cout << endl << "-----+" << string(40, '-') << endl;
    > >> > >>   for (int 首先 = 1; 首先 <= 10; ++首先) {
    > >> > >>     cout << setw(4) << 首先 << " |";
    > >> > >>     for (int 第二 = 1; 第二 <= 10; ++第二) {
    > >> > >>       cout << setw(4) << 首先 * 第二;
    > >> > >>     }
    > >> > >>     cout << endl;
    > >> > >>   }
    > >> > >>   return 0;
    > >> > >> }

    >
    > >> > > I CAN'T GET THIS PROGRAM COMPILED BY A GCC 4.4.1 COMPILER RUNNING ON AN
    > >> > > x86_64-unknown-linux-gnu !!!

    >
    > >> > Have you used the  -fextended-identifiers  -finput-charset=UTF-8  options of g++?
    > >> > With possibly also: -fexec-charset=UTF-8?

    >
    > >> > Not that they work with my version of gcc 4.4.1 either :-(
    > >> > -fextended-identifiers is an experimental option.

    >
    > >> Good to read that they're going there somehow!
    > >> So then even with those options it didn't compile on 4.4.1?

    >
    > >> My gcc version is quite older (I have the one that comes with latest
    > >> MinGW release), I don't think it will work here, but I'll try :-/

    >
    > >> Cheers,
    > >> Francesco

    >
    > > I am also interested in compiling code with foreign variable names.

    >
    > > I have tried compiling a UTF-8 encoded source file with G++ 4.4.1
    > > using the flags suggested above (-fextended-identifiers  -finput-
    > > charset=UTF-8 -fexec-charset=UTF-8), to no avail.  (Have tried using
    > > both Greek and Japanese characters.)  I can, however, get foreign
    > > characters to compile if they appear in strings or comments.

    >
    > > As for identifiers, the GCC documentation reads:

    >
    > >    -fextended-identifiers
    > >    Accept universal character names in identifiers. This option is
    > > experimental; in a future version of GCC, it will be enabled by
    > > default for C99 and C++.

    >
    > > So if "extended identifiers" doesn't mean Unicode identifiers, what
    > > *does* it mean?

    >
    > Perhaps \uABCD ?
    >
    > Let's try:
    >
    > $ cat e.cxx
    > int foo\u0141bar(int f){
    >     return(f+1);}
    >
    > $ /opt/local/bin/gcc-mp-4.3  -fextended-identifiers -o e.o -c e.cxx
    > $ nm /tmp/e.o
    > 00000000 T _Z8fooÅbari
    > 00000000 A _Z8fooÅbari.eh
    >
    > Yay!
    >
    > So we just need to convert the utf-8 sources into ASCII sources using
    > extended-identifiers:
    >
    > [pjb@galatea :0.0 tmp]$ ~/bin/extend-identifiers < chinese.cxx  > chinese-ext.cxx
    > [pjb@galatea :0.0 tmp]$ /opt/local/bin/gcc-mp-4.2  -fextended-identifiers -c -o chinese.o chinese-ext.cxx
    > [pjb@galatea :0.0 tmp]$ nm chinese.o
    > 00000454 s EH_frame1
    > 0000040e s GCC_except_table0
    > 0000016a t _GLOBAL__I__Z12添加一个i
    > 00000000 T _Z12添加一个i
    > 0000051c s __GLOBAL__I__Z12添加一个i.eh
    >          U __Unwind_Resume
    > 00000000 A __Z12添加一个i.eh
    > 00000118 t __Z41__static_initialization_and_destruction_0ii
    > 000004f0 s __Z41__static_initialization_and_destruction_0ii.eh
    >          U __ZNKSs4sizeEv
    >          U __ZNKSsixEm
    >          U __ZNSaIcEC1Ev
    >          U __ZNSaIcED1Ev
    >          U __ZNSolsEPFRSoS_E
    >          U __ZNSolsEi
    >          U __ZNSsC1EmcRKSaIcE
    >          U __ZNSsD1Ev
    >          U __ZNSt8ios_base4InitC1Ev
    >          U __ZNSt8ios_base4InitD1Ev
    > 0000000c t __ZSt17__verify_groupingPKcmRKSs
    > 000004c4 s __ZSt17__verify_groupingPKcmRKSs.eh
    > 000003d2 S __ZSt3minImERKT_S2_S2_
    > 0000049c S __ZSt3minImERKT_S2_S2_.eh
    >          U __ZSt4cout
    >          U __ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_
    > 000003c0 S __ZSt4setwi
    > 00000474 S __ZSt4setwi.eh
    > 000007b0 b __ZSt8__ioinit
    >          U __ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
    >          U __ZStlsIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_St5_Setw
    >          U __ZStlsIcSt11char_traitsIcESaIcEERSt13basic_ostreamIT_T0_ES7_RKSbIS4_S5_T1_E
    >          U ___cxa_atexit
    >          U ___dso_handle
    >          U ___gxx_personality_v0
    > 000003f9 S ___i686.get_pc_thunk.bx
    > 00000186 t ___tcf_0
    > 00000544 s ___tcf_0.eh
    > 000001a6 T _main
    > 00000570 S _main.eh
    >          U dyld_stub_binding_helper
    > [pjb@galatea :0.0 tmp]$ cat chinese.cxx
    > // coding:utf-8
    > #include <iostream>
    > #include <iomanip>
    > #include <string>
    >
    > using namespace std;
    >
    > /* 这个C + +程åºæ‰“å°æ¯•è¾¾å“¥æ‹‰æ–¯è¡¨ã€‚*/
    >
    > int 添加一个(int x){ return(x+1); }
    >
    > int main() {
    >     cout << "   X |";
    >     for (int 计数器 = 1; 计数器 <= 10; 计数器=添加一个(计数器)) {
    >         cout << setw(4) << 计数器;
    >     }
    >     cout << endl << "-----+" << string(40, '-') << endl;
    >     for (int 首先 = 1; 首先 <= 10; ++首先) {
    >         cout << setw(4) << 首先 << " |";
    >         for (int 第二 = 1; 第二 <= 10; ++第二) {
    >             cout << setw(4) << 首先 * 第二;
    >         }
    >         cout << endl;
    >     }
    >     return 0;}
    >
    > [pjb@galatea :0.0 tmp]$ cat chinese-ext.cxx
    > // coding:utf-8
    > #include <iostream>
    > #include <iomanip>
    > #include <string>
    >
    > using namespace std;
    >
    > /* \u8FD9\u4E2AC + +\u7A0B\u5E8F\u6253\u5370\u6BD5\u8FBE\u54E5\u62C9\u65AF\u8868\u3002*/
    >
    > int \u6DFB\u52A0\u4E00\u4E2A(int x){ return(x+1); }
    >
    > int main() {
    >     cout << "   X |";
    >     for (int \u8BA1\u6570\u5668 = 1; \u8BA1\u6570\u5668 <= 10; \u8BA1\u6570\u5668=\u6DFB\u52A0\u4E00\u4E2A(\u8BA1\u6570\u5668)) {
    >         cout << setw(4) << \u8BA1\u6570\u5668;
    >     }
    >     cout << endl << "-----+" << string(40, '-') << endl;
    >     for (int \u9996\u5148 = 1; \u9996\u5148 <= 10; ++\u9996\u5148) {
    >         cout << setw(4) << \u9996\u5148 << " |";
    >         for (int \u7B2C\u4E8C = 1; \u7B2C\u4E8C <= 10; ++\u7B2C\u4E8C) {
    >             cout << setw(4) << \u9996\u5148 * \u7B2C\u4E8C;
    >         }
    >         cout << endl;
    >     }
    >     return 0;}
    >
    > [pjb@galatea :0.0 tmp]$ cat ~/bin/extend-identifiers
    > #!/usr/local/bin/clisp -ansi -q -Kfull -E utf-8
    >
    > (defun extend-characters (line)
    >   (with-output-to-string (*standard-output*)
    >     (loop
    >        for ch across line
    >        do (if (< (char-code ch) 128)
    >               (princ ch)
    >               (format t "\\u~4,'0X" (char-code ch))))))
    >
    > (loop
    >    for line = (read-line *standard-input* nil nil)
    >    while line
    >    do (write-line (extend-characters line)))
    >
    > (ext:exit 0)
    >
    > [pjb@galatea :0.0 tmp]$
    >
    > --
    > __Pascal Bourguignon__


    Uh, your post is pretty cryptic for me... seems like you used some
    tool to convert the Chinese characters into their corresponding
    universal-character-names, in the original source, then you have fed
    the converted sources to the compiler with the appropriate option and
    the compile process ran successfully, have I got it all straight
    Pascal?

    Really good to know, if so.

    Cheers,
    Francesco
     
    Francesco, Sep 11, 2009
    #16
  17. Francesco <> writes:

    > Uh, your post is pretty cryptic for me... seems like you used some
    > tool to convert the Chinese characters into their corresponding
    > universal-character-names, in the original source, then you have fed
    > the converted sources to the compiler with the appropriate option and
    > the compile process ran successfully, have I got it all straight
    > Pascal?


    Yes, exactly.

    > Really good to know, if so.


    --
    __Pascal Bourguignon__
     
    Pascal J. Bourguignon, Sep 11, 2009
    #17
  18. Francesco

    Francesco Guest

    On 11 Set, 14:40, (Pascal J. Bourguignon)
    wrote:
    > Francesco <> writes:
    > > Uh, your post is pretty cryptic for me... seems like you used some
    > > tool to convert the Chinese characters into their corresponding
    > > universal-character-names, in the original source, then you have fed
    > > the converted sources to the compiler with the appropriate option and
    > > the compile process ran successfully, have I got it all straight
    > > Pascal?

    >
    > Yes, exactly.


    Thanks for the confirmation. I've read again your post and now it
    doesn't look so cryptic - the similarity between the name of your
    Common Lisp utility (extend-identifiers) and the gcc option (extended-
    identifiers) along with the double redirection you used to call your
    tool (which looked like a template instantiation) made me uncertain
    even if I got the essence :-/

    Cheers,
    Francesco
     
    Francesco, Sep 11, 2009
    #18
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Roberto Rocco
    Replies:
    0
    Views:
    398
    Roberto Rocco
    Aug 22, 2004
  2. many_years_after
    Replies:
    20
    Views:
    20,411
    Gerhard Fiedler
    Aug 21, 2006
  3. James Kanze
    Replies:
    2
    Views:
    520
    Francesco
    Sep 6, 2009
  4. Richard Herring
    Replies:
    1
    Views:
    355
    Jorgen Grahn
    Sep 8, 2009
  5. Mister Yu
    Replies:
    2
    Views:
    174
    Mister Yu
    Sep 30, 2007
Loading...

Share This Page