ANSI/UTF-8 File when save string to it

Discussion in 'C Programming' started by DDD, Feb 14, 2011.

  1. DDD

    DDD Guest

    Hi,
    I have a question about character encode and file store format.

    // xaM= is base64 codes of Chinese character '牛'
    // The following codes will get a UTF-8 text file in
    XP.
    // And it will show a ţ .
    char *decodedText = PL_Base64Decode("xaM=", 4,
    nsnull);

    FILE *fp1;
    fp1=fopen("test.txt", "ab");
    fwrite(decodedText, sizeof(char), strlen(decodedText), fp1);
    fputc('\n', fp1);
    fclose(fp1);

    // uaTX98Wj is base64 codes of Chinese character "工作
    牛"
    // The following codes will get a ANSI text file in
    XP.
    // And it will show a "工作牛" .
    char *decodedText1 = PL_Base64Decode("uaTX98Wj", 8, nsnull);

    FILE *fp11;
    fp11=fopen("test1.txt", "ab");
    fwrite(decodedText1, sizeof(char), strlen(decodedText1), fp11);
    fputc('\n', fp11);
    fclose(fp11);

    So, what will cause fwrite function to chose different file store
    format, such as UTF-8 or ANSI in windows?

    Thanks in advance.
    DDD, Feb 14, 2011
    #1
    1. Advertising

  2. DDD <> wrote:
    > Hi,
    > I have a question about character encode and file store format.


    > // xaM= is base64 codes of Chinese character '牛'
    > // The following codes will get a UTF-8 text file in
    > XP.
    > // And it will show a ţ .
    > char *decodedText = PL_Base64Decode("xaM=", 4,
    > nsnull);


    > FILE *fp1;
    > fp1=fopen("test.txt", "ab");
    > fwrite(decodedText, sizeof(char), strlen(decodedText), fp1);
    > fputc('\n', fp1);
    > fclose(fp1);


    > // uaTX98Wj is base64 codes of Chinese character "工作
    > 牛"
    > // The following codes will get a ANSI text file in
    > XP.
    > // And it will show a "工作牛" .
    > char *decodedText1 = PL_Base64Decode("uaTX98Wj", 8, nsnull);


    > FILE *fp11;
    > fp11=fopen("test1.txt", "ab");
    > fwrite(decodedText1, sizeof(char), strlen(decodedText1), fp11);
    > fputc('\n', fp11);
    > fclose(fp11);


    > So, what will cause fwrite function to chose different file store
    > format, such as UTF-8 or ANSI in windows?


    Nothing at all (and that holds for Windows and any other ope-
    rating system). fwrite() faithfully writes the content of me-
    mory into a file and doesn't care a bit what those data are.
    If you want some external tool (that you e.g. use to view the
    file with) to recognize its content as UTF-8 then you must
    make sure that the data you pass to fwrite() have the correct
    form, fwrite() won't change them in any way. Same for ASCII.

    Since you seem to set up the memory you write out with fwrite()
    using some function named PL_Base64Decode() it boils down to
    what this function is doing and what data you pass to it. But
    this isn't a standard C function but probably from a third-party
    library, so you will rather likely get better answers to that
    question in a support forum for that library.

    On the other hand you write: "xaM= is base64 codes of Chinese
    character '牛'". But it's only a representation of that cha-
    racter in a certain encoding system. Since it gets interpreted,
    after having been "decoded" and written out to a file, as UTF-8
    it rather likely is the UTF-8 representation of that character.
    Now I'm not an expert on Chinese at all (those characters do
    not even show up with my newsreader) but if I remember correct-
    ly there are several encodings for chinese characters in use.
    Perhaps the 'uaTX98Wj' you give for the other character is the
    base64 code in some other encoding system than UTF-8 that the
    tool you use to view the file doesn't know about. And it may
    tell you that it's an ASCII text file due to some faulty heu-
    ristics it applies to determine the file content type (it can
    be very difficult to get it right with only a few bytes in a
    file).
    Regards, Jens
    --
    \ Jens Thoms Toerring ___
    \__________________________ http://toerring.de
    Jens Thoms Toerring, Feb 14, 2011
    #2
    1. Advertising

  3. I've found this useful and readable:
    The Absolute Minimum Every Software Developer Absolutely,
    Positively Must Know About Unicode and Character Sets
    (No Excuses!) by Joel Spolsky
    <http://www.joelonsoftware.com/articles/Unicode.html>

    Francois Grieu
    Francois Grieu, Feb 14, 2011
    #3
  4. (Jens Thoms Toerring) writes:
    > DDD <> wrote:

    [...]
    >> So, what will cause fwrite function to chose different file store
    >> format, such as UTF-8 or ANSI in windows?

    >
    > Nothing at all (and that holds for Windows and any other ope-
    > rating system). fwrite() faithfully writes the content of me-
    > mory into a file and doesn't care a bit what those data are.


    If the file is opened in text mode, it will perform whatever
    binary-to-text translations are appropriate. For Unix-like systems,
    typically this does nothing; for Windows-like systems, it typically just
    translates '\n' characters to CRLF pairs.

    [...]

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Feb 14, 2011
    #4
  5. Keith Thompson <> wrote:
    > (Jens Thoms Toerring) writes:
    > > DDD <> wrote:

    > [...]
    > >> So, what will cause fwrite function to chose different file store
    > >> format, such as UTF-8 or ANSI in windows?

    > >
    > > Nothing at all (and that holds for Windows and any other ope-
    > > rating system). fwrite() faithfully writes the content of me-
    > > mory into a file and doesn't care a bit what those data are.


    > If the file is opened in text mode, it will perform whatever
    > binary-to-text translations are appropriate. For Unix-like systems,
    > typically this does nothing; for Windows-like systems, it typically just
    > translates '\n' characters to CRLF pairs.


    Thanks, forgot about that (probably got to do some serious Win-
    dows programming to get bitten by it to make it stick;-)

    Regards, Jens
    --
    \ Jens Thoms Toerring ___
    \__________________________ http://toerring.de
    Jens Thoms Toerring, Feb 15, 2011
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    1
    Views:
    475
  2. Replies:
    11
    Views:
    1,053
    Keith Thompson
    Apr 28, 2008
  3. Frank Iannarilli

    pre-ansi to ansi c++ conversion?

    Frank Iannarilli, Jul 21, 2009, in forum: C++
    Replies:
    2
    Views:
    403
  4. moonhkt
    Replies:
    18
    Views:
    2,485
    Roedy Green
    Feb 5, 2010
  5. Michael L.

    Converting ANSI string to UTF-8

    Michael L., Nov 22, 2003, in forum: ASP General
    Replies:
    1
    Views:
    701
    Martin Honnen
    Nov 22, 2003
Loading...

Share This Page