ANSI/UTF-8 File when save string to it

D

DDD

Hi,
I have a question about character encode and file store format.

// xaM= is base64 codes of Chinese character '牛'
// The following codes will get a UTF-8 text file in
XP.
// And it will show a ţ .
char *decodedText = PL_Base64Decode("xaM=", 4,
nsnull);

FILE *fp1;
fp1=fopen("test.txt", "ab");
fwrite(decodedText, sizeof(char), strlen(decodedText), fp1);
fputc('\n', fp1);
fclose(fp1);

// uaTX98Wj is base64 codes of Chinese character "工作
牛"
// The following codes will get a ANSI text file in
XP.
// And it will show a "工作牛" .
char *decodedText1 = PL_Base64Decode("uaTX98Wj", 8, nsnull);

FILE *fp11;
fp11=fopen("test1.txt", "ab");
fwrite(decodedText1, sizeof(char), strlen(decodedText1), fp11);
fputc('\n', fp11);
fclose(fp11);

So, what will cause fwrite function to chose different file store
format, such as UTF-8 or ANSI in windows?

Thanks in advance.
 
J

Jens Thoms Toerring

DDD said:
Hi,
I have a question about character encode and file store format.
// xaM= is base64 codes of Chinese character '牛'
// The following codes will get a UTF-8 text file in
XP.
// And it will show a ţ .
char *decodedText = PL_Base64Decode("xaM=", 4,
nsnull);
FILE *fp1;
fp1=fopen("test.txt", "ab");
fwrite(decodedText, sizeof(char), strlen(decodedText), fp1);
fputc('\n', fp1);
fclose(fp1);
// uaTX98Wj is base64 codes of Chinese character "工作
牛"
// The following codes will get a ANSI text file in
XP.
// And it will show a "工作牛" .
char *decodedText1 = PL_Base64Decode("uaTX98Wj", 8, nsnull);
FILE *fp11;
fp11=fopen("test1.txt", "ab");
fwrite(decodedText1, sizeof(char), strlen(decodedText1), fp11);
fputc('\n', fp11);
fclose(fp11);
So, what will cause fwrite function to chose different file store
format, such as UTF-8 or ANSI in windows?

Nothing at all (and that holds for Windows and any other ope-
rating system). fwrite() faithfully writes the content of me-
mory into a file and doesn't care a bit what those data are.
If you want some external tool (that you e.g. use to view the
file with) to recognize its content as UTF-8 then you must
make sure that the data you pass to fwrite() have the correct
form, fwrite() won't change them in any way. Same for ASCII.

Since you seem to set up the memory you write out with fwrite()
using some function named PL_Base64Decode() it boils down to
what this function is doing and what data you pass to it. But
this isn't a standard C function but probably from a third-party
library, so you will rather likely get better answers to that
question in a support forum for that library.

On the other hand you write: "xaM= is base64 codes of Chinese
character '牛'". But it's only a representation of that cha-
racter in a certain encoding system. Since it gets interpreted,
after having been "decoded" and written out to a file, as UTF-8
it rather likely is the UTF-8 representation of that character.
Now I'm not an expert on Chinese at all (those characters do
not even show up with my newsreader) but if I remember correct-
ly there are several encodings for chinese characters in use.
Perhaps the 'uaTX98Wj' you give for the other character is the
base64 code in some other encoding system than UTF-8 that the
tool you use to view the file doesn't know about. And it may
tell you that it's an ASCII text file due to some faulty heu-
ristics it applies to determine the file content type (it can
be very difficult to get it right with only a few bytes in a
file).
Regards, Jens
 
K

Keith Thompson

Nothing at all (and that holds for Windows and any other ope-
rating system). fwrite() faithfully writes the content of me-
mory into a file and doesn't care a bit what those data are.

If the file is opened in text mode, it will perform whatever
binary-to-text translations are appropriate. For Unix-like systems,
typically this does nothing; for Windows-like systems, it typically just
translates '\n' characters to CRLF pairs.

[...]
 
J

Jens Thoms Toerring

If the file is opened in text mode, it will perform whatever
binary-to-text translations are appropriate. For Unix-like systems,
typically this does nothing; for Windows-like systems, it typically just
translates '\n' characters to CRLF pairs.

Thanks, forgot about that (probably got to do some serious Win-
dows programming to get bitten by it to make it stick;-)

Regards, Jens
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,743
Messages
2,569,478
Members
44,899
Latest member
RodneyMcAu

Latest Threads

Top