Code in ASCII UTF8??

C

chunhui_true

I know in ASCII '\r' is 0x0d,'\n' is 0x0a.
But some say ASCII characters in UTF8 is unchanged.
Now I want to know in UTF8 '\r' and '\n' are already 0x0d and 0x0a??
Could anybody can tell me? Very Thanks!!!!
 
S

Serge Paccalin

Le jeudi 17 février 2005 à 09:38, chunhui_true a écrit dans
comp.lang.c :
I know in ASCII '\r' is 0x0d,'\n' is 0x0a.
But some say ASCII characters in UTF8 is unchanged.
Now I want to know in UTF8 '\r' and '\n' are already 0x0d and 0x0a??
Could anybody can tell me? Very Thanks!!!!

That's correct. All ASCII characters (0x00 to 0x7f) are unchanged in
UTF-8. All Unicode values from 0x80 up are encoded into 2, 3, or 4 bytes
that never reuse 0x00 to 0x7f.

--
___________ 17/02/2005 11:21:46
_/ _ \_`_`_`_) Serge PACCALIN -- sp ad mailclub.net
\ \_L_) Il faut donc que les hommes commencent
-'(__) par n'être pas fanatiques pour mériter
_/___(_) la tolérance. -- Voltaire, 1763
 
I

infobahn

Lawrence said:
Wasn't that the other way around on Macs?

<shrug> depends what you mean by '\r' and '\n'. :)

On a Mac, a text file's end-of-line indicator (C89 draft 2.1.1.2)
is 0x0D. When this is read into a C program, it is translated into
a '\n' character.

If anyone's got a Mac with a C compiler, could they please tell us
what this program prints on that system?

#include <stdio.h>

int main(void)
{
printf("\\\r = %d\n", '\r');
printf("\\\n = %d\n", '\n');
return 0;
}
 
J

Jirka Klaue

infobahn wrote:
....
If anyone's got a Mac with a C compiler, could they please tell us
what this program prints on that system?

#include <stdio.h>

int main(void)
{
printf("\\\r = %d\n", '\r');
printf("\\\n = %d\n", '\n');
return 0;
}

Probably not what you think. ;-)

Jirka
 
C

Clark S. Cox III

include <stdio.h>

int main(void)
{
printf("\\\r = %d\n", '\r');
printf("\\\n = %d\n", '\n');
return 0;
}

ITYM:
#include <stdio.h>

int main(void)
{
printf("\\r = %d\n", '\r');
printf("\\n = %d\n", '\n');
return 0;
}


Which outputs:
\r = 13
\n = 10
 
C

Clark S. Cox III

Wasn't that the other way around on Macs?

No, the actual values of '\r' and '\n' are unchanged on the Mac. The
difference was when reading/writing a file in text mode on a MacOS
before 10. (As of 10, the UNIX convention has been adopted; i.e. text
mode and binary mode are identical).
 
I

infobahn

Clark S. Cox III said:
ITYM:
#include <stdio.h>

int main(void)
{
printf("\\r = %d\n", '\r');
printf("\\n = %d\n", '\n');

\E\r\,\ \y\e\s\,\ \I\ \d\i\d\ \m\e\a\n\ \t\h\a\t\.

Which outputs:
\r = 13
\n = 10

I expected as much, but since I haven't used a Mac since 1989, I
didn't want to risk making an idiot of myself. (Seems I did that
anyway, in a rather unexpected direction.)
 
P

Peter Nilsson

infobahn said:
<shrug> depends what you mean by '\r' and '\n'. :)

True said:
On a Mac, a text file's end-of-line indicator (C89 draft 2.1.1.2)
is 0x0D. When this is read into a C program, it is translated into
a '\n' character.

If anyone's got a Mac with a C compiler, could they please tell us
what this program prints on that system?

#include <stdio.h>

int main(void)
{
printf("\\r = %d\n", '\r'); [corrected]
printf("\\n = %d\n", '\n'); [corrected]
return 0;
}

Metrowerks 3.0, Mac IIvx, swap \n and \r enabled...

\r = 10
\n = 13

The compiler option allows mac implementations to avoid having
to perform runtime translations of text file end-of-lines.
 
C

chunhui_true

Does ASCII characters remain unchanged under UTF8????Since it
unchanged,Why I can't printf thme in screen?
I use libcap get the FTP commands from Ethernet.I have one class to

get all packages and flowed,buffered them,Then aonther class can
readline (ended with \r\n)from buffer.Every time I readline from buffer

to get a command.
When I use CuteFTP I can get all commands an printf them in
screen.But when I use IE to FTP I can see one command "set utf8 on" and

then next commands I can't printf them in screen.Should I conver utf8
to ASCII?:(
 
P

Peter Nilsson

Clark said:
No, the actual values of '\r' and '\n' are unchanged on the Mac.

Depends on the development tool and compiler settings.

Generally, most old mac implementations used the normal form, but at
least two offer(ed) the option of setting \r to 10 and \n to 13.
 
L

Lawrence Kirby

On Thu, 17 Feb 2005 13:59:00 -0800, Peter Nilsson wrote:

....
Metrowerks 3.0, Mac IIvx, swap \n and \r enabled...

\r = 10
\n = 13

The compiler option allows mac implementations to avoid having
to perform runtime translations of text file end-of-lines.

So it is dangerous to assume that '\n' is 10 and '\r' is
13 even on an ASCII based system.

Lawrence
 
S

SM Ryan

# Does ASCII characters remain unchanged under UTF8????Since it
# unchanged,Why I can't printf thme in screen?
# I use libcap get the FTP commands from Ethernet.I have one class to

You're going to have to deal with the sources or authours for your software.

When your printing characters, the software is interpretting a subsequence of
bits as an index to a raster pattern or font character, and then painting that
on the screen or printer. Everybody has to agree where character divisions are
in the bit stream and how character codes are encoded. Or chaos ensues.

If you've got non-ASCII unicode characters in your UTF-8 stream, they will
show up as one or more non-ASCII characters. How the software deals with non-ASCII
characters depends on the software. It might mask off the high bit; it might
discard them; it might treat them similar to letters and digits. The software
might assume some other encoding like macintosh or windows and interpret
the non-ASCII codes as different characters then you wanted it to.

The same sort of confusion happened when 8-bit characters ousted 6-bit, but
at least then it was in smaller community.
 
O

osmium

Lawrence Kirby said:
On Thu, 17 Feb 2005 13:59:00 -0800, Peter Nilsson wrote:

...


So it is dangerous to assume that '\n' is 10 and '\r' is
13 even on an ASCII based system.

I don't agree with that. K&R, p 193, says carriage return is \r. In ASCII,
carriage return is 13. So a platform that says \r = 10 is simply not ASCII.

I haven't been following this thread so it may be that what you *mean* is
true, but the literal meaning of the your post is simply wrong as I see it.
I think the thing actually being discussed has to do how various platforms
handle the \n problem in a non \n world, i.e., ASCII as originally
promulgated. And I agree that assumptions WRT \r and \n and what they
*do* are dangerous.
 
M

Michael Mair

osmium said:
I don't agree with that. K&R, p 193, says carriage return is \r. In ASCII,
carriage return is 13. So a platform that says \r = 10 is simply not ASCII.

Please, make that '\r' and '\n' vs <CR> and <LF>. C does not care about
the character encoding it works with. If there is a "non-ASCII" platform
which has a different binary representation (maybe including multiple
bytes) of such an escape sequence, '\r' and '\n' still are carriage
return and newline, respectively.
If I use your reasoning then only Unix text files are ASCII files
because only they use <LF> as newline encoding.

BTW: The only thing _I_ rely on if hearing ASCII is that
0x20 through 0x7F will represent the printable characters I am used to.


Cheers
Michael
 
L

Lew Pitcher

Michael Mair wrote:
[snip]
BTW: The only thing _I_ rely on if hearing ASCII is that
0x20 through 0x7F will represent the printable characters I am used to.

ITYM 0x20 through 0x7E will represent the printable characters you are
used to. ASCII 0x7F is a 'control character', the 'DEL' (delete) control
character to be precise.

But, IKWYM

--

Lew Pitcher, IT Specialist, Enterprise Data Systems
Enterprise Technology Solutions, TD Bank Financial Group

(Opinions expressed here are my own, not my employer's)

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (MingW32)

iD8DBQFCFgyfagVFX4UWr64RAoLIAJ9guTivNQ+5Y6utz8q0SdY9fKB06wCgkYId
4OtpbKa3WU/PnsqiXGvhwtU=
=5XQX
-----END PGP SIGNATURE-----
 
M

Michael Mair

Lew said:
Michael Mair wrote:
[snip]
BTW: The only thing _I_ rely on if hearing ASCII is that
0x20 through 0x7F will represent the printable characters I am used to.


ITYM 0x20 through 0x7E will represent the printable characters you are
used to. ASCII 0x7F is a 'control character', the 'DEL' (delete) control
character to be precise.

Yes, that is what I meant... Thanks for the correction.

But, IKWYM

:)


Cheers
Michael
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top