question about some octal formatted output?

7

7stud --

eacute = ""
eacute << 0xC3 << 0xA9 #eacute<< 195 << 169 ; or é

p eacute

--output:---
"\303\251"

That ouput is in octal--although there is no leading 0.

1) Where does that format come from, i.e. no leading 0?
2) Why is the output in octal and not hex?

I looked up String#<< and it says it converts any Fixnum between 0-255
to a character.

3) Using what character set?


Thanks.
 
E

Eric Hodel

eacute =3D ""
eacute << 0xC3 << 0xA9 #eacute<< 195 << 169 ; or =E9

p eacute

--output:---
"\303\251"

That ouput is in octal--although there is no leading 0.

1) Where does that format come from, i.e. no leading 0?
2) Why is the output in octal and not hex?

Its at least as old as C. You'll probably have to ask some really =20
old timers for the answer.

$ cat octal.c
#include <stdio.h>

void main() { printf("\303\251\n"); }
$ gcc octal.c
octal.c: In function 'main':
octal.c:3: warning: return type of 'main' is not 'int'
$ ./a.out
=E9
I looked up String#<< and it says it converts any Fixnum between 0-255
to a character.

3) Using what character set?

ASCII. Its your terminal that controls how it gets displayed. My =20
terminal is set to UTF-8.
 
M

mortee

7stud said:
eacute = ""
eacute << 0xC3 << 0xA9 #eacute<< 195 << 169 ; or é

p eacute

--output:---
"\303\251"

That ouput is in octal--although there is no leading 0.

1) Where does that format come from, i.e. no leading 0?
2) Why is the output in octal and not hex?

I looked up String#<< and it says it converts any Fixnum between 0-255
to a character.

3) Using what character set?

Actually, what's your problem with all that?

Your ints specified in hex are actually converted to bytes in the
string. That, interpreted as utf-8, may mean an é.

The conventional syntax for specifying bytes by their integer value in
string literals, used in C, shells and a number of other environments
(including Ruby) is a backslash followed by octal digits. (The leading 0
is used for specifying *integer* literals in octal.)

String#inspect (which I guess p is using) adopts this syntax for
displaying non-ascii and/or non-printing bytes in the string.

I really don't get your third question. There's no character set
involved here, beyond how you intended your two bytes to be interpreted.
Those two bytes remain the same, regardless how they are displayed. They
may mean two characters in plain old 8-bit charsets, they may mean e.g.
one é in utf-8, or they may mean what p displays for them.

mortee
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,433
Messages
2,571,683
Members
48,796
Latest member
Greg L.

Latest Threads

Top