question about some octal formatted output?

7stud -- · Oct 14, 2007

eacute = ""
eacute << 0xC3 << 0xA9 #eacute<< 195 << 169 ; or Ã©

p eacute

--output:---
"\303\251"

That ouput is in octal--although there is no leading 0.

1) Where does that format come from, i.e. no leading 0?
2) Why is the output in octal and not hex?

I looked up String#<< and it says it converts any Fixnum between 0-255
to a character.

3) Using what character set?

Thanks.

Eric Hodel · Oct 14, 2007

eacute =3D ""
eacute << 0xC3 << 0xA9 #eacute<< 195 << 169 ; or =E9

p eacute

--output:---
"\303\251"

That ouput is in octal--although there is no leading 0.

1) Where does that format come from, i.e. no leading 0?
2) Why is the output in octal and not hex?

Its at least as old as C. You'll probably have to ask some really =20
old timers for the answer.

$ cat octal.c
#include <stdio.h>

void main() { printf("\303\251\n"); }
$ gcc octal.c
octal.c: In function 'main':
octal.c:3: warning: return type of 'main' is not 'int'
$ ./a.out
=E9

I looked up String#<< and it says it converts any Fixnum between 0-255
to a character.

3) Using what character set?

ASCII. Its your terminal that controls how it gets displayed. My =20
terminal is set to UTF-8.

mortee · Oct 14, 2007

7stud said:
eacute = ""
eacute << 0xC3 << 0xA9 #eacute<< 195 << 169 ; or Ã©

p eacute

--output:---
"\303\251"

That ouput is in octal--although there is no leading 0.

1) Where does that format come from, i.e. no leading 0?
2) Why is the output in octal and not hex?

I looked up String#<< and it says it converts any Fixnum between 0-255
to a character.

3) Using what character set?

Actually, what's your problem with all that?

Your ints specified in hex are actually converted to bytes in the
string. That, interpreted as utf-8, may mean an Ã©.

The conventional syntax for specifying bytes by their integer value in
string literals, used in C, shells and a number of other environments
(including Ruby) is a backslash followed by octal digits. (The leading 0
is used for specifying *integer* literals in octal.)

String#inspect (which I guess p is using) adopts this syntax for
displaying non-ascii and/or non-printing bytes in the string.

I really don't get your third question. There's no character set
involved here, beyond how you intended your two bytes to be interpreted.
Those two bytes remain the same, regardless how they are displayed. They
may mean two characters in plain old 8-bit charsets, they may mean e.g.
one Ã© in utf-8, or they may mean what p displays for them.

mortee

hex and octal constants in various languages	54	Jun 15, 2009
attempting to print unicode characters.	23	Aug 28, 2010
A question about reading an UTF-8 text file	8	Mar 18, 2006
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	15	Apr 1, 2006
anybody help me	1	Feb 10, 2006
clc selected threads (30-jan-2005 to 31-jan-2005) #1	3	Feb 6, 2005
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	1	Feb 1, 2004
Errata for The C Programming Language, Second Edition, by Brian Kernighanand Dennis Ritchie	4	May 16, 2009

question about some octal formatted output?

7stud --

Eric Hodel

mortee

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads