Unreadable source code

E

Ellixis

I have been looking at "sh" source code and have found this strange
thing:

/**** syntax.c ****/
#define ndx(ch) (ch + 1 - CHAR_MIN)
#define set(ch, val) [ndx(ch)] = val,
#define set_range(s, e, val) [ndx(s) ... ndx(e)] = val,

/* character classification table */
const char is_type[257] = { 0,
set_range('0', '9', ISDIGIT)
set_range('a', 'z', ISLOWER)
set_range('A', 'Z', ISUPPER)
set('_', ISUNDER)
set('#', ISSPECL)
set('?', ISSPECL)
set('$', ISSPECL)
set('!', ISSPECL)
set('-', ISSPECL)
set('*', ISSPECL)
set('@', ISSPECL)
};
/**** !syntax.c ****/


/**** gcc -E syntax.c ****/
const char is_type[257] = { 0,
[( '0' + 1 - (-0x7f-1) ) ... ( '9' + 1 - (-0x7f-1) ) ] =
01 ,
[( 'a' + 1 - (-0x7f-1) ) ... ( 'z' + 1 - (-0x7f-1) ) ] =
04 ,
[( 'A' + 1 - (-0x7f-1) ) ... ( 'Z' + 1 - (-0x7f-1) ) ] =
02 ,
[( '_' + 1 - (-0x7f-1) ) ] = 010 ,
[( '#' + 1 - (-0x7f-1) ) ] = 020 ,
[( '?' + 1 - (-0x7f-1) ) ] = 020 ,
[( '$' + 1 - (-0x7f-1) ) ] = 020 ,
[( '!' + 1 - (-0x7f-1) ) ] = 020 ,
[( '-' + 1 - (-0x7f-1) ) ] = 020 ,
[( '*' + 1 - (-0x7f-1) ) ] = 020 ,
[( '@' + 1 - (-0x7f-1) ) ] = 020 ,
};
/**** !gcc -E syntax.c ****/

It compiles without error message or warning. Does somebody have an
explanation of this portion of source code ?
 
C

Chris Torek

I have been looking at "sh" source code and have found this strange
thing:

/**** syntax.c ****/
#define ndx(ch) (ch + 1 - CHAR_MIN)
#define set(ch, val) [ndx(ch)] = val,

The first macro simply offsets a value (named ch, and without
parentheses so that the macro misbehaves if "ch" is an expression
using an operator such as "&" -- e.g., ndx(1 & 3) does not work
"as desired") by 1-CHAR_MIN, typically 1-(-128) or 1-0. In
other words, it generally adds either 129 or 1.

The second macro is designed to use the C99 "designated initializer"
syntax.
#define set_range(s, e, val) [ndx(s) ... ndx(e)] = val,

This macro produces a syntax error.

(GCC has an extension in which this error becomes valid and
meaningful, but this extension is *not* valid C99.)
/* character classification table */
const char is_type[257] = { 0,
set_range('0', '9', ISDIGIT)

This uses the GCC extension to make sure that is_type[ndx('0')] is set
to ISDIGIT, is_type[ndx('1')] is set to ISDIGIT, is_type[ndx('2')] is
set to ISDIGIT, and so on, through is_type[ndx('9')]. Since Standard
C requires that the integer values of '0' through '9' be contiguous
and sequential, this always works (provided your compiler implements
the GCC extension).
set_range('a', 'z', ISLOWER)

This uses the GCC extension to make sure that is_type[ndx('a')] is
set to ISLOWER, etc., as before. It causes is_type[ndx(various
EBCDIC non-letter characters)] *also* to be ISLOWER, i.e., it does
not work on many IBM mainframes. (It assumes instead that you are
using ASCII. There *are* GCC ports to IBM mainframes, but presumably
either the code will be run in "ASCII mode" or else this program
will never be compiled for use in "EBCDIC mode".)

[Remainder snipped; it all works along the same lines.]

The new C99 syntax is, e.g.:

int a[10] = { [3] = 42, [9] = -1 };
/* sets a[] to the sequence {0,0,0,42,0,0,0,0,0,-1} */

and:

struct blah { int i; double d; char *s; };
struct blah x = { .s = "hello" };
/* sets x.i to 0, x.d to 0.0, and x.s to point to "hello" */

Again, these are called "designated initializers", because you
designate (name) the element to initialize. (Actually, the draft
of C99 I use just calls them "designators" and does not have a
formal term for the designator=value sub-syntax.)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top