tools for manipulating (or pre-processing) data structures tosimplify source

randy · Oct 23, 2013

Hi c,

Trying to understand somebody else's code.

I look at a simple loop, to write flash memory, a data structure 3 levels deep, and see stuff like this:

if(GSN_FW_APP_0 == fwupCtx->gsnExtFlashFwupPvtCtx.binInProgress)
{
fwupCtx->gsnExtFlashFwupPvtCtx.fwupWlanCtrlBlk.APP1Size = fwupCtx->gsnExtFlashFwupPvtCtx.fwupWlanCtrlBlk.APP1Size + buffsize;
/* Calculate the intermidiate checksum*/
while(orgbuffsize>0)
{
fwupCtx->gsnExtFlashFwupPvtCtx.fwupWlanCtrlBlk.app1Checksum = fwupCtx->gsnExtFlashFwupPvtCtx.fwupWlanCtrlBlk.app1Checksum + (*orgbuff++);
orgbuffsize--;
}

}

It has been a while, do we really write now, with variable names 30+ characters long, in complex data structures, or do we use the preprocessor and tools to manage this?

I do not see how to read code written this way, it looks tool generated to me.

I will have to rewrite it by hand, and refer to the defined data structures to see whats going on. This is totally illegible.

Whats been happening since I have been out, for 10+ years?

Randy

Keith Thompson · Oct 23, 2013

Trying to understand somebody else's code.

I look at a simple loop, to write flash memory, a data structure 3 levels deep, and see stuff like this:

if(GSN_FW_APP_0 == fwupCtx->gsnExtFlashFwupPvtCtx.binInProgress)
{
fwupCtx->gsnExtFlashFwupPvtCtx.fwupWlanCtrlBlk.APP1Size = fwupCtx->gsnExtFlashFwupPvtCtx.fwupWlanCtrlBlk.APP1Size + buffsize;
/* Calculate the intermidiate checksum*/
while(orgbuffsize>0)
{
fwupCtx->gsnExtFlashFwupPvtCtx.fwupWlanCtrlBlk.app1Checksum = fwupCtx->gsnExtFlashFwupPvtCtx.fwupWlanCtrlBlk.app1Checksum + (*orgbuff++);
orgbuffsize--;
}

}

This line:

tx->gsnExtFlashFwupPvtCtx.fwupWlanCtrlBlk.APP1Size = fwupCtx->gsnExtFlashFwupPvtCtx.fwupWlanCtrlBlk.APP1Size + buffsize;

is absurd; it should be written as:

tx->gsnExtFlashFwupPvtCtx.fwupWlanCtrlBlk.APP1Size += buffsize;

Likewise for the other assignment.

(Not commenting on the rest of the code.)

[...]

Willem · Oct 23, 2013

(e-mail address removed) wrote:
) Hi c,
)
) Trying to understand somebody else's code.
)
) I look at a simple loop, to write flash memory, a data structure 3 levels deep, and see stuff like this:
)
) if(GSN_FW_APP_0 == fwupCtx->gsnExtFlashFwupPvtCtx.binInProgress)
) {
) fwupCtx->gsnExtFlashFwupPvtCtx.fwupWlanCtrlBlk.APP1Size = fwupCtx->gsnExtFlashFwupPvtCtx.fwupWlanCtrlBlk.APP1Size + buffsize;
) /* Calculate the intermidiate checksum*/
) while(orgbuffsize>0)
) {
) fwupCtx->gsnExtFlashFwupPvtCtx.fwupWlanCtrlBlk.app1Checksum = fwupCtx->gsnExtFlashFwupPvtCtx.fwupWlanCtrlBlk.app1Checksum + (*orgbuff++);
) orgbuffsize--;
) }
)
) }
)
)
)
) It has been a while, do we really write now, with variable names 30+ characters long, in complex data structures, or do we use the preprocessor and tools to manage this?
)
) I do not see how to read code written this way, it looks tool generated to me.

They probably use an IDE with completion for variable and member names (you
type the first few and get a list of the possible members).

) Whats been happening since I have been out, for 10+ years?

IDE's happened.

SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

BartC · Oct 23, 2013

if(GSN_FW_APP_0 == fwupCtx->gsnExtFlashFwupPvtCtx.binInProgress)
{
fwupCtx->gsnExtFlashFwupPvtCtx.fwupWlanCtrlBlk.APP1Size =
fwupCtx->gsnExtFlashFwupPvtCtx.fwupWlanCtrlBlk.APP1Size + buffsize;
/* Calculate the intermidiate checksum*/
while(orgbuffsize>0)
{
fwupCtx->gsnExtFlashFwupPvtCtx.fwupWlanCtrlBlk.app1Checksum =
fwupCtx->gsnExtFlashFwupPvtCtx.fwupWlanCtrlBlk.app1Checksum +
(*orgbuff++);
orgbuffsize--;

It has been a while, do we really write now, with variable names 30+
characters long, in complex data structures, or do we use the preprocessor
and tools to manage this?

At least they seem to have used abbreviations; it could have been a lot
worse!

No, this is terrible code. Since many of these seem to be struct member
names, you don't need 20- or 30-character field names to distinguish between
a dozen different members.

And it's surely possible to give a hint as to what a field does, without to
put its entire history, background and every conceivable bit of information
into the name. If an IDE is in fact being used, that will tell you all you
need to know.

My struct member names tend to be either one or two simple words. I don't
use abbreviations much either; there's no need.

BartC · Oct 23, 2013

Richard said:
Nonsense.

Well I wouldn't to have read any of your code, if you think this is clear!

Thats what they do and do it consistently : much more important than
saving a few bytes in source code. That code is pretty much self
documentating.

That's your opinion. I think it shouldn't be necessary to build practically
a complete set of documentation into every identifier, especially if that
identifier has a context which can take care of some of that (namely, being
a member of a specific struct type, and being used with a specific
instance).

And if an IDE isn't being used?

Then I would find it more effective to write one, than to try and decipher
what looks at first glance like MIME-encoded binary data!

(BTW I don't use any fancy IDEs that can do that sort of stuff. All the more
reason to keep my identifiers simple.)

Try looking into linux drivers and marvel at the long struct and field
names then tell Torwalds and co they're writing "terrible code". Should be
an
interesting exchange.

(I've tried to have a look, but as usual with linux-related matters have run
into a dead-end, because nothing is ever straightforward! Not content with
having .tar, .gz, .gz2, .bz and .bz2 extensions, apparently the kernel
sources I located now use a tar.xz extension! My 7-Zip program couldn't
understand it, and attempts to download a new version nearly crashed my
machine (and lost me my first draft of this post). So that will have to
wait.)

BartC · Oct 23, 2013

(I've tried to have a look, but as usual with linux-related matters have
run
into a dead-end, because nothing is ever straightforward!

I do happen to have some Python C sources lying around. Most of the struct
member names seem short, readable and completely reasonable. Either formed
of one or two words, or with a prefix, such as *name, HEAD, length, offset,
gc_prev, gc_next etc. There are some longer identifiers outside structs, but
these are still readable rather than look like gobbledygook, partly because
they are not so fantastically long that the words need to be abbreviated.

Maybe it's just a Linux thing to make things incomprehensible (and then try
and make out it's good coding practice!).

Eric Sosman · Oct 23, 2013

Hi c,

Trying to understand somebody else's code.

I look at a simple loop, to write flash memory, a data structure 3 levels deep, and see stuff like this:

if(GSN_FW_APP_0 == fwupCtx->gsnExtFlashFwupPvtCtx.binInProgress)
{
fwupCtx->gsnExtFlashFwupPvtCtx.fwupWlanCtrlBlk.APP1Size = fwupCtx->gsnExtFlashFwupPvtCtx.fwupWlanCtrlBlk.APP1Size + buffsize;
/* Calculate the intermidiate checksum*/
while(orgbuffsize>0)
{
fwupCtx->gsnExtFlashFwupPvtCtx.fwupWlanCtrlBlk.app1Checksum = fwupCtx->gsnExtFlashFwupPvtCtx.fwupWlanCtrlBlk.app1Checksum + (*orgbuff++);
orgbuffsize--;
}

}

It has been a while, do we really write now, with variable names 30+ characters long, in complex data structures, or do we use the preprocessor and tools to manage this?

To my taste, the identifiers are on the long side, and the
multiple appearances of "fwup" look redundant. Tastes vary, though,
and I don't know what other things these names distinguish from.
In any case the longest identifier I see has 21 letters, only
about two-thirds of your "30+".

Although one *could* use macros to abbreviate:

#define GIGGLE gsnExtFlashFwupPftCtx.fwupWlanCtrlBlk
...
fwupCtx->GIGGLE.app1Checksum = ...

.... I'd recommend against it. By introducing a second name for
the same thing, you'd add opportunities for confusion: Someone
looking for all uses of GIGGLE.app1Checksum could easily overlook
a reference via the true name.

I do not see how to read code written this way, it looks tool generated to me.

Doesn't look tool-generated to me (when tools deign to write
comments, they're usually about the mechanics and not about the
purpose), but I suppose it might be. If it is, you should look for
the tool and its input: Work with them, not with their output.

(A guy once sought my help in debugging some code, and I studied
it in vain for any explanation of the symptom he'd seen. It was
utterly inexplicable: There was simply no way his code could produce
the output he showed me. Come to find out he'd been using the
compiler to generate assembly code, hand-"optimizing" the assembly,
and running *that* -- and when it didn't work, he showed me the
original source code ... Don't edit tool output.)

I will have to rewrite it by hand, and refer to the defined data structures to see whats going on. This is totally illegible.

A few uses of the "+=" operator would make a world of difference.

Whats been happening since I have been out, for 10+ years?

The explanation is too long for the margin of this post.

Malcolm McLean · Oct 23, 2013

Hi c,

Trying to understand somebody else's code.

I look at a simple loop, to write flash memory, a data structure 3 levels
deep, and see stuff like this:

if(GSN_FW_APP_0 == fwupCtx->gsnExtFlashFwupPvtCtx.binInProgress)

That's all we need to know

The highlighting rule hasn't been understood.

Plain ascii test is reasonably understandable, but a bit boring to look at.
We can improve legibility by putting some words in colours, or in bold,
or in italics. But only to a point. When every other word is highlighted or
decorated in some way, the text becomes far more difficult to read.

In C we need to avoid namespace collisions, so a short prefix is unfortunately
necessary. BabyX prefixes virtually all its external symbols with "bbx_", for
this reason. But once you've done that, that's all that's really necessary.
You can then use normal identifiers, like "flash" or "context".

Jorgen Grahn · Oct 25, 2013

Hi c,

Trying to understand somebody else's code. ....
Whats been happening since I have been out, for 10+ years?

Nothing. People use a lot of different styles, and many of them seem
awful. You could easily have encountered this one in 2003 too.

The solution isn't macros -- adding another parallel set of names
which disappear at compile time would just make it a lot worse.

/Jorgen

Jorgen Grahn · Oct 25, 2013

.

Try looking into linux drivers and marvel at the long struct and field
names then tell Torwalds and co they're writing "terrible code". Should be an
interesting exchange.

I read kernel code a lot, and it's far more pleasant and elegant than
this.

Not hardware drivers though; I can imagine they are often written by
outsiders. Also they may want to adapt their naming to hardware specs
et cetera.

/Jorgen

Jorgen Grahn · Oct 25, 2013

Any particular problems? A lot of things about Linux are very
straightforward.

I do happen to have some Python C sources lying around. Most of the struct
member names seem short, readable and completely reasonable. Either formed
of one or two words, or with a prefix, such as *name, HEAD, length, offset,
gc_prev, gc_next etc. There are some longer identifiers outside structs, but
these are still readable rather than look like gobbledygook, partly because
they are not so fantastically long that the words need to be abbreviated.

Maybe it's just a Linux thing to make things incomprehensible (and then try
and make out it's good coding practice!).

I should have responded here instead of upthread: the Linux [kernel]
sources I've seen are nothing like this, and quite readable.

/Jorgen

Kenny McCormack · Oct 25, 2013

Maybe it's just a Linux thing to make things incomprehensible (and then
try and make out it's good coding practice!).

I should have responded here instead of upthread: the Linux [kernel]
sources I've seen are nothing like this, and quite readable.[/QUOTE]

Caveat: I've not looked at any of his code (either the kernel or git), but I
have watched a talk he gave once in which he discussed (among other things)
his coding style.

The take-away from that talk was that he does have an, er, shall we say,
"unique" coding style, and the implied statement was that you either love
it or hate it. I get the impression that the world kinda splits about
50/50 into the love/hate camps.

So, arguing about whether or not the Linux kernel is "readable" is going to
be like arguing about any other "love/hate" kind of thing; you're not going
to convince anyone to change their stance.

BartC · Oct 26, 2013

Jorgen Grahn said:
Maybe it's just a Linux thing to make things incomprehensible (and then
try
and make out it's good coding practice!).

Click to expand...

I should have responded here instead of upthread: the Linux [kernel]
sources I've seen are nothing like this, and quite readable.

I've since managed to download the Linux sources. The one or two files I've
glanced at seem nothing like as bad as what the OP posted either. (But there
are about 45,000 files I haven't yet looked at.)

glen herrmannsfeldt · Oct 26, 2013

(snip)

It was even worse back in the X3J Committee days.
We had compilers which accepted 16, 32, ... 128 character length
identifiers, BUT only the first 5, 8, 13, ... characters were
'unique/significant'.

Kernighan wanted only 5 significant characters written into the
standard. Real C programmers didn't need more. Imagine if that had
happened? <g>

The early PL/I compilers used the first four and last three for
external symbols. (The linker only knew about 8.) Internal names
could be longer, such as 31. Using some from each end allows for
long_name1, long_name2, etc.

The Fortran H compiler uses six trees for its symbol table, one for
each possible length. One manual suggests for faster compilation
distribute your names equally between 1 and 6 characters.
(No mention of readability of the program.)

-- glen

Ben Bacarisse · Oct 26, 2013

It was even worse back in the X3J Committee days.
We had compilers which accepted 16, 32, ... 128 character length
identifiers, BUT only the first 5, 8, 13, ... characters were
'unique/significant'.

Kernighan wanted only 5 significant characters written into the
standard. Real C programmers didn't need more. Imagine if that had
happened? <g>

Do you have a citation? It sounds like a peculiar thing for him to
have said.

Malcolm McLean · Oct 26, 2013

On 25 Oct 2013 19:15:45 GMT, Jorgen Grahn <[email protected]>

It was even worse back in the X3J Committee days.
We had compilers which accepted 16, 32, ... 128 character length
identifiers, BUT only the first 5, 8, 13, ... characters were
'unique/significant'.

Kernighan wanted only 5 significant characters written into the
standard. Real C programmers didn't need more. Imagine if that had
happened? <g>

Fortran would accept up to six, and C compilers would prefix an underscore
to the linker. So you could only call a C routine or use a C identifier
from Fortran if it was unique in the first five.

Mathematicians don't use long names. They virtually always use single letters,
resorting to Greek or even other alphabets when they run out of Latin.
But really in programming we've several types of variables. Minor variables
should be x, y, z for co-ordinates or real values, theta for an angle,
N for a count, i, for an index. I use ii, iii, iv, v etc for nested counters
and j, k for secondary counters. (Eg if you're removing runs of duplicates,
I'd iterate over the array with i, and keep j as the counter to the top
of the unique list). But a lot of people use j, k for nested counters.
z is a complex number, ptr a pointer, str a string, ch a character, fp a
file pointer.
There's quite a lot you can do with only five characters.

Ben Bacarisse · Oct 26, 2013

ralph said:
Surprised me as well.

It was rather well known at the time so there must be some mention in
some of the earlier publications CPJ, Byte, Dr. Dobbs, ??? It is
definitely in the X3J meeting notes. All my paper is gone - too many
moves.

Maybe we are talking at cross purposes. You quote suggests that
Kernighan did not want more because real programmers don't need more.
That seems entirely at odds with almost everything I've read by him.
For example, in 1974 -- four years before K&R 1 and more than a decade
before the ANSI standard he was advising, as a matter of style, to make
external identifiers unique in the first 6 characters. That was, as you
probably know, common at the time. Note, as a matter of style, not "you
don't need more" just that you may hit a linker limit if you assume that
more will be unique.

I can see him advocating for the standard to require no more than five
from an implementation if he had become aware in those ten or twelve
years of a system that could not guarantee even six, but that's not at
all the same as saying the real programmers don't need more.

As Mr. McLean points out - the idea while sounding strange today had
its points.

Something often over-looked by most language lawyers today is that the
process of "standardizing" C in the beginning was less an academic
exercise that it was a process of "codifying" a common ground out of
what compilers were already doing. It doesn't take much to appreciate
that migrating programs between compilers with different views of how
many characters are significant led to problems. Since all compilers
accepted at least 5, "5" certainly made sense.

Yes, but that's not how you presented the quote. If there were key
systems that could not guarantee five I can see him, and others, arguing
for four, but that would be out of desperation with broken linkers, not
because real programmer don't need more.

Lew Pitcher · Oct 26, 2013

Maybe we are talking at cross purposes. You quote suggests that
Kernighan did not want more because real programmers don't need more.
That seems entirely at odds with almost everything I've read by him.
For example, in 1974 -- four years before K&R 1 and more than a decade
before the ANSI standard he was advising, as a matter of style, to make
external identifiers unique in the first 6 characters. That was, as you
probably know, common at the time. Note, as a matter of style, not "you
don't need more" just that you may hit a linker limit if you assume that
more will be unique.

IIRC, the MVS LKED linkage editor of the time had a 6-character limit on the
size of external names. The VSE linkage editor had a similar limit.

It wasn't too long later (a few years) that IBM came up with the LE370 tools
that extended both the assembler and linkage editor to handle larger
external names, and added a native C compiler to the language support.

[snip]

Lew Pitcher · Oct 26, 2013

IIRC, the MVS LKED

Correction, now that I've checked my archived JCL: the MVS Linkage Editor
was programname IEWL, later replaced by HEWL when LE370 came along.

glen herrmannsfeldt · Oct 26, 2013

(snip, someone wrote)

IIRC, the MVS LKED linkage editor of the time had a 6-character
limit on the size of external names. The VSE linkage editor had
a similar limit.

I don't know the DOS/360 or VSE well at all, but from OS/360 through
to MVS the limit is eight. Eight is a favorite number. Jobnames are
eight, DDnames are eight, PDS member names are eight, and DSNames
in the catalog have at most eight between periods.

VM/370 and descendants have eight character filenames and filetypes
(what many call extensions).

The six character limits came from BCD on the 36 bit machines,
and later SIXBIT on the DEC 36 bit machines.

It wasn't too long later (a few years) that IBM came up with
the LE370 tools that extended both the assembler and linkage
editor to handle larger external names, and added a native C
compiler to the language support.

Well, PL/I allowed longer names, too, but IBM restricted
external names by using, I believe, the first four and last
three characters. (Allows for more than one CSECT per PROC.)
LE is convenient for both PL/I and C. Is there a Fortran 90
compiler?

-- glen

Text processing	29	Sep 26, 2011
Best / cleanest DSL for manipulating data files?	6	Mar 10, 2010
Collecting Rich Data Structures for students	10	Jan 9, 2008
Managing Dynamic Data Structures with Lightweight Database	1	Aug 2, 2007
pre-PEP: Standard Microthreading Pattern	4	May 1, 2007
Array for speed or...	11	Aug 23, 2006
Which is better for reporting tool: open source or charge?	1	Mar 17, 2009
Commercial BI vs Open Source BI	4	Oct 20, 2009

tools for manipulating (or pre-processing) data structures tosimplify source

randy

Keith Thompson

Willem

BartC

BartC

BartC

Eric Sosman

Malcolm McLean

Jorgen Grahn

Jorgen Grahn

Jorgen Grahn

Kenny McCormack

BartC

glen herrmannsfeldt

Ben Bacarisse

Malcolm McLean

Ben Bacarisse

Lew Pitcher

Lew Pitcher

glen herrmannsfeldt

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads