tools for manipulating (or pre-processing) data structures tosimplify source

Discussion in 'C Programming' started by randy, Oct 23, 2013.

  1. randy

    randy Guest

    Hi c,

    Trying to understand somebody else's code.

    I look at a simple loop, to write flash memory, a data structure 3 levels deep, and see stuff like this:

    if(GSN_FW_APP_0 == fwupCtx->gsnExtFlashFwupPvtCtx.binInProgress)
    {
    fwupCtx->gsnExtFlashFwupPvtCtx.fwupWlanCtrlBlk.APP1Size = fwupCtx->gsnExtFlashFwupPvtCtx.fwupWlanCtrlBlk.APP1Size + buffsize;
    /* Calculate the intermidiate checksum*/
    while(orgbuffsize>0)
    {
    fwupCtx->gsnExtFlashFwupPvtCtx.fwupWlanCtrlBlk.app1Checksum = fwupCtx->gsnExtFlashFwupPvtCtx.fwupWlanCtrlBlk.app1Checksum + (*orgbuff++);
    orgbuffsize--;
    }

    }



    It has been a while, do we really write now, with variable names 30+ characters long, in complex data structures, or do we use the preprocessor and tools to manage this?

    I do not see how to read code written this way, it looks tool generated to me.

    I will have to rewrite it by hand, and refer to the defined data structures to see whats going on. This is totally illegible.

    Whats been happening since I have been out, for 10+ years?

    Randy
     
    randy, Oct 23, 2013
    #1
    1. Advertisements

  2. This line:

    tx->gsnExtFlashFwupPvtCtx.fwupWlanCtrlBlk.APP1Size = fwupCtx->gsnExtFlashFwupPvtCtx.fwupWlanCtrlBlk.APP1Size + buffsize;

    is absurd; it should be written as:

    tx->gsnExtFlashFwupPvtCtx.fwupWlanCtrlBlk.APP1Size += buffsize;

    Likewise for the other assignment.

    (Not commenting on the rest of the code.)

    [...]
     
    Keith Thompson, Oct 23, 2013
    #2
    1. Advertisements

  3. randy

    Willem Guest

    wrote:
    ) Hi c,
    )
    ) Trying to understand somebody else's code.
    )
    ) I look at a simple loop, to write flash memory, a data structure 3 levels deep, and see stuff like this:
    )
    ) if(GSN_FW_APP_0 == fwupCtx->gsnExtFlashFwupPvtCtx.binInProgress)
    ) {
    ) fwupCtx->gsnExtFlashFwupPvtCtx.fwupWlanCtrlBlk.APP1Size = fwupCtx->gsnExtFlashFwupPvtCtx.fwupWlanCtrlBlk.APP1Size + buffsize;
    ) /* Calculate the intermidiate checksum*/
    ) while(orgbuffsize>0)
    ) {
    ) fwupCtx->gsnExtFlashFwupPvtCtx.fwupWlanCtrlBlk.app1Checksum = fwupCtx->gsnExtFlashFwupPvtCtx.fwupWlanCtrlBlk.app1Checksum + (*orgbuff++);
    ) orgbuffsize--;
    ) }
    )
    ) }
    )
    )
    )
    ) It has been a while, do we really write now, with variable names 30+ characters long, in complex data structures, or do we use the preprocessor and tools to manage this?
    )
    ) I do not see how to read code written this way, it looks tool generated to me.

    They probably use an IDE with completion for variable and member names (you
    type the first few and get a list of the possible members).

    ) Whats been happening since I have been out, for 10+ years?

    IDE's happened.


    SaSW, Willem
    --
    Disclaimer: I am in no way responsible for any of the statements
    made in the above text. For all I know I might be
    drugged or something..
    No I'm not paranoid. You all think I'm paranoid, don't you !
    #EOT
     
    Willem, Oct 23, 2013
    #3
  4. randy

    BartC Guest

    At least they seem to have used abbreviations; it could have been a lot
    worse!

    No, this is terrible code. Since many of these seem to be struct member
    names, you don't need 20- or 30-character field names to distinguish between
    a dozen different members.

    And it's surely possible to give a hint as to what a field does, without to
    put its entire history, background and every conceivable bit of information
    into the name. If an IDE is in fact being used, that will tell you all you
    need to know.

    My struct member names tend to be either one or two simple words. I don't
    use abbreviations much either; there's no need.
     
    BartC, Oct 23, 2013
    #4
  5. randy

    BartC Guest

    Well I wouldn't to have read any of your code, if you think this is clear!
    That's your opinion. I think it shouldn't be necessary to build practically
    a complete set of documentation into every identifier, especially if that
    identifier has a context which can take care of some of that (namely, being
    a member of a specific struct type, and being used with a specific
    instance).
    Then I would find it more effective to write one, than to try and decipher
    what looks at first glance like MIME-encoded binary data!

    (BTW I don't use any fancy IDEs that can do that sort of stuff. All the more
    reason to keep my identifiers simple.)
    (I've tried to have a look, but as usual with linux-related matters have run
    into a dead-end, because nothing is ever straightforward! Not content with
    having .tar, .gz, .gz2, .bz and .bz2 extensions, apparently the kernel
    sources I located now use a tar.xz extension! My 7-Zip program couldn't
    understand it, and attempts to download a new version nearly crashed my
    machine (and lost me my first draft of this post). So that will have to
    wait.)
     
    BartC, Oct 23, 2013
    #5
  6. randy

    BartC Guest

    I do happen to have some Python C sources lying around. Most of the struct
    member names seem short, readable and completely reasonable. Either formed
    of one or two words, or with a prefix, such as *name, HEAD, length, offset,
    gc_prev, gc_next etc. There are some longer identifiers outside structs, but
    these are still readable rather than look like gobbledygook, partly because
    they are not so fantastically long that the words need to be abbreviated.

    Maybe it's just a Linux thing to make things incomprehensible (and then try
    and make out it's good coding practice!).
     
    BartC, Oct 23, 2013
    #6
  7. randy

    Eric Sosman Guest

    To my taste, the identifiers are on the long side, and the
    multiple appearances of "fwup" look redundant. Tastes vary, though,
    and I don't know what other things these names distinguish from.
    In any case the longest identifier I see has 21 letters, only
    about two-thirds of your "30+".

    Although one *could* use macros to abbreviate:

    #define GIGGLE gsnExtFlashFwupPftCtx.fwupWlanCtrlBlk
    ...
    fwupCtx->GIGGLE.app1Checksum = ...

    .... I'd recommend against it. By introducing a second name for
    the same thing, you'd add opportunities for confusion: Someone
    looking for all uses of GIGGLE.app1Checksum could easily overlook
    a reference via the true name.
    Doesn't look tool-generated to me (when tools deign to write
    comments, they're usually about the mechanics and not about the
    purpose), but I suppose it might be. If it is, you should look for
    the tool and its input: Work with them, not with their output.

    (A guy once sought my help in debugging some code, and I studied
    it in vain for any explanation of the symptom he'd seen. It was
    utterly inexplicable: There was simply no way his code could produce
    the output he showed me. Come to find out he'd been using the
    compiler to generate assembly code, hand-"optimizing" the assembly,
    and running *that* -- and when it didn't work, he showed me the
    original source code ... Don't edit tool output.)
    A few uses of the "+=" operator would make a world of difference.
    The explanation is too long for the margin of this post.
     
    Eric Sosman, Oct 23, 2013
    #7
  8. That's all we need to know :)

    The highlighting rule hasn't been understood.

    Plain ascii test is reasonably understandable, but a bit boring to look at.
    We can improve legibility by putting some words in colours, or in bold,
    or in italics. But only to a point. When every other word is highlighted or
    decorated in some way, the text becomes far more difficult to read.

    In C we need to avoid namespace collisions, so a short prefix is unfortunately
    necessary. BabyX prefixes virtually all its external symbols with "bbx_", for
    this reason. But once you've done that, that's all that's really necessary.
    You can then use normal identifiers, like "flash" or "context".
     
    Malcolm McLean, Oct 23, 2013
    #8
  9. randy

    Jorgen Grahn Guest

    Nothing. People use a lot of different styles, and many of them seem
    awful. You could easily have encountered this one in 2003 too.

    The solution isn't macros -- adding another parallel set of names
    which disappear at compile time would just make it a lot worse.

    /Jorgen
     
    Jorgen Grahn, Oct 25, 2013
    #9
  10. randy

    Jorgen Grahn Guest

    .
    I read kernel code a lot, and it's far more pleasant and elegant than
    this.

    Not hardware drivers though; I can imagine they are often written by
    outsiders. Also they may want to adapt their naming to hardware specs
    et cetera.

    /Jorgen
     
    Jorgen Grahn, Oct 25, 2013
    #10
  11. randy

    Jorgen Grahn Guest

    Any particular problems? A lot of things about Linux are very
    straightforward.
    I should have responded here instead of upthread: the Linux [kernel]
    sources I've seen are nothing like this, and quite readable.

    /Jorgen
     
    Jorgen Grahn, Oct 25, 2013
    #11
  12. I should have responded here instead of upthread: the Linux [kernel]
    sources I've seen are nothing like this, and quite readable.[/QUOTE]

    Caveat: I've not looked at any of his code (either the kernel or git), but I
    have watched a talk he gave once in which he discussed (among other things)
    his coding style.

    The take-away from that talk was that he does have an, er, shall we say,
    "unique" coding style, and the implied statement was that you either love
    it or hate it. I get the impression that the world kinda splits about
    50/50 into the love/hate camps.

    So, arguing about whether or not the Linux kernel is "readable" is going to
    be like arguing about any other "love/hate" kind of thing; you're not going
    to convince anyone to change their stance.
     
    Kenny McCormack, Oct 25, 2013
    #12
  13. randy

    BartC Guest

    I've since managed to download the Linux sources. The one or two files I've
    glanced at seem nothing like as bad as what the OP posted either. (But there
    are about 45,000 files I haven't yet looked at.)
     
    BartC, Oct 26, 2013
    #13
  14. (snip)
    The early PL/I compilers used the first four and last three for
    external symbols. (The linker only knew about 8.) Internal names
    could be longer, such as 31. Using some from each end allows for
    long_name1, long_name2, etc.

    The Fortran H compiler uses six trees for its symbol table, one for
    each possible length. One manual suggests for faster compilation
    distribute your names equally between 1 and 6 characters.
    (No mention of readability of the program.)

    -- glen
     
    glen herrmannsfeldt, Oct 26, 2013
    #14
  15. Do you have a citation? It sounds like a peculiar thing for him to
    have said.
     
    Ben Bacarisse, Oct 26, 2013
    #15
  16. Fortran would accept up to six, and C compilers would prefix an underscore
    to the linker. So you could only call a C routine or use a C identifier
    from Fortran if it was unique in the first five.

    Mathematicians don't use long names. They virtually always use single letters,
    resorting to Greek or even other alphabets when they run out of Latin.
    But really in programming we've several types of variables. Minor variables
    should be x, y, z for co-ordinates or real values, theta for an angle,
    N for a count, i, for an index. I use ii, iii, iv, v etc for nested counters
    and j, k for secondary counters. (Eg if you're removing runs of duplicates,
    I'd iterate over the array with i, and keep j as the counter to the top
    of the unique list). But a lot of people use j, k for nested counters.
    z is a complex number, ptr a pointer, str a string, ch a character, fp a
    file pointer.
    There's quite a lot you can do with only five characters.
     
    Malcolm McLean, Oct 26, 2013
    #16
  17. Maybe we are talking at cross purposes. You quote suggests that
    Kernighan did not want more because real programmers don't need more.
    That seems entirely at odds with almost everything I've read by him.
    For example, in 1974 -- four years before K&R 1 and more than a decade
    before the ANSI standard he was advising, as a matter of style, to make
    external identifiers unique in the first 6 characters. That was, as you
    probably know, common at the time. Note, as a matter of style, not "you
    don't need more" just that you may hit a linker limit if you assume that
    more will be unique.

    I can see him advocating for the standard to require no more than five
    from an implementation if he had become aware in those ten or twelve
    years of a system that could not guarantee even six, but that's not at
    all the same as saying the real programmers don't need more.
    Yes, but that's not how you presented the quote. If there were key
    systems that could not guarantee five I can see him, and others, arguing
    for four, but that would be out of desperation with broken linkers, not
    because real programmer don't need more.
     
    Ben Bacarisse, Oct 26, 2013
    #17
  18. randy

    Lew Pitcher Guest

    IIRC, the MVS LKED linkage editor of the time had a 6-character limit on the
    size of external names. The VSE linkage editor had a similar limit.

    It wasn't too long later (a few years) that IBM came up with the LE370 tools
    that extended both the assembler and linkage editor to handle larger
    external names, and added a native C compiler to the language support.

    [snip]
     
    Lew Pitcher, Oct 26, 2013
    #18
  19. randy

    Lew Pitcher Guest

    Correction, now that I've checked my archived JCL: the MVS Linkage Editor
    was programname IEWL, later replaced by HEWL when LE370 came along.
     
    Lew Pitcher, Oct 26, 2013
    #19
  20. (snip, someone wrote)
    I don't know the DOS/360 or VSE well at all, but from OS/360 through
    to MVS the limit is eight. Eight is a favorite number. Jobnames are
    eight, DDnames are eight, PDS member names are eight, and DSNames
    in the catalog have at most eight between periods.

    VM/370 and descendants have eight character filenames and filetypes
    (what many call extensions).

    The six character limits came from BCD on the 36 bit machines,
    and later SIXBIT on the DEC 36 bit machines.
    Well, PL/I allowed longer names, too, but IBM restricted
    external names by using, I believe, the first four and last
    three characters. (Allows for more than one CSECT per PROC.)
    LE is convenient for both PL/I and C. Is there a Fortran 90
    compiler?

    -- glen
     
    glen herrmannsfeldt, Oct 26, 2013
    #20
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.