Segmentation Fault...

Discussion in 'C Programming' started by Verrigan, Oct 30, 2004.

  1. Verrigan

    Verrigan Guest

    I'm trying to pass an array to a function, and change the values of that
    array, but I can't seem to figure it out.. Please help me figure out what
    I'm doing wrong... Any help would be appreciated... Even if it's "You can't
    change the value of a passed array." :) Thanks!

    Code:
    static int get_command_args(char *msg, char **args)
    {
    char *p;
    int i, loops;

    i = 0;
    loops = 0;
    p = args;
    while(*msg && !(*msg == END_CHAR)) {
    loops++;
    fprintf(stdout, "%d\n", loops);
    *p++ = *msg++;
    if(*msg == ARG_CHAR) {
    i++;
    *p = '\0';
    p = args;
    *msg++;
    }
    }
    return i;
    }

    static int parse_command(int sock, char *msg)
    {
    char *args[10];
    char buf[BUFFER_LEN];

    get_command_args(msg, args);
    ...

    gdb debugging info:
    0x0804922e in get_command_args (msg=0xbfffe530
    "login§verrigan§82jsheud¶\n", args=0xbfffe0c0) at interface.c:65
    65 *p++ = *msg++;
    (gdb) back
    #0 0x0804922e in get_command_args (msg=0xbfffe530
    "login§verrigan§82jsheud¶\n", args=0xbfffe0c0) at interface.c:65
    #1 0x08049280 in parse_command (sock=9, msg=0xbfffe530
    "login§verrigan§82jsheud¶\n") at interface.c:81
     
    Verrigan, Oct 30, 2004
    #1
    1. Advertisements

  2. Verrigan

    Jack Klein Guest



    First, there is almost always a way to write a loop without a
    pre-initialization like the one you wrote above.

    Second, this assignment requires that args contain a valid pointer
    value.

    Third, if your data is now what you expect and contains too many
    'ARG_CHARA', your program will walk past the end of the 'args' array
    and produce undefined behavior. This type of sloppy handling of
    buffers is the cause of many security defects in C programs.


    Here is where you 'args' as an uninitialized array of ten pointers to
    char. The array does not contain valid pointers. Defining pointers,
    separately or in arrays, does not create anything for them to point to
    nor does it point them at anything.

    So inside your function when you execute 'p = args' statements, you
    are assigning uninitialized values to 'p'. Then you try to write to
    memory through the uninitialized pointer, causing undefined behavior.
    You need to understand more about pointers, how to define them,
    initialize them, and make sure they point to the proper type and
    amount of memory for what you plan to do with them. Consult a book on
    C programming.
     
    Jack Klein, Oct 30, 2004
    #2
    1. Advertisements

  3. When you post to the n.g. please post compilable code .
    Find my comments inline ..


    But you have not allocated memory for
    the members of array of pointers to char . (args).
    Do not post unnecessary code. It distracts the attention.


    You are assigning a pointer. Still at this point,
    args pointer has not yet been allocated.
    And you are dereferencing it here.
    Seems like you are trying to get the tokens of a string.
    Why would not you use strtok, if that is the case ?

     
    Karthik Kumar, Oct 30, 2004
    #3
  4. Verrigan

    Verrigan Guest

    I knew I was doing something wrong. I had been working on other functions
    in the code for way too long, and let this one get to me. :)

    Here is what I ended up with, and it works as expected.

    ...
    static int get_command_args(char *msg, char *args[])
    {
    char *p;
    int i = 0;

    p = args;
    while(*msg && !(*msg == END_CHAR)) {
    if(*msg == ARG_CHAR) {
    *p = '\0';
    i++;
    if(!(args = malloc(sizeof(args)))) {
    PERROR(malloc);
    return -1;
    }
    p = args;
    *msg++;
    } else {
    *p++ = *msg++;
    }
    }
    *p = '\0';
    return i++;
    }

    static int parse_command(int sock, char *msg)
    {
    char *args[BUFFER_LEN];

    if(get_command_args(msg, args) == -1)
    return -1;
    /* Put the rest of the code here. */
    }
    ...

    Thanks again!
     
    Verrigan, Oct 30, 2004
    #4
  5. That is, crashes with illegal address access?


    What if the first character in your 'msg' is an ARG_CHAR?
    So you are writing into an unknown address (p = args which hasn't
    been set up yet if the first character in 'msg' is ARG_CHAR);


    You are allocating a 4 byte (probably) string? Type of args is
    char*, size of that is 4 bytes on 32-bit machines. At this point you
    don't know how many characters are in each argument.


    And how about if there are multiple ARG_CHARs in a row, do you want them
    taken as separate arguments or should they be ignored (like whitespace
    in command lines)?

    What if there are more arguments than you expect?

    Chris C
     
    Chris Croughton, Oct 31, 2004
    #5
  6. Verrigan

    Chris Torek Guest

    You still are, although -- in the fashion of undefined behavior --
    it appears to be working at the moment anyway (hence probably just
    waiting for some "important" moment to crash :) ).


    What is the value of i? (Answer: 0. We just set it to 0 right
    here.) What is the value of args, i.e., args[0]? (Answer
    depends on caller, obviously, who supplied args. Presumably the
    value of "args" is valid, but args[0] had better hold some actual
    value as well, because we just copied to to "p".)

    Let us take a look at the caller for a moment so we can figure out
    what the value of args[0] is.

    So, in parse_command(), args has type "array (size BUFFER_LEN) of
    pointer to char". If BUFFER_LEN is (say) 1000, that makes it
    "array 1000 of pointer to char" -- 1000 elements, each of type
    "pointer to char", and each one uninitialized.

    Since "args" is an ordinary automatic variable, it is uninitialized
    and thus full of garbage. args[0] is garbage, args[1] is garbage,
    args[2] is garbage, and so on up through args[999] (still assuming
    BUFFER_LEN is 1000).
    This passes the array "args" by value to the function get_command_args()
    -- the place where we are trying to figure out what its args[0]
    is. By The Rule about arrays and pointers in C (see
    <http://web.torek.net/torek/c/pa.html>), the "value" of the array
    is a pointer to its first element. Ultimately, args in
    get_command_args winds up meaning the same thing as args here
    in parse_command().

    Thus, we can finally answer the question: "What is the value of
    args, i.e., args[0], that we are going to copy into p in
    get_command_args()?" The answer is: garbage. args[0] has never
    been set to anything. We copy the trash value, whatever it is,
    into "p". If we are really lucky, just copying the trash causes
    an immediate error, so that we find the bug before anything else
    goes wrong. Given what you have said elsewhere, we appear to be
    unlucky: not only does copying trash into p not stop the program
    immediately, but get_command_args() even runs to completion.

    Let us imagine that "p" happens to point to some write-able memory
    somewhere, such as the user's bank balance :) , and press on:
    What are the values of "msg" and "*msg"? Those depend on
    parse_command() too, but it just gets them from somewhere else.
    We will have to assume that they are something sensible (and
    I am sure they are; I have seen plenty of code like this, and
    this part is generally gotten right).

    We might as well assume that *msg != 0 and *msg != END_CHAR, so
    that we enter the loop. (If not we will just set *p to '\0',
    clobbering the user's bank balance. You were not planning to
    use that money, were you? :) )
    Again, this depends on *msg. If, on the first trip through the
    loop, *msg is ARG_CHAR, we will do this now; and likely *msg will
    be ARG_CHAR at least one or two times inside the loop and we will
    do this at least once or twice, after doing the "else" below a
    few times as well.
    Clobber the user's bank balance (on the first ARG_CHAR anyway),
    then:
    change i to 1 (on the first ARG_CHAR anyway; change it to 2 on
    the 2nd ARG_CHAR, and so on)...


    Here is another "guaranteed bug".

    Calls to malloc should generally have one of two forms:

    ptr = malloc(sizeof *ptr); /* to get one object */
    or: ptr = malloc(n * sizeof *ptr); /* to get N objects */

    This call looks like neither. Rather, it resembles:

    ptr = malloc(sizeof ptr); /* missing "*"; maybe also need "n *" */

    or in this case:

    args = malloc(sizeof args); /* still missing a "*" */

    So, this asks malloc() for "sizeof args" bytes -- probably 2 or
    4 but perhaps even 8 now, all depending on sizeof(char *) on your
    particular box.

    All of this is wrapped inside a test for malloc() failing (which
    is good!), and if malloc fails we have a funny-looking thing that
    must be a macro (from the all-caps):
    (Of course, by returning -1, we may lose some memory -- there could
    have been some earlier, succeeding malloc()s -- but if the program
    itself is going to exit soon, this is just a little sloppy, as long
    as the system releases all memory on program exit.)

    If the malloc() succeeds, on the other hand, we copy the result
    into "p" for additional trips around the while loop, then advance
    over the ARG_CHAR in *msg:


    This is the "*msg is not an ARG_CHAR" case:
    Here we copy the character from *msg to *p, advancing over the
    copied character in both. Again, p is either somewhere in the
    user's bank balance (which we are still faithfully clobbering) when
    i == 0, or is based on the result of malloc() (i != 0).

    If i != 0 (so that p *is* based on a malloc() call), how many chars
    have we copied into the pointer we got from malloc()? We can in
    fact find out: args and p were initially the same, and p gets
    incremented after each "char" is copied into *p, so the number of
    "char"s copied so far -- either before or after "*p++ = *msg++" --
    must be the same as (p - args). This will count up: 0, 1, 2,
    3, ..., as we copy "char"s from *msg++ to *p++. Each time *msg is
    not an ARG_CHAR, we copy another one.

    How many "char"s *can* we copy before we "copy too many"? The
    answer depends on sizeof(char *), because we only asked malloc()
    for sizeof(char *) bytes.[%] How many "char"s *will* we copy?
    That depends on how many there are between each ARG_CHAR.

    Note that each one will have a terminating '\0' added, either
    This finishes off the last one (or, if i==0, finishes clobbering
    the user's bank balance).
    -----
    [%footnote

    Remember that, in C:

    A byte is a char and a char is a byte
    Eight bits is common but nine is in sight
    Digital sig-i-nal processors? Whew!
    There you may find that you've got thirty-two!

    Each "C byte" is CHAR_BIT bits long, and CHAR_BIT is allowed to
    be more than 8. It has to be *at least* 8. Most systems make
    it exactly 8, because that works very well.]
    -----
    This "return" is a peculiar statement. "i++" says "find the current
    value of i, and schedule i+1 to be written back to i some time
    before the next sequence point." In other words, increment i, but
    also hang on to the original value, as the value of the entire
    expression. We then return the saved/original value. The act of
    returning destroys the variable -- so why did we increment it?

    This is not *wrong*, it is just peculiar. If get_command_args()
    has made two strings for args[0] and args[1] (so that i==1), it
    returns 1, not 2. If it has made just one, it returns 0, not 1.
    It is impossible for it to make none -- it always writes a '\0'
    byte. (It happens to fail to set args[0] at all, but this is a
    separate bug, which presumably we might fix by having it malloc()
    something for args[0] as well.)

    So, the two big problems here are:

    - args[0] is never malloc()ed at all, and
    - args, i != 0, *is* malloc()ed, but the number of bytes
    (aka "char"s) malloc()ed is not based on the number of bytes
    that will be written.

    A possible bug, but perhaps it is not a bug at all but rather part
    of the design, is that some argss may be empty strings. For
    instance, if msg[2] is ARG_CHAR and msg[3] is *also* ARG_CHAR,
    this winds up represented in the args[] array as an empty string.

    Another "medium-likely" bug is that get_command_args() returns 0
    even when given a completely empty string. Remember that returning
    0 means args[0] is valid -- we return one less than the number of
    strings we made. Thus, the empty string in "msg" represents one,
    instead of zero, "args". Probably it should represent no args at
    all, and get_command_args() should, e.g., return 2 when args[0]
    and args[1] are both valid.

    One other possible bug is the dreaded "buffer overflow". The
    get_command_args() function has no idea how many args's it
    can write on. The caller -- parse_command(), in this case --
    provided BUFFER_LEN of them, without telling get_command_args()
    that it did in fact provide BUFFER_LEN of them. This may well
    be OK, because we know that get_command_args() can only write
    at most "k" args[] elements, where k is the number of non-'\0'
    bytes in msg[]. (That is, k == strlen(msg) as originally passed
    in.) If k is guaranteed not to exceed BUFFER_LEN, this cannot
    cause an array-bounds error.

    Still, it might be better to re-design get_command_args() a bit so
    that it knows how many array elements it can write on. Or,
    alternatively, it could malloc() the array "args" itself, as well
    as each element args. (In this case, "args" will have type
    "char **" instead of "char *[N]" for some constant N, but -- due
    to The Rule about arrays and pointers in C -- these can be used
    *as if* they were interchangeable, other than the malloc() and
    free() that get the space for the "array".)

    Finally, there is one other "design note", as it were: when
    get_command_args() succeeds, its caller eventually needs to free
    each args element individually, making one call to free() for
    each call get_command_args() made to malloc(). It may (or may
    not) be possible and desirable to change this. In particular,
    it may be possible to use *no* calls to malloc() at all, and it
    is certainly possible (but maybe not desirable -- a lot depends
    on code not shown here) to use just one call.

    What if, instead of copying each "piece" of "msg" one piece at
    a time into separately-malloc()ed memory, we copy the entire
    thing into *one* malloc()ed area? This copy can then be whacked
    upon as much as we like, without damaging the original. This
    leads to the following routine:

    /*
    * Break up the provided string into a sequence of (possibly
    * empty) strings separated by ARG_CHARs. For instance,
    * "zero,one,,three" might become "zero" then "one" then ""
    * then "three" (assuming ARG_CHAR is ',').
    *
    * The array args[] must be large enough to hold the result,
    * but the result will not exceed strlen(msg).
    *
    * Failure (to obtain space) is represented by returning -1.
    * An empty string in "msg" is represented by returning 0.
    * Only nonempty strings result in args[] arrays.
    *
    * When all done, the caller must call free(args[0]).
    */
    static int get_command_args(char *msg, char **args, size_t argsize) {
    size_t len = strlen(msg);
    char *mem, *p;
    size_t i;

    /* this test is optional; without it, "" is one arg */
    if (len == 0)
    return 0; /* empty string = no arguments, no malloc */

    mem = malloc(len + 1); /* +1 for '\0' */
    if (mem == NULL) {
    /* you can print a message here if you wish */
    return -1; /* failure; no memory allocated at all */
    }

    memcpy(mem, msg, len + 1); /* copy the '\0' too */

    for (i = 0, p = mem;;) {
    if (i >= argsize)
    panic("get_command_args: i >= argsize");

    /*
    * At this point we have a (possibly empty) string that
    * starts at "p" and goes to the next (perhaps immediate)
    * ARG_CHAR. Save the start and advance p to the ARG_CHAR.
    * If there are no more ARG_CHARs, p becomes NULL and we
    * exit the loop.
    */
    args[i++] = p;
    p = strchr(p, ARG_CHAR);
    if (p == NULL)
    break;
    *p++ = '\0'; /* clobber ARG_CHAR and move forward */
    }

    return i;
    }

    (I used the library strchr() function, rather than a small loop,
    to advance p. As a result I have to check for NULL instead of
    *p==0.)

    Note that the caller must now pass the number of elements in the
    array "args":

    char *args[BUFFER_LEN];
    ...
    n = get_command_args(msg, args, BUFFER_LEN);

    or:

    n = get_command_args(msg, args, sizeof args / sizeof *args);

    and if the code would be about to overwrite an invalid args, it
    calls panic() (an "internal error detected" routine, which you must
    supply).

    The call to malloc() above does not have the "comp.lang.c-recommended"
    form:

    p = malloc(sizeof *p);
    or: p = malloc(n * sizeof *p);

    What happened to the "sizeof"? The answer is: I was lazy and left
    it out. We know that sizeof(char) is 1, and "mem" has type "char *"
    so "sizeof *mem" is "sizeof(char)" which is 1. Some people will
    prefer to write:

    mem = malloc((len + 1) * sizeof *mem);

    which is also quite correct.

    Finally, you might ask: "How can we get rid of even the one remaining
    malloc()?" Well, that depends on whether we can overwrite the
    original "msg" array, and point into it, instead of into malloc()ed
    memory. As should be fairly obvious -- at least if one draws this
    diagram (note, fixed-width font required):

    msg:
    +---+---+---+---+---+---+---+---+---+---+---+
    | o | n | e | , | t | w | o | , | , | 3 |\0 |
    +---+---+---+---+---+---+---+---+---+---+---+

    args:
    +-----+-----+-----+-----+
    | * | * | * | * |
    +--|--+--|--+--|--+--|--+
    | | \___ \________________
    / \________ \_____________ \
    | \ \ |
    v | | |
    mem: v v v
    +---+---+---+---+---+---+---+---+---+---+---+
    | o | n | e |\0 | t | w | o |\0 |\0 | 3 |\0 |
    +---+---+---+---+---+---+---+---+---+---+---+

    -- the only reason we copied "msg" in the first place was to be
    able to replace each ARG_CHAR (',' above) with '\0'. What if we
    just replace it directly in msg[]?

    Whether this is suitable depends on the lifetime of the original
    "msg" buffer, and whether you plan to re-use it while still wanting
    args[0] through args[3] to be valid. But it is often an option,
    and it does avoid the (now lone) call to malloc().
     
    Chris Torek, Oct 31, 2004
    #6
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.