strtok segfaults in CLI but not in GDB

Discussion in 'C Programming' started by Pietro Cerutti, May 16, 2007.

  1. Hello,
    here I have a strange problem with a real simple strtok example.

    The program is as follows:

    ### BEGIN STRTOK ###

    #include <string.h>
    #include <stdio.h>

    int main()
    {
    char *input1 = "Hello, World!";

    char *tok;

    tok = strtok(input1, " ");
    if(tok) printf("%s\n", tok);

    tok = strtok(NULL, " ");
    if(tok) printf("%s\n", tok);

    return(0);

    }

    ### END STRTOK ###


    Now, when I run it from the command line, I get a bus error:

    ### BEGIN COMMAND LINE OUTPUT ###

    > gcc -ggdb -Wall -o strtok strtok.c
    > ./strtok

    Bus error (core dumped)
    Exit 138

    ### END COMMAND LINE OUTPUT ###

    When I run it step by step in GDB, the program terminates normally:

    ### BEGIN DEBUGGER OUTPUT ###

    > gdb ./strtok

    GNU gdb 6.1.1 [FreeBSD]
    [snip]GDB copyright and bla bla[/snip]
    (gdb) break main
    Breakpoint 1 at 0x8048570: file strtok.c, line 6.
    (gdb) run
    Starting program: /home/piter/strtok

    Breakpoint 1, main () at strtok.c:6
    6 char *input1 = "Hello, World!";
    (gdb) next
    10 tok = strtok(input1, " ");
    (gdb)
    11 if(tok) printf("%s\n", tok);
    (gdb)
    Hello,
    13 tok = strtok(NULL, " ");
    (gdb)
    14 if(tok) printf("%s\n", tok);
    (gdb)
    World!
    16 return(0);
    (gdb)
    18 }
    (gdb)
    0x08048485 in _start ()
    (gdb)
    Single stepping until exit from function _start,
    which has no line number information.

    Program exited normally.
    (gdb)

    ### END DEBUGGER OUTPUT ###

    Is there something I'm missing wrt C and/or strtok, or it's rather a
    problem related to my environment (in which case I'll be happy to post
    in the right newsgroup) ?

    Thanx in advance

    --
    Pietro Cerutti

    PGP Public Key ID:
    http://gahr.ch/pgp
     
    Pietro Cerutti, May 16, 2007
    #1
    1. Advertising

  2. Pietro Cerutti

    Ian Collins Guest

    Pietro Cerutti wrote:
    > Hello,
    > here I have a strange problem with a real simple strtok example.
    >
    > The program is as follows:
    >
    > ### BEGIN STRTOK ###
    >
    > #include <string.h>
    > #include <stdio.h>
    >
    > int main()
    > {
    > char *input1 = "Hello, World!";
    >
    > char *tok;
    >
    > tok = strtok(input1, " ");


    strtok alters its input. You are passing it a string literal, modifying
    a string literal invokes the demons of undefined behavior. Don't.

    --
    Ian Collins.
     
    Ian Collins, May 16, 2007
    #2
    1. Advertising

  3. Pietro Cerutti said:

    > Hello,
    > here I have a strange problem with a real simple strtok example.
    >
    > The program is as follows:
    >
    > ### BEGIN STRTOK ###
    >
    > #include <string.h>
    > #include <stdio.h>
    >
    > int main()
    > {
    > char *input1 = "Hello, World!";
    >
    > char *tok;
    >
    > tok = strtok(input1, " ");


    strtok modifies the string you pass it. You pass it a string literal.
    You're not allowed to modify string literals.

    Change

    char *input1 = "Hello, World!";

    to

    char input1[] = "Hello, World!";


    --
    Richard Heathfield
    "Usenet is a strange place" - dmr 29/7/1999
    http://www.cpax.org.uk
    email: rjh at the above domain, - www.
     
    Richard Heathfield, May 16, 2007
    #3
  4. Pietro Cerutti wrote:

    > char *input1 = "Hello, World!";


    just in case, I know that the string to be tokenized shouldn't be a
    constant, but rather an array of chars.
    So, it should be declared as

    char input1[14] = "Hello, World!";

    The thing I don't understand is: why does it works in GDB?

    --
    Pietro Cerutti

    PGP Public Key:
    http://gahr.ch/pgp
     
    Pietro Cerutti, May 16, 2007
    #4
  5. Pietro Cerutti

    Ian Collins Guest

    Pietro Cerutti wrote:
    > Pietro Cerutti wrote:
    >
    >> char *input1 = "Hello, World!";

    >
    > just in case, I know that the string to be tokenized shouldn't be a
    > constant, but rather an array of chars.
    > So, it should be declared as
    >
    > char input1[14] = "Hello, World!";
    >
    > The thing I don't understand is: why does it works in GDB?
    >

    Luck?

    --
    Ian Collins.
     
    Ian Collins, May 16, 2007
    #5
  6. Pietro Cerutti

    Chris Dollin Guest

    Pietro Cerutti wrote:

    > here I have a strange problem with a real simple strtok example.


    Guess: you're trying to use it on a literal string.

    > The program is as follows:
    >
    > ### BEGIN STRTOK ###
    >
    > #include <string.h>
    > #include <stdio.h>
    >
    > int main()
    > {
    > char *input1 = "Hello, World!";
    >
    > char *tok;
    >
    > tok = strtok(input1, " ");
    > if(tok) printf("%s\n", tok);
    >
    > tok = strtok(NULL, " ");
    > if(tok) printf("%s\n", tok);
    >
    > return(0);
    >
    > }


    (fx:dancing) Yes!

    `strtok` writes to its argument -- it sticks nuls in there to make
    the strings it returns.

    You're not allowed to write into a string literal: that gets you
    undefined behaviour.

    An implementation may just write into the string. Or it may abort in
    some way. Or it may ignore the write. Or it may write somewhere else
    entirely. Or it may mail a report to your co-coders, or start a game
    of rogue, or book you a holiday in the Lake District, or set fire to
    your keyboard, or arrange a date with your Most Preferred Person.

    [That last one never seems to happen, though.]

    --
    "You've spotted a flaw in my thinking, Trev" Big Al,/The Beiderbeck Connection/

    Hewlett-Packard Limited registered office: Cain Road, Bracknell,
    registered no: 690597 England Berks RG12 1HN
     
    Chris Dollin, May 16, 2007
    #6
  7. Ian Collins wrote:
    > Pietro Cerutti wrote:
    >> Pietro Cerutti wrote:
    >>
    >>> char *input1 = "Hello, World!";

    >> just in case, I know that the string to be tokenized shouldn't be a
    >> constant, but rather an array of chars.
    >> So, it should be declared as
    >>
    >> char input1[14] = "Hello, World!";
    >>
    >> The thing I don't understand is: why does it works in GDB?
    >>

    > Luck?
    >


    Ya, maybe.

    The point is:
    I understand what UB means, so WW3 could start now and I'd know why...

    But if a string literal is - by definition - not modifiable, then how
    can it happen that GDB actually modifies it using strtok?

    --
    Pietro Cerutti

    PGP Public Key:
    http://gahr.ch/pgp
     
    Pietro Cerutti, May 16, 2007
    #7
  8. Chris Dollin wrote:

    > You're not allowed to write into a string literal: that gets you
    > undefined behaviour.
    >
    > An implementation may just write into the string.


    Uh? So you mean that a string literal isn't unmodifiable by definition?


    --
    Pietro Cerutti

    PGP Public Key:
    http://gahr.ch/pgp
     
    Pietro Cerutti, May 16, 2007
    #8
  9. Pietro Cerutti <> writes:
    > Pietro Cerutti wrote:
    >
    >> char *input1 = "Hello, World!";

    >
    > just in case, I know that the string to be tokenized shouldn't be a
    > constant, but rather an array of chars.
    > So, it should be declared as
    >
    > char input1[14] = "Hello, World!";
    >
    > The thing I don't understand is: why does it works in GDB?


    Because it invokes undefined behavior. There are no rules about what
    happens. It can crash, it can "work", it can make demons fly out of
    your nose.

    (I suppose string literals are stored in write-protected memory when
    your program runs normally, but not when it runs under gdb -- which
    seems odd.)

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Keith Thompson, May 16, 2007
    #9
  10. Keith Thompson wrote:

    > (I suppose string literals are stored in write-protected memory when
    > your program runs normally, but not when it runs under gdb -- which
    > seems odd.)


    Yes it's weird, but it's a logical explanation.
    I'll investigate with the freebsd people..
    Thank you.

    --
    Pietro Cerutti

    PGP Public Key:
    http://gahr.ch/pgp
     
    Pietro Cerutti, May 16, 2007
    #10
  11. In article <40054$464acdb7$50dabbcd$>,
    Pietro Cerutti <> wrote:

    >But if a string literal is - by definition - not modifiable, then how
    >can it happen that GDB actually modifies it using strtok?


    It's not modifiable in that you're not allowed to modify it. It's not
    required that the implementation signal an error when you do it. It's
    a constraint on you, not on the system.

    My guess as to why you don't see an error with GDB is that the
    debugger needs the text segment to be writable, so that it can set
    breakpoints.

    -- Richard
    --
    "Consideration shall be given to the need for as many as 32 characters
    in some alphabets" - X3.4, 1963.
     
    Richard Tobin, May 16, 2007
    #11
  12. Pietro Cerutti <> writes:
    > Ian Collins wrote:
    >> Pietro Cerutti wrote:
    >>> Pietro Cerutti wrote:
    >>>
    >>>> char *input1 = "Hello, World!";
    >>> just in case, I know that the string to be tokenized shouldn't be a
    >>> constant, but rather an array of chars.
    >>> So, it should be declared as
    >>>
    >>> char input1[14] = "Hello, World!";
    >>>
    >>> The thing I don't understand is: why does it works in GDB?
    >>>

    >> Luck?

    >
    > Ya, maybe.
    >
    > The point is:
    > I understand what UB means, so WW3 could start now and I'd know why...
    >
    > But if a string literal is - by definition - not modifiable, then how
    > can it happen that GDB actually modifies it using strtok?


    I think you don't *quite* understand what UB means.

    The actual definition (C99 3.4.3) is:

    behavior, upon use of a nonportable or erroneous program construct
    or of erroneous data, for which this International Standard
    imposes no requirements

    and C99 6.4.5p6 says:

    [...] If the program attempts to modify such an array, the
    behavior is undefined.

    For example, consider this program:

    #include <stdio.h>
    int main(void)
    {
    char *s = "Hello, world";
    s[0] = 'J'; /* attempt to modify a string literal */
    puts(s);
    return 0;
    }

    One of the infinitely many possibly results is that the string literal
    is actually modified, and the program prints "Jello, world".

    The standard doesn't say that string literals are not modifiable. It
    says that attempting to modify a string literal invokes undefined
    behavior.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Keith Thompson, May 16, 2007
    #12
  13. Pietro Cerutti

    Chris Dollin Guest

    Pietro Cerutti wrote:

    > Chris Dollin wrote:
    >
    >> You're not allowed to write into a string literal: that gets you
    >> undefined behaviour.
    >>
    >> An implementation may just write into the string.

    >
    > Uh? So you mean that a string literal isn't unmodifiable by definition?


    Yes, that's what I (well, the C standard) says.

    Specifically, it says that if you attempt to write into a string literal,
    /the effect is undefined/. Anything can happen. C washes it's hands of
    your code. It cares not. Mind the gap. Do as you will.

    An implementation may implement this freedom by changing the content of
    the literal, if that's convenient.

    Hence: don't go writing into string literals. Even though it /might/
    get you a date, it probably won't, and I am assured that nasal demons
    are not fun to have.

    --
    "I'm still here and I'm holding the answers" - Karnataka, /Love and Affection/

    Hewlett-Packard Limited registered office: Cain Road, Bracknell,
    registered no: 690597 England Berks RG12 1HN
     
    Chris Dollin, May 16, 2007
    #13
  14. Keith Thompson wrote:

    > The standard doesn't say that string literals are not modifiable. It
    > says that attempting to modify a string literal invokes undefined
    > behavior.


    Got it. Thanks!

    --
    Pietro Cerutti

    PGP Public Key:
    http://gahr.ch/pgp
     
    Pietro Cerutti, May 16, 2007
    #14
  15. Richard Tobin wrote:

    > My guess as to why you don't see an error with GDB is that the
    > debugger needs the text segment to be writable, so that it can set
    > breakpoints.


    GDB on Debian/GNU Linux gives an error when I try to modify it.
    On FreeBSD it doesn't, that's why I'm asking right now the FreeBSD
    people whether the behavior is wanted or erroneous.

    Thanx

    >
    > -- Richard



    --
    Pietro Cerutti

    PGP Public Key:
    http://gahr.ch/pgp
     
    Pietro Cerutti, May 16, 2007
    #15
  16. Chris Dollin wrote:
    > Pietro Cerutti wrote:
    >
    >> Chris Dollin wrote:
    >>
    >>> You're not allowed to write into a string literal: that gets you
    >>> undefined behaviour.
    >>>
    >>> An implementation may just write into the string.

    >> Uh? So you mean that a string literal isn't unmodifiable by definition?

    >
    > Yes, that's what I (well, the C standard) says.
    >
    > Specifically, it says that if you attempt to write into a string literal,
    > /the effect is undefined/. Anything can happen. C washes it's hands of
    > your code. It cares not. Mind the gap. Do as you will.
    >
    > An implementation may implement this freedom by changing the content of
    > the literal, if that's convenient.
    >
    > Hence: don't go writing into string literals. Even though it /might/
    > get you a date, it probably won't, and I am assured that nasal demons
    > are not fun to have.
    >


    Clear. Thanks to you too.

    --
    Pietro Cerutti

    PGP Public Key:
    http://gahr.ch/pgp
     
    Pietro Cerutti, May 16, 2007
    #16
  17. Pietro Cerutti said:

    > Richard Tobin wrote:
    >
    >> My guess as to why you don't see an error with GDB is that the
    >> debugger needs the text segment to be writable, so that it can set
    >> breakpoints.

    >
    > GDB on Debian/GNU Linux gives an error when I try to modify it.


    That's an acceptable outcome of undefined behaviour.

    > On FreeBSD it doesn't,


    So's that.

    that's why I'm asking right now the FreeBSD
    > people whether the behavior is wanted or erroneous.


    It is neither Debian nor FreeBSD, but rather your program, that is
    erroneous.

    --
    Richard Heathfield
    "Usenet is a strange place" - dmr 29/7/1999
    http://www.cpax.org.uk
    email: rjh at the above domain, - www.
     
    Richard Heathfield, May 16, 2007
    #17
  18. In article <>,
    Richard Heathfield <> wrote:

    >> GDB on Debian/GNU Linux gives an error when I try to modify it.

    >
    >That's an acceptable outcome of undefined behaviour.
    >
    >> On FreeBSD it doesn't,

    >
    >So's that.
    >
    > that's why I'm asking right now the FreeBSD
    >> people whether the behavior is wanted or erroneous.

    >
    >It is neither Debian nor FreeBSD, but rather your program, that is
    >erroneous.


    I think he meant "erroneous" in the sense of a mistake, rather than
    a violation of the C standard.

    It certainly seems desirable to have programs behave the same way
    under the debugger as without it, so it would be good if the FreeBSD
    version could be changed. Meanwhile, we at least have a clue that if
    a segmentation fault goes away in the debugger then the cause may well
    be attempted modification of literal strings.

    -- Richard
    --
    "Consideration shall be given to the need for as many as 32 characters
    in some alphabets" - X3.4, 1963.
     
    Richard Tobin, May 16, 2007
    #18
  19. Pietro Cerutti

    Don Porges Guest

    "Keith Thompson" <> wrote in message news:...
    > Pietro Cerutti <> writes:
    >> Ian Collins wrote:
    >>> Pietro Cerutti wrote:
    >>>> Pietro Cerutti wrote:
    >>>>
    >>>>> char *input1 = "Hello, World!";
    >>>> just in case, I know that the string to be tokenized shouldn't be a
    >>>> constant, but rather an array of chars.
    >>>> So, it should be declared as
    >>>>
    >>>> char input1[14] = "Hello, World!";
    >>>>
    >>>> The thing I don't understand is: why does it works in GDB?
    >>>>
    >>> Luck?

    >>
    >> Ya, maybe.
    >>
    >> The point is:
    >> I understand what UB means, so WW3 could start now and I'd know why...
    >>
    >> But if a string literal is - by definition - not modifiable, then how
    >> can it happen that GDB actually modifies it using strtok?

    >
    > I think you don't *quite* understand what UB means.
    >
    > The actual definition (C99 3.4.3) is:
    >
    > behavior, upon use of a nonportable or erroneous program construct
    > or of erroneous data, for which this International Standard
    > imposes no requirements
    >
    > and C99 6.4.5p6 says:
    >
    > [...] If the program attempts to modify such an array, the
    > behavior is undefined.
    >
    > For example, consider this program:
    >
    > #include <stdio.h>
    > int main(void)
    > {
    > char *s = "Hello, world";
    > s[0] = 'J'; /* attempt to modify a string literal */
    > puts(s);
    > return 0;
    > }
    >
    > One of the infinitely many possibly results is that the string literal
    > is actually modified, and the program prints "Jello, world".
    >
    > The standard doesn't say that string literals are not modifiable. It
    > says that attempting to modify a string literal invokes undefined
    > behavior.


    <OT>
    Yes, _but_: from the point of view of gdb users and maintainers, they
    may still consider it a gdb bug if, on a single platform, _any_ program
    executes differently under gdb than it does when run normally. After all, the
    underlying problem -- writing into r/o storage -- could be triggered from
    an assembler program. And gdb doesn't have the same standards-contract
    relationship with anything that a C implementation does.

    It is, however, a separate issue from the fact that the program invokes UB.
    </OT>
     
    Don Porges, May 16, 2007
    #19
  20. CBFalconer <> writes:
    > Richard Tobin wrote:
    >> Pietro Cerutti <> wrote:
    >>> But if a string literal is - by definition - not modifiable, then
    >>> how can it happen that GDB actually modifies it using strtok?

    >>
    >> It's not modifiable in that you're not allowed to modify it. It's
    >> not required that the implementation signal an error when you do
    >> it. It's a constraint on you, not on the system.
    >>
    >> My guess as to why you don't see an error with GDB is that the
    >> debugger needs the text segment to be writable, so that it can set
    >> breakpoints.

    >
    > To get an error with gcc, add "-Wwrite-strings" to the command. No
    > quote chars used.


    That will cause gcc to emit a warning message if it can determine at
    compilation time that you've attempted to modify a string literal.

    Actually, it will generate warnings even in some cases where you
    *don't* attempt to modify a string literal. It works by internally
    applying a "const" qualifier to the array type. So, for example:

    % cat c.c
    char *s = "Hello, world";
    % gcc -c c.c
    % gcc -c -Wwrite-strings c.c
    c.c:1: warning: initialization discards qualifiers from pointer target type

    I haven't attempted to modify the string literal, but by assigning its
    address to a (non-const) char*, I've created the potential to do so.
    It would be nice if gcc were a bit smarter about this, at least
    marking the array type as some kind of "pseudo-const" so it can give
    more sensible warning messages. But since an implementation can warn
    about anything it likes, I don't believe the "-Wwrite-strings" option
    causes gcc to be non-conforming (unless you also add "-Werror").

    Consider the following program:

    #include <stdio.h>
    int main(void)
    {
    const char *s = "Hello, world";
    char *bogus = (char*)s;
    bogus[0] = 'J';
    puts(s);
    return 0;
    }

    It attempts to modify a string literal, and gcc doesn't complain about
    it (during compilation) even with "-Wwrite-strings", because I hid the
    evil part behind a pointer cast that dropped the "const" qualifier.
    On the system I'm using, it dies with a segmentation fault at run time
    -- *unless* I specify "-fwritable-strings", in which case it happily
    prints "Jello, world".

    Most of this is gcc-specific, of course. The topical point is that,
    apart from the fact that the "-Wwrite-strings -Werror" combination
    causes some valid programs to be rejected, all this behavior conforms
    to the standard.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Keith Thompson, May 17, 2007
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. alphatan['a:lfa:ta2n]

    gdb: insert assemble code but NOT machine?

    alphatan['a:lfa:ta2n], Dec 18, 2003, in forum: C Programming
    Replies:
    2
    Views:
    569
    Grumble
    Dec 18, 2003
  2. Surendra
    Replies:
    0
    Views:
    521
    Surendra
    Mar 23, 2006
  3. seba
    Replies:
    1
    Views:
    1,464
    mlimber
    Mar 22, 2007
  4. Timothy Madden
    Replies:
    1
    Views:
    1,662
    Timothy Madden
    Sep 17, 2009
  5. Andy Elvey

    Linked-list problem - compiles but segfaults

    Andy Elvey, Jul 5, 2011, in forum: C Programming
    Replies:
    13
    Views:
    682
    Shao Miller
    Jul 6, 2011
Loading...

Share This Page