I think I'm gonna cry. (or newby problems with simple string function)

Discussion in 'C Programming' started by robbie.carlton@gmail.com, Sep 29, 2005.

  1. Guest

    Hello!

    I've programmed in c a bit, but nothing very complicated. I've just
    come back to it after a long sojourn in the lands of functional
    programming and am completely stumped on a very simple function I'm
    trying to write. I'm writing a function that takes a string, and
    returns an array of strings which are the result of splitting the input
    on whitespace and parentheses (but the parentheses should also be
    included in the array as strings).

    an example:

    explode("foo bar baz") -> ["foo", "bar", "boys"]
    explode("foo(bar)baz") -> ["foo", "(", "bar", ")", "baz"]

    Now I thought I had it. But what I've got now causes a bus error. So
    I'm going to post all of the code (sorry) and maybe wiser minds than me
    can work it out. Please remember I'm a noob so I would prefer things as
    unobfuscated as possible, and I know how bad the style of my code is
    also, I'm trying to make it work first.

    Thanks in advance, here's the code:

    #include <stdio.h>

    char* extract(char* str, int len) {
    char* out = (char*)malloc(len + 1);
    out = memcpy(out, str, len);
    out[len] = '\0';
    return out;
    }

    char istax(int ch) {
    int out = (ch=='(') | (ch==')');
    return out;
    }

    char** explode(char* str) {
    int nt = counttokens(str);
    if(!nt) {
    return 0;
    }

    char** ret = (char**)malloc(nt);

    int i = 0;
    int len = strlen(str);
    char ch;
    int start = 0;
    int mode = 0;
    int t = 0;
    for (i = 0; i < len; i++) {
    ch = str;
    if (mode == 0) {
    if(!isspace(ch)) {
    mode = 1;
    start = 0;
    }
    } else {
    if(istax(ch)) {
    ret[t] = extract(str + start, (i + 1) - start);
    t++;
    } else if(isspace(ch)) {
    mode = 0;
    ret[t] = extract(str + start, (i + 1) - start);
    t++;
    }
    }
    }
    return ret;
    }

    int counttokens(char* str) {
    char ch;
    char intoken = 0;
    int tokens = 0;
    while(ch = str[0]) {
    if(ch == '(') {
    tokens++;
    intoken = 0;
    } else if(ch == ')') {
    tokens++;
    intoken = 0;
    } else if(ch != ' ') {
    if(!intoken) {
    intoken = 1;
    tokens++;
    }
    } else {
    intoken = 0;
    }
    str++;
    }
    return tokens;
    }


    thanks again,

    robbie
    , Sep 29, 2005
    #1
    1. Advertising

  2. tedu Guest

    wrote:

    > char* extract(char* str, int len) {
    > char* out = (char*)malloc(len + 1);
    > out = memcpy(out, str, len);
    > out[len] = '\0';
    > return out;
    > }


    > char** explode(char* str) {
    > int nt = counttokens(str);
    > if(!nt) {
    > return 0;
    > }
    >
    > char** ret = (char**)malloc(nt);
    >
    > int i = 0;
    > int len = strlen(str);
    > char ch;
    > int start = 0;
    > int mode = 0;
    > int t = 0;
    > for (i = 0; i < len; i++) {
    > ch = str;
    > if (mode == 0) {
    > if(!isspace(ch)) {
    > mode = 1;
    > start = 0;
    > }
    > } else {
    > if(istax(ch)) {
    > ret[t] = extract(str + start, (i + 1) - start);
    > t++;
    > } else if(isspace(ch)) {
    > mode = 0;
    > ret[t] = extract(str + start, (i + 1) - start);
    > t++;
    > }
    > }
    > }


    you aren't reassigning start nor using t in a meaningful way. pick
    one. :)
    tedu, Sep 30, 2005
    #2
    1. Advertising

  3. tedu Guest

    tedu wrote:
    > wrote:
    > > int i = 0;
    > > int len = strlen(str);
    > > char ch;
    > > int start = 0;
    > > int mode = 0;
    > > int t = 0;
    > > for (i = 0; i < len; i++) {
    > > ch = str;
    > > if (mode == 0) {
    > > if(!isspace(ch)) {
    > > mode = 1;
    > > start = 0;
    > > }
    > > } else {
    > > if(istax(ch)) {
    > > ret[t] = extract(str + start, (i + 1) - start);
    > > t++;
    > > } else if(isspace(ch)) {
    > > mode = 0;
    > > ret[t] = extract(str + start, (i + 1) - start);
    > > t++;
    > > }
    > > }
    > > }

    >
    > you aren't reassigning start nor using t in a meaningful way.


    er, sorry, t is ok. i still think you want to be doing something more
    with start.
    tedu, Sep 30, 2005
    #3
  4. Dan Guest

    When is start anything other than 0?
    Dan, Sep 30, 2005
    #4
  5. Re: I think I'm gonna cry. (or newby problems with simple stringfunction)

    "Dan" <> writes:
    > When is start anything other than 0?


    I have no clue what you're talking about. You need to provide some
    context when you post a followup; not everyone has easy access to the
    parent article.

    A search for "google" "followup" in this very newsgroup currently gets
    1100 hits (and now it's going to be 1101).

    Dan, maybe you can help us out. We've been telling Google users for
    months how and why to post properly using the broken groups.google.com
    interface, but it's just not working. Do you have any advice on how
    we can get the word out so this stops happening?

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
    We must do something. This is something. Therefore, we must do this.
    Keith Thompson, Sep 30, 2005
    #5
  6. Jack Klein Guest

    On 29 Sep 2005 15:45:11 -0700, wrote in
    comp.lang.c:

    > Hello!
    >
    > I've programmed in c a bit, but nothing very complicated. I've just
    > come back to it after a long sojourn in the lands of functional
    > programming and am completely stumped on a very simple function I'm
    > trying to write. I'm writing a function that takes a string, and
    > returns an array of strings which are the result of splitting the input
    > on whitespace and parentheses (but the parentheses should also be
    > included in the array as strings).
    >
    > an example:
    >
    > explode("foo bar baz") -> ["foo", "bar", "boys"]
    > explode("foo(bar)baz") -> ["foo", "(", "bar", ")", "baz"]


    You haven't shown us any code that calls this function. Do you
    actually call it with string literals, and does it attempt to modify
    them? Modifying string literals is undefined behavior.

    > Now I thought I had it. But what I've got now causes a bus error. So
    > I'm going to post all of the code (sorry) and maybe wiser minds than me
    > can work it out. Please remember I'm a noob so I would prefer things as
    > unobfuscated as possible, and I know how bad the style of my code is
    > also, I'm trying to make it work first.
    >
    > Thanks in advance, here's the code:
    >
    > #include <stdio.h>


    You haven't included <stdlib.h>, so you don't have a prototype for
    malloc() in scope. Calling malloc() without a prototype produces
    undefined behavior.

    You haven't included <string.h>, so you don't have a prototype for
    memcpy() in scope. Calling memcpy() without a prototype produces
    undefined behavior.

    You haven't included <ctype.h>, so you don't have a prototype for
    isspace() in scope.

    > char* extract(char* str, int len) {


    Since the sizeof operator yields a value of type size_t, and malloc()
    accepts a single argument of type size_t, why are you using int? It
    may not cause a problem in this case, but ultimately you are asking
    for a signed/unsigned clash.

    > char* out = (char*)malloc(len + 1);


    No, casting the value returned by malloc() is wrong. You probably did
    this to shut up a compiler diagnostic, caused by your failure to
    include <stdlib.h> and have a prototype in scope.

    > out = memcpy(out, str, len);
    > out[len] = '\0';
    > return out;
    > }
    >
    > char istax(int ch) {
    > int out = (ch=='(') | (ch==')');
    > return out;
    > }
    >
    > char** explode(char* str) {
    > int nt = counttokens(str);
    > if(!nt) {
    > return 0;
    > }
    >
    > char** ret = (char**)malloc(nt);


    Whatever you are using, it is not a conforming C compiler, not
    conforming to any version of the C language standard. Or you are not
    using it that way.

    Versions of the C standard prior to 1999 would not allow the
    declaration above, because it comes after executable statements in the
    current block. And versions of the C standard from and after 1999
    will not allow a call to a function without at least a declaration in
    scope, and you have none for malloc() or memcpy().

    Examine your compiler's documentation to determine how to invoke it as
    a conforming C compiler, or ask your question in a compiler-specific
    group.

    On the other hand, if you are compiling this code with a C++ compiler,
    ask in comp.lang.c++.

    [snip]

    Fix the problems that I've pointed out and then, if you are compiling
    this code with a conforming C compiler and still have problems, post
    again.

    --
    Jack Klein
    Home: http://JK-Technology.Com
    FAQs for
    comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
    comp.lang.c++ http://www.parashift.com/c -faq-lite/
    alt.comp.lang.learn.c-c++
    http://www.contrib.andrew.cmu.edu/~ajo/docs/FAQ-acllc.html
    Jack Klein, Sep 30, 2005
    #6
  7. wrote:
    > ...I'm writing a function that takes a string, and returns an array
    > of strings which are the result of splitting the input on whitespace
    > and parentheses (but the parentheses should also be included in the
    > array as strings).
    >
    > an example:
    >
    > explode("foo bar baz") -> ["foo", "bar", "boys"]
    > explode("foo(bar)baz") -> ["foo", "(", "bar", ")", "baz"]
    >

    <snip>
    >
    > #include <stdio.h>


    Avoid using unprototyped functions...

    #include <ctype.h>
    #include <stdlib.h>
    #include <string.h>

    > char* extract(char* str, int len) {


    String lengths and object sizes in general are better measured with
    size_t than int.

    > char* out = (char*)malloc(len + 1);


    You should check the return value of malloc.

    > out = memcpy(out, str, len);
    > out[len] = '\0';
    > return out;
    > }
    >
    > char istax(int ch) {
    > int out = (ch=='(') | (ch==')');


    Look up the difference between | and ||.

    > return out;
    > }


    This isn't really worth a function.

    > char** explode(char* str) {
    > int nt = counttokens(str);


    This design is somewhat poor. You have a separate function to
    count the number of tokens, yet you use duplicate code to
    extract the tokens. If the specifications change, then you
    need to maintain two separate pieces of code synchonously.

    > if(!nt) {
    > return 0;
    > }
    >
    > char** ret = (char**)malloc(nt);


    C90 won't let you mix declarations and statements.

    >
    > int i = 0;
    > int len = strlen(str);
    > char ch;
    > int start = 0;
    > int mode = 0;
    > int t = 0;

    <snip>

    You seem to have more indexing variables than you can handle.

    Here's one way that I might do this. The 'work' function does
    the counting and the allocation. I just scan through the string
    in question (s), and use another pointer t to mark the begining
    of an 'identifier' token. Since t can be null, it serves as a
    'mode' flag.

    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>

    char *span_dup(const char *s, const char *t)
    {
    size_t n = t ? t - s : strlen(s);
    char *m = malloc(n + 1);
    if (m) { memcpy(m, s, n); m[n] = 0; }
    return m;
    }

    size_t work(char **a, const char *s)
    {
    const char *t = 0;
    size_t n = 0;

    for (; *s; s++)
    {
    if (*s == ' ' || *s == '(' || *s == ')')
    {
    /* add any prior scanned identifier */
    if (t) { n++; if (a) *a++ = span_dup(t, s); t = 0; }

    /* add a ( or ) token */
    if (*s != ' ') { n++; if (a) *a++ = span_dup(s, s+1); }
    }
    else if (!t)
    t = s; /* start new identifier token */
    }

    /* add any last (outstanding) identifier token */
    if (t) { n++; if (a) *a++ = span_dup(t, s); }

    return n;
    }

    char **explode(const char *s)
    {
    size_t n = work(0, s);
    char **m = malloc((n + 1) * sizeof *m);
    if (m) { work(m, s); m[n] = 0; }
    return m;
    }

    int main(void)
    {
    char **s, **m = explode("Hello (World)");
    if (m == 0) return 0;
    for (s = m; *s; s++) printf("<%s>\n", *s);
    return 0;
    }

    --
    Peter
    Peter Nilsson, Sep 30, 2005
    #7
  8. Guest

    Thanks for all the responses.
    Start being assigned to 0 in the loop was very stupid and due to late
    night brain fever, it is supposed to be start = i in the loop. Yes I
    wanted ||, not |, although doesn't | do the same thing in this case?
    Anyway, I've fixed these two, and I'm still getting the same error.
    Yes I know the duplication is ugly. I wanted to get a naive
    implementation working and then factor it out, but the 'naive' version
    is turning out harder than I thought.
    The reason istax is a seperate function is because the definition of
    token delimiters is likely to change, so I wanted it to be in one place
    (and yes, I know I don't use it in counttokens. counttokens came first
    and I forgot to factor it out)
    As to the many non-standard c things I've done, thanks for pointing
    them out. I've been spoilt by a very forgiving compiler (gcc,
    presumably doing c++ on the side) which just isn't shouting at me about
    it. I'm going to fix all these mistakes, and repost the code.
    Thanks peter for the working code. I'll probably end up using that as
    it's much neater than what I've been doing, but I'd like to get mine
    working so I understand what's wrong with it.
    , Sep 30, 2005
    #8
  9. pemo Guest

    <> wrote in message
    news:...
    I've been spoilt by a very forgiving compiler (gcc,
    > presumably doing c++ on the side) which just isn't shouting at me about
    > it. I'm going to fix all these mistakes, and repost the code.
    > Thanks peter for the working code. I'll probably end up using that as
    > it's much neater than what I've been doing, but I'd like to get mine
    > working so I understand what's wrong with it.
    >


    You can tell gcc to use various dialects of C when compiling

    -ansi

    -std=c99

    Here's link:
    http://gcc.gnu.org/onlinedocs/gcc-4.0.2/gcc/C-Dialect-Options.html#C-Dialect-Options
    pemo, Oct 1, 2005
    #9
  10. Old Wolf Guest

    wrote:
    > char** explode(char* str) {
    > int nt = counttokens(str);
    > if(!nt) {
    > return 0;
    > }
    >
    > char** ret = (char**)malloc(nt);


    As well as the problems everyone else has pointed out, this one
    is quite likely to cause a crash. I think you want to allocate
    'nt' number of pointers. But actually you allocate 'nt' bytes.

    You could have avoided this problem by using 'the CLC form'
    of malloc:

    char **ret = malloc( nt * sizeof *ret );
    Old Wolf, Oct 3, 2005
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Haider Abbas Y. Narsinh

    Re: I m gonna mad

    Haider Abbas Y. Narsinh, Aug 22, 2003, in forum: ASP .Net
    Replies:
    0
    Views:
    1,004
    Haider Abbas Y. Narsinh
    Aug 22, 2003
  2. Rahmi Acar
    Replies:
    0
    Views:
    333
    Rahmi Acar
    Apr 20, 2004
  3. Michal
    Replies:
    25
    Views:
    272
    David Morton
    Mar 21, 2007
  4. Ara.T.Howard

    cry for help - make this faster.

    Ara.T.Howard, Sep 27, 2006, in forum: Ruby
    Replies:
    7
    Views:
    143
    M. Edward (Ed) Borasky
    Sep 28, 2006
  5. Victor Reyes
    Replies:
    3
    Views:
    180
    Leslie Viljoen
    Jun 11, 2008
Loading...

Share This Page