Why does strcat mess up the tokens in strtok (and strtok_r)?

Discussion in 'C Programming' started by DFS, Jun 11, 2014.

  1. DFS

    Ian Collins Guest

    Return value optimisation (RVO). The copy goes directly into the return
    value location to avoid creating (and copying) a temporary object. This
    handy little compiler trick makes returning non-trivial objects from
    functions efficient.
    Ian Collins, Jun 14, 2014
    1. Advertisements

  2. DFS

    Ian Collins Guest

    I thought gcc would if str where declared const, but it doesn't which
    just goes to show you can't rely on these things! Sun Studio cc inlines
    strlen, but again, it doesn't optimise to a single call.
    Ian Collins, Jun 14, 2014
    1. Advertisements

  3. DFS

    Joe Pfeiffer Guest

    It would seem like it, but this does not turn out to be the case. If
    you've got a working implementation of any sufficiently capable
    language, you can use that language to write a compiler for whatever
    language you want. Your compiler will (normally) translate to machine
    code or assembly code (relying on an assembler to go to machine code),
    but that doesn't mean the compiler itself needs to be written in
    Joe Pfeiffer, Jun 14, 2014
  4. DFS

    DFS Guest

    I'm curious: How do you know gcc evaluates strlen(str) each iteration?

    What does it mean to 'inline strlen'?
    DFS, Jun 14, 2014
  5. gcc here (4.8.2) will do both -- inline it and move it out of the loop,
    though at higher levels of optimisation it only does one (more it out of
    the loop). Of course, it depends on the fact that the code is simple
    and contains no transfers of control outside of the translation unit
    (other than strlen -- but the compiler is permitted to know what that

    (I don't think const can help here. After all, it says nothing that the
    compiler can't check anyway.)
    Ben Bacarisse, Jun 14, 2014
  6. DFS

    Joe Pfeiffer Guest

    It's defined as evaluating the termination condition on every
    iteration. That includes making any function calls.

    Didn't I already say that?
    Instead of generating code to actually call the function, it takes the
    actual code of the function itself and inserts it into the program (so
    it puts it in line).

    I'm a little surprised that a compiler that could inline something as
    simple as strlen() couldn't then optimize to a single calculation.
    Joe Pfeiffer, Jun 14, 2014
  7. DFS

    Ian Collins Guest

    By misreading the generated assembly :( It does move strlen(str) out of
    the loop.
    It includes the strlen cone inline rather than calling the function.
    Ian Collins, Jun 14, 2014
  8. I do it by adding -S and looking at the code generated (see man gcc).
    It means replacing the call with a copy of the body of the function, or,
    more loosely, replacing the call with code that is functionally
    equivalent. For example, when optimising for space on x86, gcc might
    replace a call to strlen with a "repnz scasb" micro-loop. (I don't want
    to call it "an instruction" but it's not a normal loop either.)
    Ben Bacarisse, Jun 14, 2014
  9. DFS

    James Kuyper Guest

    It uses the entire string stored in delim. It searches that string for
    the first match to str. It returns a pointer to the first such match,
    or NULL if there is no such match. The if() condition will therefore be
    true if str matches any of the delimiters, and false otherwise.
    James Kuyper, Jun 14, 2014
  10. DFS

    Tim Rentsch Guest

    Programming interview test question - What is wrong
    with this code, and why does it not matter?
    Tim Rentsch, Jun 14, 2014
  11. DFS

    Tim Rentsch Guest

    A few comments...

    1. Using floating point may give problems with rounding

    2. Using double probably limits the results to 50-ish bits
    (ie, on common processors).

    3. It's easy to write an integer version that gives results
    exact to 64 bits, and fairly quickly, by simple iteration, or
    three parameter recursion.

    4. Using d'Ocagne's identity, an integer four parameter
    recursive version can be written that is both 64-bit
    exact and runs faster than the floating point version.

    I leave 3 and 4 as exercises (but feel free to ask for
    help if you get stuck).
    Tim Rentsch, Jun 14, 2014
  12. What I mean is you can define strcat as
    char* strcat(char *tgt, const char *src)
    return strncat(tgt, src, strlen(src));

    Using strncat this way will always copy exactly strlen(src) bytes to
    the end of tgt and then append exactly one '\0' after that. This
    happens to be the exact definition of what strcat does.
    I said it could be an implementation of strcat, not a definition. See
    Barry Schwarz, Jun 14, 2014
  13. DFS

    BartC Guest

    So does it make a copy of that 1000000-character string or not? Or does it
    simply save having to copy it twice?

    (If I write this in Python:

    def copy-of_in_py(s):
    return s

    then it doesn't even copy it once! And is also *considerably* easier on the
    eyes without all that '::std::string' and 'const' clutter!)
    BartC, Jun 14, 2014
  14. DFS

    Ike Naar Guest

    Here lurks a subtle bug:
    1) if delim has less than three elements, the behaviour is undefined if
    e.g. delim[2] is ever accessed.
    2) if delim does have at least three elements but strlen(delim) < 2, the
    conditional expression will reference characters that appear after
    delim's null terminator, which may lead to surprising results.

    Here's an example:

    char words[] = "BACH:LISZT:CAGE";
    char delimiters[3];
    strcpy(delimiters, ":");
    int length = getMaxWordLength(words, delimiters);
    /* expecting length=5 i.e. length of longest word in {BACH,LISZT,CAGE} */

    but what might actually happen:

    char words[] = "BACH:LISZT:CAGE";
    char delimiters[3];
    /* delimiters[] contains three indeterminate values, say {'X','Y','Z'}
    strcpy(delimiters, ":");
    /* delimiters[] contains {':','\0','Z');
    int length = getMaxWordLength(words, delimiters);
    /* finding length=4 i.e. length of longest word in {BACH,LIS,T,CAGE} */
    Ike Naar, Jun 14, 2014
  15. DFS

    Ian Collins Guest

    Given the requirement is to copy the string, then the function wouldn't
    be much use if it didn't, would it? The the actual copy of the data
    will most likely not happen until the the returned value gets changed
    assuming the string uses copy on write.
    So how can it be a "copy-of_in"?

    Part of that (the initial "::") is unnecessary clutter in C++. The rest
    is telling the compiler we are using string from the std namespace, not
    Ian Collins, Jun 14, 2014
  16. DFS

    BartC Guest

    So the million characters (or whatever it might be), *are* copied, it just
    gets delayed until a bit later (presumably just so you can claim that the
    function itself is fast!). Unless of course the new string never gets
    modified, in which case you can probably write a more efficient C version
    too (which emulates the Python method).
    It's effectively a copy for all intents and purposes. Although for this to
    work, the string has to be immutable...

    In all cases, assuming the resulting string needs to be modified (otherwise
    there's little point in making a copy), then the 1M-char string of my
    example needs to be duplicated.

    (There might be a mild advantage in being able to create a 'copy' of a
    string, without being sure when or if it will need duplicating.

    I assume also that in the C++ example, if the original is written to or
    destroyed, that that will trigger a duplication of the copy. But with
    potential thousands of copies of the same string that could be in existence,
    that implies quite a lot going on behind the scenes.)
    BartC, Jun 14, 2014
  17. DFS

    Ian Collins Guest

    If you make a copy and then modify one of the strings, the data
    associated with the modified string will have to be changed somehow, in
    any language, won't it?
    No, why would it?
    No, just the copy that changes.
    Ian Collins, Jun 14, 2014
  18. DFS

    BartC Guest

    OK, so the claim originally made by Stefan Ram was a little disingenuous;
    the copy function itself is efficient, but only because the actual copy
    operation is done later. (In the meantime, any operations done to the string
    have to be monitored, to make sure that copy is performed, unlike the C
    version where there are no such overheads.)
    Suppose there is a string S = "abcdefghijklmnopqrstuvwxyz", and you do:

    S2 = copy_of_in_cpp(S);
    S3 = copy_of_in_cpp(S);
    S4 = copy_of_in_cpp(S);
    S5 = copy_of_in_cpp(S);

    Presumably, at this point, S2, S3, S4, S5 still point to the same string as

    In that case, what happens when I do S[0]='X'; ?

    What happens when I do S="Another string"; ?

    What you say implies that the original string exists independently of any of
    these (is not directly owned by them), and that there is a form of garbage
    collection in place that knows have many references are pointing to any
    string. I thought C++ didn't use GC.
    BartC, Jun 14, 2014
  19. DFS

    Ian Collins Guest

    With C "strings" you can't use copy on write, you have to make a copy.
    With C++ (and I expect most scripting languages) you can because you can
    detect and write operation on the string. That's where the saving
    comes: a copy can be as quick as copying two pointers (reference count
    and the string data), you only pay the price of copying the data if you
    have to.
    Could and should do, yes.
    Assuming the string uses reference counting, the reference count to the
    original data is decremented and a new bloc is assigned to S and the
    data copied.
    That's correct, typically this will be reference counting. No need for
    any of that GC nonsense in C++ thanks to RAII. Once all of the copies
    of the string are destroyed, the data will be freed.
    Ian Collins, Jun 14, 2014
  20. DFS

    Kaz Kylheku Guest

    What's wrong: that it doesn't do str += l + strspn(str + l, delim);

    Why it doesn't matter: l will be zero at the top of the next iteration, which
    doesn't affect the correctness of the value of max, and allows the strspn to
    proceed from the new current str location.
    Kaz Kylheku, Jun 14, 2014
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.