Efficiently compiled code

Discussion in 'C Programming' started by cyberscout@coooool.co.uk, Dec 1, 2005.

  1. Guest

    OK I have some code which I didn't write and I'm toying with whether I
    need to tidy it up.

    In the code is the line shown in Example 1

    Exampe 1:
    sprintf(stringvariable, "%s", "String");


    Is it any worse than simply putting.


    Example 2:
    sprintf(stringvariable, "String");


    Similarly elsewhere in the code I have found


    Example 3:
    {
    stringvariable[10];
    sprintf(stringvariable, "Help");
    textprintfunction(stringvariable);
    }


    But I would have thought the following would make tidier code:


    Example 4:
    {
    textprintfunction("Help");
    }


    Looks neater, still works, but do my bersions make more efficient code
    once
    compiled?
     
    , Dec 1, 2005
    #1
    1. Advertising

  2. Eric Sosman Guest

    wrote:

    > OK I have some code which I didn't write and I'm toying with whether I
    > need to tidy it up.
    >
    > In the code is the line shown in Example 1
    >
    > Exampe 1:
    > sprintf(stringvariable, "%s", "String");
    >
    >
    > Is it any worse than simply putting.
    >
    >
    > Example 2:
    > sprintf(stringvariable, "String");


    As shown, the outcome will be the same in both cases.
    But change "String" to "10th %ile" and there'll be a
    rather dramatic difference ...

    Better than either, IMHO, would be

    strcpy(stringvariable, "String");

    > Similarly elsewhere in the code I have found
    >
    > Example 3:
    > {
    > stringvariable[10];
    > sprintf(stringvariable, "Help");
    > textprintfunction(stringvariable);
    > }
    >
    > But I would have thought the following would make tidier code:
    >
    >
    > Example 4:
    > {
    > textprintfunction("Help");
    > }


    If textprintfunction() doesn't modify the string its
    argument points to, 3 and 4 should be equivalent. But if
    it does modify the string (e.g. by inserting linefeeds
    here and there to fit the output in a specified width),
    example 4 would produce undefined behavior -- it's a no-no
    to try to modify a string literal.

    > Looks neater, still works, but do my bersions make more efficient code
    > once
    > compiled?


    Impossible to say with certainty, since execution speed
    and the like are matters not addressed by the Standard. On
    many machines examples 1 and 2 will show roughly the same
    performance (either one might be the faster), and the strcpy()
    suggestion will usually be fastest of all. Example 4 will
    usually be faster than 3, but the difference may be negligible
    if there's actual I/O occurring.

    Suggestion: Your impulse to tidy up may be a good one (or
    may not; too many variables for me to judge), but your zeal
    for micro-optimization is probably misplaced. Pay heed to
    Jackson's Laws of Program Optimization:

    FIRST LAW OF PROGRAM OPTIMIZATION:
    Don't do it.

    SECOND LAW OF PROGRAM OPTIMIZATION (for experts only):
    Don't do it yet.

    Optimization without measurement is folly.

    --
    Eric Sosman
    lid
     
    Eric Sosman, Dec 1, 2005
    #2
    1. Advertising

  3. Simon Biber Guest

    wrote:
    > OK I have some code which I didn't write and I'm toying with whether I
    > need to tidy it up.
    >
    > In the code is the line shown in Example 1
    >
    > Exampe 1:
    > sprintf(stringvariable, "%s", "String");
    >
    >
    > Is it any worse than simply putting.
    >
    >
    > Example 2:
    > sprintf(stringvariable, "String");


    In this case, they both do the same thing. However, in general, example
    1 is much better than example 2. The problem is that if the string
    "String" could possibly be modified by an attacker, they could insert
    '%' characters and make the program read from memory that it shouldn't.
    It will cause undefined behaviour. By rigidly specifying the format
    string, you avoid this possibility of users inserting bad format strings.

    If there really is only one string to be copied into stringvariable, and
    no other content, then using strcpy would be the tidiest. Of course, you
    still need to be very careful of the length, so you don't exceed the
    storage available for stringvariable.

    > Similarly elsewhere in the code I have found
    >
    >
    > Example 3:
    > {
    > stringvariable[10];


    Missing a type here, you probably want 'char'.

    > sprintf(stringvariable, "Help");
    > textprintfunction(stringvariable);
    > }
    >
    >
    > But I would have thought the following would make tidier code:
    >
    >
    > Example 4:
    > {
    > textprintfunction("Help");
    > }
    >
    >
    > Looks neater, still works, but do my bersions make more efficient code
    > once
    > compiled?


    These two examples do not achieve the same thing at all. String literals
    are not modifiable; any attempt to modify them is undefined behaviour.
    You haven't shown us the definition of textprintfunction, but if it does
    try to modify any of the array, then example 4 has undefined behaviour.

    A clearer and better approach would be:
    char stringvariable[10] = "Help";
    textprintfunction(stringvariable);

    This still has the original meaning, of creating a 10-byte modifiable
    array and filling it with the string "Help", but avoids the cost of a
    call to sprintf.

    --
    Simon.
     
    Simon Biber, Dec 1, 2005
    #3
  4. In article <>, writes:
    > OK I have some code which I didn't write and I'm toying with whether I
    > need to tidy it up.
    >
    > Exampe 1:
    > sprintf(stringvariable, "%s", "String");
    >
    >
    > Is it any worse than simply putting.
    >
    >
    > Example 2:
    > sprintf(stringvariable, "String");


    Both are bad, but 2 is arguably worse, because it employs a style that
    encourages the introduction of format-string security holes.

    The f-functions (printf, etc) should be used to format data; that's
    what they're for. The "format string" argument to one of those
    functions should always be a format string. If you're not doing
    any formatting, use an alternative function that doesn't format; if
    there is no alternative (the case with some functions outside ISO C),
    use a format string of "%s" (possibly with appropriate size and
    precision specifiers).

    Here, use:

    strcpy(stringvariable, "String");

    or better:

    size_t length = [some expression evaluating to the size of stringvariable];
    stringvariable[0] = 0;
    strncat(stringvariable, "String", length-1);

    though there may be better ways to handle overflow, depending on
    the application.

    > Example 3:
    > {
    > stringvariable[10];
    > sprintf(stringvariable, "Help");
    > textprintfunction(stringvariable);
    > }
    >
    > But I would have thought the following would make tidier code:
    >
    > Example 4:
    > {
    > textprintfunction("Help");
    > }
    >
    >
    > Looks neater, still works, but do my bersions make more efficient code
    > once compiled?


    It's difficult to see how example 4 could be *less* efficient.
    Certainly sprintf in example 3 is pointless, and another example of
    the style which encourages format-string vulnerabilities.

    However, it's possible that textprintfunction modifies the contents
    of the area pointed to by its argument, in which case you'd need to
    keep the stringvariable array (I assume that's actually declared as
    char, not implicit int), but in that case strcpy would again be the
    better option.

    The original author of this code apparently didn't understand the
    purpose of sprintf. I'd be very leery about the quality of the rest
    of the code, and I'd recommend consulting some references on common C
    errors (some of the most common are covered in Howard, LeBlanc, and
    Viega's "19 Deadly Sins of Software Security", for example) before
    completing your review.

    --
    Michael Wojcik

    I would never understand our engineer. But is there anything in this world
    that *isn't* made out of words? -- Tawada Yoko (trans. Margaret Mitsutani)
     
    Michael Wojcik, Dec 1, 2005
    #4
  5. Mark B Guest

    "Michael Wojcik" <> wrote in message
    news:...
    >
    > In article <>,
    > writes:
    >> OK I have some code which I didn't write and I'm toying with whether I
    >> need to tidy it up.
    >>
    >> Exampe 1:
    >> sprintf(stringvariable, "%s", "String");
    >>
    >>
    >> Is it any worse than simply putting.
    >>
    >>
    >> Example 2:
    >> sprintf(stringvariable, "String");

    >
    > Both are bad, but 2 is arguably worse,
    > because it employs a style that
    > encourages the introduction of format-string security holes.
    >
    > The f-functions (printf, etc) should be used to format data; that's
    > what they're for. The "format string" argument to one of those
    > functions should always be a format string.

    The 'format string' argument to one of those functions IS always
    a format string... "String" happens to be a format string with no
    conversion specifiers... perfectly legal.

    > If you're not doing
    > any formatting, use an alternative function that doesn't format;

    Albeit not an issue with these examples, there are often 'reasons'
    for which sprintf(); is chosen over say for instance, strcpy().
    One may need to know the length of the string, or maybe they
    need to set a pointer to the back of the string for later appends...
    The formatted input functions are perfectly suited for such tasks.

    Mark
     
    Mark B, Dec 1, 2005
    #5
  6. Simon Biber <> writes:
    > wrote:
    >> OK I have some code which I didn't write and I'm toying with whether I
    >> need to tidy it up.
    >> In the code is the line shown in Example 1
    >> Exampe 1:
    >> sprintf(stringvariable, "%s", "String");
    >> Is it any worse than simply putting.
    >> Example 2:
    >> sprintf(stringvariable, "String");

    >
    > In this case, they both do the same thing. However, in general,
    > example 1 is much better than example 2. The problem is that if the
    > string "String" could possibly be modified by an attacker, they could
    > insert '%' characters and make the program read from memory that it
    > shouldn't. It will cause undefined behaviour. By rigidly specifying
    > the format string, you avoid this possibility of users inserting bad
    > format strings.


    In this particular case, I don't think an "attacker" is a realistic
    concern. This is a string literal in a program; any attacker able to
    modify it will be able to modify anything else in the program and make
    it do whatever he wants.

    The danger here for a string literal is program maintenance. If a
    future version of the program has a '%' character in the string, it's
    going to cause unexpected results (probably undefined behavior).

    And if the third argument is something other than a literal, it's
    especially important not to make assumptions about its value.

    (And of course strcpy() is probably better here than sprintf().)

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
    We must do something. This is something. Therefore, we must do this.
     
    Keith Thompson, Dec 1, 2005
    #6
  7. In article <KTGjf.41$>, "Mark B" <> writes:
    > "Michael Wojcik" <> wrote in message
    > news:...
    > >
    > > The f-functions (printf, etc) should be used to format data; that's
    > > what they're for. The "format string" argument to one of those
    > > functions should always be a format string.

    > The 'format string' argument to one of those functions IS always
    > a format string...


    A philosophical quibble. Read my original statement as "should always
    be an intentional format string" if you must.

    > "String" happens to be a format string with no
    > conversion specifiers... perfectly legal.


    I didn't claim it wasn't.

    > > If you're not doing
    > > any formatting, use an alternative function that doesn't format;

    >
    > Albeit not an issue with these examples, there are often 'reasons'
    > for which sprintf(); is chosen over say for instance, strcpy().


    I've never seen an example where using sprintf just to copy a
    string is justified, and the ones you provide are not compelling.

    > One may need to know the length of the string,


    If you don't already know the length of the source, you have a
    potential buffer overrun, and using sprintf rather than strcpy
    is the wrong solution.

    > or maybe they
    > need to set a pointer to the back of the string for later appends...


    Trivial if you already know the length of the source. Which you
    should.

    I'm with Richard Heathfield on this: the well-written program always
    needs to know that it won't overflow a buffer, and typically needs to
    know if truncation occurred; so in the vast majority of cases, it
    needs to already know the length of the source.

    > The formatted input functions are perfectly suited for such tasks.


    I disagree.

    --
    Michael Wojcik

    Every allegiance to some community eventually involves such a fetish,
    which functions as the disavowal of its founding crime: is not 'America'
    the fetish of an infinitely open space enabling every individual to
    pursue happiness in his or her own way? -- Slavoj Zizek
     
    Michael Wojcik, Dec 5, 2005
    #7
  8. Michael Wojcik said:

    > I'm with Richard Heathfield on this: the well-written program always
    > needs to know that it won't overflow a buffer, and typically needs to
    > know if truncation occurred; so in the vast majority of cases, it
    > needs to already know the length of the source.


    This is also true of poorly-written programs, of course. :)

    --
    Richard Heathfield
    "Usenet is a strange place" - dmr 29/7/1999
    http://www.cpax.org.uk
    email: rjh at above domain (but drop the www, obviously)
     
    Richard Heathfield, Dec 5, 2005
    #8
  9. Mark B Guest

    "Michael Wojcik" <> wrote in message
    news:...
    >
    > In article <KTGjf.41$>, "Mark B"
    > <> writes:
    >> "Michael Wojcik" <> wrote in message
    >> news:...
    >> >
    >> > The f-functions (printf, etc) should be used to format data; that's
    >> > what they're for. The "format string" argument to one of those
    >> > functions should always be a format string.

    >> The 'format string' argument to one of those functions IS always
    >> a format string...

    >
    > A philosophical quibble. Read my original statement as "should always
    > be an intentional format string" if you must.


    Bullshit. A string literal "IS" an intentional format string, no?
    Your assertion implied that a format string must contain a
    conversion specifier.

    >> "String" happens to be a format string with no
    >> conversion specifiers... perfectly legal.

    > I didn't claim it wasn't.


    But you most definately implied that it wasn't.

    >> > If you're not doing
    >> > any formatting, use an alternative function that doesn't format;

    >>
    >> Albeit not an issue with these examples, there are often 'reasons'
    >> for which sprintf(); is chosen over say for instance, strcpy().

    >
    > I've never seen an example where using sprintf just to copy a
    > string is justified, and the ones you provide are not compelling.


    Sure they are, you just lack imagination.

    >> One may need to know the length of the string,

    > If you don't already know the length of the source, you have a
    > potential buffer overrun, and using sprintf rather than strcpy
    > is the wrong solution.


    Hmm... I'd wager the opposite is true. More-often what
    matters is 'sizeof' destination.
    Consider this, regardless of source length,
    with sprintf() I can do something along the lines of:
    sprintf(buf, "%.*s", sizeof buf -1, src);
    How does strcpy() save you even if the length of src is known?

    >> or maybe they
    >> need to set a pointer to the back of the string for later appends...

    > Trivial if you already know the length of the source. Which you
    > should.


    Not necessarily. How did you determine the length of the source?
    Trivial example (since you've never seen one) requirements:
    Read lines from stdin until they are exausted.
    Although line lengths vary, the maximum length of 1 line is
    guaranteed not to exceed 255 bytes (including newline).
    Each line (whose actual length must also be determined) needs
    to be duplicated prior to calling functions to massage the data.

    <example>
    ....
    char buf[2][256];
    while(fgets(buf[0], sizeof buf[0], stdin) != NULL) {
    int len = sprintf(buf[1], "%s", buf[0]);
    ...
    }
    ....
    </example>

    OK, so now you can see an example.
    There is no potential for buffer overrun, agreed?
    One call to sprintf() copies the string and determines it's length.
    Maybe I'm just naive and using the wrong solution, but if so, how
    would you recommend I turn this garbage into a well-written program?

    > I'm with Richard Heathfield on this: the well-written program always
    > needs to know that it won't overflow a buffer,

    Granted.

    > and typically needs to
    > know if truncation occurred;

    Granted.

    > so in the vast majority of cases, it
    > needs to already know the length of the source.

    Bullshit.

    >> The formatted input functions are perfectly suited for such tasks.

    > I disagree.

    Still?
     
    Mark B, Dec 7, 2005
    #9
  10. In article <43973a73$0$8187$>, "Mark B" <> writes:
    > "Michael Wojcik" <> wrote in message
    > news:...
    > > In article <KTGjf.41$>, "Mark B"
    > > <> writes:
    > >> "Michael Wojcik" <> wrote in message
    > >> news:...
    > >> >
    > >> > The f-functions (printf, etc) should be used to format data; that's
    > >> > what they're for. The "format string" argument to one of those
    > >> > functions should always be a format string.
    > >> The 'format string' argument to one of those functions IS always
    > >> a format string...

    > >
    > > A philosophical quibble. Read my original statement as "should always
    > > be an intentional format string" if you must.

    >
    > Bullshit.


    Ah, the sign of an interlocutor confident in his argument.

    > A string literal "IS" an intentional format string, no?


    Only if the writer intended it to be one. That's what "intentional"
    means.

    > Your assertion implied that a format string must contain a
    > conversion specifier.


    It did no such thing. You inferred it, incorrectly.

    > >> "String" happens to be a format string with no
    > >> conversion specifiers... perfectly legal.

    > > I didn't claim it wasn't.

    >
    > But you most definately implied that it wasn't.


    Wrong, but thanks for playing.

    > >> > If you're not doing
    > >> > any formatting, use an alternative function that doesn't format;
    > >>
    > >> Albeit not an issue with these examples, there are often 'reasons'
    > >> for which sprintf(); is chosen over say for instance, strcpy().

    > >
    > > I've never seen an example where using sprintf just to copy a
    > > string is justified, and the ones you provide are not compelling.

    >
    > Sure they are, you just lack imagination.


    Clearly they aren't, since you haven't compelled any agreement.

    Let's check the score: so far you fail to understand the concept of
    intent; the difference between implication and inference; and what
    constitutes a compelling argument. You are pretty handy with the
    brusque dismissal, though. Pity that doesn't constitute an argument.

    > >> One may need to know the length of the string,

    > > If you don't already know the length of the source, you have a
    > > potential buffer overrun, and using sprintf rather than strcpy
    > > is the wrong solution.

    >
    > Hmm... I'd wager the opposite is true.


    The opposite of what? If you don't know the length of the source,
    you *don't* have a potential buffer overrun? You'd lose that wager.

    > More-often what
    > matters is 'sizeof' destination.


    Both matter, every time, as is obvious to anyone with any
    comprehension of the language.

    > Consider this, regardless of source length,
    > with sprintf() I can do something along the lines of:
    > sprintf(buf, "%.*s", sizeof buf -1, src);


    And you can do the equivalent with strncat (and it works in precisely
    the same cases, ie when "buf" refers to an array type and not a
    pointer type). And strncat has the following advantages:

    - It's not variadic, so it's type-safe.
    - It doesn't have to parse a format string, so it's almost certainly
    more efficient.
    - It doesn't require an extraneous string literal.
    - It requires one fewer parameter, which is more efficient on some
    implementations.

    Also, your sprintf call, as written, is incorrect and I believe
    invokes UB. The result of the sizeof operator has type size_t, so
    "sizeof buf -1" has type size_t; when an asterisk is used as a field
    width or precision specifier, it requires an argument of type int.
    So we can add another:

    - For some reason, many programmers seem to have difficulty using
    sprintf correctly.

    That's why sprintf is the wrong solution.

    > How does strcpy() save you even if the length of src is known?


    I never said that it was. (And no, I did not imply it, either.)
    I said that sprintf was the wrong solution, not that strcpy was
    the right one.

    > >> or maybe they
    > >> need to set a pointer to the back of the string for later appends...

    > > Trivial if you already know the length of the source. Which you
    > > should.

    >
    > Not necessarily. How did you determine the length of the source?


    I think that ought to be left as an exercise for the reader. Here's
    a hint: 7.21.6.3.

    > Trivial example (since you've never seen one) requirements:
    > Read lines from stdin until they are exausted.
    > Although line lengths vary, the maximum length of 1 line is
    > guaranteed not to exceed 255 bytes (including newline).
    > Each line (whose actual length must also be determined) needs
    > to be duplicated prior to calling functions to massage the data.


    This one isn't compelling, either.

    > char buf[2][256];
    > while(fgets(buf[0], sizeof buf[0], stdin) != NULL) {
    > int len = sprintf(buf[1], "%s", buf[0]);
    > ...
    > }


    > OK, so now you can see an example.


    Try reading for comprehension. What I hadn't seen an example of
    is a case "where using sprintf just to copy a string is justified".
    You haven't justified it.

    > There is no potential for buffer overrun, agreed?


    Why would there be?

    > One call to sprintf() copies the string and determines it's length.


    (That's "its". No apostrophe in the possessive pronoun.)

    So what? On what grounds do you claim this is superior to
    determining the string's length and copying it with separate calls to
    strlen and strcpy (or strlen and memcpy, or any other variation)?

    > Maybe I'm just naive and using the wrong solution, but if so, how
    > would you recommend I turn this garbage into a well-written program?


    The obviously superior solution is strlen and strcpy. Still O(N),
    without the overhead of sprintf, or the other problems listed above.
    Uses the obvious approach, improving code clarity.

    > > I'm with Richard Heathfield on this: the well-written program always
    > > needs to know that it won't overflow a buffer,

    > Granted.
    >
    > > and typically needs to
    > > know if truncation occurred;

    > Granted.
    >
    > > so in the vast majority of cases, it
    > > needs to already know the length of the source.

    > Bullshit.


    Very well. Show some data demonstrating that in a sizable minority
    of cases, C programs do not need to know the length of the source
    when copying strings.

    > >> The formatted input functions are perfectly suited for such tasks.

    > > I disagree.

    > Still?


    Yes.

    --
    Michael Wojcik

    Painful lark, labouring to rise!
    The solemn mallet says:
    In the grave's slot
    he lies. We rot. -- Basil Bunting
     
    Michael Wojcik, Dec 11, 2005
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. ibeetb

    Programming Excel Efficiently

    ibeetb, Dec 15, 2003, in forum: ASP .Net
    Replies:
    4
    Views:
    415
    Alvin Bruney
    Dec 16, 2003
  2. Klaus Schneider
    Replies:
    1
    Views:
    576
    Rolf Magnus
    Dec 2, 2004
  3. Pipiten
    Replies:
    3
    Views:
    407
    Gabriel Genellina
    Sep 20, 2006
  4. lander
    Replies:
    5
    Views:
    619
    bruce barker
    Mar 5, 2008
  5. Thomas Jollans
    Replies:
    5
    Views:
    1,742
    Thomas Jollans
    Jun 14, 2010
Loading...

Share This Page