Replacing NULLS with space (C strings)

Discussion in 'C Programming' started by peter, Feb 9, 2012.

  1. peter

    peter Guest

    In fact, I want to remove all NULLS and EOFs (0x1a)
    from a string then replace them all with spaces. The way I do it
    now is by using a for() loop:

    for(temp=0;temp<=strlen(buffer);temp++)
    {
    if(buffer[temp]== '\0' || buffer[temp]==0x1A)
    {buffer[temp]=' ';}
    }

    Is there a faster / more efficient way of doing this?
     
    peter, Feb 9, 2012
    #1
    1. Advertising

  2. peter

    James Kuyper Guest

    On 02/09/2012 03:19 PM, peter wrote:
    > In fact, I want to remove all NULLS and EOFs (0x1a)


    EOF is a macro defined in <stdio.h>. It's required to have a negative
    value, which 0x1A does not, so they can't be the same. EOF very
    commonly, though not universally, has a value of -1.

    There have been systems where 0x1A was used to indicate the end of a
    file. However, such systems are far from universal. I'd recommend making
    sure that this value is indeed being used that way in all of the
    contexts in which you want to use this code.

    > from a string then replace them all with spaces. The way I do it
    > now is by using a for() loop:
    >
    > for(temp=0;temp<=strlen(buffer);temp++)
    > {
    > if(buffer[temp]== '\0' || buffer[temp]==0x1A)
    > {buffer[temp]=' ';}
    > }
    >
    > Is there a faster / more efficient way of doing this?


    By definition, strlen(buffer) gives you the offset of the very first
    null character in buffer (or, if there is none, it keeps searching past
    the end of buffer until if finds one; this often results in memory
    access violations - make VERY sure that your buffer is in fact null
    terminated before calling strlen). Therefore, there's no point in
    checking for null characters before you reach then end of the loop; and
    you're guaranteed to find one once you reach that end.

    I suspect that you have some kind of misunderstanding, that led you to
    think that your code could find a null character in some other locations
    as well. However, for the rest of this message I'll assume you intended
    it to handle null characters exactly the way it actually does.

    You have strlen() scanning sequentially through buffer looking for the
    first null character, and then you have your for loop scanning
    sequentially through buffer looking for null characters and 0x1A. Why
    not do it in a single pass?

    Your code sets the final terminating null character to blank. This
    guarantees that strlen(buffer) can no longer be used to tell you where
    that character used to be. If you're planning to do anything further
    with that portion of buffer, you'd better do something to keep track of
    where it ends.

    You don't say what the element type of buffer is; I'll assume it's char;
    make appropriate adjustments below if it's something else.

    for(char *p = buffer; *p; p++)
    if(*p == 0x1A)
    *p = ' ';

    *p++ = ' ';
    ptrdiff_t length = p - buffer;
     
    James Kuyper, Feb 9, 2012
    #2
    1. Advertising

  3. peter <> writes:

    > In fact, I want to remove all NULLS and EOFs (0x1a)
    > from a string then replace them all with spaces. The way I do it
    > now is by using a for() loop:
    >
    > for(temp=0;temp<=strlen(buffer);temp++)
    > {
    > if(buffer[temp]== '\0' || buffer[temp]==0x1A)
    > {buffer[temp]=' ';}
    > }


    > Is there a faster / more efficient way of doing this?


    strlen(buffer) will return the offset of the first '\0' encountered,
    so the code above doesn't make that much sense. Also, it is not very
    effecient to call strlen for each iteration of the loop. Especially
    with patological code like this, the comåpiler will be unable to
    optimize the repeated calls away, as you are modifying the object you
    are giving as argument.

    Either call strlen once and use that result in the entire loop:

    len = strlen (buffer);
    for (temp = 0 ; temp < len ; temp++) {
    if (buffer[temp] == 0x1a) { buffer[temp] = ' '; }
    }

    Or skip the strlen call entirely, and check for end of string at the
    same time as check for modification:

    temp = 0;

    while (buffer[temp]) {
    if (buffer[temp] == 0x1a) { buffer[temp] = ' '; }
    temp++;
    }

    --
    /Wegge

    Leder efter redundant peering af dk.*,linux.debian.*
     
    Anders Wegge Keller, Feb 9, 2012
    #3
  4. peter

    James Kuyper Guest

    On 02/09/2012 04:04 PM, Anders Wegge Keller wrote:
    > peter <> writes:
    >
    >> In fact, I want to remove all NULLS and EOFs (0x1a)
    >> from a string then replace them all with spaces. The way I do it
    >> now is by using a for() loop:
    >>
    >> for(temp=0;temp<=strlen(buffer);temp++)
    >> {
    >> if(buffer[temp]== '\0' || buffer[temp]==0x1A)
    >> {buffer[temp]=' ';}
    >> }

    >
    >> Is there a faster / more efficient way of doing this?

    >
    > strlen(buffer) will return the offset of the first '\0' encountered,
    > so the code above doesn't make that much sense. Also, it is not very
    > effecient to call strlen for each iteration of the loop.


    I didn't notice that - that's embarrassing (not as embarrassing as
    having written such code, but close). It's worse than merely being
    horrendously inefficient; with the terminating null character being
    replaced with ' ' inside the loop, followed by immediate recalculation
    of the length of the supposedly null-terminated string, the loop will
    never terminate until something goes very badly wrong (and possibly not
    even then).
     
    James Kuyper, Feb 9, 2012
    #4
  5. peter

    John Gordon Guest

    In <jh19o6$qnl$> peter <> writes:

    > In fact, I want to remove all NULLS and EOFs (0x1a)
    > from a string then replace them all with spaces. The way I do it
    > now is by using a for() loop:


    > for(temp=0;temp<=strlen(buffer);temp++)
    > {
    > if(buffer[temp]== '\0' || buffer[temp]==0x1A)
    > {buffer[temp]=' ';}
    > }


    C strings are terminated by a NULL character. Therefore, by definition,
    you won't find any NULLs in the string itself.

    --
    John Gordon A is for Amy, who fell down the stairs
    B is for Basil, assaulted by bears
    -- Edward Gorey, "The Gashlycrumb Tinies"
     
    John Gordon, Feb 9, 2012
    #5
  6. peter <> writes:
    > In fact, I want to remove all NULLS and EOFs (0x1a)
    > from a string then replace them all with spaces. The way I do it
    > now is by using a for() loop:
    >
    > for(temp=0;temp<=strlen(buffer);temp++)
    > {
    > if(buffer[temp]== '\0' || buffer[temp]==0x1A)
    > {buffer[temp]=' ';}
    > }
    >
    > Is there a faster / more efficient way of doing this?


    There's probably no faster way than a for loop, but yours can be
    improved considerably by not calling strlen() on each iteration.
    strlen() has to scan the entire string, and you're doing that once for
    each character.

    Also, the correct condition is "<", not "<=". For example if the
    string's value is "hello", then strlen() returns 5, but you want to
    check positions 0 through 4.

    const size_t len = strlen(buffer);
    for (i = 0; i < len; i ++) {
    ...
    }

    And some terminology issues. NULL is (a macro that expands to)
    a null *pointer* constant; the null character is better referred
    to as NUL, or just '\0'. (Yes, some character set standards do
    call it NULL, but using that name can be confusing.)

    And EOF is a macro that expands to a negative integer constant
    expression, typically (-1). 0x1A is the control-Z character,
    which is used on some systems, to indicate an end-of-file condition.

    Finally, strlen() searches for the '\0' character that marks the end
    of a string. If your buffer might have multiple '\0' characters in
    it, then it isn't a string, and you should use some other technique
    to determine how long it is (or how long the relevant portion of
    it is).

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Will write code for food.
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Keith Thompson, Feb 9, 2012
    #6
  7. John Gordon <> writes:
    > In <jh19o6$qnl$> peter <> writes:
    >> In fact, I want to remove all NULLS and EOFs (0x1a)
    >> from a string then replace them all with spaces. The way I do it
    >> now is by using a for() loop:

    >
    >> for(temp=0;temp<=strlen(buffer);temp++)
    >> {
    >> if(buffer[temp]== '\0' || buffer[temp]==0x1A)
    >> {buffer[temp]=' ';}
    >> }

    >
    > C strings are terminated by a NULL character. Therefore, by definition,
    > you won't find any NULLs in the string itself.


    Null is (a macro that expands to) a null *pointer* constant.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Will write code for food.
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Keith Thompson, Feb 9, 2012
    #7
  8. Keith Thompson <> writes:
    > John Gordon <> writes:
    >> In <jh19o6$qnl$> peter <> writes:
    >>> In fact, I want to remove all NULLS and EOFs (0x1a)
    >>> from a string then replace them all with spaces. The way I do it
    >>> now is by using a for() loop:

    >>
    >>> for(temp=0;temp<=strlen(buffer);temp++)
    >>> {
    >>> if(buffer[temp]== '\0' || buffer[temp]==0x1A)
    >>> {buffer[temp]=' ';}
    >>> }

    >>
    >> C strings are terminated by a NULL character. Therefore, by definition,
    >> you won't find any NULLs in the string itself.

    >
    > Null is (a macro that expands to) a null *pointer* constant.


    I meant to type NULL, of course.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Will write code for food.
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Keith Thompson, Feb 9, 2012
    #8
  9. On Feb 9, 8:19 pm, peter <> wrote:
    > In fact, I want to remove all NULLS and EOFs (0x1a)
    > from a string then replace them all with spaces. The way I do it
    > now is by using a for() loop:
    >
    >  for(temp=0;temp<=strlen(buffer);temp++)
    >    {
    >     if(buffer[temp]== '\0' || buffer[temp]==0x1A)
    >       {buffer[temp]=' ';}
    >    }
    >
    > Is there a faster / more efficient way of doing this?
    >

    Yes.

    get the length of the data in the buffer. Only you can do that.
    Probably you want to exclude the last terminating nul from the
    replacement, but maybe not, depending on how you're going to use the
    data. You might even ned to add a nul.


    Then just do this.

    len = data_length_got_somehow;
    for(i=0;i<len;i++)
    if(buffer == 0 || buffer == 0x1a)
    buffer = ' ';
    /* possibly you need to do this, but make sure that buffer is one
    bigger than len */
    buffer = 0;

    If you call strlen() in the for control statement, the length of the
    string will be reclaculated on each iteration, which is slow. Also
    since you want to replace nuls, it's a bug.
    --
    Visit my website
    http://www.malcolmmclean.site11.com/www
     
    Malcolm McLean, Feb 9, 2012
    #9
  10. On Feb 10, 12:22 am, pete <> wrote:
    >
    > By definition, a string includes a null character.
    >
    > ISO/IEC 9899:201x Committee Draft — April 12, 2011 N1570
    > 7. Library
    > 7.1 Introduction
    > 7.1.1 Definitions of terms
    > 1     A string is a contiguous sequence of characters
    >       terminated by and including the first null character.
    >

    In ANSI C terminology. That's so that they can use the term "string"
    in describing library functions without constantly having to specify
    that it must be nul-terminated.
    However the strings in your C program may not be nul-terminated.
     
    Malcolm McLean, Feb 10, 2012
    #10
  11. Malcolm McLean <> writes:
    > On Feb 10, 12:22 am, pete <> wrote:
    >> By definition, a string includes a null character.
    >>
    >> ISO/IEC 9899:201x Committee Draft — April 12, 2011 N1570
    >> 7. Library
    >> 7.1 Introduction
    >> 7.1.1 Definitions of terms
    >> 1     A string is a contiguous sequence of characters
    >>       terminated by and including the first null character.
    >>

    > In ANSI C terminology. That's so that they can use the term "string"
    > in describing library functions without constantly having to specify
    > that it must be nul-terminated.
    > However the strings in your C program may not be nul-terminated.


    Then they're not strings, and calling them that will cause confusion.
    They might well be some data structure that acts like a string in a
    more general sense, but then there should be an unambiguous name for it.

    And so far, we have no idea what kind of data structure the OP is
    dealing with, other than an array of characters.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Will write code for food.
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Keith Thompson, Feb 10, 2012
    #11
  12. peter

    Kaz Kylheku Guest

    On 2012-02-10, Keith Thompson <> wrote:
    > Malcolm McLean <> writes:
    >> However the strings in your C program may not be nul-terminated.

    >
    > Then they're not strings, and calling them that will cause confusion.


    No, it won't. If I have a "struct string" in my C program, nobody in their
    right mind assumes that this still refers to a null-terminated array of char,
    without looking inside the struct and the surrounding functions.

    What are you going to complain about next? That we should not use the term
    "header" in packet processing code because that refers to the units processed
    by the #include directive?

    > They might well be some data structure that acts like a string in a
    > more general sense, but then there should be an unambiguous name for it.


    That unambiguous name is "character string".

    You do not get to rename the fundamental concepts in computer science, sorry.
     
    Kaz Kylheku, Feb 10, 2012
    #12
  13. peter

    Shao Miller Guest

    On 2/9/2012 19:36, Malcolm McLean wrote:
    > However the strings in your C program may not be nul-terminated.


    "peter" versus "pete," for what it's worth.
     
    Shao Miller, Feb 10, 2012
    #13
  14. On Feb 10, 2:05 am, Kaz Kylheku <> wrote:
    > On 2012-02-10, Keith Thompson <> wrote:
    >
    > > Malcolm McLean <> writes:
    > >> However the strings in your C program may not be nul-terminated.

    >
    > > Then they're not strings, and calling them that will cause confusion.

    >
    > No, it won't. If I have a  "struct string" in my C program, nobody in their
    > right mind assumes that this still refers to a null-terminated array of char,
    > without looking inside the struct and the surrounding functions.
    >

    It does cause confusion, of course. Because if you create a struct
    string then you've got two string types in the program. But calling it
    struct text or something similar would cause even more confusion. Most
    people expect a struct string to consist of a character buffer, length
    member, and maybe a few oddments to indicate a read-only string or a
    non-ASCII alphabet. I would always nul-terminate the buffer if I
    could, but not everyone agrees. If for some reason you need strings
    that index into each other, this might not be possible.
    --
    Malcom's website. Check out the MiniBasic project
    http://www.malcolmmclean.site11.com/www
     
    Malcolm McLean, Feb 10, 2012
    #14
  15. peter

    John Gordon Guest

    In <> Kenneth Brody <> writes:

    > On 2/9/2012 4:16 PM, John Gordon wrote:
    > > In<jh19o6$qnl$> peter<> writes:
    > >
    > >> In fact, I want to remove all NULLS and EOFs (0x1a)

    > [...]
    > > C strings are terminated by a NULL character. Therefore, by definition,
    > > you won't find any NULLs in the string itself.


    > <nit type="not so minor">
    > There is no such thing as a "NULL character" in C. Rather, strings are
    > terminated by a "null character". The all-caps "NULL" is a macro
    > representing the "null pointer".
    > </nit>


    Keith pointed out the same thing. I stand corrected!

    --
    John Gordon A is for Amy, who fell down the stairs
    B is for Basil, assaulted by bears
    -- Edward Gorey, "The Gashlycrumb Tinies"
     
    John Gordon, Feb 10, 2012
    #15
  16. peter <> writes:
    > In fact, I want to remove all NULLS and EOFs (0x1a)


    NULs and control-Zs

    > from a string


    from a buffer

    > then replace them all with spaces. The way I do it
    > now is by using a for() loop:
    >
    > for(temp=0;temp<=strlen(buffer);temp++)
    > {
    > if(buffer[temp]== '\0' || buffer[temp]==0x1A)
    > {buffer[temp]=' ';}
    > }
    >
    > Is there a faster / more efficient way of doing this?


    It's probably a good idea to investigate how those bytes got into
    your buffer in the first place. Where did the data come from?

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Will write code for food.
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Keith Thompson, Feb 10, 2012
    #16
  17. peter

    Shao Miller Guest

    On 2/10/2012 14:08, Keith Thompson wrote:
    > peter<> writes:
    >> In fact, I want to remove all NULLS and EOFs (0x1a)

    >
    > NULs and control-Zs
    >
    >> from a string

    >
    > from a buffer
    >
    >> then replace them all with spaces. The way I do it
    >> now is by using a for() loop:
    >>
    >> for(temp=0;temp<=strlen(buffer);temp++)
    >> {
    >> if(buffer[temp]== '\0' || buffer[temp]==0x1A)
    >> {buffer[temp]=' ';}
    >> }
    >>
    >> Is there a faster / more efficient way of doing this?

    >
    > It's probably a good idea to investigate how those bytes got into
    > your buffer in the first place. Where did the data come from?
    >


    So far, we haven't seen much in the way of follow-ups from peter.
    Multiple people have asked about the books regarding "pointer versus
    array," but the lack of a peter-response doesn't prevent people from
    continuing to invest time in responses to peter's queries, which is
    fortunate for peter. :)
     
    Shao Miller, Feb 11, 2012
    #17
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Shuo Xiang

    Stack space, global space, heap space

    Shuo Xiang, Jul 9, 2003, in forum: C Programming
    Replies:
    10
    Views:
    2,925
    Bryan Bullard
    Jul 11, 2003
  2. Christian Seberino
    Replies:
    21
    Views:
    1,694
    Stephen Horne
    Oct 27, 2003
  3. Ian Bicking
    Replies:
    2
    Views:
    1,038
    Steve Lamb
    Oct 23, 2003
  4. Ian Bicking
    Replies:
    2
    Views:
    737
    Michael Hudson
    Oct 24, 2003
  5. Matt Waite
    Replies:
    4
    Views:
    353
    Matt Waite
    Feb 1, 2007
Loading...

Share This Page