Replacing NULLS with space (C strings)

P

peter

In fact, I want to remove all NULLS and EOFs (0x1a)
from a string then replace them all with spaces. The way I do it
now is by using a for() loop:

for(temp=0;temp<=strlen(buffer);temp++)
{
if(buffer[temp]== '\0' || buffer[temp]==0x1A)
{buffer[temp]=' ';}
}

Is there a faster / more efficient way of doing this?
 
J

James Kuyper

In fact, I want to remove all NULLS and EOFs (0x1a)

EOF is a macro defined in <stdio.h>. It's required to have a negative
value, which 0x1A does not, so they can't be the same. EOF very
commonly, though not universally, has a value of -1.

There have been systems where 0x1A was used to indicate the end of a
file. However, such systems are far from universal. I'd recommend making
sure that this value is indeed being used that way in all of the
contexts in which you want to use this code.
from a string then replace them all with spaces. The way I do it
now is by using a for() loop:

for(temp=0;temp<=strlen(buffer);temp++)
{
if(buffer[temp]== '\0' || buffer[temp]==0x1A)
{buffer[temp]=' ';}
}

Is there a faster / more efficient way of doing this?

By definition, strlen(buffer) gives you the offset of the very first
null character in buffer (or, if there is none, it keeps searching past
the end of buffer until if finds one; this often results in memory
access violations - make VERY sure that your buffer is in fact null
terminated before calling strlen). Therefore, there's no point in
checking for null characters before you reach then end of the loop; and
you're guaranteed to find one once you reach that end.

I suspect that you have some kind of misunderstanding, that led you to
think that your code could find a null character in some other locations
as well. However, for the rest of this message I'll assume you intended
it to handle null characters exactly the way it actually does.

You have strlen() scanning sequentially through buffer looking for the
first null character, and then you have your for loop scanning
sequentially through buffer looking for null characters and 0x1A. Why
not do it in a single pass?

Your code sets the final terminating null character to blank. This
guarantees that strlen(buffer) can no longer be used to tell you where
that character used to be. If you're planning to do anything further
with that portion of buffer, you'd better do something to keep track of
where it ends.

You don't say what the element type of buffer is; I'll assume it's char;
make appropriate adjustments below if it's something else.

for(char *p = buffer; *p; p++)
if(*p == 0x1A)
*p = ' ';

*p++ = ' ';
ptrdiff_t length = p - buffer;
 
A

Anders Wegge Keller

peter said:
In fact, I want to remove all NULLS and EOFs (0x1a)
from a string then replace them all with spaces. The way I do it
now is by using a for() loop:

for(temp=0;temp<=strlen(buffer);temp++)
{
if(buffer[temp]== '\0' || buffer[temp]==0x1A)
{buffer[temp]=' ';}
}
Is there a faster / more efficient way of doing this?

strlen(buffer) will return the offset of the first '\0' encountered,
so the code above doesn't make that much sense. Also, it is not very
effecient to call strlen for each iteration of the loop. Especially
with patological code like this, the comåpiler will be unable to
optimize the repeated calls away, as you are modifying the object you
are giving as argument.

Either call strlen once and use that result in the entire loop:

len = strlen (buffer);
for (temp = 0 ; temp < len ; temp++) {
if (buffer[temp] == 0x1a) { buffer[temp] = ' '; }
}

Or skip the strlen call entirely, and check for end of string at the
same time as check for modification:

temp = 0;

while (buffer[temp]) {
if (buffer[temp] == 0x1a) { buffer[temp] = ' '; }
temp++;
}
 
J

James Kuyper

peter said:
In fact, I want to remove all NULLS and EOFs (0x1a)
from a string then replace them all with spaces. The way I do it
now is by using a for() loop:

for(temp=0;temp<=strlen(buffer);temp++)
{
if(buffer[temp]== '\0' || buffer[temp]==0x1A)
{buffer[temp]=' ';}
}
Is there a faster / more efficient way of doing this?

strlen(buffer) will return the offset of the first '\0' encountered,
so the code above doesn't make that much sense. Also, it is not very
effecient to call strlen for each iteration of the loop.

I didn't notice that - that's embarrassing (not as embarrassing as
having written such code, but close). It's worse than merely being
horrendously inefficient; with the terminating null character being
replaced with ' ' inside the loop, followed by immediate recalculation
of the length of the supposedly null-terminated string, the loop will
never terminate until something goes very badly wrong (and possibly not
even then).
 
J

John Gordon

In said:
In fact, I want to remove all NULLS and EOFs (0x1a)
from a string then replace them all with spaces. The way I do it
now is by using a for() loop:
for(temp=0;temp<=strlen(buffer);temp++)
{
if(buffer[temp]== '\0' || buffer[temp]==0x1A)
{buffer[temp]=' ';}
}

C strings are terminated by a NULL character. Therefore, by definition,
you won't find any NULLs in the string itself.
 
K

Keith Thompson

peter said:
In fact, I want to remove all NULLS and EOFs (0x1a)
from a string then replace them all with spaces. The way I do it
now is by using a for() loop:

for(temp=0;temp<=strlen(buffer);temp++)
{
if(buffer[temp]== '\0' || buffer[temp]==0x1A)
{buffer[temp]=' ';}
}

Is there a faster / more efficient way of doing this?

There's probably no faster way than a for loop, but yours can be
improved considerably by not calling strlen() on each iteration.
strlen() has to scan the entire string, and you're doing that once for
each character.

Also, the correct condition is "<", not "<=". For example if the
string's value is "hello", then strlen() returns 5, but you want to
check positions 0 through 4.

const size_t len = strlen(buffer);
for (i = 0; i < len; i ++) {
...
}

And some terminology issues. NULL is (a macro that expands to)
a null *pointer* constant; the null character is better referred
to as NUL, or just '\0'. (Yes, some character set standards do
call it NULL, but using that name can be confusing.)

And EOF is a macro that expands to a negative integer constant
expression, typically (-1). 0x1A is the control-Z character,
which is used on some systems, to indicate an end-of-file condition.

Finally, strlen() searches for the '\0' character that marks the end
of a string. If your buffer might have multiple '\0' characters in
it, then it isn't a string, and you should use some other technique
to determine how long it is (or how long the relevant portion of
it is).
 
K

Keith Thompson

John Gordon said:
In said:
In fact, I want to remove all NULLS and EOFs (0x1a)
from a string then replace them all with spaces. The way I do it
now is by using a for() loop:
for(temp=0;temp<=strlen(buffer);temp++)
{
if(buffer[temp]== '\0' || buffer[temp]==0x1A)
{buffer[temp]=' ';}
}

C strings are terminated by a NULL character. Therefore, by definition,
you won't find any NULLs in the string itself.

Null is (a macro that expands to) a null *pointer* constant.
 
K

Keith Thompson

Keith Thompson said:
John Gordon said:
In said:
In fact, I want to remove all NULLS and EOFs (0x1a)
from a string then replace them all with spaces. The way I do it
now is by using a for() loop:
for(temp=0;temp<=strlen(buffer);temp++)
{
if(buffer[temp]== '\0' || buffer[temp]==0x1A)
{buffer[temp]=' ';}
}

C strings are terminated by a NULL character. Therefore, by definition,
you won't find any NULLs in the string itself.

Null is (a macro that expands to) a null *pointer* constant.

I meant to type NULL, of course.
 
M

Malcolm McLean

In fact, I want to remove all NULLS and EOFs (0x1a)
from a string then replace them all with spaces. The way I do it
now is by using a for() loop:

 for(temp=0;temp<=strlen(buffer);temp++)
   {
    if(buffer[temp]== '\0' || buffer[temp]==0x1A)
      {buffer[temp]=' ';}
   }

Is there a faster / more efficient way of doing this?
Yes.

get the length of the data in the buffer. Only you can do that.
Probably you want to exclude the last terminating nul from the
replacement, but maybe not, depending on how you're going to use the
data. You might even ned to add a nul.


Then just do this.

len = data_length_got_somehow;
for(i=0;i<len;i++)
if(buffer == 0 || buffer == 0x1a)
buffer = ' ';
/* possibly you need to do this, but make sure that buffer is one
bigger than len */
buffer = 0;

If you call strlen() in the for control statement, the length of the
string will be reclaculated on each iteration, which is slow. Also
since you want to replace nuls, it's a bug.
 
M

Malcolm McLean

By definition, a string includes a null character.

ISO/IEC 9899:201x Committee Draft — April 12, 2011 N1570
7. Library
7.1 Introduction
7.1.1 Definitions of terms
1     A string is a contiguous sequence of characters
      terminated by and including the first null character.
In ANSI C terminology. That's so that they can use the term "string"
in describing library functions without constantly having to specify
that it must be nul-terminated.
However the strings in your C program may not be nul-terminated.
 
K

Keith Thompson

Malcolm McLean said:
In ANSI C terminology. That's so that they can use the term "string"
in describing library functions without constantly having to specify
that it must be nul-terminated.
However the strings in your C program may not be nul-terminated.

Then they're not strings, and calling them that will cause confusion.
They might well be some data structure that acts like a string in a
more general sense, but then there should be an unambiguous name for it.

And so far, we have no idea what kind of data structure the OP is
dealing with, other than an array of characters.
 
K

Kaz Kylheku

Then they're not strings, and calling them that will cause confusion.

No, it won't. If I have a "struct string" in my C program, nobody in their
right mind assumes that this still refers to a null-terminated array of char,
without looking inside the struct and the surrounding functions.

What are you going to complain about next? That we should not use the term
"header" in packet processing code because that refers to the units processed
by the #include directive?
They might well be some data structure that acts like a string in a
more general sense, but then there should be an unambiguous name for it.

That unambiguous name is "character string".

You do not get to rename the fundamental concepts in computer science, sorry.
 
M

Malcolm McLean

No, it won't. If I have a  "struct string" in my C program, nobody in their
right mind assumes that this still refers to a null-terminated array of char,
without looking inside the struct and the surrounding functions.
It does cause confusion, of course. Because if you create a struct
string then you've got two string types in the program. But calling it
struct text or something similar would cause even more confusion. Most
people expect a struct string to consist of a character buffer, length
member, and maybe a few oddments to indicate a read-only string or a
non-ASCII alphabet. I would always nul-terminate the buffer if I
could, but not everyone agrees. If for some reason you need strings
that index into each other, this might not be possible.
 
J

John Gordon

In said:
In said:
In fact, I want to remove all NULLS and EOFs (0x1a)
[...]
C strings are terminated by a NULL character. Therefore, by definition,
you won't find any NULLs in the string itself.
<nit type="not so minor">
There is no such thing as a "NULL character" in C. Rather, strings are
terminated by a "null character". The all-caps "NULL" is a macro
representing the "null pointer".
</nit>

Keith pointed out the same thing. I stand corrected!
 
K

Keith Thompson

peter said:
In fact, I want to remove all NULLS and EOFs (0x1a)

NULs and control-Zs
from a string

from a buffer
then replace them all with spaces. The way I do it
now is by using a for() loop:

for(temp=0;temp<=strlen(buffer);temp++)
{
if(buffer[temp]== '\0' || buffer[temp]==0x1A)
{buffer[temp]=' ';}
}

Is there a faster / more efficient way of doing this?

It's probably a good idea to investigate how those bytes got into
your buffer in the first place. Where did the data come from?
 
S

Shao Miller

peter said:
In fact, I want to remove all NULLS and EOFs (0x1a)

NULs and control-Zs
from a string

from a buffer
then replace them all with spaces. The way I do it
now is by using a for() loop:

for(temp=0;temp<=strlen(buffer);temp++)
{
if(buffer[temp]== '\0' || buffer[temp]==0x1A)
{buffer[temp]=' ';}
}

Is there a faster / more efficient way of doing this?

It's probably a good idea to investigate how those bytes got into
your buffer in the first place. Where did the data come from?

So far, we haven't seen much in the way of follow-ups from peter.
Multiple people have asked about the books regarding "pointer versus
array," but the lack of a peter-response doesn't prevent people from
continuing to invest time in responses to peter's queries, which is
fortunate for peter. :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,020
Latest member
GenesisGai

Latest Threads

Top