Parsing a string

Ian Collins · Nov 17, 2010

If you don't initialize a variable then I see the only other way is to
make sure you assign some value to the variable before you use it. Is
that right way ?

Some will say yes, some will say no. I say declare it and initialise it
when it's first needed.

Malcolm McLean · Nov 17, 2010

Some will say yes, some will say no. I say declare it and initialise it
when it's first needed.

The argument for is that it makes bugs deterministic.

Consider this

int str_count(char *str, char ch)
{
int answer;

while(*str)
if(*str++ == ch)
answer++;

return answer;
}
This will often appear to work when you test it.

This
int str_count(char *str, char ch)
{
int answer = 1;

while(*str)
if(*str++ == ch)
answer++;

return answer;
}

will always fail in the same way.

Ian Collins · Nov 17, 2010

The argument for is that it makes bugs deterministic.

Consider this

int str_count(char *str, char ch)
{
int answer;

while(*str)
if(*str++ == ch)
answer++;

return answer;
}
This will often appear to work when you test it.

This
int str_count(char *str, char ch)
{
int answer = 1;

while(*str)
if(*str++ == ch)
answer++;

return answer;
}

will always fail in the same way.

Because answer is declared and initialised when it's first needed.

arnuld · Nov 17, 2010

Some will say yes, some will say no. I say declare it and initialise it
when it's first needed.

You meant declare it and *assign* to it when it's first needed:

int i; /* declaration */
i = 0; /* assignment */

int i = 0; /* declaration and initialization */

arnuld · Nov 17, 2010

The argument for is that it makes bugs deterministic.

Consider this

int str_count(char *str, char ch)
{
int answer;

while(*str)
if(*str++ == ch)
answer++;

return answer;}

This will often appear to work when you test it.

No assignment to variable before its used. Thats what I consider a
spottable bug.

This
int str_count(char *str, char ch)
{
int answer = 1;

while(*str)
if(*str++ == ch)
answer++;

return answer;

}

will always fail in the same way.

Okay, Now I got what Eric Sosman was pointing out in my code.
Initialized values in my code were nowhere blonging to some reason.
(like here answer should start from zero, reason for its assignment/
initialization). My initialized values were just for the sake of
initialization.

Eric Sosman · Nov 17, 2010

Thanks for clarification else I kept on thinking you were talking about
initializing the array. My use of arrays is clumsy ... hummm....... I
used arrays like this only:

(1) initialize
(2) pass it to function to do the job
(3) print it

So what is clumsy here ?

First, the very existence of two arrays. The code as shown can
certainly do without at least one of them, perhaps both. A variable
that isn't needed (especially a "large" variable) is as clumsy as the
kitchen sink in your car's back seat.

Second, the way you use one of them strikes me as clumsy. You
initialize the entire array to zeroes, just so you can then memcpy()
some characters into the beginning and know that they're followed
by a zero terminator. The initial part of the array is overwritten
by the memcpy(), the tail is never used at all, and there's only
one byte that you actually need to have as a zero. Why not zero
that one necessary byte instead of clumsily zeroing a lot of bytes
you'll never care about?

Parsimony is the soul of programming.

Eric Sosman · Nov 17, 2010

If you don't initialize a variable then I see the only other way is to
make sure you assign some value to the variable before you use it. Is
that right way ?

The variable must be given a value (by whatever means) before you
attempt to use its value. But what you've done is, paraphrasing,

int variable = 42;
variable = 17;

.... and the initialization is pointless.

Keith Thompson · Nov 17, 2010

arnuld said:
You meant declare it and *assign* to it when it's first needed:

int i; /* declaration */
i = 0; /* assignment */

int i = 0; /* declaration and initialization */

What makes you think that was what he meant?

I won't speak for Ian, but my own preference is to use initialization
whenever a sensible initial value can be computed at the point
where the object is declared, and assignment otherwise. In C90,
I might occasionally introduce an inner block so I can delay the
declaration and use an initializer; in C99 <OT>or C++</OT> I might
just declare the object just before it's needed.

If every possible code path assigns a value to a given variable,
you don't actually *need* to initialize it. You might choose to
do so anyway, just for the sake of robustness, to guarantee that
it has some defined value. On the other hand, in such a case if
the initial value is used, it indicates a bug in your program --
one that the compiler might have detected for you if you had omitted
the initializer.

Consider this horribly contrived program:

#include <stdio.h>
#include <time.h>
int main (void)
{
int n;
switch (time(NULL) % 4) {
case 0:
n = 10;
break;
case 1:
n = 42;
break;
case 2:
n = 500;
break;
}
printf("n = %d\n", n);
return 0;
}

25% of the time, it uses the value of n without having initialized it,
which is clearly a bug. When I compile this with the right
optimizations and warnings enabled, I get:

c.c: In function `main':
c.c:5: warning: `n' may be used uninitialized in this function

If I changed the declaration "int n;" to "int n = -1;", I'd get more
predictable behavior (it would print "n = -1" 25% of the time),
but I wouldn't get the warning, since the compiler has no way of
knowing that -1 isn't a perfectly valid value.

BartC · Nov 17, 2010

Eric Sosman said:
On 11/16/2010 11:46 PM, arnuld wrote:

Second, the way you use one of them strikes me as clumsy. You
initialize the entire array to zeroes, just so you can then memcpy()
some characters into the beginning and know that they're followed
by a zero terminator. The initial part of the array is overwritten
by the memcpy(), the tail is never used at all, and there's only
one byte that you actually need to have as a zero. Why not zero
that one necessary byte instead of clumsily zeroing a lot of bytes
you'll never care about?

What's the problem?

During development, it might be useful to know that the array is all-zeros.
It also means any bugs cause the program to fail in a more consistent,
repeatable manner.

And if it's proven the initialisation is not necessary, then perhaps it can
be removed after development is over, as an optimisation step.

BartC · Nov 17, 2010

Eric Sosman said:
The variable must be given a value (by whatever means) before you
attempt to use its value. But what you've done is, paraphrasing,

int variable = 42;
variable = 17;

... and the initialization is pointless.

Until you change it to this:

int variable = 42;
// variable = 17;

or possibly:

if (cond) variable = 17;

(Yes I know this is the same point I made elsewhere in the thread, but is
another example.)

Ian Collins · Nov 17, 2010

What makes you think that was what he meant?

I won't speak for Ian, but my own preference is to use initialization
whenever a sensible initial value can be computed at the point
where the object is declared, and assignment otherwise. In C90,
I might occasionally introduce an inner block so I can delay the
declaration and use an initializer; in C99<OT>or C++</OT> I might
just declare the object just before it's needed.

That's also my preference.

Eric Sosman · Nov 17, 2010

Until you change it to this:

int variable = 42;
// variable = 17;

or possibly:

if (cond) variable = 17;

(Yes I know this is the same point I made elsewhere in the thread, but
is another example.)

Let me get this straight: Do you object to an assessment of
Program A because it might not apply to a completely different
Program B?

Keith Thompson · Nov 17, 2010

Eric Sosman said:
On 11/17/2010 12:11 PM, BartC wrote: [...]

During development, it might be useful to know that the array is
all-zeros. It also means any bugs cause the program to fail in a more
consistent, repeatable manner.

Click to expand...

(Shrug.) It might be even more useful if the compiler's data
flow analysis could produce a warning about an uninitialized variable,
which of course it cannot if the variable is initialized. You pays
your money and you takes your choice: You silence the compiler and
gain a repeatable (though not necessarily correct) outcome, or you
leave the compiler the opportunity to warn (it may or may not do so),
with the drawback that the behavior of the erroneous code might not
be reproducible.

[...]

It might be nice to have a language feature that lets you initialize
an object while telling the compiler that you never want to use
the initial value. The generated code would initialize the object
(though the initialization could be removed if the compiler can
prove that the value is never used), but the compiler could still
perform data flow analysis and warn you if it determines that the
value might be used.

Barry Schwarz · Nov 17, 2010

I wrote this program (parser ?) to extract two values from a string
separated by a comma. I extract the values without any problem. Error
checking is there with different kinds of input but in comparison to clc
folks my error checking always looks like Calvin and Hobbes plan for
saving themselves form world war 3.

As usual, any advice om improving the program ? (I wonder why the heck
the value of *pname never changes even when pointer pname is going to
next element for each loop)

/* A program that searches for 2 comma separated values inside a string:
name and number. Name can be anything while number can only be
* an integer greater than zero.
*
* VERSION 0.0
*/

#include <stdio.h>
#include <stdlib.h>
#include <errno.h>

enum { SIZE_NAME = 10 };

void get_values(const char*, char*, const int, int*);

int main(void)
{
const char* arr1 = "Arnul@23,1";
const char* arr2 = "00#$%^&@23,Arnuld";
const char* arr3 = "Arnuld,";
char name[SIZE_NAME+1] = {0};
int num = -1;

printf("arr: %s\tname: %s\tnum: %d\n", arr3, name, num);
printf("\n");

get_values(arr3, name, SIZE_NAME, &num);
printf("name = %s\tnum = %d\n", name, num);

return 0;
}

void get_values(const char* p, char* pname, const int pname_size, int*
pnum)
{
int idx;

for(idx = 0; (pname_size != idx) && *p && (',' != *p); ++p, ++idx, +
+pname)
{
printf("*pname = %c, *tmp = %c\n", *pname, *p);

Since there is no variable tmp, your choice of text is just
misleading.

What is the value of *pname (arr3[0]) when you enter this function?
What would you expect it to look like on your display?

*pname = *p;

You store a value in *pname, and then increment pname as part of the
for statement. What is the value of *pname immediately after the
first increment (arr3[1])? After the second increment (arr3[2])?

}

errno = 0;
*pnum = (int) strtol(p+1, NULL, 10);
if(errno)
{
perror("***ERROR***, can not convert string \"Max Phone Lines\"
into integer\n");

The only value guaranteed to be stored in errno is ERANGE. If that is
the case, then the previous assignment statement already invoked
implementation defined behavior which could preclude reaching the call
to perror.

Barry Schwarz · Nov 17, 2010

Here is the strchr() and memcpy() version, any issues in this:

Why do you post multiple versions of a program without waiting for or
including the comments on previous versions?

arnuld · Nov 18, 2010

(Shrug.) It might be even more useful if the compiler's data
flow analysis could produce a warning about an uninitialized variable,
which of course it cannot if the variable is initialized. You pays your
money and you takes your choice: You silence the compiler and gain a
repeatable (though not necessarily correct) outcome, or you leave the
compiler the opportunity to warn (it may or may not do so), with the
drawback that the behavior of the erroneous code might not be
reproducible.

I want to make some things clear here before I speak more. There are 2
things you can learn from clc or any programming language newsgroup:

(1) Genuine/Proper/Good/Real/(fill your favorite word here) way of
programming in that language. Its only because of clc that I *always*
compile with -ansi (-std=c99) -pedantic -Wall -Wextra flags to gcc.

(2) Style of Programming: This is personal. e.g. Richard Heathfield
considers it as a bug if his function contains more than one return
statement, while others consider having return statements at 3/4/N places
in the functions as a fine idea.

So, what you are saying belongs to (1) or (2) ? I think its (1) but I
want to be sure.

Ah. We can all be heroes and collect large bonuses for speeding
up code by umpty-huge percent. All we need to do is write a useless
delay loop in the original, and later remove it as an "optimization."
Ten out of ten for gaming the system, but zero points for honesty.

Thats why I love clc

arnuld · Nov 18, 2010

Why do you post multiple versions of a program without waiting for or
including the comments on previous versions?

The comments from one person made me realize that program was designed in
a very bad way, so I had to correct it. Is correcting after one comment
is wrong ?

Tim Rentsch · Nov 18, 2010

arnuld said:
I want to make some things clear here before I speak more. There are 2
things you can learn from clc or any programming language newsgroup:

(1) Genuine/Proper/Good/Real/(fill your favorite word here) way of
programming in that language. Its only because of clc that I *always*
compile with -ansi (-std=c99) -pedantic -Wall -Wextra flags to gcc.

(2) Style of Programming: This is personal. e.g. Richard Heathfield
considers it as a bug if his function contains more than one return
statement, while others consider having return statements at 3/4/N places
in the functions as a fine idea.

So, what you are saying belongs to (1) or (2) ? I think its (1) but I
want to be sure.

I consider it (2), ie, there are reasonable arguments on both
sides (even though, at the same time, most often I would agree
with the practice Eric Sosman advocates). There isn't any single
right answer because it depends on what compilers and tools are
available, not to mention numerous other potential factors.

BartC · Nov 18, 2010

Let me get this straight: Do you object to an assessment of
Program A because it might not apply to a completely different
Program B?

OK. I just see it differently; ie. as providing a known default value in a
program structure that might be subject to change, or that might have an as
yet unperceived bug. So it's just good practice.

And if that's obvious the initialisation is unnecessary, perhaps the
compiler will remove it.

(BTW why is it that uninitialized static variables are generally all set to
zero, rather than some completely arbitrary bit-pattern that is different on
every run?)

Chris M. Thomasson · Nov 18, 2010

arnuld said:
I wrote this program (parser ?) to extract two values from a string
separated by a comma. I extract the values without any problem. Error
checking is there with different kinds of input but in comparison to clc
folks my error checking always looks like Calvin and Hobbes plan for
saving themselves form world war 3.

As usual, any advice om improving the program ? (I wonder why the heck
the value of *pname never changes even when pointer pname is going to
next element for each loop)

/* A program that searches for 2 comma separated values inside a string:
name and number. Name can be anything while number can only be
* an integer greater than zero.

[...]

What about something simple like this:

http://codepad.org/soY31mYl
_____________________________________________________________
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <string.h>
#include <limits.h>

#define STRTOL_VALIDATE(mp_number) \
(! ((! (mp_number) \
|| (mp_number) == LONG_MAX \
|| (mp_number) == LONG_MIN) \
&& errno))

struct name_and_number
{
char* name;
int number;
};

struct name_and_number*
name_and_number_parse(
char const* string
){
long int number;
size_t name_size;
struct name_and_number* self;
char const* name = strchr(string, ',');

if (! name) return NULL;
errno = 0;
number = strtol(name + 1, NULL, 10);
if (! STRTOL_VALIDATE(number) || ((int)number) < 1) return NULL;
name_size = name - string;
if (! name_size) return NULL;

if ((self = malloc(sizeof(*self) + name_size + 1)))
{
self->name = (char*)(self + 1);
memcpy(self->name, string, name_size);
self->name[name_size] = '\0';
self->number = (int)number;
}

return self;
}

#define name_and_number_destroy free

void
name_and_number_parse_and_display(
char const* string
){
struct name_and_number* parsed;

if ((parsed = name_and_number_parse(string)))
{
printf("origin: %s\nname: %s\nnumber: %d\n\n",
string, parsed->name, parsed->number);

name_and_number_destroy(parsed);
}

else
{
printf("failed to parse: %s\n\n", string);
}

puts("------------------------------------------");
}

int
main(void)
{
name_and_number_parse_and_display("Arnul@23,1");
name_and_number_parse_and_display("00#$%^&@23,Arnuld");
name_and_number_parse_and_display("Arnuld,");
name_and_number_parse_and_display("Chris Thomasson, 12.65");
name_and_number_parse_and_display("John Doe,-1234");
name_and_number_parse_and_display("Jane Doe");
name_and_number_parse_and_display("");
name_and_number_parse_and_display("1234,34345,6");
name_and_number_parse_and_display("lzkfjclkdsjfaj,asd123");

puts("\n\n\n___________________________________________\n"
"The program has completed; hit <ENTER> to exit...");

getchar();

return 0;
}
_____________________________________________________________

?

Finding a string inside string	3	Nov 23, 2010
Array implementation of Stack	80	Jun 16, 2011
selection-sort in C	22	Aug 16, 2011
printing array of pointers	22	Oct 27, 2009
Finding a word inside a string	35	Sep 24, 2010
Can I declare a variable with an uncertainty number suffix?	3	Mar 4, 2024
Parsing a PATH null-terminated variable	24	Aug 12, 2008
A process take input from /proc/<pid>/fd/0, but won't process it	0	Oct 29, 2023

Parsing a string

Ian Collins

Malcolm McLean

Ian Collins

arnuld

arnuld

Eric Sosman

Eric Sosman

Keith Thompson

BartC

BartC

Ian Collins

Eric Sosman

Keith Thompson

Barry Schwarz

Barry Schwarz

arnuld

arnuld

Tim Rentsch

BartC

Chris M. Thomasson

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads