input word

B

Ben Pfaff

arnuld said:
I will like to see what other clc posters has to say about this say. If
they agree, I will drop this kind into my style-bucket. Personally, I
liked it :)

I always put a line break between an "if" statement and any
statement conditional upon it. I find it easier to read.
 
B

Barry Schwarz

okay, forget about general function writing procedure. Don't you think the
while loop (with memory re-allocation) in my function is placed in an odd
place. To me, it feels like, this whole while loop should not be in this
function, or I am just getting confused (emotionally ?):

I don't know about emotionally, but you are getting confused about
which is the current version of your routine.
int get_single_word( char** ppc )
{
unsigned ele_num;
int ch;
size_t word_length, word_length_interval;
char* word_begin;
char* new_mem;

ele_num = 0;

word_length = WORD_SIZE;
word_length_interval = 2;

*ppc = malloc(word_length * sizeof(**ppc));
word_begin = *ppc;


if( NULL == ppc )

This is still the wrong test.
{
return GSW_ENOMEM;

If you reach this statement, the statement above that calls malloc
invoked undefined behavior.
}


while( (EOF != (ch = getchar())) && isspace(ch) )
{
continue; /* trailing whitespace */

You corrected the comment on a previous version. Why is it back to
being incorrect now?
}


if( EOF != ch )
{
*word_begin++ = ch;
ele_num = 1;
}


while( (EOF != (ch = getchar())) && (! isspace(ch)) )
{
if( (word_length - 1) == ele_num++ )
{
new_mem = realloc( *ppc, (word_length_interval * word_length * sizeof **ppc) );
*ppc = new_mem;

This still creates a memory leak when realloc fails.
 
B

Barry Schwarz

I mean whatever whitespace there is, newlines, tabs etc etc. I will change
the comments to reflect this.





Oh.. my .. I reall don't understand why the program was working with this
bug.

It worked because malloc never failed.


snip white space discussion
I tried it but i get dangling pointer errors. See below for code.

Except you never showed us the errors.
/* a function written in ANSI C, that takes a word from stdin and manages the memory dynamically
*
* Since there is no generally agreed definition of what a 'word' is. I have taken a very simple
* approach:
*
* A word is a contiguous collection of non-whitespace characters. A whitespace is defined
* by isspace().
*
*/


#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>


enum { WORD_SIZE = 2 };
enum { GSW_OK, GSW_ENOMEM, GSW_ENORESIZE } ;


int get_single_word( char** );
int alloc_new_mem( char** , char**, char**, size_t* );


int main( void )
{
char* pword;


while( (GSW_OK == get_single_word(&pword)) && ( *pword != 0) )
{
printf("You entered: [%s]\n", pword);
free( pword );
}


return 0;
}




int get_single_word( char** ppc )
{
unsigned ele_num = 0;
int ch = EOF;
size_t word_length = WORD_SIZE;
char* word_begin = NULL;
char* new_mem = NULL;
size_t* pwl = &word_length;

*ppc = malloc(word_length * sizeof(**ppc));
word_begin = *ppc;


if( NULL == *ppc )
{
return GSW_ENOMEM;

}


while( (EOF != (ch = getchar())) && isspace(ch) )
{
continue; /* Leading whitespace */
}


if( EOF != ch )
{
*word_begin++ = ch;
ele_num = 1;
}


while( (EOF != (ch = getchar())) && (! isspace(ch)) )
{
if( (word_length - 1) == ele_num++ )
{
if( GSW_OK != alloc_new_mem( ppc, &word_begin, &new_mem, pwl ) )

Strange that you used the & operator for word_begin and new_mem but
felt the need for a new variable for &word_length.
{
return GSW_ENORESIZE;
}
}

*word_begin++ = ch;
}

*word_begin = '\0';

return GSW_OK;
}



int alloc_new_mem( char** ppc, char** word_begin, char** new_mem, size_t* pwl )
{
size_t word_length_interval = 2;

*new_mem = realloc( *ppc, (word_length_interval * WORD_SIZE * sizeof **new_mem) );

Look at the realloc statement in your previous code. The two
arguments should evaluate identically. Hint: In this code all three
terms of the second argument are constant. That is not what your
original code did.
if( *new_mem )
{
*word_begin = *new_mem + (*word_begin - *ppc);
*pwl *= word_length_interval;

This is now a lie and you will write beyond the end of allocated
memory invoking undefined behavior
*ppc = *new_mem;
}
else
{
**word_begin = '\0';
return GSW_ENORESIZE;
}

return GSW_OK;
}




========================= OUTPUT =====================
[arnuld@dune ztest]$ gcc -ansi -pedantic -Wall -Wextra get-single-word.c
[arnuld@dune ztest]$ ./a.out
Love
You entered: [Love]
Looooooooooooooooooooooooooooooooooooooooooooooooooove
*** glibc detected *** realloc(): invalid next size: 0x09068008 ***

This error makes no sense and is probably the result of writing beyond
your allocation. While the value you computed in the call to realloc
is not the one you wanted, it is a valid value (4).

I don't see any dangling pointers.
 
L

Lew Pitcher

On Fri, 03 Oct 2008 10:37:29 -0700, Lew Pitcher wrote: [snip]
From what I can see, your main() only looks for a
GSW_OK; /all/ other returns cause a silent program termination.


is that worse ?

Worse than what?

As I see it, you only care if the function returns GSW_OK or not. Thus, all
your existing error values are suspect (you want either GSW_OK or !GSW_OK),
and adding another error value just compounds the situation.

Personally, I'd code in error values based on whether or not something
unique has to happen for them. In your case, there are two unique follow-on
actions; one that happens when everything works, and one that happens when
something fails. This implies that two return values would suffice for your
logic.

On the other hand, if you intended your calling function to take some
remedial action, or report on the cause of a failure situation, having
multiple error return values would be handy.


--
Lew Pitcher

Master Codewright & JOAT-in-training | Registered Linux User #112576
http://pitcher.digitalfreehold.ca/ | GPG public key available by request
---------- Slackware - Because I know what I'm doing. ------
 
T

Tim Rentsch

James Kuyper said:
arnuld wrote:
...
I will like to see what other clc posters has to say about this say. If
they agree, I will drop this kind into my style-bucket. Personally, I
liked it :)

The compact form presents problems due to the fact that the if condition
and the return statement are on the same line. [...]

I think 'tradeoffs' is a more appropriate word than 'problems'.
The one-line form also has its own advantages (without trying to
decide which set of advantages might outweigh the other).

Furthermore, if the one-line form is used, it's a simple matter
to produce source automatically that uses the two-line form,
which makes (most of) both sets of advantages available.
 
T

Tim Rentsch

William Pursell said:
Also, having the statement on a separate line makes code
coverage tests more useful. For example, gcov has line based
granularity and can tell you how often each line has been
executed in your test suite. If the statement and
the condition appear on the same line, then it
generates less accurate reports.

Not less accurate; just sometimes having less information.

To give one example, information for how often each line has been
executed for this code sequence

if(x)
return 0;
...

won't have any more information than the same analysis for this
code sequence

if(x) return 0;
...
 
J

jameskuyper

Tim Rentsch wrote:
....
Not less accurate; just sometimes having less information.

To give one example, information for how often each line has been
executed for this code sequence

if(x)
return 0;
...

won't have any more information than the same analysis for this
code sequence

if(x) return 0;
...

I think you meant "accuracy" rather "information" in your last
sentence. As written, it makes no sense; there's quite clearly half as
much information in the second case as in the first.
 
B

Ben Bacarisse

Tim Rentsch wrote:
...

I think you meant "accuracy" rather "information" in your last
sentence. As written, it makes no sense; there's quite clearly half as
much information in the second case as in the first.

I, too, was confused but then I though that what was being illustrated
was the "sometimes". I.e. the text round the example is correct and
the example is to show that one may, sometimes, not _even_ loose
information (much less accuracy).

A coverage tool that reports

<a times> if (x)
<b times> return 0;
<c times> ...

gives exactly the same information as one that reports

<a times> if (x) return 0;
<c times> ...

since b = a - c.

Of course, if the tools don't work that way then this is wrong, but it
fits what was written.
 
T

Tim Rentsch

Tim Rentsch wrote:
...

I think you meant "accuracy" rather "information" in your last
sentence. As written, it makes no sense; there's quite clearly half as
much information in the second case as in the first.

I meant "information" in the sense of information theory. The
three per-line execution counts for the lines

if(x)
return 0;
...

have more /data/ than the two counts for the lines

if(x) return 0;
...

but they don't contain more information.
 
T

Tim Rentsch

Ben Bacarisse said:
I, too, was confused but then I though that what was being illustrated
was the "sometimes". I.e. the text round the example is correct and
the example is to show that one may, sometimes, not _even_ loose
information (much less accuracy).

A coverage tool that reports

<a times> if (x)
<b times> return 0;
<c times> ...

gives exactly the same information as one that reports

<a times> if (x) return 0;
<c times> ...

since b = a - c.

Of course, if the tools don't work that way then this is wrong, but it
fits what was written.

Yes, exactly my point.

Furthermore, this example is significant, because control
transfers (including break, continue, and yes even goto) usually
seem better placed (IMO) on the same level/line as the 'if',
rather than indented underneath it. All control transfers
have this information-preserving property with respect to
line execution counts.
 
D

David Thompson

[for OP's program:]
* A word is a contiguous collection of non-whitespace characters. A whitespace means anyone of
* a single apace, a newline or a tab.

You use isspace to detect whitespace. isspace will return true for
several other characters besides these three.
Though at least in the (default) C locale, all of them are characters
that if used (which they are not so much nowadays except where \r
leaks through) any sane person would treat unconditionally as nonword,
so the OP should just extend their spec to agree with isspace().

(The harder part to word delimiting, as previously discussed at
length, are things like apostrophe esp if also used for singlequote
and/or acute, other confusable quasi-accents, hyphen, period/stop,
etc., which the OP has explicitly decided not to deal with.)

They are mutually exclusive in this case so you only need to use one
or the other. If the one you use returns 1, then that condition is
true. (E.g., if you use feof and it returns 1, then you reached end
of file and don't have an error). If the one you use returns 0, then
the other condition is true.

Supernit: s/1/nonzero/
The builtin operators (== != < <= > >= && || !) yield one or zero, but
the library routines (e.g. feof and is[w]*) return nonzero or zero.

- formerly david.thompson1 || achar(64) || worldnet.att.net
 
D

David Thompson

That's a false choice. Certainly you should not shut down your brain,
and if you don't understand your code starting over is usually good.
But it is entirely possible to use a debugger with your brain on.
Then test as you go. Write a simple test, write the code to pass it,
check in. Repeat until done.
Yes. (But make sure the tests are correct. Sometimes, like bad specs,
it's easy to get both wrong in compatible ways; that doesn't help.)
Then if you add come code that breaks, just revert the change and do it
again.

But no. _Look_ at the change(s) first to try to figure out the
problem. 99% of the time you will find it, with much less effort than
investigating or analyzing the whole program. And that's a win.

But don't assume that the problem _must_ be in the last change, or
even last few. Sometimes a code change -- or a test change -- will
expose a latent bug that was already there, in code that didn't
change, sometimes code that has been there for years. We tell
ourselves, and each other, "oh, that code has been extensively tested
and widely used, it can't be wrong" but for (sub)programs of any
significant complexity and/or context the number of possible execution
histories is usually so huge that no testing could ever be exhaustive.
- formerly david.thompson1 || achar(64) || worldnet.att.net
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,777
Messages
2,569,604
Members
45,228
Latest member
MikeMichal

Latest Threads

Top