adapting getline

F

Franken Sense

In Dread Ink, the Grave Hand of Keith Thompson Did Inscribe:
Franken Sense said:
I assume the NULL happens when dupstr can't malloc, but I don't see what I
can do about it.
[...]

Yes, there are no really good responses to running out of memory. In
some cases, you might be able to abort just the current action and let
the program continue running, particularly in interactive programs.
In other cases, it might even make sense to fall back to a less
memory-intensive algorithm (though typically if you have such an
algorithm you'll just use it in the first place).

But at least you can abort the program with an error message. That's
almost always better than ignoring the error and trying to continue.

ptr = malloc(count * sizeof *ptr);
if (ptr == NULL) {
fputs("Malloc failed", stderr);
exit(EXIT_FAILURE);
}

There's two mallocs that could fail.

char *dupstr( char *pc )
{
char *pc2;
pc2 = malloc( strlen(pc) + 1 );
if (pc2 == NULL)
{
fputs("Malloc failed", stderr);
exit(EXIT_FAILURE);
}
if( pc2 )
{
strcpy( pc2, pc );
}
return pc2;
}

How do I unfurl this, so as to do likewise?

/* allocate memory for a node */
struct tnode *tree_alloc( void )
{
return malloc( sizeof( struct tnode ) );
}

I've been able to create trees big enough to make this a serious issue.
--
Frank

And by the way, a few months ago, I trademarked the word 'funny.' So when
Fox calls me 'unfunny,' they're violating my trademark. I am seriously
considering a countersuit.
~~ Al Franken, in response to Fox's copyright infringement lawsuit
 
F

Franken Sense

In Dread Ink, the Grave Hand of Keith Thompson Did Inscribe:
This test is redundant; if you reach this point, you know that pc2 is
non-null.

ok

char *dupstr( char *pc )
{
char *pc2;
pc2 = malloc( strlen(pc) + 1 );
if (pc2 == NULL)
{
fputs("Malloc failed", stderr);
exit(EXIT_FAILURE);
}
strcpy( pc2, pc );
return pc2;
}
struct tnode *tree_alloc( void )
{
struct tnode *result = malloc(sizeof *result);
if (result == NULL) {
fputs("Malloc failed", stderr);
exit(EXIT_FAILURE);
}
return result;
}

Looks good. Let's say we were running this with very little available
memory as compared to the size of the tree. Which malloc were more likely
to fail?

I would think that that would be the ratio of the sizeof tnode to the
sizeof a pointer.
Or tree_alloc could be left as it is, and the caller could be given
the responsibility of handling failures; likewise for dupstr. If
these functions are only ever going to be used as part of this
program, and your only strategy for handling allocation failures is to
abort immediately, then having the exit in the functions is ok, at
least for now. If they're going to become more general-purpose
library routines, though, they should return an error indication (such
as a null pointer) to the caller, and let the caller handle it.

I like having it in the subs, because they are the things that I'm going to
re-use. It might only be of only pedagogical value.
Earlier I mentioned a few things that can be done in response to an
out-of-memory failure: abort immediately, abort just this part of the
program, fall back to a less memory-intensive algorithm. I forgot
another case: clean up as much as possible and *then* abort. For
example, if your program is an editor, you might try to ensure that
all the user's data is written to disk before aborting.

Bottom line: error handling is hard.

I think I want my program to tell me where it happened, as opposed to
windows:

E:\gfortran\dan>gfortran tree8.f03 -Wall -o out

E:\gfortran\dan>out
sort size is 10000000
g1 g2 and d1 are 2705375 2728953 23578
g3 g4 and d2 are 2728953 2736031 7078
sort size is 10000000
g1 g2 and d1 are 2736421 2760078 23657
g3 g4 and d2 are 2760078 2767203 7125
sort size is 10000000
g1 g2 and d1 are 2767625 2792234 24609
g3 g4 and d2 are 2792250 2802093 9843
sort size is 10000000
g1 g2 and d1 are 2805515 2829812 24297
g3 g4 and d2 are 2829937 2837078 7141
sort size is 10000000
g1 g2 and d1 are 2837484 2861781 24297
g3 g4 and d2 are 2861859 2869046 7187
sort size is 10000000
g1 g2 and d1 are 2869453 2893906 24453
g3 g4 and d2 are 2893968 2901140 7172
sort size is 10000000
g1 g2 and d1 are 2901546 2926125 24579
g3 g4 and d2 are 2926187 2933359 7172
sort size is 10000000
g1 g2 and d1 are 2933812 2958968 25156
g3 g4 and d2 are 2959031 2966265 7234
sort size is 10000000
Operating system error: Bad file descriptor
Out of memory

E:\gfortran\dan>

Thanks for your comment, Keith. Cheers and watch out karaoke world,
 
K

Keith Thompson

Franken Sense said:
In Dread Ink, the Grave Hand of Keith Thompson Did Inscribe: [...]
struct tnode *tree_alloc( void )
{
struct tnode *result = malloc(sizeof *result);
if (result == NULL) {
fputs("Malloc failed", stderr);
exit(EXIT_FAILURE);
}
return result;
}

Looks good. Let's say we were running this with very little available
memory as compared to the size of the tree. Which malloc were more likely
to fail?

It's impossible to say, and almost certainly not worth worrying about.
I would think that that would be the ratio of the sizeof tnode to the
sizeof a pointer.

In one place, you malloc space for a struct tnode; in another place,
you malloc space for a string. Nowhere, as far as I can tell, do you
malloc space for a single pointer.

But generally it's not about probabilities. It's more about getting
into the habit of checking *all* allocations for failure, because even
a one-byte allocation could fail -- and Murphy's law says that the one
allocation you don't check will be the one that erases your hard drive
after e-mailing a nasty letter to your boss.

[...]
 
B

Barry Schwarz

In Dread Ink, the Grave Hand of Barry Schwarz Did Inscribe:

snip

So there is one statement in that while loop?

snip

We now enter the realm of semantics which will probably cause this
thread to be hijacked for a month or so. A while loop always consists
of a "single statement." But there are numerous types of statements
and some statements contain other statements. See section 6.8 of the
standard.

In the code
while (...)
a = b;
the while loop consists of the single "simple" assignment statement.
In the code
while (...)
{
a = b;
c = d;
}
the while loop consists of the single compound statement (which itself
consists of two statements). These two are the easy cases. You could
have
while (...)
if (...)
for (....)
...
where the while loop consists of the single selection statement, the
range of the if is the single iteration statement, etc. (In cases
like this I would insist on braces.)

snip
Yeah, I think I look "poofy" with whitespace after a paren. Can you talk
through the control, because I have clearly not understood it for the
duration of the thread?

Sorry but what does "talk through the control" mean?
 
K

Keith Thompson

Barry Schwarz said:
We now enter the realm of semantics which will probably cause this
thread to be hijacked for a month or so. A while loop always consists
of a "single statement." But there are numerous types of statements
and some statements contain other statements. See section 6.8 of the
standard.

In the code
while (...)
a = b;
the while loop consists of the single "simple" assignment statement.
In the code
while (...)
{
a = b;
c = d;
}
the while loop consists of the single compound statement (which itself
consists of two statements). These two are the easy cases. You could
have
while (...)
if (...)
for (....)
...
where the while loop consists of the single selection statement, the
range of the if is the single iteration statement, etc. (In cases
like this I would insist on braces.)
[...]

I'm afraid I'm going to have to quibble about the wording.

A while loop (the standard uses the term "while statement", but it's
the same thing) *is* a single statement. It consists of the 'while'
keyword, a '(' character, an expression, a '0' character, and another
statement, in that order. More briefly, the syntax is
while ( expression ) statement

The subsidiary statement can be any kind of statement: an expression
statement, a compound statement, even another while statement.

You said that in
while (...)
a = b;
the while loop consists of the single assignment statement. In fact,
it *contains* the assignment statement; it consists of everything from
the while keyword to the semicolon, inclusive.

Given:
while (condition) {
a = b;
c = d;
}
there are, by my count, 4 distinct statements: the while statement,
the compound statement, and the two expression statements.

Apart from that fairly minor quibble, you're spot on. A while loop is
a single statement, and the substatement after the ')' is also a
single statement; the latter may be arbitrarily complex, and may
contain other substatements as well.
 
L

lawrence.jones

Barry Schwarz said:
In the code
while (...)
a = b;
the while loop consists of the single "simple" assignment statement.

The *body* of the while loop consists of the single simple assignment
statement. The entire while loop is itself a single statement,
regardless of how simple or complex the body.
 
J

jameskuyper

Franken said:
In Dread Ink, the Grave Hand of Barry Schwarz Did Inscribe:


I'm asking you (or anyone who understands it) to describe execution in a
typical case.

/* getline: get line into s, return length */
int getline(FILE *fp, char *s, int lim)
{
char *p;
int c;
p = s;
while (--lim > 0 && (c = getc(fp)) != EOF && c != '\n')
{
*p++ = c;
}
if (c == '\n')
{
*p++ = c;
}
*p = '\0';
return p - s;
}

It looks like the while control continues until lim hits zero or an EOF or
\n is encountered.

Correct. That is the main part of the routine, the one that loads the
characters from the line into the buffer. It stops when lim runs out,
because you don't want to overrun the end of the buffer. It stops when
getc() returns EOF, which normally indicates either an I/O error or
end-of-file. FInally, if nothing else interrupts it, it stops upon
reaching the end of the line.

What is the purpose of the if control?

To cause the '\n' character to be copied into the buffer, even though
it terminates the loop.
What does this do?
*p = '\0';

I know that strings are to be null-terminated, but I can't see what's going
on here.

It null-terminates the string, which you apparently already know, so
you've already seen everything that there is to see. I'm not at all
sure of what it is that you think you can't see.
 
F

Franken Sense

In Dread Ink, the Grave Hand of Barry Schwarz Did Inscribe:
Sorry but what does "talk through the control" mean?

I'm asking you (or anyone who understands it) to describe execution in a
typical case.

/* getline: get line into s, return length */
int getline(FILE *fp, char *s, int lim)
{
char *p;
int c;
p = s;
while (--lim > 0 && (c = getc(fp)) != EOF && c != '\n')
{
*p++ = c;
}
if (c == '\n')
{
*p++ = c;
}
*p = '\0';
return p - s;
}

It looks like the while control continues until lim hits zero or an EOF or
\n is encountered.

What is the purpose of the if control?

What does this do?
*p = '\0';

I know that strings are to be null-terminated, but I can't see what's going
on here. I've got the Tondo and Gimpel solns for K&R, and they do a good
job of "talking through" their code, but they're a little bit light on this
one, which is exercise 5-6.
--
Frank

[G. W. Bush's] pro-air pollution Clear Skies Initiative is designed to
clear the skies of birds.
~~ Al Franken,
 
R

Richard Bos

Franken Sense said:
This reminds me of the last time my mom tried to spank me.

You remind me of a certain George, who reminded me of a certain Merrill
and Michell, who in turn reminded me of another poster whose name
escapes me. Are you, too, going to lapse into German?

Richard
 
F

Franken Sense

In Dread Ink, the Grave Hand of jameskuyper Did Inscribe:
Correct. That is the main part of the routine, the one that loads the
characters from the line into the buffer. It stops when lim runs out,
because you don't want to overrun the end of the buffer. It stops when
getc() returns EOF, which normally indicates either an I/O error or
end-of-file. FInally, if nothing else interrupts it, it stops upon
reaching the end of the line.

I have to wonder aloud what Ben Bacrisse meant when he thought lim had to
be greater than one.

I seem to have found more bugs when I used a larger text. I used the book
of acts that I downloaded from gutenberg.org.

I'm not picking up the lines that don't begin with numbers.

input:
44:028:030 And Paul dwelt two whole years in his own hired house, and
received all that came in unto him,

44:028:031 Preaching the kingdom of God, and teaching those things which
concern the Lord Jesus Christ, with all confidence, no man
forbidding him.
output:
44:028:030 And Paul dwelt two whole years in his own hired house, and
-- 1
44:028:031 Preaching the kingdom of God, and teaching those things which
-- 1

I looked at it with a hex editor, and see nothing incriminating:

http://lomas-assault.net/usenet/z25.jpg
To cause the '\n' character to be copied into the buffer, even though
it terminates the loop.


It null-terminates the string, which you apparently already know, so
you've already seen everything that there is to see. I'm not at all
sure of what it is that you think you can't see.

Are these equivalent:

*p++=c;

and

*p=c;
p++;
--
Frank

Drug war, well, as Rush Limbaugh said, anyone who uses drugs illegally
should be prosecuted and put away. I don't agree with him; I think they
should be treated, but that's what Rush believes and so, you know, we're
praying for Rush because he's in recovery and you take responsibilities for
your actions so I'm sure any day now Rush will demand to be put away for
the maximum sentence and ask for the most dangerous prison and we'll be
praying for maybe an African American cellmate who saw the Donovan McNabb
comments on ESPN. So we're prayin'.
~~ Al Franken, Book TV, on Rush Limbaugh's illegal drug arrest and racist
remarks
 
F

Franken Sense

In Dread Ink, the Grave Hand of Franken Sense Did Inscribe:
I'm not picking up the lines that don't begin with numbers.

input:
44:028:030 And Paul dwelt two whole years in his own hired house, and
received all that came in unto him,

44:028:031 Preaching the kingdom of God, and teaching those things which
concern the Lord Jesus Christ, with all confidence, no man
forbidding him.
output:
44:028:030 And Paul dwelt two whole years in his own hired house, and
-- 1
44:028:031 Preaching the kingdom of God, and teaching those things which
-- 1

I looked at it with a hex editor, and see nothing incriminating:

http://lomas-assault.net/usenet/z25.jpg

I was only seeing the last bit of the sorted data. These are earlier on:

Jerusalem, saying, After I have been there, I must also see
-- 1
Jerusalem, so must thou bear witness also at Rome.
-- 1
Jerusalem.
-- 3
Jerusalem; and they were all scattered abroad throughout the
-- 1
Jesus Christ.
-- 1
Jesus began both to do and teach,
-- 1
Jesus of Nazareth, whom thou persecutest.

I wouldn't have guessed that Jerusalem would be one of only three words in
Acts that appears on its own line three times, and would outnumber mention
of Jesus Christ, but this time is actually pretty close to when the latter
appellation began to be used.

Bet no one can guess the other two.
--
Frank

No Child Left Behind is the most ironically named act, piece of legislation
since the 1942 Japanese Family Leave Act.
~~ Al Franken, in response to the 2004 SOTU address
 
F

Franken Sense

In Dread Ink, the Grave Hand of Richard Bos Did Inscribe:
You remind me of a certain George, who reminded me of a certain Merrill
and Michell, who in turn reminded me of another poster whose name
escapes me. Are you, too, going to lapse into German?

Richard

Wenn's mir paßt.

Bin ich nicht der einzige Tor, der Deutsch kann?
--
Frank

[G. W. Bush's] pro-air pollution Clear Skies Initiative is designed to
clear the skies of birds.
~~ Al Franken,
 
J

James Kuyper

Franken said:
In Dread Ink, the Grave Hand of jameskuyper Did Inscribe:


I have to wonder aloud what Ben Bacrisse meant when he thought lim had to
be greater than one.

I did not examine your code in detail looking for bugs. If lim is less
than 1, the while loop exits before c is ever assigned a value. c is
neither explicitly nor implicitly initialized, so it's value is
indeterminate. This means that it could contain the representation of
any legal int value; but it could also contain a trap representation.
This means that the value of c is completely unpredictable, and in the
worst case, attempting to determine the value stored in c would render
the behavior of your program undefined - in principle, anything could
happen.

The very next statement to get executed would be the if() statement,
which attempts to retrieve the value stored in c. This is bad. That's
what Ben was referring to.
I seem to have found more bugs when I used a larger text. I used the book
of acts that I downloaded from gutenberg.org.

I'm not picking up the lines that don't begin with numbers.

input:
44:028:030 And Paul dwelt two whole years in his own hired house, and
received all that came in unto him,

44:028:031 Preaching the kingdom of God, and teaching those things which
concern the Lord Jesus Christ, with all confidence, no man
forbidding him.
output:
44:028:030 And Paul dwelt two whole years in his own hired house, and
-- 1
44:028:031 Preaching the kingdom of God, and teaching those things which
-- 1

I see no obvious reason for such behavior, but the reason might be in
the calling code. Could you show us what the calling code looks like?

....
Are these equivalent:

*p++=c;

and

*p=c;
p++;

Yes.
 
F

Franken Sense

In Dread Ink, the Grave Hand of Han from China Did Inscribe:
OK, let's consider the pathological cases first, assuming
`lim' is nonnegative.

lim == 0 (pathological)
-----------------------
`--lim' sets `lim' to -1, the `> 0' test fails, and evaluation
of the `&&' chain gets short-circuited at that failure. Then
we proceed to the `if' test. Since `c' didn't get assigned anything
(because getc() wasn't executed), `c' has a garbage value. Interesting
garbage values include a trap representation (in which case, boom)
and a garbage value that is coincidentally a '\n' (in which case,
the buffer will have two possibly problematic writes of '\n'
followed by '\0' as opposed to one possibly problematic write
of a single '\0').

lim == 1 (pathological)
-----------------------
`--lim' sets `lim' to 0, the `> 0' test fails, and evaluation
of the `&&' chain gets short-circuited at that failure. Then
we proceed to the `if' test. Since `c' didn't get assigned anything
(because getc() wasn't executed), `c' has a garbage value. Interesting
garbage values include a trap representation (in which case, boom)
and a garbage value that is coincidentally a '\n' (in which case,
the buffer will have '\n' followed by a possibly overflowing '\0'
as opposed to a single, safe '\0').

lim >= 2
--------
The loop gets executed at most (lim - 1) times -- this is governed
by the `--lim > 0'. Here are the possibilities:
(a) No EOF or '\n' encountered, hence no early loop termination.
We'll have (lim - 1) characters written to the buffer, the
`if' test will fail, and then we'll have the '\0' added
in the last byte of the buffer.
(b) '\n' encountered, in which case the loop will terminate,
the `if' test will succeed (thereby adding the '\n' to
the buffer after the previously read characters, if any),
and then we'll have the '\0' added after the '\n' in the
buffer. Note that this is safe, for in the worst-case
scenario, the '\n' will be encountered on the final loop
interation (lim - 1), but the loop will be terminated without
a write, so the write that occurs when the `if' test succeeds
will still be at byte (lim - 1), with the terminating '\0'
added in the last byte of the buffer.
(c) EOF encountered, in which case the loop will terminate,
the `if' test will fail, and then we'll have the '\0'
added after the previously read characters, if any.

As you can see, `lim' should be at least 2, though you can
get away with `lim' being at least 1 if you initialize
`c':

int c = 'X';


Yours,
Han from China

thx han.
 
K

Kenny McCormack

Malcolm McLean said:
That reminds me of a Chinese man who visited Britain, back in Chairman Mao's
time.
He was invited to dinner, and told that, in Britian, everyone must wear the
same black dinner jacket and bow tie. Being used to this sort of
regimentation, he accepted.
"What about that chap?" he asked at the dinner, "why is he wearing a purple
dinner jacket with a blank bow tie?"

I don't get it...
 
G

Guest

That reminds me of a Chinese man who visited Britain, back in Chairman Mao's
time.
He was invited to dinner, and told that, in Britian, everyone must wear the
same black dinner jacket and bow tie. Being used to this sort of
regimentation, he accepted.
"What about that chap?" he asked at the dinner, "why is he wearing a purple
dinner jacket with a blank bow tie?"

what?
 
R

Richard Bos

Franken Sense said:
In Dread Ink, the Grave Hand of Richard Bos Did Inscribe:

Wenn's mir paßt.

Bin ich nicht der einzige Tor, der Deutsch kann?

Sicher nicht.

But thanks for confirming that you are still the same nutcase troll. It
would be even less to your disgrace, though, if you stopped nymshifting
every handful of months.

Richard
 
D

David Thompson

It looks to me more like it creates errors:

E:\gfortran\dan>gcc sosman1.c -W -Wall -ansi -pedantic -O2 -o out
In file included from sosman1.c:2:
C:/MinGW/bin/../lib/gcc/mingw32/3.4.5/../../../../include/stdlib.h:317:
error: s
yntax error before "double"

That's a bug that was fixed in mingwrt-3.15.2 in January;
see #2010966 in the bugs tracker on sourceforge.

I think you said you get your mingw in a gfortran bundle.
If so, check with the packager(s) about including this fix/update.
sosman1.c:34: error: syntax error before '/' token
sosman1.c:40: error: syntax error before "while"
sosman1.c:43: error: syntax error before '+' token
Those don't occur for me in the code you posted in this thread.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,777
Messages
2,569,604
Members
45,216
Latest member
topweb3twitterchannels

Latest Threads

Top