bulk flush input

W

Willem

Tad McClellan wrote:
) I almost called Willem on the "usually" part, until I re-read
) the "Context" section in
)
) perldoc perldata
)
) ...
)
) User-defined subroutines may choose to care whether they are being
) called in a void, scalar, or list context.
)
) void context always translates to scalar context for built-in functions.
)
) void context usually translates to scalar context for user-defined functions.

What about map ?

AFAIK, when you call <map> in void context, it turns into a <for> internally.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
 
D

Dr.Ruud

Willem said:
AFAIK, when you call <map> in void context, it turns into a <for> internally.

Don't think that.

For example, in some older versions of Perl, a map inside a map only
releases memory in the outsidest map, which can make things horrible.
 
I

Ilya Zakharevich

Hm, sounds like I need to look more closely...

So a humongous temp array gets built with only
the resulting assignment being optimized away...?

Hmm, IN PRINCIPLE, one could have coded recognition of this construct,
and would somehow advise pp_readline() that its output is going to be
ignored. However, given the frequency of this construct, I doubt this
was ever done.
perl -MO=Concise -e "()=<>"
8 <@> leave[1 ref] vKP/REFC ->(end)
1 <0> enter ->2
2 <;> nextstate(main 1 -e:1) v:{ ->3
7 <2> aassign[t3] vKS ->8
- <1> ex-list lK ->6
3 <0> pushmark s ->4
5 <1> readline[t2] lK/1 ->6
4 <#> gv[*ARGV] s ->5
- <1> ex-list lK ->7
6 <0> pushmark s ->7
- <0> stub lPRM* ->-

The only way I know to advise an OP is via flags. So one should
compare flags on the `readline' OP with those on "usual" list contents
readline. If they are identical, there is little chance that this
construct is memory-optimized. (But they may differ by "other
reasons" as well...)

Yours,
Ilya
 
I

Ilya Zakharevich

I don't know if <> is smart enough to recognize void context though,
can probably be tested with a large file and a memory checker tool.

I use void-context-<> all the time (to skip one line); but only with
defined $/.

I vaguely remember that about 10 years ago, I put some code to
optimize behaviour of pp_readline() in void context (or at least, had
a WISH to do so; no way to distinguish now, sigh). And, definitely,
about the same time I had the same problem as the OP: avoiding SIGPIPE
on the OTHER side of the pipe.

Putting 2 and 2 together, I MIGHT have put there
optimization-of-<>-with-undefined-$/-in-void-context. But no, I have
no memory of actually doing it. And I have strong doubts about
somebody else doing it as well...

So I think it is not wise to expect that the core of Perl would be
able to help with this problem. I would just do $/ = (1<<20), and do
a loop.

Hope this helps,
Ilya
 
I

Ilya Zakharevich

) void context always translates to scalar context for built-in functions.
)
) void context usually translates to scalar context for user-defined functions.
What about map ?
AFAIK, when you call <map> in void context, it turns into a <for> internally.

Irrelevant: SIDE EFFECTS of map in scalar and list context are the same.

Hope this helps,
Ilya
 
J

John Kelly

I had the same problem as the OP: avoiding SIGPIPE
on the OTHER side of the pipe.
So I think it is not wise to expect that the core of Perl would be
able to help with this problem. I would just do $/ = (1<<20), and do
a loop.

I went with the loop.

(1<<20) is a 1 meg of memory. (1<<15) may run nearly as fast on a large
file (untested).
 
C

C.DeRykus

Hm, sounds like I need to look more closely...
So a humongous temp array gets built with only
the resulting  assignment being optimized away...?

Hmm, IN PRINCIPLE, one could have coded recognition of this construct,
and would somehow advise pp_readline() that its output is going to be
ignored.  However, given the frequency of this construct, I doubt this
was ever done.
perl -MO=Concise -e "()=<>"
8  <@> leave[1 ref] vKP/REFC ->(end)
1     <0> enter ->2
2     <;> nextstate(main 1 -e:1) v:{ ->3
7     <2> aassign[t3] vKS ->8
-        <1> ex-list lK ->6
3           <0> pushmark s ->4
5           <1> readline[t2] lK/1 ->6
4              <#> gv[*ARGV] s ->5
-        <1> ex-list lK ->7
6           <0> pushmark s ->7
-           <0> stub lPRM* ->-

The only way I know to advise an OP is via flags.  So one should
compare flags on the `readline' OP with those on "usual" list contents
readline.  If they are identical, there is little chance that this
construct is memory-optimized.  (But they may differ by "other
reasons" as well...)

Thanks for all the explanations Ilya. As you mention,
this is infrequent. Probably a ton of work to fix too.
 
P

Peter Makholm

John Kelly said:
This code reads STDIN and remembers the first non-empty line. That's
all it cares about.

But it also keeps reading till EOF, acting like the "cat" utility, to
flush the extra input and avoid broken pipe errors.

But reading line by line, just to throw away the unwanted garbage, is
inefficient. I would like to jump out of the loop and "bulk flush" the
remaining input stream.

If you input stream is a terminal on a posix system you can use the
tcflush() function.

tcflush(0, TCIFLUSH)
or warn "Couldn't flush stdin: $!";

This of course only works if stdin is a terminal and not a pipe from
some other program. This might not work on non-posix systems, this
might not work for you specific need.

This works for me:

#!/usr/bin/perl

use strict;
use warnings;

use POSIX;

my $data = '';

sleep 5;

while (<>) {
chomp;
/^\s*$/ and next;
$data = $_;
print "data=\"$data\"\n";
last;
}

tcflush(0, TCIFLUSH)
or warn "Couldn't flush stdin: $!";

<> or die "1 EOF\n";
<> or die "2 EOF\n";
<> or die "3 EOF\n";
<> or die "4 EOF\n";
<> or die "5 EOF\n";
<> or die "6 EOF\n";
<> or die "7 EOF\n";

__END__
 
I

Ilya Zakharevich

\(1<<20), of course
I went with the loop.

(1<<20) is a 1 meg of memory. (1<<15) may run nearly as fast on a large
file (untested).

AFAIU, we are talking about one context switch (plus small change) per
N input characters. I do not know the price of context switch on
current hardware; 20 years ago it was about 200-300 cycles, and I
suspect it is more today.

Assuming 1cycle/char to generate the pipe output, and
1000cycles/switch, the slowdown of (1<<15) would be noticable (3%).
These assumptions are plausible, but on the "pessimistic side"; so you
might be right. Nevertheless, today the price of memory is not large;
this is why I had chosen \(1<<20).

Hope this helps,
Ilya
 
J

John Kelly

\(1<<20), of course


AFAIU, we are talking about one context switch (plus small change) per
N input characters. I do not know the price of context switch on
current hardware; 20 years ago it was about 200-300 cycles, and I
suspect it is more today.

Assuming 1cycle/char to generate the pipe output, and
1000cycles/switch, the slowdown of (1<<15) would be noticable (3%).

OK. But a 3% difference seems small to me.

These assumptions are plausible, but on the "pessimistic side"; so you
might be right. Nevertheless, today the price of memory is not large

A gig of RAM on my PC is cheap. But I still like to conserve it. Just
old fashioned I guess.
 
J

John Kelly

exec a fork() of cat >/dev/null?

I sometimes use that idea in shell scripts, though at the cost of an
extra pid.

But I'm not sure if it would be more efficient than a perl loop with a
fixed buffer size.
 
T

Tim McDaniel

I read the newsgroup only occasionally, so I'm sorry to get to this
late.

I decided to use 4096 too. I also replaced the "length" test with a
regex, to ignore lines containing only superfluous whitespace, prior to
the first line of data:

/^\s*$/ and next;

I suspect

/\S/ or next;

is more efficient. Or at least I can't see how it can be less
efficient.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,575
Members
45,051
Latest member
CarleyMcCr

Latest Threads

Top