bulk flush input

Willem · Jun 27, 2010

Tad McClellan wrote:
) I almost called Willem on the "usually" part, until I re-read
) the "Context" section in
)
) perldoc perldata
)
) ...
)
) User-defined subroutines may choose to care whether they are being
) called in a void, scalar, or list context.
)
) void context always translates to scalar context for built-in functions.
)
) void context usually translates to scalar context for user-defined functions.

What about map ?

AFAIK, when you call <map> in void context, it turns into a <for> internally.

SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

Dr.Ruud · Jun 27, 2010

Willem said:
AFAIK, when you call <map> in void context, it turns into a <for> internally.

Don't think that.

For example, in some older versions of Perl, a map inside a map only
releases memory in the outsidest map, which can make things horrible.

Ilya Zakharevich · Jun 28, 2010

Hm, sounds like I need to look more closely...

So a humongous temp array gets built with only
the resulting assignment being optimized away...?

Hmm, IN PRINCIPLE, one could have coded recognition of this construct,
and would somehow advise pp_readline() that its output is going to be
ignored. However, given the frequency of this construct, I doubt this
was ever done.

perl -MO=Concise -e "()=<>"
8 <@> leave[1 ref] vKP/REFC ->(end)
1 <0> enter ->2
2 <;> nextstate(main 1 -e:1) v:{ ->3
7 <2> aassign[t3] vKS ->8
- <1> ex-list lK ->6
3 <0> pushmark s ->4
5 <1> readline[t2] lK/1 ->6
4 <#> gv[*ARGV] s ->5
- <1> ex-list lK ->7
6 <0> pushmark s ->7
- <0> stub lPRM* ->-

The only way I know to advise an OP is via flags. So one should
compare flags on the `readline' OP with those on "usual" list contents
readline. If they are identical, there is little chance that this
construct is memory-optimized. (But they may differ by "other
reasons" as well...)

Yours,
Ilya

Ilya Zakharevich · Jun 28, 2010

I don't know if <> is smart enough to recognize void context though,
can probably be tested with a large file and a memory checker tool.

I use void-context-<> all the time (to skip one line); but only with
defined $/.

I vaguely remember that about 10 years ago, I put some code to
optimize behaviour of pp_readline() in void context (or at least, had
a WISH to do so; no way to distinguish now, sigh). And, definitely,
about the same time I had the same problem as the OP: avoiding SIGPIPE
on the OTHER side of the pipe.

Putting 2 and 2 together, I MIGHT have put there
optimization-of-<>-with-undefined-$/-in-void-context. But no, I have
no memory of actually doing it. And I have strong doubts about
somebody else doing it as well...

So I think it is not wise to expect that the core of Perl would be
able to help with this problem. I would just do $/ = (1<<20), and do
a loop.

Hope this helps,
Ilya

Ilya Zakharevich · Jun 28, 2010

) void context always translates to scalar context for built-in functions.
)
) void context usually translates to scalar context for user-defined functions.

What about map ?

AFAIK, when you call <map> in void context, it turns into a <for> internally.

Irrelevant: SIDE EFFECTS of map in scalar and list context are the same.

Hope this helps,
Ilya

John Kelly · Jun 28, 2010

I had the same problem as the OP: avoiding SIGPIPE
on the OTHER side of the pipe.

So I think it is not wise to expect that the core of Perl would be
able to help with this problem. I would just do $/ = (1<<20), and do
a loop.

I went with the loop.

(1<<20) is a 1 meg of memory. (1<<15) may run nearly as fast on a large
file (untested).

C.DeRykus · Jun 28, 2010

Hm, sounds like I need to look more closely...

Click to expand...

So a humongous temp array gets built with only
the resulting assignment being optimized away...?

Click to expand...

Hmm, IN PRINCIPLE, one could have coded recognition of this construct,
and would somehow advise pp_readline() that its output is going to be
ignored. However, given the frequency of this construct, I doubt this
was ever done.

perl -MO=Concise -e "()=<>"
8 <@> leave[1 ref] vKP/REFC ->(end)
1 <0> enter ->2
2 <;> nextstate(main 1 -e:1) v:{ ->3
7 <2> aassign[t3] vKS ->8
- <1> ex-list lK ->6
3 <0> pushmark s ->4
5 <1> readline[t2] lK/1 ->6
4 <#> gv[*ARGV] s ->5
- <1> ex-list lK ->7
6 <0> pushmark s ->7
- <0> stub lPRM* ->-

Click to expand...

The only way I know to advise an OP is via flags. So one should
compare flags on the `readline' OP with those on "usual" list contents
readline. If they are identical, there is little chance that this
construct is memory-optimized. (But they may differ by "other
reasons" as well...)

Thanks for all the explanations Ilya. As you mention,
this is infrequent. Probably a ton of work to fix too.

Peter Makholm · Jun 28, 2010

John Kelly said:
This code reads STDIN and remembers the first non-empty line. That's
all it cares about.

But it also keeps reading till EOF, acting like the "cat" utility, to
flush the extra input and avoid broken pipe errors.

But reading line by line, just to throw away the unwanted garbage, is
inefficient. I would like to jump out of the loop and "bulk flush" the
remaining input stream.

If you input stream is a terminal on a posix system you can use the
tcflush() function.

tcflush(0, TCIFLUSH)
or warn "Couldn't flush stdin: $!";

This of course only works if stdin is a terminal and not a pipe from
some other program. This might not work on non-posix systems, this
might not work for you specific need.

This works for me:

#!/usr/bin/perl

use strict;
use warnings;

use POSIX;

my $data = '';

sleep 5;

while (<>) {
chomp;
/^\s*$/ and next;
$data = $_;
print "data=\"$data\"\n";
last;
}

tcflush(0, TCIFLUSH)
or warn "Couldn't flush stdin: $!";

<> or die "1 EOF\n";
<> or die "2 EOF\n";
<> or die "3 EOF\n";
<> or die "4 EOF\n";
<> or die "5 EOF\n";
<> or die "6 EOF\n";
<> or die "7 EOF\n";

__END__

Ilya Zakharevich · Jun 28, 2010

\(1<<20), of course

I went with the loop.

(1<<20) is a 1 meg of memory. (1<<15) may run nearly as fast on a large
file (untested).

AFAIU, we are talking about one context switch (plus small change) per
N input characters. I do not know the price of context switch on
current hardware; 20 years ago it was about 200-300 cycles, and I
suspect it is more today.

Assuming 1cycle/char to generate the pipe output, and
1000cycles/switch, the slowdown of (1<<15) would be noticable (3%).
These assumptions are plausible, but on the "pessimistic side"; so you
might be right. Nevertheless, today the price of memory is not large;
this is why I had chosen \(1<<20).

Hope this helps,
Ilya

John Kelly · Jun 28, 2010

\(1<<20), of course

AFAIU, we are talking about one context switch (plus small change) per
N input characters. I do not know the price of context switch on
current hardware; 20 years ago it was about 200-300 cycles, and I
suspect it is more today.

Assuming 1cycle/char to generate the pipe output, and
1000cycles/switch, the slowdown of (1<<15) would be noticable (3%).

OK. But a 3% difference seems small to me.

These assumptions are plausible, but on the "pessimistic side"; so you
might be right. Nevertheless, today the price of memory is not large

A gig of RAM on my PC is cheap. But I still like to conserve it. Just
old fashioned I guess.

John Kelly · Jun 28, 2010

exec a fork() of cat >/dev/null?

I sometimes use that idea in shell scripts, though at the cost of an
extra pid.

But I'm not sure if it would be more efficient than a perl loop with a
fixed buffer size.

Tim McDaniel · Aug 15, 2010

I read the newsgroup only occasionally, so I'm sorry to get to this
late.

I decided to use 4096 too. I also replaced the "length" test with a
regex, to ignore lines containing only superfluous whitespace, prior to
the first line of data:

/^\s*$/ and next;

I suspect

/\S/ or next;

is more efficient. Or at least I can't see how it can be less
efficient.

FAQ 5.1 How do I flush/unbuffer an output filehandle? Why must I do this?	0	Apr 2, 2011
Child processes don't get the close on pipe	3	Jun 2, 2012
FAQ 8.10 How do I read and write the serial port?	0	Jan 15, 2011
newbie questions	7	May 16, 2008
Reading from standard input	5	Sep 26, 2006
Perl 5.8.x, Unicode and In-memory Filehandles	3	Mar 1, 2006
FAQ 5.3 How do I count the number of lines in a file?	0	Jan 31, 2011
reading from stdin via pipe, buffering?	4	Mar 3, 2008

bulk flush input

Willem

Dr.Ruud

Ilya Zakharevich

Ilya Zakharevich

Ilya Zakharevich

John Kelly

C.DeRykus

Peter Makholm

Ilya Zakharevich

John Kelly

John Kelly

Tim McDaniel

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads