bulk flush input

John Kelly · Jun 26, 2010

#!/usr/bin/perl

use strict;
use warnings;

my $data;

while (<>) {
chomp;
if (!$data && $_) {
$data = $_;
}
}

print "data=$data\n";

This code reads STDIN and remembers the first non-empty line. That's
all it cares about.

But it also keeps reading till EOF, acting like the "cat" utility, to
flush the extra input and avoid broken pipe errors.

But reading line by line, just to throw away the unwanted garbage, is
inefficient. I would like to jump out of the loop and "bulk flush" the
remaining input stream.

I don't think

$io->flush

flush causes perl to flush any buffered data at the perlio
api level. Any unread data in the buffer will be discarded,

will work, because that it flushes the buffer, not the entire input
stream. I want to flush the whole file. And keep in mind, I don't want
a broken pipe either.

Suggestions?

John Kelly · Jun 26, 2010

No, it remembers the first line that doesn't evaluate to boolean false.
Since you are chomping the lines, a line containing only "0" will be
considered 'empty'. You want to check length $data.

Yeah, shot myself in the foot again.

my $data;
while (<>) {
chomp;
length or next;
$data = $_;
last;
}
{
local $/ = \2048;
1 while <>;
}

Ben

That looks interesting, I get the first part, the rest will give me
something to chew on ...

Thanks.

C.DeRykus · Jun 26, 2010

Yeah, shot myself in the foot again.

That looks interesting, I get the first part, the rest will give me
something to chew on ...

But, this'll just read/toss 2048 byte chunks
till EOF. Alternatively, if you want to toss
the entire stream after exiting the loop, a
single statement:

<>;

does what you want due to the list context. I
suspect there's little to gain by changing $/.

C.DeRykus · Jun 26, 2010

But, this'll just read/toss 2048 byte chunks
till EOF. Alternatively, if you want to toss
the entire stream after exiting the loop, a
single statement:

<>;

^^^^^^^

() = <>;

Ilya Zakharevich · Jun 26, 2010

() = <>;

Try to do it with a terabyte file...

Yours,
Ilya

John Kelly · Jun 26, 2010

But, this'll just read/toss 2048 byte chunks
till EOF. Alternatively, if you want to toss
the entire stream after exiting the loop, a
single statement:

<>;

does what you want due to the list context. I
suspect there's little to gain by changing $/.

That doesn't look like list context to me. Testing a small data set
containing:

one
two

three

four

with this code:

-----------------------

#!/usr/bin/perl

use strict;
use warnings;

my $data = '';

while (<>) {
chomp;
/^\s*$/ and next;
$data = $_;
print "data=\"$data\"\n";
last;
}

<> or die "1 EOF\n";
<> or die "2 EOF\n";
<> or die "3 EOF\n";
<> or die "4 EOF\n";
<> or die "5 EOF\n";
<> or die "6 EOF\n";
<> or die "7 EOF\n";
<> or die "8 EOF\n";
<> or die "9 EOF\n";

close STDIN;

--------------------------------

produces output:

data="one"
6 EOF

Which seems to prove that a bare <> is scalar context.

C.DeRykus · Jun 26, 2010

Try to do it with a terabyte file...

Hm, sounds like I need to look more closely...

So a humongous temp array gets built with only
the resulting assignment being optimized away...?

--
Charles DeRykus

perl -MO=Concise -e "()=<>"
8 <@> leave[1 ref] vKP/REFC ->(end)
1 <0> enter ->2
2 <;> nextstate(main 1 -e:1) v:{ ->3
7 <2> aassign[t3] vKS ->8
- <1> ex-list lK ->6
3 <0> pushmark s ->4
5 <1> readline[t2] lK/1 ->6
4 <#> gv[*ARGV] s ->5
- <1> ex-list lK ->7
6 <0> pushmark s ->7
- <0> stub lPRM* ->-

John Kelly · Jun 26, 2010

Try to do it with a terabyte file...

Aim gun at foot. Pull trigger. Not my concept of fun.

John Kelly · Jun 26, 2010

Thanks for the ideas, wrong ones too. They taught me more Perl, and got
me pointed in the right direction.

Willem · Jun 26, 2010

John Kelly wrote:
<snip>
)<> or die "1 EOF\n";
)<> or die "2 EOF\n";
)<> or die "3 EOF\n";
)<> or die "4 EOF\n";
)<> or die "5 EOF\n";
)<> or die "6 EOF\n";
)<> or die "7 EOF\n";
)<> or die "8 EOF\n";
)<> or die "9 EOF\n";
<snip>
) produces output:
)
)>data="one"
)>6 EOF
)
)
) Which seems to prove that a bare <> is scalar context.

No, it proves that the left-hand side of 'or' has scalar context.

I'm pretty sure that a bare <> has void context, (which usually
translates to scalar context).

It is really a lot faster to change the line separator.
Setting it to \2048 means that it will always read that many bytes,
and undef'ing it would mean it will read the whole rest of the file.
I don't know if <> is smart enough to recognize void context though,
can probably be tested with a large file and a memory checker tool.

SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

John Kelly · Jun 26, 2010

John Kelly wrote:
<snip>
)<> or die "1 EOF\n";
)<> or die "2 EOF\n";
)<> or die "3 EOF\n";
)<> or die "4 EOF\n";
)<> or die "5 EOF\n";
)<> or die "6 EOF\n";
)<> or die "7 EOF\n";
)<> or die "8 EOF\n";
)<> or die "9 EOF\n";
<snip>
) produces output:
)
)>data="one"
)>6 EOF
)
)
) Which seems to prove that a bare <> is scalar context.

No, it proves that the left-hand side of 'or' has scalar context.

Maybe. But when I do this:

<>;
<>;
<> or die "3 EOF\n";
<> or die "4 EOF\n";
<> or die "5 EOF\n";
<> or die "6 EOF\n";
<> or die "7 EOF\n";
<> or die "8 EOF\n";
<> or die "9 EOF\n";

The output is the same, showing the first two <> diamonds read only one

I'm pretty sure that a bare <> has void context, (which usually
translates to scalar context).

From ISBN 0-596-00027-8:
2.7.3. Void Context
Another peculiar kind of scalar context is the void context. This
context not only doesn't care what the return value's type is, it
doesn't even want a return value. From the standpoint of how functions
work, it's no different from an ordinary scalar context.

It is really a lot faster to change the line separator.

Setting it to \2048 means that it will always read that many bytes,

Right. That's what I decided to use, although 4096.

and undef'ing it would mean it will read the whole rest of the file.

Which I do NOT want.

I don't know if <> is smart enough to recognize void context though,
can probably be tested with a large file and a memory checker tool.

No need for the big hammer, considering the text of 2.7.3, and my test
code.

C.DeRykus · Jun 26, 2010

John Kelly wrote:

<snip>
)<> or die "1 EOF\n";
)<> or die "2 EOF\n";
)<> or die "3 EOF\n";
)<> or die "4 EOF\n";
)<> or die "5 EOF\n";
)<> or die "6 EOF\n";
)<> or die "7 EOF\n";
)<> or die "8 EOF\n";
)<> or die "9 EOF\n";
<snip>
) produces output:
)
)>data="one"
)>6 EOF
)
)
) Which seems to prove that a bare <> is scalar context.

No, it proves that the left-hand side of 'or' has scalar context.

I'm pretty sure that a bare <> has void context, (which usually
translates to scalar context).

Yes, bare <> does have void context:

perl -we '$SIG{INT}=sub{exit};undef $/; <> ; END{print}'
foo
bar
^CUse of uninitialized value in print at -e line 1.

whereas, with just scalar context:

perl -we '$SIG{INT}=sub{exit};undef $/; 1 while <> ; END{print}'
foo
bar
^Cfoo

It is really a lot faster to change the line separator.
Setting it to \2048 means that it will always read that many bytes,
and undef'ing it would mean it will read the whole rest of the file.
I don't know if <> is smart enough to recognize void context though,
can probably be tested with a large file and a memory checker tool.

So this'd be very fast.

undef $/;
<>;

Unfortunately, like () = <>, there's potentially
grave impact to a foot:

perl -we '@ARGV=("big.txt"); undef $/; <>'
Out of memory during "large" request ...

Peter J. Holzer · Jun 26, 2010

What is ISBN 0-596-00027-8?

http://oreilly.com/catalog/9780596000271/

hp

John Kelly · Jun 26, 2010

There is no "bare" <>.

The <> in

<>;

is in a statement.

The statement provides the context for <> (or for whatever else is
in the stmt).

I meant "bare" without a while or foreach which forcibly provide
context.

What is ISBN 0-596-00027-8?

Programming Perl 3rd edition.

Why do you quote that?
That is, what point are you trying to make by quoting that?

It says void context is a "kind of scalar context." So according to the
author, void == scalar for purposes of knowing whether <> reads one line
or many.

I almost called Willem on the "usually" part, until I re-read
the "Context" section in

perldoc perldata

...

User-defined subroutines may choose to care whether they are being
called in a void, scalar, or list context.

void context always translates to scalar context for built-in functions.

void context usually translates to scalar context for user-defined functions.

So the nameless book you quote above is in error.

I see it gets deep.

It should have qualified "functions":

From the standpoint of how built-in functions work, ...

That's done easily enough with a simple one-liner:

bash-4.0$ perl -we '<>'
foo
bash-4.0$

Since the program exited after only one input line, then the
readline() must have been in scalar context, else it would have
waited for more input rather than exit()ing. As we can see with:

perl -we 'print <>'

Note that here print() provides the (list) context for <>, and that
the statement provides the (void) context for print().

Since it is a built-in, void context is treated the same as scalar context.

My stupid test verified that.

Dr.Ruud · Jun 26, 2010

Ben said:
my $data;
while (<>) {
chomp;
length or next;
$data = $_;
last;
}
{
local $/ = \2048;
1 while <>;
}

This nicely keeps memory usage limited.
I always use 4096.

An alternative is to seek to the end:

seek STDIN, 0, SEEK_END;
<>;

John Kelly · Jun 26, 2010

This nicely keeps memory usage limited.
I always use 4096.

I decided to use 4096 too. I also replaced the "length" test with a
regex, to ignore lines containing only superfluous whitespace, prior to
the first line of data:

/^\s*$/ and next;

An alternative is to seek to the end:

seek STDIN, 0, SEEK_END;
<>;

I posted, hoping for some magical Perl incantation. After all, there
are so many of them! But seek should be just as good. However, it also
needs a while loop that tests EOF.

Otherwise, the pipe writer could race with you, and write more data
after you seek, but before you read. Without a while loop testing for
EOF, you may falsely assume EOF, and close STDIN while the writer is
still sending more data, thus breaking the pipe.

John Kelly · Jun 26, 2010

Err... no. Pipes are not seekable, so the seek will simply fail. (Ruud
should have checked the return value of seek for exactly this reason.)

I see there's no rabbit in that hat:

man lseek

ERRORS
EBADF fd is not an open file descriptor.

EINVAL whence is not one of SEEK_SET, SEEK_CUR, SEEK_END; or the
resulting file offset would be negative, or beyond the end of a
seekable device.

EOVERFLOW
The resulting file offset cannot be represented in an off_t.

ESPIPE fd is associated with a pipe, socket, or FIFO.

It was too good to be true. What was I thinking ...

Xho Jingleheimerschmidt · Jun 26, 2010

John said:
I posted, hoping for some magical Perl incantation. After all, there
are so many of them! But seek should be just as good. However, it also
needs a while loop that tests EOF.

On my system it just doesn't work at all, setting $! to "Illegal seek".

Otherwise, the pipe writer could race with you, and write more data
after you seek, but before you read. Without a while loop testing for
EOF, you may falsely assume EOF, and close STDIN while the writer is
still sending more data, thus breaking the pipe.

But a broken pipe needn't be a problem. It is merely a condition, not
an error, unless the program decides to turn it into an error. Can you
instruct the other end of the pipe to just behave gracefully on SIGPIPE?
(Alas, bzcat can't be so instructed, as far as I can determine.) This
would be the ultimate in efficiency.

Xho

C.DeRykus · Jun 26, 2010

Quoth "C.DeRykus" <[email protected]>:

...

That's not 'just scalar context'. <>-within-while is special-cased to
assign to $_ (and check 'defined', rather than simply truth). Try

perl -we '$SIG{INT}=sub{exit}; undef $/; $x = <>; END{print}'

Yes, there's more magic to it than "just" implies but
the special-casing does include an assignment to $_
in scalar context.

John Kelly · Jun 26, 2010

But a broken pipe needn't be a problem. It is merely a condition, not
an error, unless the program decides to turn it into an error.

I prefer using:

set -e -u -o pipefail

in my bash scripts.

instruct the other end of the pipe to just behave gracefully on SIGPIPE?

To me it's like compiling with -Wall -Werror. I just do it, and fix
the warnings. I feel better knowing (or at least thinking) that my code
is clean, not sloppy.

FAQ 5.1 How do I flush/unbuffer an output filehandle? Why must I do this?	0	Apr 2, 2011
Child processes don't get the close on pipe	3	Jun 2, 2012
FAQ 8.10 How do I read and write the serial port?	0	Jan 15, 2011
newbie questions	7	May 16, 2008
Reading from standard input	5	Sep 26, 2006
Perl 5.8.x, Unicode and In-memory Filehandles	3	Mar 1, 2006
FAQ 5.3 How do I count the number of lines in a file?	0	Jan 31, 2011
reading from stdin via pipe, buffering?	4	Mar 3, 2008

bulk flush input

John Kelly

John Kelly

C.DeRykus

C.DeRykus

Ilya Zakharevich

John Kelly

C.DeRykus

John Kelly

John Kelly

Willem

John Kelly

C.DeRykus

Peter J. Holzer

John Kelly

Dr.Ruud

John Kelly

John Kelly

Xho Jingleheimerschmidt

C.DeRykus

John Kelly

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads