bulk flush input

J

John Kelly

#!/usr/bin/perl

use strict;
use warnings;

my $data;

while (<>) {
chomp;
if (!$data && $_) {
$data = $_;
}
}

print "data=$data\n";


This code reads STDIN and remembers the first non-empty line. That's
all it cares about.

But it also keeps reading till EOF, acting like the "cat" utility, to
flush the extra input and avoid broken pipe errors.

But reading line by line, just to throw away the unwanted garbage, is
inefficient. I would like to jump out of the loop and "bulk flush" the
remaining input stream.

I don't think
$io->flush
flush causes perl to flush any buffered data at the perlio
api level. Any unread data in the buffer will be discarded,

will work, because that it flushes the buffer, not the entire input
stream. I want to flush the whole file. And keep in mind, I don't want
a broken pipe either.

Suggestions?
 
J

John Kelly

No, it remembers the first line that doesn't evaluate to boolean false.
Since you are chomping the lines, a line containing only "0" will be
considered 'empty'. You want to check length $data.

Yeah, shot myself in the foot again.


my $data;
while (<>) {
chomp;
length or next;
$data = $_;
last;
}
{
local $/ = \2048;
1 while <>;
}

Ben

That looks interesting, I get the first part, the rest will give me
something to chew on ...

Thanks.
 
C

C.DeRykus

Yeah, shot myself in the foot again.






That looks interesting,  I get the first part, the rest will give me
something to chew on ...

But, this'll just read/toss 2048 byte chunks
till EOF. Alternatively, if you want to toss
the entire stream after exiting the loop, a
single statement:

<>;

does what you want due to the list context. I
suspect there's little to gain by changing $/.
 
C

C.DeRykus

But, this'll just read/toss 2048 byte chunks
till EOF. Alternatively, if you want to toss
the entire stream after exiting the loop, a
single statement:

     <>;
^^^^^^^

() = <>;
 
J

John Kelly

But, this'll just read/toss 2048 byte chunks
till EOF. Alternatively, if you want to toss
the entire stream after exiting the loop, a
single statement:

<>;

does what you want due to the list context. I
suspect there's little to gain by changing $/.

That doesn't look like list context to me. Testing a small data set
containing:
one
two

three

four

with this code:

-----------------------

#!/usr/bin/perl

use strict;
use warnings;

my $data = '';

while (<>) {
chomp;
/^\s*$/ and next;
$data = $_;
print "data=\"$data\"\n";
last;
}

<> or die "1 EOF\n";
<> or die "2 EOF\n";
<> or die "3 EOF\n";
<> or die "4 EOF\n";
<> or die "5 EOF\n";
<> or die "6 EOF\n";
<> or die "7 EOF\n";
<> or die "8 EOF\n";
<> or die "9 EOF\n";

close STDIN;


--------------------------------

produces output:
data="one"
6 EOF


Which seems to prove that a bare <> is scalar context.
 
C

C.DeRykus

Try to do it with a terabyte file...

Hm, sounds like I need to look more closely...

So a humongous temp array gets built with only
the resulting assignment being optimized away...?

--
Charles DeRykus

perl -MO=Concise -e "()=<>"
8 <@> leave[1 ref] vKP/REFC ->(end)
1 <0> enter ->2
2 <;> nextstate(main 1 -e:1) v:{ ->3
7 <2> aassign[t3] vKS ->8
- <1> ex-list lK ->6
3 <0> pushmark s ->4
5 <1> readline[t2] lK/1 ->6
4 <#> gv[*ARGV] s ->5
- <1> ex-list lK ->7
6 <0> pushmark s ->7
- <0> stub lPRM* ->-
 
J

John Kelly

Thanks for the ideas, wrong ones too. They taught me more Perl, and got
me pointed in the right direction.
 
W

Willem

John Kelly wrote:
<snip>
)<> or die "1 EOF\n";
)<> or die "2 EOF\n";
)<> or die "3 EOF\n";
)<> or die "4 EOF\n";
)<> or die "5 EOF\n";
)<> or die "6 EOF\n";
)<> or die "7 EOF\n";
)<> or die "8 EOF\n";
)<> or die "9 EOF\n";
<snip>
) produces output:
)
)>data="one"
)>6 EOF
)
)
) Which seems to prove that a bare <> is scalar context.

No, it proves that the left-hand side of 'or' has scalar context.

I'm pretty sure that a bare <> has void context, (which usually
translates to scalar context).

It is really a lot faster to change the line separator.
Setting it to \2048 means that it will always read that many bytes,
and undef'ing it would mean it will read the whole rest of the file.
I don't know if <> is smart enough to recognize void context though,
can probably be tested with a large file and a memory checker tool.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
 
J

John Kelly

John Kelly wrote:
<snip>
)<> or die "1 EOF\n";
)<> or die "2 EOF\n";
)<> or die "3 EOF\n";
)<> or die "4 EOF\n";
)<> or die "5 EOF\n";
)<> or die "6 EOF\n";
)<> or die "7 EOF\n";
)<> or die "8 EOF\n";
)<> or die "9 EOF\n";
<snip>
) produces output:
)
)>data="one"
)>6 EOF
)
)
) Which seems to prove that a bare <> is scalar context.

No, it proves that the left-hand side of 'or' has scalar context.

Maybe. But when I do this:
<>;
<>;
<> or die "3 EOF\n";
<> or die "4 EOF\n";
<> or die "5 EOF\n";
<> or die "6 EOF\n";
<> or die "7 EOF\n";
<> or die "8 EOF\n";
<> or die "9 EOF\n";

The output is the same, showing the first two <> diamonds read only one
I'm pretty sure that a bare <> has void context, (which usually
translates to scalar context).

From ISBN 0-596-00027-8:
2.7.3. Void Context
Another peculiar kind of scalar context is the void context. This
context not only doesn't care what the return value's type is, it
doesn't even want a return value. From the standpoint of how functions
work, it's no different from an ordinary scalar context.

It is really a lot faster to change the line separator.

Setting it to \2048 means that it will always read that many bytes,

Right. That's what I decided to use, although 4096.

and undef'ing it would mean it will read the whole rest of the file.

Which I do NOT want.

I don't know if <> is smart enough to recognize void context though,
can probably be tested with a large file and a memory checker tool.

No need for the big hammer, considering the text of 2.7.3, and my test
code.
 
C

C.DeRykus

John Kelly wrote:

<snip>
)<> or die "1 EOF\n";
)<> or die "2 EOF\n";
)<> or die "3 EOF\n";
)<> or die "4 EOF\n";
)<> or die "5 EOF\n";
)<> or die "6 EOF\n";
)<> or die "7 EOF\n";
)<> or die "8 EOF\n";
)<> or die "9 EOF\n";
<snip>
) produces output:
)
)>data="one"
)>6 EOF
)
)
) Which seems to prove that a bare <> is scalar context.

No, it proves that the left-hand side of 'or' has scalar context.

I'm pretty sure that a bare <> has void context, (which usually
 translates to scalar context).


Yes, bare <> does have void context:

perl -we '$SIG{INT}=sub{exit};undef $/; <> ; END{print}'
foo
bar
^CUse of uninitialized value in print at -e line 1.

whereas, with just scalar context:

perl -we '$SIG{INT}=sub{exit};undef $/; 1 while <> ; END{print}'
foo
bar
^Cfoo
It is really a lot faster to change the line separator.
Setting it to \2048 means that it will always read that many bytes,
and undef'ing it would mean it will read the whole rest of the file.
I don't know if <> is smart enough to recognize void context though,
can probably be tested with a large file and a memory checker tool.

So this'd be very fast.

undef $/;
<>;

Unfortunately, like () = <>, there's potentially
grave impact to a foot:

perl -we '@ARGV=("big.txt"); undef $/; <>'
Out of memory during "large" request ...
 
J

John Kelly

There is no "bare" <>.

The <> in

<>;

is in a statement.

The statement provides the context for <> (or for whatever else is
in the stmt).

I meant "bare" without a while or foreach which forcibly provide
context.
What is ISBN 0-596-00027-8?

Programming Perl 3rd edition.
Why do you quote that?
That is, what point are you trying to make by quoting that?

It says void context is a "kind of scalar context." So according to the
author, void == scalar for purposes of knowing whether <> reads one line
or many.

I almost called Willem on the "usually" part, until I re-read
the "Context" section in

perldoc perldata

...

User-defined subroutines may choose to care whether they are being
called in a void, scalar, or list context.

void context always translates to scalar context for built-in functions.

void context usually translates to scalar context for user-defined functions.


So the nameless book you quote above is in error.

I see it gets deep.

It should have qualified "functions":

From the standpoint of how built-in functions work, ...




That's done easily enough with a simple one-liner:

bash-4.0$ perl -we '<>'
foo
bash-4.0$

Since the program exited after only one input line, then the
readline() must have been in scalar context, else it would have
waited for more input rather than exit()ing. As we can see with:

perl -we 'print <>'

Note that here print() provides the (list) context for <>, and that
the statement provides the (void) context for print().
Since it is a built-in, void context is treated the same as scalar context.

My stupid test verified that.
 
D

Dr.Ruud

Ben said:
my $data;
while (<>) {
chomp;
length or next;
$data = $_;
last;
}
{
local $/ = \2048;
1 while <>;
}

This nicely keeps memory usage limited.
I always use 4096.

An alternative is to seek to the end:

seek STDIN, 0, SEEK_END;
<>;
 
J

John Kelly

This nicely keeps memory usage limited.
I always use 4096.

I decided to use 4096 too. I also replaced the "length" test with a
regex, to ignore lines containing only superfluous whitespace, prior to
the first line of data:

/^\s*$/ and next;

An alternative is to seek to the end:

seek STDIN, 0, SEEK_END;
<>;

I posted, hoping for some magical Perl incantation. After all, there
are so many of them! But seek should be just as good. However, it also
needs a while loop that tests EOF.

Otherwise, the pipe writer could race with you, and write more data
after you seek, but before you read. Without a while loop testing for
EOF, you may falsely assume EOF, and close STDIN while the writer is
still sending more data, thus breaking the pipe.
 
J

John Kelly

Err... no. Pipes are not seekable, so the seek will simply fail. (Ruud
should have checked the return value of seek for exactly this reason.)


I see there's no rabbit in that hat:

man lseek
ERRORS
EBADF fd is not an open file descriptor.

EINVAL whence is not one of SEEK_SET, SEEK_CUR, SEEK_END; or the
resulting file offset would be negative, or beyond the end of a
seekable device.

EOVERFLOW
The resulting file offset cannot be represented in an off_t.

ESPIPE fd is associated with a pipe, socket, or FIFO.

It was too good to be true. What was I thinking ...
 
X

Xho Jingleheimerschmidt

John said:
I posted, hoping for some magical Perl incantation. After all, there
are so many of them! But seek should be just as good. However, it also
needs a while loop that tests EOF.

On my system it just doesn't work at all, setting $! to "Illegal seek".
Otherwise, the pipe writer could race with you, and write more data
after you seek, but before you read. Without a while loop testing for
EOF, you may falsely assume EOF, and close STDIN while the writer is
still sending more data, thus breaking the pipe.

But a broken pipe needn't be a problem. It is merely a condition, not
an error, unless the program decides to turn it into an error. Can you
instruct the other end of the pipe to just behave gracefully on SIGPIPE?
(Alas, bzcat can't be so instructed, as far as I can determine.) This
would be the ultimate in efficiency.

Xho
 
C

C.DeRykus

Quoth "C.DeRykus" <[email protected]>:



...


That's not 'just scalar context'. <>-within-while is special-cased to
assign to $_ (and check 'defined', rather than simply truth). Try

    perl -we '$SIG{INT}=sub{exit}; undef $/; $x = <>; END{print}'

Yes, there's more magic to it than "just" implies but
the special-casing does include an assignment to $_
in scalar context.
 
J

John Kelly

But a broken pipe needn't be a problem. It is merely a condition, not
an error, unless the program decides to turn it into an error.

I prefer using:

set -e -u -o pipefail

in my bash scripts.

instruct the other end of the pipe to just behave gracefully on SIGPIPE?

To me it's like compiling with -Wall -Werror. I just do it, and fix
the warnings. I feel better knowing (or at least thinking) that my code
is clean, not sloppy.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top