No more than N element of an array

T

Tim McDaniel

A cow-orker is coding something: he gets an array of results, but
wants to take no more than the first 1000 elements. His suggestion was

@results = @results[0 .. (MAXSEARCHRESULTS - 1, $#results)[MAXSEARCHRESULTS - 1 > $#results ]];

He felt it was "awesome". I thought it was way too cute.

Of course there's
@results = @results[0 .. (MAXSEARCHRESULTS - 1 > $#results ? $#results : MAXSEARCHRESULTS - 1)];
It has the same number of uses of variables, just in a different order.

I considered plain @results[0 .. MAXSEARCHRESULTS]. It works if the
array is longer than the limit, but pads with undef if shorter:

$ perl -e 'use strict; use warnings; my @a = (0, 1, 2, 3, 4, 5, 6); @a = @a[0..10]; print join(",", @a), "\n"'
Use of uninitialized value $a[7] in join or string at -e line 1.
Use of uninitialized value $a[8] in join or string at -e line 1.
Use of uninitialized value $a[9] in join or string at -e line 1.
Use of uninitialized value $a[10] in join or string at -e line 1.
0,1,2,3,4,5,6,,,,

I thought of splice.

$ perl -e 'use strict; use warnings; my @a = (0, 1, 2, 3, 4, 5, 6); splice @a, 3; print join(",", @a), "\n"'
0,1,2
$ perl -e 'use strict; use warnings; my @a = (0, 1, 2, 3, 4, 5, 6); splice @a, 10; print join(",", @a), "\n"'
splice() offset past end of array at -e line 1.
0,1,2,3,4,5,6
$ perl -e 'print $], "\n"'
5.014002

But that's on my ISP. On our office machines, no warning.

$ perl -e 'use strict; use warnings; my @a = (0, 1, 2, 3, 4, 5, 6); splice @a, 3; print join(",", @a), "\n"'
0,1,2
$ perl -e 'use strict; use warnings; my @a = (0, 1, 2, 3, 4, 5, 6); splice @a, 10; print join(",", @a), "\n"'
0,1,2,3,4,5,6
$ perl -e 'print $], "\n"'
5.016002
$ perlfunc splice | cat
....
If OFFSET is past the end of the array,
Perl issues a warning, and splices at the end of the array.
....

So is splice likely to ever output such a warning again, so the 5.16
documentation is simply outdated? Or was that just an error in 5.16
and the warning now comes out in 5.18?

To protect against that,
splice @results, MAXSEARCHRESULTS if @results > MAXSEARCHRESULTS;
Doable (warning: that's untested code). But it's not as pretty.

Is there a cleaner way?

(As it happens, he now has to check the limit in an if anyway for
another purpose, so he knows that @results[0 .. MAXSEARCHRESULTS - 1]
won't go past the end, so the question is now moot. I'm still
curious.)
 
R

Rainer Weikusat

A cow-orker is coding something: he gets an array of results, but
wants to take no more than the first 1000 elements. His suggestion was

@results = @results[0 .. (MAXSEARCHRESULTS - 1, $#results)[MAXSEARCHRESULTS - 1 > $#results ]];

He felt it was "awesome". I thought it was way too cute.

I suggest 'awful' instead:

$#results = MAXSEARCHRESULTS - 1 unless $#results < MAXSEARCHRESULTS;
 
R

Rainer Weikusat

Rainer Weikusat said:
A cow-orker is coding something: he gets an array of results, but
wants to take no more than the first 1000 elements. His suggestion was

@results = @results[0 .. (MAXSEARCHRESULTS - 1, $#results)[MAXSEARCHRESULTS - 1 > $#results ]];

He felt it was "awesome". I thought it was way too cute.

I suggest 'awful' instead:

$#results = MAXSEARCHRESULTS - 1 unless $#results < MAXSEARCHRESULTS;

In case someone loves fancy calculations, this can also be written as

$#results -= (@results - MAXSEARCHRESULTS) * (@results > MAXSEARCHRESULTS);

:->
 
C

Charles DeRykus

Quoth (e-mail address removed):
A cow-orker is coding something: he gets an array of results, but
wants to take no more than the first 1000 elements. His suggestion was

@results = @results[0 .. (MAXSEARCHRESULTS - 1,
$#results)[MAXSEARCHRESULTS - 1 > $#results ]];

...
$ perl -e 'use strict; use warnings; my @a = (0, 1, 2, 3, 4, 5, 6);
splice @a, 3; print join(",", @a), "\n"'
0,1,2
$ perl -e 'use strict; use warnings; my @a = (0, 1, 2, 3, 4, 5, 6);
splice @a, 10; print join(",", @a), "\n"' it's not as pretty.
...
Is there a cleaner way?

Perhaps

@results = splice @results, 0, MAXSEARCHRESULTS;

If fewer strokes were to factor into cleanliness, even:

@results = @results[0..MAXSEARCHRESULTS];


But, as you grow array size and MAXSEARCHRESULTS, it gets filthy slow...

Setting $#results = MAXSEARCHRESULTS undoubtedly comes out of the wash
purest and fastest.
 
C

Charles DeRykus

Quoth Charles DeRykus said:
Quoth (e-mail address removed):
A cow-orker is coding something: he gets an array of results, but
wants to take no more than the first 1000 elements. His suggestion was [...]
Is there a cleaner way?

Perhaps

@results = splice @results, 0, MAXSEARCHRESULTS;

If fewer strokes were to factor into cleanliness, even:

@results = @results[0..MAXSEARCHRESULTS];

Tim already pointed out that this returns extraneous undefs if @results
is too short.

Sigh, I missed it.

You could tweak it via with [0..min($#results,MAXSEARCRESULTS)] but,
aside from the purist's objection of adding a module, it should be DOA
anyway with its inefficiency (I'm guessing that it does more copying so
is slower).
It's probably easiest, though turning off the warning and using Tim's

splice @results, MAXSEARCHRESULTS - 1;

is probably better, on balance.

Turning off a warning category seems slightly unclean to me... even
though it's only because of the earlier version.


Using $#ary as an lvalue has some
permanent side-effects on the array; you can see them with Devel::peek.

[The side-effects are to do with the fact that $#ary is a scalar lvalue
and \$#ary should return a ref to the same scalar every time, so we need
an actual permanent scalar somewhere, which turns out to get stored in
the array's magic.]

Interesting. IIUC any real downside other than the extra storage in
magic? I thought I remembered truncating via $#ary doesn't return the
memory to the process unlike undef @ary.
 
P

Peter J. Holzer

Quoth Charles DeRykus said:
@results = @results[0..MAXSEARCHRESULTS];

Tim already pointed out that this returns extraneous undefs if @results
is too short.

Sigh, I missed it.

You could tweak it via with [0..min($#results,MAXSEARCRESULTS)] but,
aside from the purist's objection of adding a module, it should be DOA
anyway with its inefficiency (I'm guessing that it does more copying so
is slower).

Why should @results[0..min($#results,MAXSEARCRESULTS)] do more copying
than @results[0..MAXSEARCHRESULTS]?

hp
 
C

Charles DeRykus

Quoth Charles DeRykus <[email protected]>:
@results = @results[0..MAXSEARCHRESULTS];

Tim already pointed out that this returns extraneous undefs if @results
is too short.

Sigh, I missed it.

You could tweak it via with [0..min($#results,MAXSEARCRESULTS)] but,
aside from the purist's objection of adding a module, it should be DOA
anyway with its inefficiency (I'm guessing that it does more copying so
is slower).

Why should @results[0..min($#results,MAXSEARCRESULTS)] do more copying
than @results[0..MAXSEARCHRESULTS]?

The min(..) tweak was to eliminate the padding with undef that occurs if
MAXSEARCHRESULTS > $#results. In general, I was referring to a
supposition (guessing as I put it) that @results = @results[...] will
likely do more copying than say splice(@results,MAXSEARCHRESULTS). At
any rate, it's slower.
 
T

Tim McDaniel

You could tweak it via with [0..min($#results,MAXSEARCRESULTS)]

There's a min sub somewhere?

$ perl -e 'my $a = min(5,8); print $a, "\n"'
Undefined subroutine &main::min called at -e line 1.

Since this is my ork-place, I don't have control of modules.
 
I

Ivan Shmakov

Tim McDaniel said:
Charles DeRykus <[email protected]> wrote:
You could tweak it via with [0..min($#results,MAXSEARCRESULTS)]
There's a min sub somewhere?
$ perl -e 'my $a = min(5,8); print $a, "\n"'
Undefined subroutine &main::min called at -e line 1.
Since this is my ork-place, I don't have control of modules.

How so? Doesn't $ export PERLLIB="$HOME"/.perl/modules help,
for instance?

--cut: ~/.bash_profile --
CPAN=${HOME}/.cpan
cpan_pfx=${CPAN}/prefix
perl_ver=5.14.2
PERLLIB=${cpan_pfx}/lib/perl/5.14.2\
:${cpan_pfx}/share/perl/5.14.2\
:${cpan_pfx}/lib/perl5\
:${cpan_pfx}/lib/perl\
:${cpan_pfx}/share/perl5\
:${cpan_pfx}/lib/perl/5.14\
:${cpan_pfx}/share/perl/5.14\
:${cpan_pfx}/lib/perl5/x86_64-linux-gnu-thread-multi

export CPAN PERLLIB
--cut: ~/.bash_profile --
 
R

Rainer Weikusat

Ben Morrow said:
Quoth Charles DeRykus said:
On 7/25/2013 9:12 PM, Ben Morrow wrote:
[...]
Setting $#results = MAXSEARCHRESULTS undoubtedly comes out of the wash
purest and fastest.

It's probably easiest, though turning off the warning and using Tim's

splice @results, MAXSEARCHRESULTS - 1;

is probably better, on balance. Using $#ary as an lvalue has some
permanent side-effects on the array; you can see them with Devel::peek.

[The side-effects are to do with the fact that $#ary is a scalar lvalue
and \$#ary should return a ref to the same scalar every time, so we need
an actual permanent scalar somewhere, which turns out to get stored in
the array's magic.]

While there is little reason to prefer one or the other, I
nevertheless want to make an argument in favor of assigning to $#ary:

'Splicing' usually refers to connecting things together. This can
still be seen in the '4 argument splice' which 'works' the contents of
a list into an array. It is a more general 'array element manipulation
operator' in Perl but statements like

"splice(@a, @a, 0, $x, $y) is equivalent to push(@a, $x, $y)"
[paraphrase of a part of 'perldoc -f splice']

remind me in an uncanny way of something I read about 'lambda
calculus' a while ago: The statement basically was "This (short string
of incomprehensible symbols) can be simplified to that (extremely long
string of incomprehensible symbols)". At this point, I concluded that
either me or the author of the text had obviously lost the plot and
that I'd rather continue to live in my own little world than endure
more relevations of this kind. The splice-operation I quoted above is
similar to this, expressing a relatively simple 'well-known' operation
in a more complicated way than necessary by invoking splice with two
additional arguments (compared to push) in order to work around the
'actual' semantics of the 4-argument splice, namely, replace some run
of array elements with a run of other "datas" (datums?). Making the
simple appear complicated may be good for achieving a "Wow!" effect
but it isn't a good strategy for software: Things tend to get
complicated on their own and the more complicated the simple stuff
already is, the less complicated the system as a whole can become
before it collapses under its own weight. That splice can be made to
perform many different array manipulation tasks IMO means it is
ill-defined and should usually be avoided. For the case at hand,
namely, truncating an array without knowing if it needs to be
truncated, the 'clever way to use splice' is actually so 'ill' that
perl even issues a warning for it. While I don't usually use Perl
runtime warnings and would recommend to disregard them most of the
time, this is at least a clear hint that someone considered this to be
a rather bizarre way to express a particular operation.

In contrast to this, 'assigning to $#ary' has the defined meaning of
'change the length of the array', maybe in a way peculiar to Perl (I
don't know of anything similar in another language) but "Perl written
in a way peculiar to Perl" is, in my opinion, not generally a bad
thing, more so if this means 'the code becomes simpler'. I do
consider

$#a = SOMETHING - 1 unless $#a < SOMETHING;

simpler than the more 'elegant' splice(@a, SOMETHING) which
performs the same 'Do we actually need to do something?' check
internally as part of validating its arguments, because it plainly
states the intent of the code: Get rid of the excess elements unless
there aren't any.

I don't think that 'But my arrays get the measles when I do that!'
is a valid counterargument: The magic scalar needs to be created in
order to perform this operation via assignment and once it has been
created, it makes sense to keep it around unless memory is very tight:
If the array is short-lived, freeing it at the same time its container
dies instead of immediately won't make a noticeable difference but if
it is long-lived, it will likely be needed again, at least because
this codepath will be taken again, and then, the proxy scalar doesn't
need to be created again.
 
T

Tim McDaniel

How so? Doesn't $ export PERLLIB="$HOME"/.perl/modules help,
for instance?

This is not for my personal laptop. This is for production code on a
Web server. I can check in new files, so I could download some module
and put it in our section of the tree. But they have reasonable
suspicions of third-party code and it's generally better to use
builtins.

Or, of course, it would be easy for me to code a min sub.
 
R

Rainer Weikusat

Ben Morrow said:
'Splice' is not an ideal name for the operation; however, it's no worse
than 'substr', which is exactly the same operator on strings.

They're sort-of the inverse of each other in this respect: The
4-argument splice actually splices something (in a sense at least -- I
figure that whoever invented this name wasn't a conscript mariner in
some navy :), the other three don't: They seem to have 'grown' on the
splice implementation because it happened to be a suitable environment
for them. For substr, the same three cases actually extract substrings
from a string while the 4-argument one does something different.
It is a more general 'array element manipulation
operator' in Perl but statements like

"splice(@a, @a, 0, $x, $y) is equivalent to push(@a, $x, $y)"
[paraphrase of a part of 'perldoc -f splice'] [...]
The splice-operation I quoted above is
similar to this, expressing a relatively simple 'well-known' operation
in a more complicated way than necessary by invoking splice with two
additional arguments (compared to push) in order to work around the
'actual' semantics of the 4-argument splice, namely, replace some run
of array elements with a run of other "datas" (datums?). Making the
simple appear complicated may be good for achieving a "Wow!" effect
but it isn't a good strategy for software:

Did it occur to you that this was not intended to explain what 'push'
does, but rather to help explain what 'splice' does?

I think it was intended to explain what splice can be made to do by
feeding 'cleverly selected arguments' to it.
I would agree with you that the push is a simpler expression than
the splice, but for example the similar equivalence

shift(@a) splice(@a, 0, 1)

shows you how to use splice to shift multiple elements at once.

Conceptually, the Perl splice can be thought of as a combination of
two 'primitive operations', namely a

remove(@array, $offset, $length)

which removes @array[$offset .. $offset + $length - 1] from @array and
a

insert(@array, $offset, @list)

which inserts the elements on @list into @array starting at offset
$offset. The latter can be expressed in Perl as

splice(@array, $offset, 0, @list)

(another 'clever abuse'). There doesn't seem to be any good reason for
combining both in this way except that this means the implementation
can be 'smart' wrt changing the array length for the 'splicing'
splice case. Where Perl provides another 'built-in' way to perform a
particular array manipulation, ie, shift/unshift, push/pop and
assignment to $#array for truncation, the 'generic splice' should IMHO
be avoided because of this 'optimized combo-opness'.
 
C

Charles DeRykus

On 7/30/2013 9:52 AM, Rainer Weikusat wrote:
....
Conceptually, the Perl splice can be thought of as a combination of
two 'primitive operations', namely a

remove(@array, $offset, $length)

which removes @array[$offset .. $offset + $length - 1] from @array and
a
...

"Splicing" in a bit of a tangent here... the unsuspecting might think,
however briefly, that 'delete' on an array could pinch hit for 'remove'
above. Except that 'delete' on arrays was DWIM-challenged and almost
never what you wanted.

'delete' on arrays has been euthanized, ie, 'deprecated' (which is
euthanasia...just dragging it on for years).

A remove/zap/delete for arrays though would fill the gap nicely, maybe a
List::Util function that, passed an array and an indices list, eg, some
faster equivalent of:

sub zap(+@) { die "not an array ref unless ref $_[0] eq 'ARRAY';
splice( $_[0], $_ ,1 ) for reverse sort @_[1..$#_]; }

called like, eg, zap( @results, MAXSEARCHRESULTS..$#results)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,012
Latest member
RoxanneDzm

Latest Threads

Top