skipping blank array items

ccc31807 · Sep 6, 2013

I have a csv file with 10K items. The header looks something like this:
CustNo,Name,FirstPaidItem,FirstOrderedItem,FirstShippedItem,Items

The first five fields are single valued. The last field (Items) has many items. Lines may look like this:

1,Joe,a,a,a,a,b,c,d,e,f,g,h,i
2,Jane,a,a,b,a,,,,,,a,b,c,d,e
3,Jim,b,a,a,a,b,c,d,e,f,g,h,i
4,Jill,b,b,b,b,,,,,c,,b,,e,f,g,h

I parse the file like this, putting the first five values in scalars and the remainder in the @items array:
foreach line
my ($custno,$name,$paid,$ordered,$shipped,@items) = split on the commas

The objective is to take the first item in the @items array and match it toone (or more) of $paid, $ordered, or $shipped. I just want to grab the first item in the @items array that has a value, IOW, skip all the blank items(as in line 2 above) that precede the first actual value.

Is there a quick and dirty way to do this without having to do this>

my $item = '';
foreach my $ele (@items)
{
$item = $ele if $ele =~ /\w/;
last;
}

Thanks, CC.

Rainer Weikusat · Sep 6, 2013

ccc31807 said:
I have a csv file with 10K items. The header looks something like this:
CustNo,Name,FirstPaidItem,FirstOrderedItem,FirstShippedItem,Items

The first five fields are single valued. The last field (Items) has many items. Lines may look like this:

1,Joe,a,a,a,a,b,c,d,e,f,g,h,i
2,Jane,a,a,b,a,,,,,,a,b,c,d,e
3,Jim,b,a,a,a,b,c,d,e,f,g,h,i
4,Jill,b,b,b,b,,,,,c,,b,,e,f,g,h

I parse the file like this, putting the first five values in scalars and the remainder in the @items array:
foreach line
my ($custno,$name,$paid,$ordered,$shipped,@items) = split on the commas

The objective is to take the first item in the @items array and match it to one (or more) of $paid, $ordered, or $shipped. I just want to grab the first item in the @items array that has a value, IOW, skip all the blank items (as in line 2 above) that precede the first actual value.

Is there a quick and dirty way to do this without having to do this>

my $item = '';
foreach my $ele (@items)
{
$item = $ele if $ele =~ /\w/;
last;
}

use List::Util qw(first);

$item = first { /\w/ } @items;
$item = first { length() } @items;

But this is also trivial without using any module:

length() and $item = $_, last for @items;
(/\w/) and $item = $_, last for @items;

J. Gleixner · Sep 6, 2013

I have a csv file with 10K items. The header looks something like this:
CustNo,Name,FirstPaidItem,FirstOrderedItem,FirstShippedItem,Items

The first five fields are single valued. The last field (Items) has many items. Lines may look like this:

1,Joe,a,a,a,a,b,c,d,e,f,g,h,i
2,Jane,a,a,b,a,,,,,,a,b,c,d,e
3,Jim,b,a,a,a,b,c,d,e,f,g,h,i
4,Jill,b,b,b,b,,,,,c,,b,,e,f,g,h

I parse the file like this, putting the first five values in scalars and the remainder in the @items array:
foreach line
my ($custno,$name,$paid,$ordered,$shipped,@items) = split on the commas

If items doesn't need to be an array...

my ($custno,$name,$paid,$ordered,$shipped,$items) = split(/,/,$line,6);
my ( $item ) = $items =~ /(\w)/;

ccc31807 · Sep 6, 2013

use List::Util qw(first);

$item = first { /\w/ } @items;
$item = first { length() } @items;

But this is also trivial without using any module:

length() and $item = $_, last for @items;
(/\w/) and $item = $_, last for @items;

FOR iterates through the array. I have not looked at List::Util but I will tomorrow. I wanted to avoid having to iterate through the array, which may have 1000 items, and the first item with an actual value may be 500 places down.

Thanks, CC.

ccc31807 · Sep 6, 2013

If items doesn't need to be an array...

@items does not need to be an array, it could be a string. The database I mainly use is UniData, a multi-valued database where fields do not contain scalar values. Over the years, as a general rule, I find that using arrays rather than strings gives more consistent results overall, but there is no technical reason for using an array rather than a string.

my ($custno,$name,$paid,$ordered,$shipped,$items) = split(/,/,$line,6);
my ( $item ) = $items =~ /(\w)/;

Actually, all my 'items' consist of two digits, a forward slash, a character and another digit, like this: 13/T1. So the regex would be something likethis:
$items =~ m!,(\d\d/\w\d),!;
$item = $1;

Just off hand, is using a regex much faster than iterating through arrays? My script takes about 30 seconds to run and speed is not critical (I have all day to run it). What offended me was having to iterate through an array with potentially 1,000 elements just to grab the first one with a value.

Thanks, CC.

Rainer Weikusat · Sep 6, 2013

ccc31807 said:
FOR iterates through the array. I have not looked at List::Util but
I will tomorrow. I wanted to avoid having to iterate through the
array, which may have 1000 items, and the first item with an actual
value may be 500 places down.

Eh ... provided you have an array with a number of empty leading
elements you want to skip but you don't know how many, how do you
propose to do that without 'looping through the array'?

Rainer Weikusat · Sep 6, 2013

Rainer Weikusat said:
[...]

FOR iterates through the array. I have not looked at List::Util but
I will tomorrow. I wanted to avoid having to iterate through the
array, which may have 1000 items, and the first item with an actual
value may be 500 places down.

Click to expand...

Eh ... provided you have an array with a number of empty leading
elements you want to skip but you don't know how many, how do you
propose to do that without 'looping through the array'?

Byzantinely recursive implementation:

sub fne { return (shift or &fne); }

NB: This will also skip over zeroes.

George Mpouras · Sep 6, 2013

Î£Ï„Î¹Ï‚ 6/9/2013 23:06, Î¿/Î· ccc31807 ÎÎ³ÏÎ±ÏˆÎµ:

FOR iterates through the array. I have not looked at List::Util but I will tomorrow. I wanted to avoid having to iterate through the array, which may have 1000 items, and the first item with an actual value may be 500 places down.

Thanks, CC.

what you can do is to run the two scripts:
a) one that building a trie index from your csv file
b) a second that give you the answers instantly without "fors" from the
index of of the a) script
but since you want only one execution per day I do not if this make
sense, also it is not quick and dirty

J. Gleixner · Sep 6, 2013

If items doesn't need to be an array...

Click to expand...

@items does not need to be an array, it could be a string. [...]

my ($custno,$name,$paid,$ordered,$shipped,$items) = split(/,/,$line,6);
my ( $item ) = $items =~ /(\w)/;

Click to expand...

Actually, all my 'items' consist of two digits, a forward slash, a character and another digit, like this: 13/T1.

Oh, I guess my ESP isn't working again. Had you
provided that in your original post, you may have
received a more accurate solution. Regardless, we
don't really need to know your correct regex, you can
adjust it as needed without re-posting.

So the regex would be something like this:
$items =~ m!,(\d\d/\w\d),!;

Nope, the ',' in there avoid matching the first or last value. No need
to include the separator, in the regex, in this case.

$item = $1;

Always test that the match was successful, never blindly assign $1.

dumb example...

$str = '123';
$str =~ /(\d+)/;
$val = $1;

$str =~ /(blah)/;
$val2 = $1; #oops

Just off hand, is using a regex much faster than iterating through arrays? My script takes about 30 seconds to run and speed is not critical (I have all day to run it). What offended me was having to iterate through an array with potentially 1,000 elements just to grab the first one with a value.

You can answer that yourself using the Benchmark module.

ccc31807 · Sep 6, 2013

sub fne { return (shift or &fne); }

I like that.

It reminds me of how I discovered to transpose a matrix in Common Lisp.

(defun transpose (matrix) (apply #'mapcar #'list matrix))

CC.

Rainer Weikusat · Sep 7, 2013

Ben Morrow said:
my ($item) = $items =~ m!(?:^|,)(\d\d/\w\d)(?:,|$)!a;

Add /x as desired, or use

my $comma = qr/^|$|,/;

$items =~ /([^,])+/

?

Rainer Weikusat · Sep 7, 2013

Rainer Weikusat said:
Rainer Weikusat said:

[...]

length() and $item = $_, last for @items;
(/\w/) and $item = $_, last for @items;

FOR iterates through the array. I have not looked at List::Util but
I will tomorrow. I wanted to avoid having to iterate through the
array, which may have 1000 items, and the first item with an actual
value may be 500 places down.

Click to expand...

Eh ... provided you have an array with a number of empty leading
elements you want to skip but you don't know how many, how do you
propose to do that without 'looping through the array'?

Click to expand...

[...]

sub fne { return (shift or &fne); }

NB: This will also skip over zeroes.

It will also recurse forever (until perl aborts the recursion) if the
argument list didn't contain something which is regarded as true when
used in a boolean expression. This could be fixed with the even more
byzantine

sub fne { return @_ && (shift || &fne) || undef; }

Peter J. Holzer · Sep 8, 2013

I have a csv file with 10K items. The header looks something like this:
CustNo,Name,FirstPaidItem,FirstOrderedItem,FirstShippedItem,Items

The first five fields are single valued. The last field (Items) has
many items. Lines may look like this:

1,Joe,a,a,a,a,b,c,d,e,f,g,h,i
2,Jane,a,a,b,a,,,,,,a,b,c,d,e
3,Jim,b,a,a,a,b,c,d,e,f,g,h,i
4,Jill,b,b,b,b,,,,,c,,b,,e,f,g,h

I parse the file like this, putting the first five values in scalars
and the remainder in the @items array:
foreach line
my ($custno,$name,$paid,$ordered,$shipped,@items) = split on the commas

The objective is to take the first item in the @items array and match
it to one (or more) of $paid, $ordered, or $shipped. I just want to
grab the first item in the @items array that has a value, IOW, skip
all the blank items (as in line 2 above) that precede the first actual
value.

Is there a quick and dirty way to do this without having to do this>

If $custno, $name, $paid, $ordered, $shipped are guaranteed to be
non-empty, you could split on /,+/ to skip empty items, and if you need
only the first, use a scalar instead of an array.

hp

ccc31807 · Sep 8, 2013

If $custno, $name, $paid, $ordered, $shipped are guaranteed to be
non-empty, you could split on /,+/ to skip empty items, and if you need
only the first, use a scalar instead of an array.

Unfortunately,. only the first two items are guaranteed. This is the real world, and sometimes items are shipped without being ordered, sometimes items are ordered without being shipped, and strange as it may be, items are paid for without being ordered or shipped. The first thing I had to do was tovalidate the file, and each of the shipped, paid, and ordered columns had an error rate of around 20 percent. In this instance, it's troubling but not a bid deal, as the real critical piece of information is the value of thefirst item in the @items array.

CC.

Willem · Sep 8, 2013

ccc31807 wrote:
) I have a csv file with 10K items. The header looks something like this:
) CustNo,Name,FirstPaidItem,FirstOrderedItem,FirstShippedItem,Items
)
) The first five fields are single valued. The last field (Items) has many
) items. Lines may look like this:
)
) 1,Joe,a,a,a,a,b,c,d,e,f,g,h,i
) 2,Jane,a,a,b,a,,,,,,a,b,c,d,e
) 3,Jim,b,a,a,a,b,c,d,e,f,g,h,i
) 4,Jill,b,b,b,b,,,,,c,,b,,e,f,g,h
)
) I parse the file like this, putting the first five values in scalars and
) the remainder in the @items array:
) foreach line
) my ($custno,$name,$paid,$ordered,$shipped,@items) = split on the commas
)
) The objective is to take the first item in the @items array and match it
) to one (or more) of $paid, $ordered, or $shipped. I just want to grab the
) first item in the @items array that has a value, IOW, skip all the blank
) items (as in line 2 above) that precede the first actual value.
)
) Is there a quick and dirty way to do this without having to do this>
)
) my $item = '';
) foreach my $ele (@items)
) {
) $item = $ele if $ele =~ /\w/;
) last;
) }

I haven't seen this one yet (perhaps I missed it):

my ($custno,$name,$paid,$ordered,$shipped,@items) = split /,+/;

But I believe it's the simplest way to get what you want.

To explain: This splits on one-or-more-commas. (Split takes a regex.)

SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

Blue J Ciphertext Program	2	Nov 22, 2023
An empty initializer is invalid for an array with unspecified bound	0	Jul 1, 2020
PHP RSS Feed Aggregator changing to todays date everytime feed is aggregated	1	Jan 10, 2022
Python3 string slicing	2	Dec 11, 2023
My Status, Ciphertext	2	Nov 27, 2023
Array of structs function pointer	10	Jul 16, 2023
How to play corresponding sound?	2	Jun 10, 2023
Survey details won't go through using php, ajax, Mysql	3	Oct 25, 2023

skipping blank array items

ccc31807

Rainer Weikusat

J. Gleixner

ccc31807

ccc31807

Rainer Weikusat

Rainer Weikusat

George Mpouras

J. Gleixner

ccc31807

Rainer Weikusat

Rainer Weikusat

Peter J. Holzer

ccc31807

Willem

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads