skipping blank array items

C

ccc31807

I have a csv file with 10K items. The header looks something like this:
CustNo,Name,FirstPaidItem,FirstOrderedItem,FirstShippedItem,Items

The first five fields are single valued. The last field (Items) has many items. Lines may look like this:

1,Joe,a,a,a,a,b,c,d,e,f,g,h,i
2,Jane,a,a,b,a,,,,,,a,b,c,d,e
3,Jim,b,a,a,a,b,c,d,e,f,g,h,i
4,Jill,b,b,b,b,,,,,c,,b,,e,f,g,h

I parse the file like this, putting the first five values in scalars and the remainder in the @items array:
foreach line
my ($custno,$name,$paid,$ordered,$shipped,@items) = split on the commas

The objective is to take the first item in the @items array and match it toone (or more) of $paid, $ordered, or $shipped. I just want to grab the first item in the @items array that has a value, IOW, skip all the blank items(as in line 2 above) that precede the first actual value.

Is there a quick and dirty way to do this without having to do this>

my $item = '';
foreach my $ele (@items)
{
$item = $ele if $ele =~ /\w/;
last;
}

Thanks, CC.
 
R

Rainer Weikusat

ccc31807 said:
I have a csv file with 10K items. The header looks something like this:
CustNo,Name,FirstPaidItem,FirstOrderedItem,FirstShippedItem,Items

The first five fields are single valued. The last field (Items) has many items. Lines may look like this:

1,Joe,a,a,a,a,b,c,d,e,f,g,h,i
2,Jane,a,a,b,a,,,,,,a,b,c,d,e
3,Jim,b,a,a,a,b,c,d,e,f,g,h,i
4,Jill,b,b,b,b,,,,,c,,b,,e,f,g,h

I parse the file like this, putting the first five values in scalars and the remainder in the @items array:
foreach line
my ($custno,$name,$paid,$ordered,$shipped,@items) = split on the commas

The objective is to take the first item in the @items array and match it to one (or more) of $paid, $ordered, or $shipped. I just want to grab the first item in the @items array that has a value, IOW, skip all the blank items (as in line 2 above) that precede the first actual value.

Is there a quick and dirty way to do this without having to do this>

my $item = '';
foreach my $ele (@items)
{
$item = $ele if $ele =~ /\w/;
last;
}

use List::Util qw(first);

$item = first { /\w/ } @items;
$item = first { length() } @items;

But this is also trivial without using any module:

length() and $item = $_, last for @items;
(/\w/) and $item = $_, last for @items;
 
J

J. Gleixner

I have a csv file with 10K items. The header looks something like this:
CustNo,Name,FirstPaidItem,FirstOrderedItem,FirstShippedItem,Items

The first five fields are single valued. The last field (Items) has many items. Lines may look like this:

1,Joe,a,a,a,a,b,c,d,e,f,g,h,i
2,Jane,a,a,b,a,,,,,,a,b,c,d,e
3,Jim,b,a,a,a,b,c,d,e,f,g,h,i
4,Jill,b,b,b,b,,,,,c,,b,,e,f,g,h

I parse the file like this, putting the first five values in scalars and the remainder in the @items array:
foreach line
my ($custno,$name,$paid,$ordered,$shipped,@items) = split on the commas

If items doesn't need to be an array...

my ($custno,$name,$paid,$ordered,$shipped,$items) = split(/,/,$line,6);
my ( $item ) = $items =~ /(\w)/;
 
C

ccc31807

use List::Util qw(first);



$item = first { /\w/ } @items;
$item = first { length() } @items;



But this is also trivial without using any module:



length() and $item = $_, last for @items;
(/\w/) and $item = $_, last for @items;

FOR iterates through the array. I have not looked at List::Util but I will tomorrow. I wanted to avoid having to iterate through the array, which may have 1000 items, and the first item with an actual value may be 500 places down.

Thanks, CC.
 
C

ccc31807

If items doesn't need to be an array...

@items does not need to be an array, it could be a string. The database I mainly use is UniData, a multi-valued database where fields do not contain scalar values. Over the years, as a general rule, I find that using arrays rather than strings gives more consistent results overall, but there is no technical reason for using an array rather than a string.
my ($custno,$name,$paid,$ordered,$shipped,$items) = split(/,/,$line,6);
my ( $item ) = $items =~ /(\w)/;

Actually, all my 'items' consist of two digits, a forward slash, a character and another digit, like this: 13/T1. So the regex would be something likethis:
$items =~ m!,(\d\d/\w\d),!;
$item = $1;

Just off hand, is using a regex much faster than iterating through arrays? My script takes about 30 seconds to run and speed is not critical (I have all day to run it). What offended me was having to iterate through an array with potentially 1,000 elements just to grab the first one with a value.

Thanks, CC.
 
R

Rainer Weikusat

ccc31807 said:
FOR iterates through the array. I have not looked at List::Util but
I will tomorrow. I wanted to avoid having to iterate through the
array, which may have 1000 items, and the first item with an actual
value may be 500 places down.

Eh ... provided you have an array with a number of empty leading
elements you want to skip but you don't know how many, how do you
propose to do that without 'looping through the array'?
 
R

Rainer Weikusat

Rainer Weikusat said:
[...]
FOR iterates through the array. I have not looked at List::Util but
I will tomorrow. I wanted to avoid having to iterate through the
array, which may have 1000 items, and the first item with an actual
value may be 500 places down.

Eh ... provided you have an array with a number of empty leading
elements you want to skip but you don't know how many, how do you
propose to do that without 'looping through the array'?

Byzantinely recursive implementation:

sub fne { return (shift or &fne); }

NB: This will also skip over zeroes.
 
G

George Mpouras

Στις 6/9/2013 23:06, ο/η ccc31807 έγÏαψε:
FOR iterates through the array. I have not looked at List::Util but I will tomorrow. I wanted to avoid having to iterate through the array, which may have 1000 items, and the first item with an actual value may be 500 places down.

Thanks, CC.


what you can do is to run the two scripts:
a) one that building a trie index from your csv file
b) a second that give you the answers instantly without "fors" from the
index of of the a) script
but since you want only one execution per day I do not if this make
sense, also it is not quick and dirty
 
J

J. Gleixner

If items doesn't need to be an array...

@items does not need to be an array, it could be a string. [...]
my ($custno,$name,$paid,$ordered,$shipped,$items) = split(/,/,$line,6);
my ( $item ) = $items =~ /(\w)/;

Actually, all my 'items' consist of two digits, a forward slash, a character and another digit, like this: 13/T1.
Oh, I guess my ESP isn't working again. Had you
provided that in your original post, you may have
received a more accurate solution. Regardless, we
don't really need to know your correct regex, you can
adjust it as needed without re-posting.
So the regex would be something like this:
$items =~ m!,(\d\d/\w\d),!;

Nope, the ',' in there avoid matching the first or last value. No need
to include the separator, in the regex, in this case.
$item = $1;
Always test that the match was successful, never blindly assign $1.

dumb example...

$str = '123';
$str =~ /(\d+)/;
$val = $1;

$str =~ /(blah)/;
$val2 = $1; #oops

Just off hand, is using a regex much faster than iterating through arrays? My script takes about 30 seconds to run and speed is not critical (I have all day to run it). What offended me was having to iterate through an array with potentially 1,000 elements just to grab the first one with a value.

You can answer that yourself using the Benchmark module.
 
C

ccc31807

sub fne { return (shift or &fne); }

I like that.

It reminds me of how I discovered to transpose a matrix in Common Lisp.

(defun transpose (matrix) (apply #'mapcar #'list matrix))

CC.
 
R

Rainer Weikusat

Rainer Weikusat said:
Rainer Weikusat said:
[...]
length() and $item = $_, last for @items;
(/\w/) and $item = $_, last for @items;

FOR iterates through the array. I have not looked at List::Util but
I will tomorrow. I wanted to avoid having to iterate through the
array, which may have 1000 items, and the first item with an actual
value may be 500 places down.

Eh ... provided you have an array with a number of empty leading
elements you want to skip but you don't know how many, how do you
propose to do that without 'looping through the array'?
[...]

sub fne { return (shift or &fne); }

NB: This will also skip over zeroes.

It will also recurse forever (until perl aborts the recursion) if the
argument list didn't contain something which is regarded as true when
used in a boolean expression. This could be fixed with the even more
byzantine

sub fne { return @_ && (shift || &fne) || undef; }
 
P

Peter J. Holzer

I have a csv file with 10K items. The header looks something like this:
CustNo,Name,FirstPaidItem,FirstOrderedItem,FirstShippedItem,Items

The first five fields are single valued. The last field (Items) has
many items. Lines may look like this:

1,Joe,a,a,a,a,b,c,d,e,f,g,h,i
2,Jane,a,a,b,a,,,,,,a,b,c,d,e
3,Jim,b,a,a,a,b,c,d,e,f,g,h,i
4,Jill,b,b,b,b,,,,,c,,b,,e,f,g,h

I parse the file like this, putting the first five values in scalars
and the remainder in the @items array:
foreach line
my ($custno,$name,$paid,$ordered,$shipped,@items) = split on the commas

The objective is to take the first item in the @items array and match
it to one (or more) of $paid, $ordered, or $shipped. I just want to
grab the first item in the @items array that has a value, IOW, skip
all the blank items (as in line 2 above) that precede the first actual
value.

Is there a quick and dirty way to do this without having to do this>

If $custno, $name, $paid, $ordered, $shipped are guaranteed to be
non-empty, you could split on /,+/ to skip empty items, and if you need
only the first, use a scalar instead of an array.

hp
 
C

ccc31807

If $custno, $name, $paid, $ordered, $shipped are guaranteed to be
non-empty, you could split on /,+/ to skip empty items, and if you need
only the first, use a scalar instead of an array.

Unfortunately,. only the first two items are guaranteed. This is the real world, and sometimes items are shipped without being ordered, sometimes items are ordered without being shipped, and strange as it may be, items are paid for without being ordered or shipped. The first thing I had to do was tovalidate the file, and each of the shipped, paid, and ordered columns had an error rate of around 20 percent. In this instance, it's troubling but not a bid deal, as the real critical piece of information is the value of thefirst item in the @items array.

CC.
 
W

Willem

ccc31807 wrote:
) I have a csv file with 10K items. The header looks something like this:
) CustNo,Name,FirstPaidItem,FirstOrderedItem,FirstShippedItem,Items
)
) The first five fields are single valued. The last field (Items) has many
) items. Lines may look like this:
)
) 1,Joe,a,a,a,a,b,c,d,e,f,g,h,i
) 2,Jane,a,a,b,a,,,,,,a,b,c,d,e
) 3,Jim,b,a,a,a,b,c,d,e,f,g,h,i
) 4,Jill,b,b,b,b,,,,,c,,b,,e,f,g,h
)
) I parse the file like this, putting the first five values in scalars and
) the remainder in the @items array:
) foreach line
) my ($custno,$name,$paid,$ordered,$shipped,@items) = split on the commas
)
) The objective is to take the first item in the @items array and match it
) to one (or more) of $paid, $ordered, or $shipped. I just want to grab the
) first item in the @items array that has a value, IOW, skip all the blank
) items (as in line 2 above) that precede the first actual value.
)
) Is there a quick and dirty way to do this without having to do this>
)
) my $item = '';
) foreach my $ele (@items)
) {
) $item = $ele if $ele =~ /\w/;
) last;
) }

I haven't seen this one yet (perhaps I missed it):

my ($custno,$name,$paid,$ordered,$shipped,@items) = split /,+/;

But I believe it's the simplest way to get what you want.

To explain: This splits on one-or-more-commas. (Split takes a regex.)


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top