skipping blank array items

Discussion in 'Perl Misc' started by ccc31807, Sep 6, 2013.

  1. ccc31807

    ccc31807 Guest

    I have a csv file with 10K items. The header looks something like this:
    CustNo,Name,FirstPaidItem,FirstOrderedItem,FirstShippedItem,Items

    The first five fields are single valued. The last field (Items) has many items. Lines may look like this:

    1,Joe,a,a,a,a,b,c,d,e,f,g,h,i
    2,Jane,a,a,b,a,,,,,,a,b,c,d,e
    3,Jim,b,a,a,a,b,c,d,e,f,g,h,i
    4,Jill,b,b,b,b,,,,,c,,b,,e,f,g,h

    I parse the file like this, putting the first five values in scalars and the remainder in the @items array:
    foreach line
    my ($custno,$name,$paid,$ordered,$shipped,@items) = split on the commas

    The objective is to take the first item in the @items array and match it toone (or more) of $paid, $ordered, or $shipped. I just want to grab the first item in the @items array that has a value, IOW, skip all the blank items(as in line 2 above) that precede the first actual value.

    Is there a quick and dirty way to do this without having to do this>

    my $item = '';
    foreach my $ele (@items)
    {
    $item = $ele if $ele =~ /\w/;
    last;
    }

    Thanks, CC.
     
    ccc31807, Sep 6, 2013
    #1
    1. Advertising

  2. ccc31807 <> writes:
    > I have a csv file with 10K items. The header looks something like this:
    > CustNo,Name,FirstPaidItem,FirstOrderedItem,FirstShippedItem,Items
    >
    > The first five fields are single valued. The last field (Items) has many items. Lines may look like this:
    >
    > 1,Joe,a,a,a,a,b,c,d,e,f,g,h,i
    > 2,Jane,a,a,b,a,,,,,,a,b,c,d,e
    > 3,Jim,b,a,a,a,b,c,d,e,f,g,h,i
    > 4,Jill,b,b,b,b,,,,,c,,b,,e,f,g,h
    >
    > I parse the file like this, putting the first five values in scalars and the remainder in the @items array:
    > foreach line
    > my ($custno,$name,$paid,$ordered,$shipped,@items) = split on the commas
    >
    > The objective is to take the first item in the @items array and match it to one (or more) of $paid, $ordered, or $shipped. I just want to grab the first item in the @items array that has a value, IOW, skip all the blank items (as in line 2 above) that precede the first actual value.
    >
    > Is there a quick and dirty way to do this without having to do this>
    >
    > my $item = '';
    > foreach my $ele (@items)
    > {
    > $item = $ele if $ele =~ /\w/;
    > last;
    > }


    use List::Util qw(first);

    $item = first { /\w/ } @items;
    $item = first { length() } @items;

    But this is also trivial without using any module:

    length() and $item = $_, last for @items;
    (/\w/) and $item = $_, last for @items;
     
    Rainer Weikusat, Sep 6, 2013
    #2
    1. Advertising

  3. ccc31807

    J. Gleixner Guest

    On 09/06/13 13:56, ccc31807 wrote:
    > I have a csv file with 10K items. The header looks something like this:
    > CustNo,Name,FirstPaidItem,FirstOrderedItem,FirstShippedItem,Items
    >
    > The first five fields are single valued. The last field (Items) has many items. Lines may look like this:
    >
    > 1,Joe,a,a,a,a,b,c,d,e,f,g,h,i
    > 2,Jane,a,a,b,a,,,,,,a,b,c,d,e
    > 3,Jim,b,a,a,a,b,c,d,e,f,g,h,i
    > 4,Jill,b,b,b,b,,,,,c,,b,,e,f,g,h
    >
    > I parse the file like this, putting the first five values in scalars and the remainder in the @items array:
    > foreach line
    > my ($custno,$name,$paid,$ordered,$shipped,@items) = split on the commas


    If items doesn't need to be an array...

    my ($custno,$name,$paid,$ordered,$shipped,$items) = split(/,/,$line,6);
    my ( $item ) = $items =~ /(\w)/;


    >
    > The objective is to take the first item in the @items array and match it to one (or more) of $paid, $ordered, or $shipped. I just want to grab the first item in the @items array that has a value, IOW, skip all the blank items (as in line 2 above) that precede the first actual value.
    >
    > Is there a quick and dirty way to do this without having to do this>
    >
    > my $item = '';
    > foreach my $ele (@items)
    > {
    > $item = $ele if $ele =~ /\w/;
    > last;
    > }
    >
    > Thanks, CC.
     
    J. Gleixner, Sep 6, 2013
    #3
  4. ccc31807

    ccc31807 Guest

    On Friday, September 6, 2013 3:45:42 PM UTC-4, Rainer Weikusat wrote:
    > use List::Util qw(first);
    >
    >
    >
    > $item = first { /\w/ } @items;
    > $item = first { length() } @items;
    >
    >
    >
    > But this is also trivial without using any module:
    >
    >
    >
    > length() and $item = $_, last for @items;
    > (/\w/) and $item = $_, last for @items;


    FOR iterates through the array. I have not looked at List::Util but I will tomorrow. I wanted to avoid having to iterate through the array, which may have 1000 items, and the first item with an actual value may be 500 places down.

    Thanks, CC.
     
    ccc31807, Sep 6, 2013
    #4
  5. ccc31807

    ccc31807 Guest

    On Friday, September 6, 2013 3:47:08 PM UTC-4, J. Gleixner wrote:
    > If items doesn't need to be an array...


    @items does not need to be an array, it could be a string. The database I mainly use is UniData, a multi-valued database where fields do not contain scalar values. Over the years, as a general rule, I find that using arrays rather than strings gives more consistent results overall, but there is no technical reason for using an array rather than a string.

    > my ($custno,$name,$paid,$ordered,$shipped,$items) = split(/,/,$line,6);
    > my ( $item ) = $items =~ /(\w)/;


    Actually, all my 'items' consist of two digits, a forward slash, a character and another digit, like this: 13/T1. So the regex would be something likethis:
    $items =~ m!,(\d\d/\w\d),!;
    $item = $1;

    Just off hand, is using a regex much faster than iterating through arrays? My script takes about 30 seconds to run and speed is not critical (I have all day to run it). What offended me was having to iterate through an array with potentially 1,000 elements just to grab the first one with a value.

    Thanks, CC.
     
    ccc31807, Sep 6, 2013
    #5
  6. ccc31807 <> writes:

    > On Friday, September 6, 2013 3:45:42 PM UTC-4, Rainer Weikusat wrote:
    >> use List::Util qw(first);
    >>
    >>
    >>
    >> $item = first { /\w/ } @items;
    >> $item = first { length() } @items;
    >>
    >>
    >>
    >> But this is also trivial without using any module:
    >>
    >>
    >>
    >> length() and $item = $_, last for @items;
    >> (/\w/) and $item = $_, last for @items;

    >
    > FOR iterates through the array. I have not looked at List::Util but
    > I will tomorrow. I wanted to avoid having to iterate through the
    > array, which may have 1000 items, and the first item with an actual
    > value may be 500 places down.


    Eh ... provided you have an array with a number of empty leading
    elements you want to skip but you don't know how many, how do you
    propose to do that without 'looping through the array'?
     
    Rainer Weikusat, Sep 6, 2013
    #6
  7. Rainer Weikusat <> writes:
    > ccc31807 <> writes:
    >> On Friday, September 6, 2013 3:45:42 PM UTC-4, Rainer Weikusat wrote:


    [...]

    >>> length() and $item = $_, last for @items;
    >>> (/\w/) and $item = $_, last for @items;

    >>
    >> FOR iterates through the array. I have not looked at List::Util but
    >> I will tomorrow. I wanted to avoid having to iterate through the
    >> array, which may have 1000 items, and the first item with an actual
    >> value may be 500 places down.

    >
    > Eh ... provided you have an array with a number of empty leading
    > elements you want to skip but you don't know how many, how do you
    > propose to do that without 'looping through the array'?


    Byzantinely recursive implementation:

    sub fne { return (shift or &fne); }

    NB: This will also skip over zeroes.
     
    Rainer Weikusat, Sep 6, 2013
    #7
  8. Στις 6/9/2013 23:06, ο/η ccc31807 έγÏαψε:
    > On Friday, September 6, 2013 3:45:42 PM UTC-4, Rainer Weikusat wrote:
    >> use List::Util qw(first);
    >>
    >>
    >>
    >> $item = first { /\w/ } @items;
    >> $item = first { length() } @items;
    >>
    >>
    >>
    >> But this is also trivial without using any module:
    >>
    >>
    >>
    >> length() and $item = $_, last for @items;
    >> (/\w/) and $item = $_, last for @items;

    >
    > FOR iterates through the array. I have not looked at List::Util but I will tomorrow. I wanted to avoid having to iterate through the array, which may have 1000 items, and the first item with an actual value may be 500 places down.
    >
    > Thanks, CC.
    >



    what you can do is to run the two scripts:
    a) one that building a trie index from your csv file
    b) a second that give you the answers instantly without "fors" from the
    index of of the a) script
    but since you want only one execution per day I do not if this make
    sense, also it is not quick and dirty
     
    George Mpouras, Sep 6, 2013
    #8
  9. ccc31807

    J. Gleixner Guest

    On 09/06/13 15:15, ccc31807 wrote:
    > On Friday, September 6, 2013 3:47:08 PM UTC-4, J. Gleixner wrote:
    >> If items doesn't need to be an array...

    >
    > @items does not need to be an array, it could be a string. [...]


    >> my ($custno,$name,$paid,$ordered,$shipped,$items) = split(/,/,$line,6);
    >> my ( $item ) = $items =~ /(\w)/;

    >
    > Actually, all my 'items' consist of two digits, a forward slash, a character and another digit, like this: 13/T1.

    Oh, I guess my ESP isn't working again. Had you
    provided that in your original post, you may have
    received a more accurate solution. Regardless, we
    don't really need to know your correct regex, you can
    adjust it as needed without re-posting.

    >So the regex would be something like this:
    > $items =~ m!,(\d\d/\w\d),!;


    Nope, the ',' in there avoid matching the first or last value. No need
    to include the separator, in the regex, in this case.

    > $item = $1;

    Always test that the match was successful, never blindly assign $1.

    dumb example...

    $str = '123';
    $str =~ /(\d+)/;
    $val = $1;

    $str =~ /(blah)/;
    $val2 = $1; #oops


    >
    > Just off hand, is using a regex much faster than iterating through arrays? My script takes about 30 seconds to run and speed is not critical (I have all day to run it). What offended me was having to iterate through an array with potentially 1,000 elements just to grab the first one with a value.


    You can answer that yourself using the Benchmark module.
     
    J. Gleixner, Sep 6, 2013
    #9
  10. ccc31807

    ccc31807 Guest

    On Friday, September 6, 2013 4:33:18 PM UTC-4, Rainer Weikusat wrote:
    > sub fne { return (shift or &fne); }


    I like that.

    It reminds me of how I discovered to transpose a matrix in Common Lisp.

    (defun transpose (matrix) (apply #'mapcar #'list matrix))

    CC.
     
    ccc31807, Sep 7, 2013
    #10
  11. Ben Morrow <> writes:
    > Quoth ccc31807 <>:
    >>
    >> Actually, all my 'items' consist of two digits, a forward slash, a
    >> character and another digit, like this: 13/T1. So the regex would be
    >> something like this:
    >> $items =~ m!,(\d\d/\w\d),!;
    >> $item = $1;

    >
    > my ($item) = $items =~ m!(?:^|,)(\d\d/\w\d)(?:,|$)!a;
    >
    > Add /x as desired, or use
    >
    > my $comma = qr/^|$|,/;


    $items =~ /([^,])+/

    ?
     
    Rainer Weikusat, Sep 7, 2013
    #11
  12. Rainer Weikusat <> writes:
    > Rainer Weikusat <> writes:
    >> ccc31807 <> writes:
    >>> On Friday, September 6, 2013 3:45:42 PM UTC-4, Rainer Weikusat wrote:

    >
    > [...]
    >
    >>>> length() and $item = $_, last for @items;
    >>>> (/\w/) and $item = $_, last for @items;
    >>>
    >>> FOR iterates through the array. I have not looked at List::Util but
    >>> I will tomorrow. I wanted to avoid having to iterate through the
    >>> array, which may have 1000 items, and the first item with an actual
    >>> value may be 500 places down.

    >>
    >> Eh ... provided you have an array with a number of empty leading
    >> elements you want to skip but you don't know how many, how do you
    >> propose to do that without 'looping through the array'?


    [...]

    > sub fne { return (shift or &fne); }
    >
    > NB: This will also skip over zeroes.


    It will also recurse forever (until perl aborts the recursion) if the
    argument list didn't contain something which is regarded as true when
    used in a boolean expression. This could be fixed with the even more
    byzantine

    sub fne { return @_ && (shift || &fne) || undef; }
     
    Rainer Weikusat, Sep 7, 2013
    #12
  13. On 2013-09-06 18:56, ccc31807 <> wrote:
    > I have a csv file with 10K items. The header looks something like this:
    > CustNo,Name,FirstPaidItem,FirstOrderedItem,FirstShippedItem,Items
    >
    > The first five fields are single valued. The last field (Items) has
    > many items. Lines may look like this:
    >
    > 1,Joe,a,a,a,a,b,c,d,e,f,g,h,i
    > 2,Jane,a,a,b,a,,,,,,a,b,c,d,e
    > 3,Jim,b,a,a,a,b,c,d,e,f,g,h,i
    > 4,Jill,b,b,b,b,,,,,c,,b,,e,f,g,h
    >
    > I parse the file like this, putting the first five values in scalars
    > and the remainder in the @items array:
    > foreach line
    > my ($custno,$name,$paid,$ordered,$shipped,@items) = split on the commas
    >
    > The objective is to take the first item in the @items array and match
    > it to one (or more) of $paid, $ordered, or $shipped. I just want to
    > grab the first item in the @items array that has a value, IOW, skip
    > all the blank items (as in line 2 above) that precede the first actual
    > value.
    >
    > Is there a quick and dirty way to do this without having to do this>


    If $custno, $name, $paid, $ordered, $shipped are guaranteed to be
    non-empty, you could split on /,+/ to skip empty items, and if you need
    only the first, use a scalar instead of an array.

    hp


    --
    _ | Peter J. Holzer | Fluch der elektronischen Textverarbeitung:
    |_|_) | | Man feilt solange an seinen Text um, bis
    | | | | die Satzbestandteile des Satzes nicht mehr
    __/ | http://www.hjp.at/ | zusammenpaßt. -- Ralph Babel
     
    Peter J. Holzer, Sep 8, 2013
    #13
  14. ccc31807

    ccc31807 Guest

    On Sunday, September 8, 2013 7:24:28 AM UTC-4, Peter J. Holzer wrote:
    > If $custno, $name, $paid, $ordered, $shipped are guaranteed to be
    > non-empty, you could split on /,+/ to skip empty items, and if you need
    > only the first, use a scalar instead of an array.


    Unfortunately,. only the first two items are guaranteed. This is the real world, and sometimes items are shipped without being ordered, sometimes items are ordered without being shipped, and strange as it may be, items are paid for without being ordered or shipped. The first thing I had to do was tovalidate the file, and each of the shipped, paid, and ordered columns had an error rate of around 20 percent. In this instance, it's troubling but not a bid deal, as the real critical piece of information is the value of thefirst item in the @items array.

    CC.
     
    ccc31807, Sep 8, 2013
    #14
  15. ccc31807

    Willem Guest

    ccc31807 wrote:
    ) I have a csv file with 10K items. The header looks something like this:
    ) CustNo,Name,FirstPaidItem,FirstOrderedItem,FirstShippedItem,Items
    )
    ) The first five fields are single valued. The last field (Items) has many
    ) items. Lines may look like this:
    )
    ) 1,Joe,a,a,a,a,b,c,d,e,f,g,h,i
    ) 2,Jane,a,a,b,a,,,,,,a,b,c,d,e
    ) 3,Jim,b,a,a,a,b,c,d,e,f,g,h,i
    ) 4,Jill,b,b,b,b,,,,,c,,b,,e,f,g,h
    )
    ) I parse the file like this, putting the first five values in scalars and
    ) the remainder in the @items array:
    ) foreach line
    ) my ($custno,$name,$paid,$ordered,$shipped,@items) = split on the commas
    )
    ) The objective is to take the first item in the @items array and match it
    ) to one (or more) of $paid, $ordered, or $shipped. I just want to grab the
    ) first item in the @items array that has a value, IOW, skip all the blank
    ) items (as in line 2 above) that precede the first actual value.
    )
    ) Is there a quick and dirty way to do this without having to do this>
    )
    ) my $item = '';
    ) foreach my $ele (@items)
    ) {
    ) $item = $ele if $ele =~ /\w/;
    ) last;
    ) }

    I haven't seen this one yet (perhaps I missed it):

    my ($custno,$name,$paid,$ordered,$shipped,@items) = split /,+/;

    But I believe it's the simplest way to get what you want.

    To explain: This splits on one-or-more-commas. (Split takes a regex.)


    SaSW, Willem
    --
    Disclaimer: I am in no way responsible for any of the statements
    made in the above text. For all I know I might be
    drugged or something..
    No I'm not paranoid. You all think I'm paranoid, don't you !
    #EOT
     
    Willem, Sep 8, 2013
    #15
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Ken
    Replies:
    9
    Views:
    411
    Karl Heinz Buchegger
    May 24, 2004
  2. TS
    Replies:
    3
    Views:
    358
    =?Utf-8?B?VFM=?=
    Oct 6, 2006
  3. Québec

    for loop skipping items

    Québec, Jul 30, 2004, in forum: C Programming
    Replies:
    7
    Views:
    391
    Keith Thompson
    Jul 31, 2004
  4. Chris R.
    Replies:
    3
    Views:
    145
    Adam Prescott
    Jan 28, 2011
  5. Tad McClellan
    Replies:
    3
    Views:
    160
    Edward Wijaya
    May 13, 2004
Loading...

Share This Page