split problem

Discussion in 'Perl Misc' started by gabkin, Sep 20, 2004.

  1. gabkin

    gabkin Guest

    I am having a problem with the split function.
    Here is the sub that it is used in, it should illustrate what I'm
    doing, criticism is welcomed...


    <PERL SUB>
    sub parseLine()
    {
    #this parses a line which will be in a similar format to this
    #"0010230" "Book of the Dead" "Yendor books"
    #(tab delimited, escaped by quotes)
    #it will take as an argument the column headers and the string to
    parse
    #it will return a hash,using the columnheader as the key
    #and the column data as the element
    my $ParseMe = $_[0];
    my @ColumnHeaders = $_[1..@_];
    my %returnData;
    chop($ParseMe);
    my @Columns = split(/\t/,$ParseMe);
    #my $size=@Columns;print("Size = ",$size,"\n");
    for(my $i=0;$i<@Columns;$i++) {
    $Columns[$i] =~ s/\"//g; # remove extraneous quotes
    #print($ColumnHeaders[$i],"\t",$Columns[$i],"\n");
    $returnData{$ColumnHeaders[$i]} = $Columns[$i];
    }
    return %returnData;
    }
    </PERL SUB>
    (Sorry about the awful two-space indentation, but google seems to strip
    out tabs)

    A problem has arisen in that in one example, the last four columns are
    blank (i.e. null) they're there, theres just nothing in them. For these
    last four, the split function seems to discard them. I checked this
    with the aid of the commented out lines.

    Is there a way to force split to not lose the blank columns?

    Or would I have to 're-invent' the split algorithm so as to keep them?
    Any help would be greatly appreciated...
     
    gabkin, Sep 20, 2004
    #1
    1. Advertising

  2. gabkin

    Paul Lalli Guest

    "gabkin" <> wrote in message
    news:...
    > I am having a problem with the split function.


    Did you consider reading the documentation for the function you're
    having problems with?

    <snipped a bunch of poorly formatted code>

    > A problem has arisen in that in one example, the last four columns are
    > blank (i.e. null) they're there, theres just nothing in them. For

    these
    > last four, the split function seems to discard them. I checked this
    > with the aid of the commented out lines.
    >
    > Is there a way to force split to not lose the blank columns?


    perldoc -f split
    4th paragraph.

    Paul Lalli
     
    Paul Lalli, Sep 20, 2004
    #2
    1. Advertising

  3. gabkin

    thundergnat Guest

    gabkin wrote:
    > I am having a problem with the split function.
    > Here is the sub that it is used in, it should illustrate what I'm
    > doing, criticism is welcomed...
    >

    [snip]
    > A problem has arisen in that in one example, the last four columns are
    > blank (i.e. null) they're there, theres just nothing in them. For these
    > last four, the split function seems to discard them. I checked this
    > with the aid of the commented out lines.
    >
    > Is there a way to force split to not lose the blank columns?
    >
    > Or would I have to 're-invent' the split algorithm so as to keep them?
    > Any help would be greatly appreciated...
    >


    Did you read the docs for split? (Really. Not being sarcastic.)

    Seems like you are looking for the Limit option on split.

    Since you know how many cloumns you are looking for, specify that.
     
    thundergnat, Sep 20, 2004
    #3
  4. gabkin wrote:
    > I am having a problem with the split function.
    > Here is the sub that it is used in, it should illustrate what I'm
    > doing, criticism is welcomed...

    ^^^^^^^^^^^^^^^^^^^^^
    Ok, you asked for it. :)


    > <PERL SUB>
    > sub parseLine()
    > {
    > #this parses a line which will be in a similar format to this
    > #"0010230" "Book of the Dead" "Yendor books"
    > #(tab delimited, escaped by quotes)
    > #it will take as an argument the column headers and the string to
    > parse
    > #it will return a hash,using the columnheader as the key
    > #and the column data as the element
    > my $ParseMe = $_[0];
    > my @ColumnHeaders = $_[1..@_];

    ^^^^^^^^^
    That is wrong. The '$' at the beginning denotes a scalar value so you are
    assigning a single value from the @_ array to the @ColumnHeaders array. And
    even if you had used a proper array slice, you are accessing an extra element
    at the end of the array that does not exist.

    $ perl -le'@x="a".."f"; print @x . " @x"; @y = @x[1..@x]; print @y . " @y"'
    6 a b c d e f
    6 b c d e f

    The correct syntax is:

    my @ColumnHeaders = @_[ 1 .. $#_ ];

    However the usual way to do that is:

    my ( $ParseMe, @ColumnHeaders ) = @_;

    Or if you really want to do it on two lines:

    my $ParseMe = shift;
    my @ColumnHeaders = @_;


    > my %returnData;
    > chop($ParseMe);


    chop() isn't really used very much anymore. You should use chomp() unless you
    have a valid reason not to.


    > my @Columns = split(/\t/,$ParseMe);


    As others have pointed out, use the third argument to split().

    my @Columns = split /\t/,$ParseMe, -1;


    > #my $size=@Columns;print("Size = ",$size,"\n");
    > for(my $i=0;$i<@Columns;$i++) {


    That is usually written as:

    for my $i ( 0 .. $#Columns ) {


    > $Columns[$i] =~ s/\"//g; # remove extraneous quotes


    Double quote characters don't have to be escaped in regular expressions.


    > #print($ColumnHeaders[$i],"\t",$Columns[$i],"\n");
    > $returnData{$ColumnHeaders[$i]} = $Columns[$i];
    > }
    > return %returnData;
    > }
    > </PERL SUB>




    John
    --
    use Perl;
    program
    fulfillment
     
    John W. Krahn, Sep 21, 2004
    #4
  5. gabkin

    Gabkin Guest

    John W. Krahn wrote:
    > gabkin wrote:
    >
    >> I am having a problem with the split function.
    >> Here is the sub that it is used in, it should illustrate what I'm
    >> doing, criticism is welcomed...

    >
    > ^^^^^^^^^^^^^^^^^^^^^
    > Ok, you asked for it. :)


    I welcome criticism because I know I am new to perl and am probably
    carrying over mistakes from other languages (Java,VB,COBOL) into my perl
    writing.

    >> my @ColumnHeaders = $_[1..@_];

    >
    > ^^^^^^^^^
    > That is wrong. The '$' at the beginning denotes a scalar value so you
    > are assigning a single value from the @_ array to the @ColumnHeaders
    > array. And even if you had used a proper array slice, you are accessing
    > an extra element at the end of the array that does not exist.
    >
    > $ perl -le'@x="a".."f"; print @x . " @x"; @y = @x[1..@x]; print @y . "
    > @y"'
    > 6 a b c d e f
    > 6 b c d e f
    >
    > The correct syntax is:
    >
    > my @ColumnHeaders = @_[ 1 .. $#_ ];
    >
    > However the usual way to do that is:
    >
    > my ( $ParseMe, @ColumnHeaders ) = @_;
    >
    > Or if you really want to do it on two lines:
    >
    > my $ParseMe = shift;
    > my @ColumnHeaders = @_;


    Uh, Thanks. I'm still trying to understand all of this but I have
    implemented the much cleaner single-line assignment. I have actually
    seen this before and thus should know about it. Thanks!

    I still have a lot of trouble with all of the 'magic' variables (like
    $#) and 'shift', it may be because I have never used C...

    >> my %returnData;
    >> chop($ParseMe);

    >
    > chop() isn't really used very much anymore. You should use chomp()
    > unless you have a valid reason not to.


    It's more a case of I started using chop from the start and it works, so
    I haven't changed it, I will try to use 'chomp' over 'chop' though.

    >
    >> my @Columns = split(/\t/,$ParseMe);

    >
    > As others have pointed out, use the third argument to split().


    Yes, I have found that out.
    I now use this instead..
    my @Columns = split(/\t/,$ParseMe,@ColumnHeaders);

    >> for(my $i=0;$i<@Columns;$i++) {

    >
    > That is usually written as:
    >
    > for my $i ( 0 .. $#Columns ) {


    I have never seen that before, it looks quite handy.
    I am not too familiar with the $# usage yet, so I will go look it up now.

    >> $Columns[$i] =~ s/\"//g; # remove extraneous quotes

    >
    > Double quote characters don't have to be escaped in regular expressions.


    I tend to err on the side of caution with regexes, due to their
    inconsistent handling between perl,sed,grep and vi (and probably others
    too).
    Thanks though, duly noted.

    > John


    Thanks for these little tips!

    I would love to get my entire program inspected and criticized like
    this, but I feel I might be amiss to post the entire thing (1252 lines
    in the main program, and 113 lines in the 'data verification' program),
    because I know of at least one major algorithm that I did wrong, I used
    a hash where I should have used a string.
     
    Gabkin, Sep 21, 2004
    #5
  6. gabkin

    Gabkin Guest

    thundergnat wrote:

    > gabkin wrote:
    >> Is there a way to force split to not lose the blank columns?
    >>
    >> Or would I have to 're-invent' the split algorithm so as to keep them?
    >> Any help would be greatly appreciated...
    >>

    >
    > Did you read the docs for split? (Really. Not being sarcastic.)
    >
    > Seems like you are looking for the Limit option on split.
    >
    > Since you know how many cloumns you are looking for, specify that.


    You are quite right, I did not read the help for split before posting this!

    I apologize, since it has answered my question perfectly...
     
    Gabkin, Sep 21, 2004
    #6
  7. Gabkin wrote:
    > John W. Krahn wrote:
    >>
    >> The correct syntax is:
    >>
    >> my @ColumnHeaders = @_[ 1 .. $#_ ];
    >>
    >> However the usual way to do that is:
    >>
    >> my ( $ParseMe, @ColumnHeaders ) = @_;
    >>
    >> Or if you really want to do it on two lines:
    >>
    >> my $ParseMe = shift;
    >> my @ColumnHeaders = @_;

    >
    > Uh, Thanks. I'm still trying to understand all of this but I have
    > implemented the much cleaner single-line assignment. I have actually
    > seen this before and thus should know about it. Thanks!
    >
    > I still have a lot of trouble with all of the 'magic' variables (like
    > $#) and 'shift', it may be because I have never used C...


    Don't confuse the magic variable $# (which is deprecated)

    perldoc perlvar

    with the index of the last element in an array

    perldoc perldata


    >>> my %returnData;
    >>> chop($ParseMe);

    >>
    >> chop() isn't really used very much anymore. You should use chomp()
    >> unless you have a valid reason not to.

    >
    > It's more a case of I started using chop from the start and it works, so
    > I haven't changed it, I will try to use 'chomp' over 'chop' though.


    chop() will always remove and return the last character in a string while
    chomp() will remove the value of $/ if it is at the end of the string.

    perldoc -f chop
    perldoc -f chomp


    John
    --
    use Perl;
    program
    fulfillment
     
    John W. Krahn, Sep 21, 2004
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    2
    Views:
    491
  2. Carlos Ribeiro
    Replies:
    11
    Views:
    737
    Alex Martelli
    Sep 17, 2004
  3. trans.  (T. Onoma)

    split on '' (and another for split -1)

    trans. (T. Onoma), Dec 27, 2004, in forum: Ruby
    Replies:
    10
    Views:
    237
    Florian Gross
    Dec 28, 2004
  4. Sam Kong
    Replies:
    5
    Views:
    276
    Rick DeNatale
    Aug 12, 2006
  5. Stanley Xu
    Replies:
    2
    Views:
    707
    Stanley Xu
    Mar 23, 2011
Loading...

Share This Page