Counting column delimiters per row in a text file

Discussion in 'Perl Misc' started by hbar, Feb 8, 2005.

  1. hbar

    hbar Guest

    Ok, so I've got this text file which is supposed to have 6 columns per
    row. However, I know that some of the rows don't have the right amount
    of columns. Counting the number of delimiters in each row sounds like
    the most straightforward approach, but maybe there's a slick function I
    don't know about?

    At any rate, my goal is to identify the row/line numbers of those rows
    that don't have 6 columns.

    Can anyone either help me, or at least point me in the right
    directions?

    Thanks!!
    hbar, Feb 8, 2005
    #1
    1. Advertising

  2. hbar

    Guest

    A. Sinan Unur wrote:
    > "hbar" <> wrote in news:1107885124.542726.326620
    > @l41g2000cwc.googlegroups.com:
    >
    > > Ok, so I've got this text file which is supposed to have 6 columns

    per
    > > row.

    >
    > ...
    >
    > > Can anyone either help me, or at least point me in the right
    > > directions?

    >
    > perldoc -q delimited
    >
    > Sinan


    Wow, that was easy. Thanks!!!
    , Feb 8, 2005
    #2
    1. Advertising

  3. hbar

    Guest

    Well, that seems to have got me about 98% of the way there. Thanks.
    However, I still have a problem.

    #!/user/bin/perl

    use Text::parseWords;

    my @fields;
    my @data;

    my $fldcnt = 15;
    my $rownum = 0;
    my $errCount = 0;

    $file="test.txt";

    open file or die "Cannot open $file for read:$!";

    while(<file>)
    {
    $rownum++;

    chomp;
    @fields = quotewords("~", 0, $_);

    if ($#fields != $fldcnt) {
    $errCount++;
    print "row $rownum is missing fields.\n";
    print "\n";
    }
    else {
    print "row $rownum is ok\n";
    }
    }

    print "There were $errCount total errors.\n";

    close file;

    __END__

    Well, I thought I had the problem licked....but it seems that if a
    tick, quote, or slash (I'm sure there are others, but that's all my
    working test turned up) appears in a field, the script returns that row
    as one that does not have the correct # of delimiters. So I guess I
    now have a new question. How can I make it ignore what is between the
    delimiters?

    Thanks again.
    , Feb 8, 2005
    #3
  4. <> wrote:
    > A. Sinan Unur wrote:
    >> "hbar" <> wrote in news:1107885124.542726.326620
    >> @l41g2000cwc.googlegroups.com:


    >> > Can anyone either help me, or at least point me in the right
    >> > directions?

    >>
    >> perldoc -q delimited



    > Wow, that was easy. Thanks!!!



    That is exactly the point of compiling Frequently Asked Questions,
    so that you can get the answer easily.

    It all falls apart when folks do not have the courtesy to check
    the FAQ before posting though.


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
    Tad McClellan, Feb 8, 2005
    #4
  5. wrote in news:1107896566.817073.315850
    @o13g2000cwo.googlegroups.com:

    > Well, that seems to have got me about 98% of the way there. Thanks.
    > However, I still have a problem.
    >
    > #!/user/bin/perl
    >
    > use Text::parseWords;


    I guess I should have been more obvious. I would have used Text::CSV_XS

    Consider the word 'comma' in comma separated to be a placeholder for
    things like pipe, semi-colon, dash etc etc.

    When you construct the object, specify what you want to be interpreted
    as quote and escape characters etc. Then parse (see the parse method)
    your input line by line, look at the number of fields. Record the line
    number if the number of fields does not match what you were expecting.

    This can't be more than 20 lines or so including use strict and use
    warnings.

    Post a sample of your data along with the code if you run into problems.
    Put the data in the __DATA__ section of your script.

    As for your script:

    use strict;
    use warnings;

    missing.

    > my @fields;
    > my @data;


    No need to declare these variables in this scope.

    > my $fldcnt = 15;


    use constant EXPECTED_FIELDS => 15;

    > my $rownum = 0;


    Please see perldoc perlvar for the $. variable.

    > my $errCount = 0;
    >
    > $file="test.txt";


    my $file = shift;
    $file ||= 'test.txt';

    > open file or die "Cannot open $file for read:$!";


    Are you trying to take advantage of:

    If EXPR is omitted, the scalar variable of the same name as
    the FILEHANDLE contains the filename. (Note that lexical
    variables--those declared with "my"--will not work for this
    purpose; so if you're using "my", specify EXPR in your call
    to open.)

    I do have a feeling that this might be error prone, and I personally
    prefer:

    open my $fh, '<', $file or die "Cannot open $file: $!";

    > while(<file>)
    > {
    > $rownum++;


    You want to use $. here.

    >
    > chomp;
    > @fields = quotewords("~", 0, $_);
    >
    > if ($#fields != $fldcnt) {


    @fields in scalar context would return the number of elements in
    @fields. That would be a more natural comparison. This also means the
    constant I defined above should have been 16 rather than 15. I am going
    to leave it that way, however. That is for you to fix.

    Sinan.
    A. Sinan Unur, Feb 8, 2005
    #5
  6. hbar

    Tore Aursand Guest

    wrote:
    > #!/user/bin/perl


    Is the directory really called 'user'? I've never seen that one before.
    You're still missing these two, though;

    use strict;
    use warnings;

    > use Text::parseWords;


    Don't use Text::parseWords for this task. Instead, have a look at the
    excellent Text::CSV_XS module.

    > my @fields;
    > my @data;


    Don't declare your variables before you actually use them!

    > my $fldcnt = 15;


    Constants should be declared as constants;

    use constant FIELDS_EXPECTED => 15;

    > my $rownum = 0;


    No need to; Perl keeps track of that for you in the $. variable.

    > $file="test.txt";


    No need to use double quotes here;

    my $file = 'test.txt';

    > open file or die "Cannot open $file for read:$!";


    Bad way of opening files, and I don't really think you want to do it.
    Please read 'perldoc -f open' for more information;

    open my $fh, '<', $file or die "$!\n";

    > while(<file>)
    > {
    > $rownum++;
    >
    > chomp;
    > @fields = quotewords("~", 0, $_);
    >
    > if ($#fields != $fldcnt) {
    > $errCount++;
    > print "row $rownum is missing fields.\n";
    > print "\n";
    > }
    > else {
    > print "row $rownum is ok\n";
    > }
    > }


    By using Text::CSV_XS, something like this should work;

    my $CSV = Text::CSV_XS->new();

    while ( <$fh> ) {
    my $status = $CSV->parse( $_ );
    my @fields = $CSV->fields();

    unless ( @fields == FIELDS_EXPECTED ) {
    $errCount++;
    print "Row $. is missing fields!\n";
    }
    else {
    print "Row $. is OK!\n";
    }
    }


    --
    Tore Aursand <>
    "There are three kinds of lies: lies, damn lies, and statistics."
    (Benjamin Disraeli)
    Tore Aursand, Feb 8, 2005
    #6
  7. hbar

    Guest

    Wow. This started as an excercise to get comfortable in perl. I was
    mostly learning from a few examples I found lying around here. Clearly
    not my best source of info. Thanks for all your help, even those who
    took the time out of their busy day to tell me to RTFFAQ. I'll digest
    this, and hopefully will not have to bother you any further.

    Oh, one more question. I checked the FAQ, but I still don't know where
    to find full docs on the packages like CSV_XS, and all the methods,
    etc. Where can I find that stuff?

    And /user/bin/perl is definitely a typo. Odd that it still ran.

    Thanks again.
    , Feb 8, 2005
    #7
  8. wrote in news:1107902373.447820.234170
    @z14g2000cwz.googlegroups.com:

    > Wow. This started as an excercise to get comfortable in perl. I was
    > mostly learning from a few examples I found lying around here. Clearly
    > not my best source of info. Thanks for all your help, even those who
    > took the time out of their busy day to tell me to RTFFAQ. I'll digest
    > this, and hopefully will not have to bother you any further.
    >
    > Oh, one more question. I checked the FAQ, but I still don't know where
    > to find full docs on the packages like CSV_XS, and all the methods,
    > etc. Where can I find that stuff?


    perldoc perldoc
    perldoc perltoc
    perldoc Text::CSV_XS

    In case you are using ActiveState Perl on Windows, the documentation is
    also available in HTML format in the Start menu.

    Finally, you are going to need to start quoting some context in your
    posts. See http://groups.google.com/googlegroups/posting_style.html

    Sinan.
    A. Sinan Unur, Feb 8, 2005
    #8
  9. hbar

    Eric Bohlman Guest

    Dave Weaver <> wrote in
    news:420b162f$0$4090$:

    > On Tue, 08 Feb 2005 23:20:15 +0100, Tore Aursand <>
    > wrote:
    >>
    >> > my @fields;
    >> > my @data;

    >>
    >> Don't declare your variables before you actually use them!
    >>

    >
    > You should declare them *after* you use them??
    >
    > ;-)


    "Before" and "after" do not an exhaustive partition make; we obsess over
    the past and the future to the point of forgetting about the present :)
    Eric Bohlman, Feb 10, 2005
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?S01aX3N0YXRl?=

    Quick one - Is SESSION per browser instance or per IP Address?

    =?Utf-8?B?S01aX3N0YXRl?=, Apr 4, 2006, in forum: ASP .Net
    Replies:
    7
    Views:
    5,882
    gerry
    Apr 10, 2006
  2. RobE
    Replies:
    2
    Views:
    2,019
    =?ISO-8859-1?Q?J=F6rg_Marti?=
    Aug 12, 2003
  3. flarosa
    Replies:
    6
    Views:
    11,955
    flarosa
    Apr 11, 2006
  4. ethem
    Replies:
    1
    Views:
    1,188
    Ashok Kunwar Singh
    Mar 2, 2011
  5. Randy Kramer
    Replies:
    2
    Views:
    394
    Randy Kramer
    Jan 12, 2007
Loading...

Share This Page