Counting column delimiters per row in a text file

H

hbar

Ok, so I've got this text file which is supposed to have 6 columns per
row. However, I know that some of the rows don't have the right amount
of columns. Counting the number of delimiters in each row sounds like
the most straightforward approach, but maybe there's a slick function I
don't know about?

At any rate, my goal is to identify the row/line numbers of those rows
that don't have 6 columns.

Can anyone either help me, or at least point me in the right
directions?

Thanks!!
 
H

hbar21

Well, that seems to have got me about 98% of the way there. Thanks.
However, I still have a problem.

#!/user/bin/perl

use Text::parseWords;

my @fields;
my @data;

my $fldcnt = 15;
my $rownum = 0;
my $errCount = 0;

$file="test.txt";

open file or die "Cannot open $file for read:$!";

while(<file>)
{
$rownum++;

chomp;
@fields = quotewords("~", 0, $_);

if ($#fields != $fldcnt) {
$errCount++;
print "row $rownum is missing fields.\n";
print "\n";
}
else {
print "row $rownum is ok\n";
}
}

print "There were $errCount total errors.\n";

close file;

__END__

Well, I thought I had the problem licked....but it seems that if a
tick, quote, or slash (I'm sure there are others, but that's all my
working test turned up) appears in a field, the script returns that row
as one that does not have the correct # of delimiters. So I guess I
now have a new question. How can I make it ignore what is between the
delimiters?

Thanks again.
 
T

Tad McClellan

Wow, that was easy. Thanks!!!


That is exactly the point of compiling Frequently Asked Questions,
so that you can get the answer easily.

It all falls apart when folks do not have the courtesy to check
the FAQ before posting though.
 
A

A. Sinan Unur

(e-mail address removed) wrote in @o13g2000cwo.googlegroups.com:
Well, that seems to have got me about 98% of the way there. Thanks.
However, I still have a problem.

#!/user/bin/perl

use Text::parseWords;

I guess I should have been more obvious. I would have used Text::CSV_XS

Consider the word 'comma' in comma separated to be a placeholder for
things like pipe, semi-colon, dash etc etc.

When you construct the object, specify what you want to be interpreted
as quote and escape characters etc. Then parse (see the parse method)
your input line by line, look at the number of fields. Record the line
number if the number of fields does not match what you were expecting.

This can't be more than 20 lines or so including use strict and use
warnings.

Post a sample of your data along with the code if you run into problems.
Put the data in the __DATA__ section of your script.

As for your script:

use strict;
use warnings;

missing.
my @fields;
my @data;

No need to declare these variables in this scope.
my $fldcnt = 15;

use constant EXPECTED_FIELDS => 15;
my $rownum = 0;

Please see perldoc perlvar for the $. variable.
my $errCount = 0;

$file="test.txt";

my $file = shift;
$file ||= 'test.txt';
open file or die "Cannot open $file for read:$!";

Are you trying to take advantage of:

If EXPR is omitted, the scalar variable of the same name as
the FILEHANDLE contains the filename. (Note that lexical
variables--those declared with "my"--will not work for this
purpose; so if you're using "my", specify EXPR in your call
to open.)

I do have a feeling that this might be error prone, and I personally
prefer:

open my $fh, '<', $file or die "Cannot open $file: $!";
while(<file>)
{
$rownum++;

You want to use $. here.
chomp;
@fields = quotewords("~", 0, $_);

if ($#fields != $fldcnt) {

@fields in scalar context would return the number of elements in
@fields. That would be a more natural comparison. This also means the
constant I defined above should have been 16 rather than 15. I am going
to leave it that way, however. That is for you to fix.

Sinan.
 
T

Tore Aursand

#!/user/bin/perl

Is the directory really called 'user'? I've never seen that one before.
You're still missing these two, though;

use strict;
use warnings;
use Text::parseWords;

Don't use Text::parseWords for this task. Instead, have a look at the
excellent Text::CSV_XS module.
my @fields;
my @data;

Don't declare your variables before you actually use them!
my $fldcnt = 15;

Constants should be declared as constants;

use constant FIELDS_EXPECTED => 15;
my $rownum = 0;

No need to; Perl keeps track of that for you in the $. variable.
$file="test.txt";

No need to use double quotes here;

my $file = 'test.txt';
open file or die "Cannot open $file for read:$!";

Bad way of opening files, and I don't really think you want to do it.
Please read 'perldoc -f open' for more information;

open my $fh, '<', $file or die "$!\n";
while(<file>)
{
$rownum++;

chomp;
@fields = quotewords("~", 0, $_);

if ($#fields != $fldcnt) {
$errCount++;
print "row $rownum is missing fields.\n";
print "\n";
}
else {
print "row $rownum is ok\n";
}
}

By using Text::CSV_XS, something like this should work;

my $CSV = Text::CSV_XS->new();

while ( <$fh> ) {
my $status = $CSV->parse( $_ );
my @fields = $CSV->fields();

unless ( @fields == FIELDS_EXPECTED ) {
$errCount++;
print "Row $. is missing fields!\n";
}
else {
print "Row $. is OK!\n";
}
}
 
H

hbar21

Wow. This started as an excercise to get comfortable in perl. I was
mostly learning from a few examples I found lying around here. Clearly
not my best source of info. Thanks for all your help, even those who
took the time out of their busy day to tell me to RTFFAQ. I'll digest
this, and hopefully will not have to bother you any further.

Oh, one more question. I checked the FAQ, but I still don't know where
to find full docs on the packages like CSV_XS, and all the methods,
etc. Where can I find that stuff?

And /user/bin/perl is definitely a typo. Odd that it still ran.

Thanks again.
 
A

A. Sinan Unur

(e-mail address removed) wrote in @z14g2000cwz.googlegroups.com:
Wow. This started as an excercise to get comfortable in perl. I was
mostly learning from a few examples I found lying around here. Clearly
not my best source of info. Thanks for all your help, even those who
took the time out of their busy day to tell me to RTFFAQ. I'll digest
this, and hopefully will not have to bother you any further.

Oh, one more question. I checked the FAQ, but I still don't know where
to find full docs on the packages like CSV_XS, and all the methods,
etc. Where can I find that stuff?

perldoc perldoc
perldoc perltoc
perldoc Text::CSV_XS

In case you are using ActiveState Perl on Windows, the documentation is
also available in HTML format in the Start menu.

Finally, you are going to need to start quoting some context in your
posts. See http://groups.google.com/googlegroups/posting_style.html

Sinan.
 
E

Eric Bohlman

You should declare them *after* you use them??

;-)

"Before" and "after" do not an exhaustive partition make; we obsess over
the past and the future to the point of forgetting about the present :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top