Splitting filenames to extract a string

N

nj_perl_newbie

I am trying to extract the first string from a filename that is
delimitted by "."s. I want this sub routine to read the contents of a
directory and define the first field as a variable.

a directory with contents:

joe.2004039201.xl.b.csv
judy.200312123.x5.b.csv

would return joe as username and then judy as a username

this is what I have thusfar:

opendir(DIR,"<$dir") or die "Error $! opening directory.";
my(@filenames) = readdir(DIR);

my ($first,undef,undef,undef,undef) = split /\./, $_;
$username = $first;
&log_entry($logfh, "Username is: $username") if $debug;
close(DIR);

I am guessing that I need a while or something in line three to tell
it to split @filenames....but I am lost so that's why I am
posting...any help would be appreciated.
 
J

Josef Möllers

nj_perl_newbie said:
I am trying to extract the first string from a filename that is
delimitted by "."s. I want this sub routine to read the contents of a
directory and define the first field as a variable.

a directory with contents:

joe.2004039201.xl.b.csv
judy.200312123.x5.b.csv

would return joe as username and then judy as a username

this is what I have thusfar:

opendir(DIR,"<$dir") or die "Error $! opening directory.";
my(@filenames) = readdir(DIR);

my ($first,undef,undef,undef,undef) = split /\./, $_;
$username = $first;
&log_entry($logfh, "Username is: $username") if $debug;
close(DIR);

I am guessing that I need a while or something in line three to tell
it to split @filenames....but I am lost so that's why I am
posting...any help would be appreciated.

1. You don't specify "<$dir", but rather "$dir" (you will have a hard
time doing anything else but reading from a directory).
2. the while loop might be "while (my $filename = readdir(DIR))", then
you'll have the entry name in $filename.

OTOH you can also
foreach (@filenames) {
my ($first,undef) = split(/\./, $_, 2);

Josef
 
T

thumb_42

nj_perl_newbie said:
I am trying to extract the first string from a filename that is
delimitted by "."s. I want this sub routine to read the contents of a
directory and define the first field as a variable.

a directory with contents:

joe.2004039201.xl.b.csv
judy.200312123.x5.b.csv

would return joe as username and then judy as a username

this is what I have thusfar:

opendir(DIR,"<$dir") or die "Error $! opening directory.";
my(@filenames) = readdir(DIR);

my ($first,undef,undef,undef,undef) = split /\./, $_;
$username = $first;
&log_entry($logfh, "Username is: $username") if $debug;
close(DIR);

I am guessing that I need a while or something in line three to tell
it to split @filenames....but I am lost so that's why I am
posting...any help would be appreciated.

Note: I don't have a directory funn of those types of filenames, but
generally this is how I'd do it:

opendir(DIR,$dir);
while( $filename = readdir(DIR)){

#...............
# Check for files beginning with ^\. etc.. here otherwise
# you'll get empty entries . .. , also might be wise to check for
# /\.csv$/ for "Only *.csv files"
#...........

$filename =~ /^([^\.]+)\./;
print "Uid: $1\n";
}
closedir(DIR);


You could slurp them into an array too, but with a large file listing, it
could get to be a bit much, since you're doing a loop anyway, might as well
do them one at a time?

Jamie
 
A

Anno Siegel

Josef Möllers said:
OTOH you can also
foreach (@filenames) {
my ($first,undef) = split(/\./, $_, 2);

The trailing "undef" is unnecessary. It's still an improvement over
OP's code, which had four of them :)

Anno
 
P

Paul Lalli

1. You don't specify "<$dir", but rather "$dir" (you will have a hard
time doing anything else but reading from a directory).
2. the while loop might be "while (my $filename = readdir(DIR))", then
you'll have the entry name in $filename.

OTOH you can also
foreach (@filenames) {
my ($first,undef) = split(/\./, $_, 2);


Is there any gain to using split instead of a pattern match? Why mess
with undefs instead of:

my ($username) = /(.*)\./;

(of course, $username should be checked to make sure the pattern match
succeeded...)

Paul Lalli
 
A

Anno Siegel

nj_perl_newbie <[email protected]> wrote:

[extracting the first dot-separated portion of a file name]
opendir(DIR,$dir);
while( $filename = readdir(DIR)){

#...............
# Check for files beginning with ^\. etc.. here otherwise
# you'll get empty entries . .. , also might be wise to check for
# /\.csv$/ for "Only *.csv files"
#...........

$filename =~ /^([^\.]+)\./;
^
"." isn't special in a character class, no need to escape it.
print "Uid: $1\n";

Before you use $1 and friends you *must* make sure that the pattern
matched. Otherwise $1 may contain garbage from an earlier match. The
trailing dot is arguably also unneeded. The match must stop at a dot
anyway. You're only excluding filenames that don't have a dot.

Also, what is printed here is a username, not the (numeric) uid.
}
closedir(DIR);

Further, I wouldn't bother with naming the loop variable (what else could
readdir() produce but filenames?). That simplifies the match operation.
Instead I'd name the extracted part, whose meaning can't be otherwise
deduced. So (untested):

while ( readdir( DIR ) ) {
next unless my ( $username) =~ /^([^.]+)/;
print "User: $username\n";
}

You could slurp them into an array too, but with a large file listing, it
could get to be a bit much, since you're doing a loop anyway, might as well
do them one at a time?

Quite. Slurping the directory content has no advantage.

Anno
 
A

Anno Siegel

Paul Lalli said:
Is there any gain to using split instead of a pattern match? Why mess
with undefs instead of:

my ($username) = /(.*)\./;

Watchit, ".*" is greedy. You want /(.*?)\./.
(of course, $username should be checked to make sure the pattern match
succeeded...)

In general, it is safer to check the match itself instead of the result
of captures. Captures can legally be "" or undef, even if the pattern
matched.

Anno
 
D

David K. Wall

Paul Lalli said:
[snip]

my ($username) = /(.*)\./;

I think you mean

/(.*?)\./

Without the ? it will capture 'joe.2004039201.xl.b'.
(of course, $username should be checked to make sure the pattern match
succeeded...)

It's just my personal preference, but I'd probably write the regex as

my ($username) = /^([^.]+)/ or die;

I've never much liked .* in regexes, it just seems too vague most of the
time.

The 'or die' bit is an idiom I picked up here in c.l.p.m recently, as a
shorthand for "I expect this regex to ALWAYS match (but tell me if by some
strange chance it doesn't)". It might possibly be applicable to the OP's
situation.
 
P

Paul Lalli

Paul Lalli said:
nj_perl_newbie wrote:

I am trying to extract the first string from a filename that is
delimitted by "."s. I want this sub routine to read the contents of a
directory and define the first field as a variable.

a directory with contents:

joe.2004039201.xl.b.csv
judy.200312123.x5.b.csv

would return joe as username and then judy as a username
[snip]

my ($username) = /(.*)\./;

I think you mean

/(.*?)\./

Without the ? it will capture 'joe.2004039201.xl.b'.

Yes, of course I did indeed forget the ?. My mistake...
(of course, $username should be checked to make sure the pattern match
succeeded...)

It's just my personal preference, but I'd probably write the regex as

my ($username) = /^([^.]+)/ or die;

I've never much liked .* in regexes, it just seems too vague most of the
time.

Personal preference, I suppose. .* in my opinion is just as vague as it
should be. It literally means "any amount of anything", which is exactly
what the OP wanted.
 
T

Tad McClellan

generally this is how I'd do it:

opendir(DIR,$dir);


Using the correct directory name is a move in the right direction.

Removing the checking of the return value to ensure that you actually
got what you asked for is a move in the wrong direction.

$filename =~ /^([^\.]+)\./;
print "Uid: $1\n";


Using the dollar-digit variables without first ensuring that
the match _succeeded_ is a big move in the wrong direction.
 
T

Tad McClellan

Josef Möllers said:
nj_perl_newbie wrote:
1. You don't specify "<$dir", but rather "$dir" (you will have a hard


You don't specify "$dir", but rather $dir (without the useless
use of double quotes, as in the FAQ).

my ($first,undef) = split(/\./, $_, 2);


my($first) = split /\./;


Will do the same thing.
 
T

Tad McClellan

Paul Lalli said:
Is there any gain to using split instead of a pattern match?


I usually go with "Randal's rule":

Use split() when you want to say what to throw away.

Use m// in list context when you want to say what to keep.
 
B

Ben Morrow

The trailing "undef" is unnecessary. It's still an improvement over
OP's code, which had four of them :)

As is the count: if split is assigned to a list of variables Perl will
supply a count of one more than the number of variables in the list.

my ($first) = split /\./;

Ben
 
N

nj_perl_newbie

Thanks for all of the suggestions...I went with what seemed the most simplistic...

I think that I might need something in line 145 like:

my @filenames = grep { $_ ne '..' and $_ ne '.' and -d $_ } readdir DIR;

so that I don't get this:

2004-02-24 15:25:53 * Now getting username
2004-02-24 15:25:53 * Username is: for .
2004-02-24 15:25:53 * Username is: for ..



sub get_username {
141
142 &log_entry($logfh, "Now getting username") if $debug;
143
144 opendir(DIR,"$dir") or die "Error $! opening directory.";
145 my(@filenames) = readdir(DIR);
146 foreach (@filenames) {
147 my ($first,undef) = split(/\./, $_ );
148 my $username = $first;
149 &log_entry($logfh, "Username is: $username for $_ ") if $debug;
150 }
151 close(DIR);
152 }
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,144
Latest member
KetoBaseReviews
Top