n00b needs help pls.

K

Koncept

Sorry for asking, but I give up on this situation. I am a total n00b at
Perl and have only used it seriously for about 1 week now. I would
really appreciate somebody's help here because I am really feeling
stuck.

** THIS IS NOT A SPAM LIST FIRST AND FOREMOST **
I have a list of email accounts. Many of the email accounts have are
from the same domain.

I want to seperate this huge list into seperate lists, where each list
only contains one address from each domain.

Example:

If my addresses in the source file are as follows:

bill at one.com
jane at one.com
frank at two.com
ted at one.com
jess at three.com

My first run should return:
--------------------
bill at one.com
frank at two.com
jess at three.com

2nd run:
------
jane at one.com

3rd run:
------
ted at one.com

I came up with this to grep unique domains once, but the problem is
that the script only runs once.

#!/usr/bin/perl -w

sub div() { "+","-" x 50, "+\n"; }

die "Usage: $0 emailList" if (@ARGV!=1);

open( EML, $ARGV[0] ) || die "Can't open file : $!\n";

while(<EML>){
 chomp;
 push(@addys, $_) if $_ =~
/^[a-zA-Z0-9_\.\-]+\@[a-zA-Z0-9\-]+\.[a-zA-Z0-9\-\.]+$/;
}

close( EML );

foreach $email( @addys ) {
 @parts = split( "@", $email );
 $domain = $parts[1];
 unless( $seen{$domain} ) {
   push( @users, $email );
   $seen{$domain} = 1;
 }
}

if(@users>0){
 print &div, "The following are uniq users per domain:\n", &div;
 print join( "\n", sort( @users ) ), "\n", &div;
} else {
 print "Sorry. I could not find any email addresses.\n";
}
 
K

Koncept

Titus A said:
As a follow up surely you mean (e-mail address removed), (e-mail address removed) (e-mail address removed)
rather than (e-mail address removed) and b @y.co.uk etc...

1) re: emails => Yes. I just didn't want to make active links.
2) re: excel => No. Don't intend to use microsoft products. I would
really like to know how to do this in Perl. This is why I posted here.
Thanks for your advice though.
 
B

Bob Walton

Koncept wrote:

....
I want to seperate this huge list into seperate lists, where each list
only contains one address from each domain.

Example:

If my addresses in the source file are as follows:

bill at one.com
jane at one.com
frank at two.com
ted at one.com
jess at three.com

My first run should return:
--------------------
bill at one.com
frank at two.com
jess at three.com

2nd run:
------
jane at one.com

3rd run:
------
ted at one.com

I came up with this to grep unique domains once, but the problem is
that the script only runs once.


I added some commentary:

#!/usr/bin/perl -w


use strict; #let Perl help you all it can
use warnings; #preferable to -w switch

sub div() { "+","-" x 50, "+\n"; }
die "Usage: $0 emailList" if (@ARGV!=1);

open( EML, $ARGV[0] ) || die "Can't open file : $!\n";

----------------------------------------------------^^
That suppresses the line number of the error. Usually you don't want that.

my @addys; #with strict, you need to declare variables prior to use.

my %seen;

my @users;

while(<EML>){
chomp;
push(@addys, $_) if $_ =~
/^[a-zA-Z0-9_\.\-]+\@[a-zA-Z0-9\-]+\.[a-zA-Z0-9\-\.]+$/;

---------------^-^---^-----------^---------------^-^
It is not necessary to escape the indicated characters. It makes your
program harder to read and understand when unnecessary quoting is used.

}

close( EML );

foreach $email( @addys ) {
my--------^


@parts = split( "@", $email );

--------------------^-^
split takes a pattern. It is better written as:

my @parts = split( /@/, $email );

$domain = $parts[1];
my--^


unless( $seen{$domain} ) {
push( @users, $email );
$seen{$domain} = 1;
}
}

if(@users>0){
print &div, "The following are uniq users per domain:\n", &div;
print join( "\n", sort( @users ) ), "\n", &div;
} else {
print "Sorry. I could not find any email addresses.\n";
}

Well, it doesn't look like you really have much of a Perl problem, just
a minor logistics problem. I take it you want to run the program
multiple times, and want to have a list of email addresses, just one per
domain, come out each time. In order to do that you will need to save
what remains of the list of emails each time you run your program. It
would be convenient to save it back to the same file, providing you
don't need that file later. If you don't want to destroy the file, then
have the program copy it to a temporary file, and have the program
automatically take from the temporary file if it exists (and delete the
temporary file if it ends up empty). So you will need to close the
input file, open it again for output, and build and output to that file
a list of the email addresses that were not output by the program on the
current pass. Based on the program you've already written, you should
be able to handle that.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,011
Latest member
AjaUqq1950

Latest Threads

Top