A twist on Cookbook recipe 4.6

Amer Neely · Mar 22, 2007

I'm stuck trying to get a count of duplicate emails in a HoH.

Input is a pipe-delimited file of users with their email addresses.
Element 0 is a unique ID number; 10 is their email.

The problem is some users are present multiple times, so that their ID
number is unique but their email remains the same.

I'm trying to prepend a string to the duplicated emails, so I can use
them to populate a MySQL table along with the ID numbers. The email
addresses will not (likely) work. Not my problem.

What I have is a working script from the 'Perl Cookbook' (recipe #4.6)
that shows me which emails are duplicated.

I'm trying to get a count of those emails ALONG with their respective IDs.

My attempts so far:

#! /usr/bin/perl
use strict;
use warnings;

my ($ID,$Email);
my %SeenEmails=();

open IN,"<",$InFile or die "Can't open $InFile: $!\n";
foreach my $line (<IN>)
{
($ID,$Email) = ((split /\|/, $line)[0,10]);
$SeenEmails{$Email}++;
}
close IN or die "Can't close $InFile: $!\n";;

for (sort keys %SeenEmails)
{
if ($SeenEmails{$_} > 1)
{
print "$_ -> $SeenEmails{$_}\n";
for my $i (1..($SeenEmails{$_}-1))
{
my $thisone=$_;
$thisone = 'dupe' . $i . '_' . $_;
print "\t$thisone\n";
}
}
}

This prepends the email with a dynamically generated string, but I'm
missing the ID number when I populate the hash initially.

If I use
$SeenEmails{$ID}{$Email}++;
I get the ID number, but I'm counting the wrong thing.

How can I get the duplicated emails (and how many there are) with their
respective IDs, so I can prepend my string?

Ben Morrow · Mar 22, 2007

Quoth Amer Neely said:
I'm trying to get a count of those [duplicated] emails ALONG with
their respective IDs.

My attempts so far:

#! /usr/bin/perl
use strict;
use warnings;

my ($ID,$Email);

Don't declare variables before you need them. These should go inside the
loop.

my %SeenEmails=();

open IN,"<",$InFile or die "Can't open $InFile: $!\n";

I would suggest

open my $IN, ...

instead. Global filehandles are as bad as any other global variable.

foreach my $line (<IN>)

This will work, but is a terrible waste of time. <FH> in list context
returns a list of lines, so you are reading in the file, splitting that
into lines, and then going through them one at a time. If you want to go
through one at a time, use while instead:

while (my $line = said:
{
($ID,$Email) = ((split /\|/, $line)[0,10]);
$SeenEmails{$Email}++;

Here you want to store a list of the IDs, instead of just incrementing.
Something like

$SeenEmails{$Email} ||= [];
push @{ $SeenEmails{$Email} }, $ID;

which could be shortend to

push @{ $SeenEmails{$Email} ||= [] }, $ID;

once you're comfortable with the syntax.

If you don't understand refs (the [] and @{...} stuff) read perldoc
perlreftut.

}
close IN or die "Can't close $InFile: $!\n";;

for (sort keys %SeenEmails)
{
if ($SeenEmails{$_} > 1)

my $seen = $SeenEmails{$_};

if (@$seen > 1) {

I don't really understand what you want to generate from this, but
perhaps you want something like

print "$_ -> " . scalar(@$seen) . "\n";

for my $id (@$seen) {
print "\tdupe${id}_$_\n";
}

That print string is quite complicated; it can be split into

"\t" . 'dupe' . $id . '_' . $_ . "\n"

.. A decent highlighting editor will help, of course.

FWIW, while I often use $_ as an implicit 'for' loop variable, I only
use it on the *innermost* loop. I find that once a loop gets large
enough to have sub-loops of its own, it probably deserves a real
variable; and having a $_ around that isn't 'the current item' is
unhelpful and confusing

.

{
print "$_ -> $SeenEmails{$_}\n";
for my $i (1..($SeenEmails{$_}-1))
{
my $thisone=$_;
$thisone = 'dupe' . $i . '_' . $_;
print "\t$thisone\n";
}
}
}

Ben

John W. Krahn · Mar 22, 2007

Ben said:
Quoth Amer Neely said:

{
($ID,$Email) = ((split /\|/, $line)[0,10]);
$SeenEmails{$Email}++;

Click to expand...

Here you want to store a list of the IDs, instead of just incrementing.
Something like

$SeenEmails{$Email} ||= [];
push @{ $SeenEmails{$Email} }, $ID;

which could be shortend to

push @{ $SeenEmails{$Email} ||= [] }, $ID;

All you really need is:

push @{ $SeenEmails{ $Email } }, $ID;

(Lookup autovivification in the docs.)

John

Amer Neely · Mar 22, 2007

Ben said:
Quoth Amer Neely said:

I'm trying to get a count of those [duplicated] emails ALONG with
their respective IDs.

My attempts so far:

#! /usr/bin/perl
use strict;
use warnings;

my ($ID,$Email);

Click to expand...

Don't declare variables before you need them. These should go inside the
loop.

my %SeenEmails=();

open IN,"<",$InFile or die "Can't open $InFile: $!\n";

Click to expand...

I would suggest

open my $IN, ...

instead. Global filehandles are as bad as any other global variable.

foreach my $line (<IN>)

Click to expand...

This will work, but is a terrible waste of time. <FH> in list context
returns a list of lines, so you are reading in the file, splitting that
into lines, and then going through them one at a time. If you want to go
through one at a time, use while instead:

while (my $line = said:

{
($ID,$Email) = ((split /\|/, $line)[0,10]);
$SeenEmails{$Email}++;

Click to expand...

Here you want to store a list of the IDs, instead of just incrementing.
Something like

$SeenEmails{$Email} ||= [];
push @{ $SeenEmails{$Email} }, $ID;

which could be shortend to

push @{ $SeenEmails{$Email} ||= [] }, $ID;

once you're comfortable with the syntax.

If you don't understand refs (the [] and @{...} stuff) read perldoc
perlreftut.

}
close IN or die "Can't close $InFile: $!\n";;

for (sort keys %SeenEmails)
{
if ($SeenEmails{$_} > 1)

Click to expand...

my $seen = $SeenEmails{$_};

if (@$seen > 1) {

I don't really understand what you want to generate from this, but
perhaps you want something like

print "$_ -> " . scalar(@$seen) . "\n";

for my $id (@$seen) {
print "\tdupe${id}_$_\n";
}

That print string is quite complicated; it can be split into

"\t" . 'dupe' . $id . '_' . $_ . "\n"

. A decent highlighting editor will help, of course.

FWIW, while I often use $_ as an implicit 'for' loop variable, I only
use it on the *innermost* loop. I find that once a loop gets large
enough to have sub-loops of its own, it probably deserves a real
variable; and having a $_ around that isn't 'the current item' is
unhelpful and confusing .

{
print "$_ -> $SeenEmails{$_}\n";
for my $i (1..($SeenEmails{$_}-1))
{
my $thisone=$_;
$thisone = 'dupe' . $i . '_' . $_;
print "\t$thisone\n";
}
}
}

Click to expand...

Ben

Ben, thanks for the advice and direction. I can see I need to get a good
handle on references.

Amer Neely · Mar 22, 2007

John said:
Ben said:

Quoth Amer Neely said:

{
($ID,$Email) = ((split /\|/, $line)[0,10]);
$SeenEmails{$Email}++;

Click to expand...

Here you want to store a list of the IDs, instead of just incrementing.
Something like

$SeenEmails{$Email} ||= [];
push @{ $SeenEmails{$Email} }, $ID;

which could be shortend to

push @{ $SeenEmails{$Email} ||= [] }, $ID;

Click to expand...

All you really need is:

push @{ $SeenEmails{ $Email } }, $ID;

(Lookup autovivification in the docs.)

John

Thank you for the alternative John, I'll do more reading

Ben Morrow · Mar 23, 2007

Quoth "John W. Krahn said:
Ben said:

push @{ $SeenEmails{$Email} ||= [] }, $ID;

Click to expand...

All you really need is:

push @{ $SeenEmails{ $Email } }, $ID;

(Lookup autovivification in the docs.)

Duh... I even knew that, I'd just temporarily forgotten (and wasn't
trusting Perl to DWIM

). Thanks for the reminder...

Ben

John W. Krahn · Mar 23, 2007

Ben said:
Quoth "John W. Krahn said:

Ben said:

push @{ $SeenEmails{$Email} ||= [] }, $ID;

Click to expand...

All you really need is:

push @{ $SeenEmails{ $Email } }, $ID;

(Lookup autovivification in the docs.)

Click to expand...

Duh... I even knew that, I'd just temporarily forgotten (and wasn't
trusting Perl to DWIM ). Thanks for the reminder...

No problemo.

John

example4-25 of c++cookbook	3	Jul 10, 2011
Trim whitespace with cookbook recipe does not result in trimmed array	24	Feb 12, 2006
Need help with code on website (noob)	2	Jul 18, 2022
How to create a JSON array with values from DOM(HTML TABLE) when I click a button using JQuery/Javascript?	0	May 1, 2023
Random fucntion with a twist	7	Jan 27, 2007
Help with optionparse recipe	2	Jul 27, 2004
Survey details won't go through using php, ajax, Mysql	0	Oct 26, 2023
Trying to catch invalid emails	4	May 7, 2008

A twist on Cookbook recipe 4.6

Amer Neely

Ben Morrow

John W. Krahn

Amer Neely

Amer Neely

Ben Morrow

John W. Krahn

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads