unique elements in a list....

J

jim.goodman

i want to find out how many unique "regions are in part of the list
attached.... i have also included the perl script that i have
created... i am new to this so please no flames.... it appears to
partially work, but the intended output isn't what i wanted... it
appears to loop and actually find non-unique items instead of just the
unique... I don't know, i'm lost, that's why i'm here :eek:)! As i look
at it more, i understand why it's doing what it is (it's a second loop
thing with teh "uniqueness"), but i don't know how to fix it to get the
end result i want....

#!/usr/bin/perl
use strict;
use warnings;

open (ORG_STUFF, "/Users/goodman/Desktop/region.txt") or die "Can't
open ORG_STUFF : $!";
my (@org, @new); # declare arrays
my ($org_data, $new_data, $org_line, $new_line); # declare
variables
while( <ORG_STUFF> ) {
push @org, $_; # push the data line onto the array
}
push @new, "helpme"; # load something into the array or it won't match
at all
foreach $org_line (@org) { #loop through the original data
$org_data = $3 if ($org_line =~ /(.*?\t)(.*?\t)(.*?\n)/); #get
the data chunk
chomp $org_data;
foreach $new_line (@new) {
if ($org_data ne $new_line) {
push @new, $org_data;
}
}
}

close ORG_STUFF;
print @new;


Now part of the input file... i am interested in the "third" chunk of
data on each line.... and only the unique ones!

1 0,1160,10,00 Napa Valley
2 0,1160,100,00 Monterey Bay Area
3 0,1160,1000,00 Napa Valley
4 0,1160,1001,00 Napa Valley
5 0,1160,1002,00 Sonoma
6 0,1160,1003,00 South Central Coast
7 0,1160,1005,00 Sonoma
8 0,1160,1006,00 South Central Coast
9 0,1160,1007,00 South Central Coast
10 0,1160,1008,00 Napa Valley
11 0,1160,1009,00 Napa Valley
12 0,1160,101,00 Napa Valley
13 0,1160,1010,00 Sonoma
14 0,1160,1011,00 Sonoma
15 0,1160,1012,00 Sonoma
16 0,1160,1013,00 South Central Coast
17 0,1160,1014,00 Napa Valley
18 0,1160,1015,00 Piedmont
19 0,1160,1016,00 Lombardy
20 0,1160,1017,00 Veneto
21 0,1160,1018,00 Veneto
22 0,1160,1019,00 Tuscany
23 0,1160,102,00 Napa Valley
24 0,1160,1020,00 Sicily
25 0,1160,1021,00 Veneto
26 0,1160,1022,00 Piedmont
27 0,1160,1023,00 Piedmont
28 0,1160,1024,00 Piedmont
29 0,1160,1025,00 Piedmont
30 0,1160,1026,00 Piedmont
31 0,1160,1027,00 Veneto
32 0,1160,1028,00 Latium & Rome
33 0,1160,1029,00 Tuscany
34 0,1160,103,00 Sierra Foothills
35 0,1160,1030,00 Mendocino
36 0,1160,1031,00 Napa Valley
37 0,1160,1032,00 Central Valley
38 0,1160,1033,00 Willamette Valley
39 0,1160,1034,00 Napa Valley
40 0,1160,1035,00 New England
41 0,1160,1036,00 New England
42 0,1160,1037,00 South Central Coast
43 0,1160,1038,00 Hudson River Valley
44 0,1160,1039,00 Hudson River Valley
45 0,1160,104,00 Napa Valley
46 0,1160,1040,00 Hudson River Valley
47 0,1160,1041,00 Hudson River Valley
48 0,1160,1042,00 Hudson River Valley
49 0,1160,1043,00 Bordeaux
50 0,1160,1044,00 Bordeaux


from teh above list, in the end, i should get output that looks
something like this... maybe sorted alphabetically...?

Napa Valley
Monterey Bay Area
Sonoma
South Central Coast
Piedmont
Lombardy
Veneto
Tuscany
Sicily
Latium & Rome
Sierra Foothills
Mendocino
Central Valley
Willamette Valley
New England
Hudson River Valley
Bordeaux
 
X

xhoster

i want to find out how many unique "regions are in part of the list
attached.... i have also included the perl script that i have
created... i am new to this so please no flames.... it appears to
partially work, but the intended output isn't what i wanted...

You should use hashes for things like this.
#!/usr/bin/perl
use strict;
use warnings;

Thank you.
open (ORG_STUFF, "/Users/goodman/Desktop/region.txt")
or die "Can't open ORG_STUFF : $!";

It would probably be better to use lexcial file handles.

my (@org, @new); # declare arrays
my %region;

my ($org_data, $new_data, $org_line, $new_line); # declare variables

This is premature declaration. These could be declared as needed
while( <ORG_STUFF> ) {
push @org, $_; # push the data line onto the array
}

No need to store everything before hand.

while (<ORG_STUFF>) {
/(.*?\t)(.*?\t)(.*?\n)/ or next;
## split probably better than above, but I'll leave it.
my $org_data=$3;
chomp $org_data;
## why not just exlude \n from the catpure in the first place?
$region{$org_data}=();
## set this key in the hash
};

print foreach sort keys %region.

Xho
 
U

usenet

No flaming is warranted - you wrote a fairly good question (far better
than most newcomers). FYI, the posting guidelines for this newsgroup
(http://tinyurl.com/eu2vl) will help you continue to post good (and
even better) questions/responses.

Something like this might suit your needs:

#!/usr/bin/perl
use strict; use warnings;

my %region;
while (<DATA>) {
chomp;
$region{ (split /\s+/, $_ ,3)[2] }++;
}

print "$_\n" for sort keys %region;

__DATA__
1 0,1160,10,00 Napa Valley
2 0,1160,100,00 Monterey Bay Area
3 0,1160,1000,00 Napa Valley
4 0,1160,1001,00 Napa Valley
5 0,1160,1002,00 Sonoma
6 0,1160,1003,00 South Central Coast
7 0,1160,1005,00 Sonoma
8 0,1160,1006,00 South Central Coast
9 0,1160,1007,00 South Central Coast
10 0,1160,1008,00 Napa Valley
11 0,1160,1009,00 Napa Valley
12 0,1160,101,00 Napa Valley
13 0,1160,1010,00 Sonoma
14 0,1160,1011,00 Sonoma
15 0,1160,1012,00 Sonoma
16 0,1160,1013,00 South Central Coast
 
J

John W. Krahn

i want to find out how many unique "regions are in part of the list
attached.... i have also included the perl script that i have
created... i am new to this so please no flames.... it appears to
partially work, but the intended output isn't what i wanted... it
appears to loop and actually find non-unique items instead of just the
unique... I don't know, i'm lost, that's why i'm here :eek:)! As i look
at it more, i understand why it's doing what it is (it's a second loop
thing with teh "uniqueness"), but i don't know how to fix it to get the
end result i want....

If you want a unique list think 'hash' and not 'array'. Something like
(UNTESTED):

#!/usr/bin/perl
use strict;
use warnings;

my $org_stuff = '/Users/goodman/Desktop/region.txt';

open ORG_STUFF, $org_stuff or die "Can't open $org_stuff : $!";

my %org; # declare hash

while( <ORG_STUFF> ) {
chomp;
my $org_data = ( split /\t/ )[ -1 ];
$org{ $org_data } = (); # don't care about values
}

close ORG_STUFF;

for my $org_data ( sort keys %org ) {
print "$org_data\n";
}

__END__


John
 
U

usenet

No flaming is warranted - you wrote a fairly good question (far better
than most newcomers). FYI, the posting guidelines for this newsgroup
(http://tinyurl.com/eu2vl) will help you continue to post good (and
even better) questions/responses.

Something like this might suit your needs:

#!/usr/bin/perl
use strict; use warnings;

my %region;
while (<DATA>) {
chomp;
$region{ (split /\s+/, $_ ,3)[2] }++;
}

print "$_\n" for sort keys %region;

__DATA__
1 0,1160,10,00 Napa Valley
2 0,1160,100,00 Monterey Bay Area
3 0,1160,1000,00 Napa Valley
4 0,1160,1001,00 Napa Valley
5 0,1160,1002,00 Sonoma
6 0,1160,1003,00 South Central Coast
7 0,1160,1005,00 Sonoma
8 0,1160,1006,00 South Central Coast
9 0,1160,1007,00 South Central Coast
10 0,1160,1008,00 Napa Valley
11 0,1160,1009,00 Napa Valley
12 0,1160,101,00 Napa Valley
13 0,1160,1010,00 Sonoma
14 0,1160,1011,00 Sonoma
15 0,1160,1012,00 Sonoma
16 0,1160,1013,00 South Central Coast
 
J

jim.goodman

ok, could someone please explain this... :eek:)! first, why a hash and
not an array?

second....
my %org; # declare hash

i get this, just declaring the hash....
while( <ORG_STUFF> ) {
chomp;

good here too, open the data file to read until the end and remove the
trailing newline character
my $org_data = ( split /\t/ )[ -1 ];

splitting the line on the "tabs", but i don't understand the "[-1]"...?
i think that has something to do with in the array was $3...?
$org{ $org_data } = (); # don't care about values

don't get this...?
}

close ORG_STUFF;
for my $org_data ( sort keys %org ) {
print "$org_data\n";

kind of get this.... i need to understand hashes better and i'll get
the "sort" thing
 
J

jim.goodman

ok, could someone please explain this... :eek:)! first, why a hash and
not an array?

second....
my %org; # declare hash

i get this, just declaring the hash....
while( <ORG_STUFF> ) {
chomp;

good here too, open the data file to read until the end and remove the
trailing newline character
my $org_data = ( split /\t/ )[ -1 ];

splitting the line on the "tabs", but i don't understand the "[-1]"...?
i think that has something to do with in the array was $3...?
$org{ $org_data } = (); # don't care about values

don't get this...?
}

close ORG_STUFF;
for my $org_data ( sort keys %org ) {
print "$org_data\n";

kind of get this.... i need to understand hashes better and i'll get
the "sort" thing
 
J

jim.goodman

ok, could someone please explain this... :eek:)! first, why a hash and
not an array?

second....
my %org; # declare hash

i get this, just declaring the hash....
while( <ORG_STUFF> ) {
chomp;

good here too, open the data file to read until the end and remove the
trailing newline character
my $org_data = ( split /\t/ )[ -1 ];

splitting the line on the "tabs", but i don't understand the "[-1]"...?
i think that has something to do with in the array was $3...?
$org{ $org_data } = (); # don't care about values

don't get this...?
}

close ORG_STUFF;
for my $org_data ( sort keys %org ) {
print "$org_data\n";

kind of get this.... i need to understand hashes better and i'll get
the "sort" thing
 
S

Sherm Pendley

ok, could someone please explain this... :eek:)! first, why a hash and
not an array?

Hash keys are unique. When you think of "unique" and "list" in the same
sentence, you're more often than not thinking of a hash.
good here too, open the data file to read until the end and remove the
trailing newline character

That doesn't open the file - the open() function does that. The above is
looping over the file one line at a time, removing $/ from the end of
each line. $/ is in perlvar.
my $org_data = ( split /\t/ )[ -1 ];

splitting the line on the "tabs", but i don't understand the "[-1]"...?

You know that arrays are zero-based, right? Well, imagine the array items
in a row like this, with index zero at the beginning:

[0] [1] [2] [3] [4]

Now, if you use negative indexes, you get elements starting from the *end*
of the list:

[-5] [-4] [-3] [-2] [-1]

To put it another way, where [0] refers to the first item in a list, [-1]
refers to the *last* item.
don't get this...?

When you associate a value with a hash key, perl first checks to see if
the hash key already exists. If so, the new value is simply associated
with the existing key. So keys are always unique.

We don't care about the values in this instance - we're only using a hash
to take advantage of the fact that hash keys are always unique.
kind of get this.... i need to understand hashes better and i'll get
the "sort" thing

The above uses keys() to get a list of keys in %org, and then sort() to
sort the list.

sherm--
 
J

Jürgen Exner

ok, could someone please explain this... :eek:)!

Explain what? Please quote context!
first, why a hash and
not an array?

Based on the subject of posting: because all keys are unique in a hash by
definition of a hash.
my $org_data = ( split /\t/ )[ -1 ];

splitting the line on the "tabs", but i don't understand the
"[-1]"...? i think that has something to do with in the array was
$3...?

Don't know what array you are talking about. split() returns a list and [-1]
selects the last element of that list.
don't get this...?

Creates a new element in %org (or overwrites an existing element) where the
value is the empty list. He could have chosen any other value like 0 or 1 or
'foobar' because apparently he's interested in the key, but not the value.
kind of get this.... i need to understand hashes better and i'll get
the "sort" thing

Well, that's really simple: grab all the keys from the hash %org, sort that
list alphabetically, and then just print them one by one.

jue
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top