# counting number of uniques in a multidimensional array column

Discussion in 'Perl Misc' started by Jack, Jul 25, 2006.

1. ### JackGuest

Hi I have data in a multidim array and DONT want to create another
array representing just 1 column from this multidim array.. I want to
determine the number of uniques, I did this easily with just a regular
array (code below), does anyone know how to do this over just 1 column
of a multidim array (in other words, number of uniques across 1 column
of the multi dim defined as: multidim[0][0],multidim[1][0],
multidim[2][0].... etc)

sort @\$columnarray;
@out = grep(\$_ ne \$prev && (\$prev = \$_, 1), @\$columnarray);
if (\$#out == -1) { \$#out = 0; }
\$out = \$#out +1; # makes \$#out of 0 = 1 so it gets counted !
push @distinctcounts, \$out;

Thanks!
Jack

Jack, Jul 25, 2006

2. ### Paul LalliGuest

Jack wrote:
> Hi I have data in a multidim array and DONT want to create another
> array representing just 1 column from this multidim array..

Why?

> I want to
> determine the number of uniques, I did this easily with just a regular
> array (code below), does anyone know how to do this over just 1 column
> of a multidim array (in other words, number of uniques across 1 column
> of the multi dim defined as: multidim[0][0],multidim[1][0],
> multidim[2][0].... etc)
>
> sort @\$columnarray;

This does nothing at all. You are clearly not enabling warnings in

> @out = grep(\$_ ne \$prev && (\$prev = \$_, 1), @\$columnarray);
> if (\$#out == -1) { \$#out = 0; }

"if @out is empty, create one undefined element in @out"

Why? Under what circumstances do you believe @out could ever be empty
from the above code (assuming you had sorted @\$columnarray correctly)?
Well, I suppose it could be if your array had nothing but undefined
values in it. Is that the circumstance you were going for?

> \$out = \$#out +1; # makes \$#out of 0 = 1 so it gets counted !

Now you're assigning \$out to be the size of @out. Why not just use the
size of @out?

> push @distinctcounts, \$out;

The above code looks remarkably like the first answer to
perldoc -q duplicate

Have you seen the other answers?

Have you considered using map to generate a list of the first "columns"
of each array, and using that as your list rather than @{\$columnarray}
?

map { \$_->[0] } @\$columnarray

will give you that.

Paul Lalli

Paul Lalli, Jul 25, 2006

3. ### JackGuest

Paul Lalli wrote:
> Jack wrote:
> > Hi I have data in a multidim array and DONT want to create another
> > array representing just 1 column from this multidim array..

>
> Why?
>
> > I want to
> > determine the number of uniques, I did this easily with just a regular
> > array (code below), does anyone know how to do this over just 1 column
> > of a multidim array (in other words, number of uniques across 1 column
> > of the multi dim defined as: multidim[0][0],multidim[1][0],
> > multidim[2][0].... etc)
> >
> > sort @\$columnarray;

>
> This does nothing at all. You are clearly not enabling warnings in
> your development. Please start doing so.
>
> > @out = grep(\$_ ne \$prev && (\$prev = \$_, 1), @\$columnarray);
> > if (\$#out == -1) { \$#out = 0; }

>
> "if @out is empty, create one undefined element in @out"
>
> Why? Under what circumstances do you believe @out could ever be empty
> from the above code (assuming you had sorted @\$columnarray correctly)?
> Well, I suppose it could be if your array had nothing but undefined
> values in it. Is that the circumstance you were going for?
>
> > \$out = \$#out +1; # makes \$#out of 0 = 1 so it gets counted !

>
> Now you're assigning \$out to be the size of @out. Why not just use the
> size of @out?
>
> > push @distinctcounts, \$out;

>
> The above code looks remarkably like the first answer to
> perldoc -q duplicate
>
> Have you seen the other answers?
>
> Have you considered using map to generate a list of the first "columns"
> of each array, and using that as your list rather than @{\$columnarray}
> ?
>
> map { \$_->[0] } @\$columnarray
>
> will give you that.
>
> Paul Lalli

Just ignore the @\$ (this represents a variable) - assume the code is
this:
sort @columnarray;
@out = grep(\$_ ne \$prev && (\$prev = \$_, 1), @columnarray);
if (\$#out == -1) { \$#out = 0; }
print \$out;

Are you saying the above doesnt work ?? It works great on a single
array. Do you have a better code, if so, what is it ? Also, can you
please answer the question about how to get the distinct count of a
multidim column with an actual example. Appreciate your response.
Thanks, Jack

Jack, Jul 25, 2006
4. ### Paul LalliGuest

Jack wrote:

> Just ignore the @\$ (this represents a variable)

There was no @\$ in your original snippet, so ignoring it is a no-op.
There was, however, @\$columnarray, which is a perfectly valid array. I
have no idea why you're saying to get rid of it now.

> - assume the code is this:
> sort @columnarray;

Once again, THIS LINE DOES NOTHING. You still have not bothered to
turn warnings on? Why? You are asking for help, help is being given
to you, and you're ignoring it. That's really very annoying.

> @out = grep(\$_ ne \$prev && (\$prev = \$_, 1), @columnarray);
> if (\$#out == -1) { \$#out = 0; }
> print \$out;
>
> Are you saying the above doesnt work ??

I did not say that at all. What part of my post implies that the code
doesn't work? I said that the first line of it does nothing at all,
and the messing about with \$#out is pointless.

> It works great on a single
> array. Do you have a better code, if so, what is it ?

Once again, I point you to the other responses in the FAQ that you
apparently saw to get this code:
perldoc -q duplicate
(Or did you never see that FAQ, and are instead just copy/pasting some
other code you found lying around somewhere?)
Once again, why are you ignoring what I've already told you to do,
preferring instead to believe that I'm just not bothering to help?

> Also, can you
> please answer the question about how to get the distinct count of a
> multidim column with an actual example

I *did*! Why are you ignoring my entire response?! I told you
precisely how to change your example to use a list of the first
columns, rather than a single array. The fact that you ignored that

> Appreciate your response.

Really doesn't appear that way.

Paul Lalli

Paul Lalli, Jul 25, 2006
5. ### JackGuest

Paul Lalli wrote:
> Jack wrote:
>
> > Just ignore the @\$ (this represents a variable)

>
> There was no @\$ in your original snippet, so ignoring it is a no-op.
> There was, however, @\$columnarray, which is a perfectly valid array. I
> have no idea why you're saying to get rid of it now.
>
> > - assume the code is this:
> > sort @columnarray;

>
> Once again, THIS LINE DOES NOTHING. You still have not bothered to
> turn warnings on? Why? You are asking for help, help is being given
> to you, and you're ignoring it. That's really very annoying.
>
> > @out = grep(\$_ ne \$prev && (\$prev = \$_, 1), @columnarray);
> > if (\$#out == -1) { \$#out = 0; }
> > print \$out;
> >
> > Are you saying the above doesnt work ??

>
> I did not say that at all. What part of my post implies that the code
> doesn't work? I said that the first line of it does nothing at all,
> and the messing about with \$#out is pointless.
>
> > It works great on a single
> > array. Do you have a better code, if so, what is it ?

>
> Once again, I point you to the other responses in the FAQ that you
> apparently saw to get this code:
> perldoc -q duplicate
> (Or did you never see that FAQ, and are instead just copy/pasting some
> other code you found lying around somewhere?)
> Once again, why are you ignoring what I've already told you to do,
> preferring instead to believe that I'm just not bothering to help?
>
> > Also, can you
> > please answer the question about how to get the distinct count of a
> > multidim column with an actual example

>
> I *did*! Why are you ignoring my entire response?! I told you
> precisely how to change your example to use a list of the first
> columns, rather than a single array. The fact that you ignored that
> advice is your problem, not mine.
>
> > Appreciate your response.

>
> Really doesn't appear that way.
>
> Paul Lalli

Forgive me if I am limited to some degree. I am just asking if someone
can provide some sample code that works takes \$multidimarray[1][0],
\$multidimarray[2][0], (a column) and produces a distinct count...

I dont know how to take your suggestion of
map { \$_->[0] } @columnarray
and convert that into a solution for that counts the distinct entires
for the first column in a multidimensional array ..

Would you consider elaborating, or perhaps someone who is willing to
help/share.

Thank you,
Jack

Jack, Jul 25, 2006
6. ### Guest

"Jack" <> wrote:
> Hi I have data in a multidim array and DONT want to create another
> array representing just 1 column from this multidim array.. I want to
> determine the number of uniques, I did this easily with just a regular
> array (code below),

I don't know if the code below actually does work, but I will assume it
does.

> does anyone know how to do this over just 1 column
> of a multidim array (in other words, number of uniques across 1 column
> of the multi dim defined as: multidim[0][0],multidim[1][0],
> multidim[2][0].... etc)

my \$col_number=0; # or whatever column you want
my \$columnarray=[map \$_->[\$col_number], @multidim];

Now procede as before with \$columnarray.

Xho

--
Usenet Newsgroup Service \$9.95/Month 30GB

, Jul 25, 2006
7. ### Ted ZlatanovGuest

On 25 Jul 2006, wrote:

> Forgive me if I am limited to some degree. I am just asking if someone
> can provide some sample code that works takes \$multidimarray[1][0],
> \$multidimarray[2][0], (a column) and produces a distinct count...
>
> I dont know how to take your suggestion of
> map { \$_->[0] } @columnarray
> and convert that into a solution for that counts the distinct entires
> for the first column in a multidimensional array ..

I'll try to help you. Keep in mind that the advice Paul gave was
useful, I'm just restating it and elaborating. Don't feel bad about
missing things here and there, everyone has to start somewhere.

That map call will return the first (0) column of the array as a list.

Your original question was how to find unique elements in a column.

You posted:

> sort @\$columnarray;
> @out = grep(\$_ ne \$prev && (\$prev = \$_, 1), @\$columnarray);
> if (\$#out == -1) { \$#out = 0; }
> \$out = \$#out +1; # makes \$#out of 0 = 1 so it gets counted !
> push @distinctcounts, \$out;

The first line does nothing at all. Paul mentioned that too. Use
warnings and strict mode, if possible, to avoid such code. Sort
*returns* the sorted list, it doesn't modify in place.

In addition your 'uniques' code is not very good. It may work in some
cases, but really you should use a hash. Look at 'perldoc -q
duplicates' and 'perldoc perldata' to get started. Actually all of
the perldoc info is good

Here's a (very simple) function to give you the unique items from a
list you pass:

sub uniques
{
my %unique = ();
\$unique{\$_}++ foreach @_;
return keys %unique;
}

Now use it like this:

my @columnarray = ( [1,2,3], [1,2,3], [4,5,6], [7,8,9], );

foreach my \$column (1 .. scalar @{\$columnarray[0]})
{
print "Unique elements in column \$column: ";
print join ', ',
uniques(map { \$_->[\$column-1] }
@columnarray
);
print "\n";
}

I formatted this to be easy to understand, and I tested it with the
data above under

use warnings;
use strict;

and it worked correctly. Please learn from the code posted above - it
shows many useful techniques.

Ted

Ted Zlatanov, Jul 25, 2006
8. ### Paul LalliGuest

Jack wrote:
> Paul Lalli wrote:
> > > @out = grep(\$_ ne \$prev && (\$prev = \$_, 1), @columnarray);

> > > Also, can you
> > > please answer the question about how to get the distinct count of a
> > > multidim column with an actual example

> >
> > I *did*! Why are you ignoring my entire response?! I told you
> > precisely how to change your example to use a list of the first
> > columns, rather than a single array. The fact that you ignored that
> > advice is your problem, not mine.
> >

> Forgive me if I am limited to some degree.

Being new to Perl is not something that requires forgiveness. Being
unwilling to put forth effort of your own, and only accepting solutions
that are spoonfed to you, is not worthy of forgiveness.

> I am just asking if someone can provide some sample code

I know exactly what you're asking. I have answered it 3 times now.
The answer is "No, I will not write code for you. I will, however,
give you all the information you need to do it yourself." If that's
not good enough for you, I strongly suggest you hire a consultant.

> that works takes \$multidimarray[1][0],
> \$multidimarray[2][0], (a column) and produces a distinct count...
>
> I dont know how to take your suggestion of
> map { \$_->[0] } @columnarray
> and convert that into a solution for that counts the distinct entires
> for the first column in a multidimensional array ..

I told you to take that expression, and operate on that, rather than on
@columnarray itself. What part of that is confusing to you?

Take that expression right there, and put that where you currently have
'@columnarray' in the first quoted line of this message.

> Would you consider elaborating, or perhaps someone who is willing to
> help/share.

Implying that I am *not* willing to help or share? You have a very
bizarre definition of "help".

*PLONK*

Paul Lalli

Paul Lalli, Jul 25, 2006
9. ### Mumia W.Guest

On 07/25/2006 01:54 PM, Jack wrote:
> Paul Lalli wrote:
>> [ snipped ]

> [...]
> I dont know how to take your suggestion of
> map { \$_->[0] } @columnarray
> and convert that into a solution for that counts the distinct entires
> for the first column in a multidimensional array ..
>
> Would you consider elaborating, or perhaps someone who is willing to
> help/share.
>
> Thank you,
> Jack
>

Paul Lalli gave you half of the answer. You're supposed to
figure out the other half. The other half is storing the data
in a hash where the keys are the column data returned from the
map, and the values are incremented once for each entry in the
column.

Hashes have a "magical" quality that makes their keys unique.
Using a hash, you can count the number of unique items in an
array, because each key in a hash appears only once.

1: use Data:umper;
2: my @temps = (30, 38, 26, 38, 39);
3: my %hash;
4: for my \$tp (@temps) { \$hash{\$tp} += 1 }
5: print Dumper(\%hash);

Line 4 increments a hash value each time it's found[0] in the
array. Notice that 38 only appears once in the hash, despite
the fact that it appears twice in @temps.

:-O UNTESTED CODE :-O

--
[0] Simplified language. Untrue.

Mumia W., Jul 25, 2006
10. ### JackGuest

Ted Zlatanov wrote:
> On 25 Jul 2006, wrote:
>
> > Forgive me if I am limited to some degree. I am just asking if someone
> > can provide some sample code that works takes \$multidimarray[1][0],
> > \$multidimarray[2][0], (a column) and produces a distinct count...
> >
> > I dont know how to take your suggestion of
> > map { \$_->[0] } @columnarray
> > and convert that into a solution for that counts the distinct entires
> > for the first column in a multidimensional array ..

>
> I'll try to help you. Keep in mind that the advice Paul gave was
> useful, I'm just restating it and elaborating. Don't feel bad about
> missing things here and there, everyone has to start somewhere.
>
> That map call will return the first (0) column of the array as a list.
>
> Your original question was how to find unique elements in a column.
>
> You posted:
>
> > sort @\$columnarray;
> > @out = grep(\$_ ne \$prev && (\$prev = \$_, 1), @\$columnarray);
> > if (\$#out == -1) { \$#out = 0; }
> > \$out = \$#out +1; # makes \$#out of 0 = 1 so it gets counted !
> > push @distinctcounts, \$out;

>
> The first line does nothing at all. Paul mentioned that too. Use
> warnings and strict mode, if possible, to avoid such code. Sort
> *returns* the sorted list, it doesn't modify in place.
>
> In addition your 'uniques' code is not very good. It may work in some
> cases, but really you should use a hash. Look at 'perldoc -q
> duplicates' and 'perldoc perldata' to get started. Actually all of
> the perldoc info is good
>
> Here's a (very simple) function to give you the unique items from a
> list you pass:
>
> sub uniques
> {
> my %unique = ();
> \$unique{\$_}++ foreach @_;
> return keys %unique;
> }
>
> Now use it like this:
>
> my @columnarray = ( [1,2,3], [1,2,3], [4,5,6], [7,8,9], );
>
> foreach my \$column (1 .. scalar @{\$columnarray[0]})
> {
> print "Unique elements in column \$column: ";
> print join ', ',
> uniques(map { \$_->[\$column-1] }
> @columnarray
> );
> print "\n";
> }
>
> I formatted this to be easy to understand, and I tested it with the
> data above under
>
> use warnings;
> use strict;
>
> and it worked correctly. Please learn from the code posted above - it
> shows many useful techniques.
>
> Ted

Ted, great job that works killer... can you tell me, I want to exclude
from the counting any null values, I tried adding this without
success..any reply would be appreciated..thanks, Jack

sub uniques
{
my %unique = ();
if (@_ != /^\z/) { \$unique{\$_}++ foreach @_ } ;
return keys %unique;
}

Jack, Jul 25, 2006
11. ### Guest

Jack <> wrote:

> sort @columnarray;
> @out = grep(\$_ ne \$prev && (\$prev = \$_, 1), @columnarray);
> if (\$#out == -1) { \$#out = 0; }
> print \$out;

> Are you saying the above doesnt work ?? It works great on a single
> array. Do you have a better code, if so, what is it ?

It doesn't work. Even if the last line if a typo for

print \$#out;

my @columnarray = qw(a b c d b e c);
Results in:

Useless use of sort in void context at q1.pl line 11.
Use of uninitialized value in string ne at q1.pl line 12.
6

> Also, can you
> please answer the question about how to get the distinct count of a
> multidim column with an actual example. Appreciate your response.

People on this group do not regard to being told to answer questions or
write code.

Axel

, Jul 25, 2006
12. ### DJ StunksGuest

Jack wrote:
> Ted, great job that works killer... can you tell me, I want to exclude
> from the counting any null values, I tried adding this without
> success..any reply would be appreciated..thanks, Jack
>
> sub uniques
> {
> my %unique = ();
> if (@_ != /^\z/) { \$unique{\$_}++ foreach @_ } ;

1) this ---^^ only operates on a scalar; thus
2) this ^^ array is forced into scalar context; and
3) an array evaluated in scalar context yields the count of
the number of elements in the array; but
4) here ---^^ you mistyped the negated binding operator; therefore
5) this ------^^^^^ attempts to match against whatever is
currently contained in \$_; and
6) if the return value for this test (1 or 0) is not equal to the
number of elements in @_ (likely > 1); then
5) the block will be evaluated

> return keys %unique;
> }

if only there were some way to test the value
foreach element of the array...

-jp

DJ Stunks, Jul 26, 2006
13. ### Tad McClellanGuest

Paul Lalli <> wrote:
> Jack wrote:

>> Would you consider elaborating, or perhaps someone who is willing to
>> help/share.

>
> Implying that I am *not* willing to help or share? You have a very
> bizarre definition of "help".
>
> *PLONK*

You are coming late to the party:

Message-ID: <>

:-(

--
Tad McClellan SGML consulting
Perl programming
Fort Worth, Texas

Tad McClellan, Jul 26, 2006
14. ### Tad McClellanGuest

Paul Lalli <> wrote:
> Jack wrote:

>> sort @\$columnarray;

>
> This does nothing at all.

It is useful only during the winter.

If you keep your tower under your desk, it will help to keep your feet warm!

--
Tad McClellan SGML consulting
Perl programming
Fort Worth, Texas

Tad McClellan, Jul 26, 2006
15. ### Tad McClellanGuest

Jack <> wrote:

> I want to exclude
> from the counting any null values,

> sub uniques
> {
> my %unique = ();
> if (@_ != /^\z/) { \$unique{\$_}++ foreach @_ } ;
> return keys %unique;

return grep length, keys %unique;

or, since there can only be one anyway:

delete \$unique{''};
return keys %unique;

> }
>

--
Tad McClellan SGML consulting
Perl programming
Fort Worth, Texas

Tad McClellan, Jul 26, 2006
16. ### Ted ZlatanovGuest

On 25 Jul 2006, wrote:

> Ted, great job that works killer... can you tell me, I want to exclude
> from the counting any null values, I tried adding this without
> success..any reply would be appreciated..thanks, Jack
>
> sub uniques
> {
> my %unique = ();
> if (@_ != /^\z/) { \$unique{\$_}++ foreach @_ } ;
> return keys %unique;
> }

Tad's solution is great, but I just wanted to clarify something. When
you say "null" that actually doesn't mean anything in Perl. Perl
calls undefined values "undef" - this is different from NULL in
C/C++. There are rules about undef and how it's converted to a string
or numeric context, but what's important is to realize that what
you're filtering above is the empty string "", and that's why Tad used
length() in his test.

Interestingly, length(undef) is also 0, which makes Tad's length()
test eliminate undef values as well. For extra credit and fun, figure
out why length(undef) is 0 - you'll learn about the rules I mentioned
above, and you'll be a better Perl programmer for it.

Also, it's good that you posted what you tried even though it didn't
work. People on this newsgroup are very, very helpful when they see
you've tried something on your own. They generally dislike open-ended
questions with vague requirement. This is why you got a good response
from Tad. There's some posting guidelines (look them up on Google
News) posted here regularly, which explain this and more.

Good luck

Ted

Ted Zlatanov, Jul 26, 2006
17. ### Ben MorrowGuest

Quoth Ted Zlatanov <>:

> Interestingly, length(undef) is also 0, which makes Tad's length()
> test eliminate undef values as well. For extra credit and fun, figure
> out why length(undef) is 0

.... with a warning...

Ben

--
It will be seen that the Erwhonians are a meek and long-suffering people,
easily led by the nose, and quick to offer up common sense at the shrine of
logic, when a philosopher convinces them that their institutions are not based
on the strictest morality. [Samuel Butler, paraphrased]

Ben Morrow, Jul 26, 2006
18. ### Tad McClellanGuest

Tad McClellan <> wrote:
> Jack <> wrote:
>
>> I want to exclude
>> from the counting any null values,

>
>> sub uniques
>> {
>> my %unique = ();
>> if (@_ != /^\z/) { \$unique{\$_}++ foreach @_ } ;

Or, if you meant values rather than keys:

\$unique{\$_}++ foreach grep length, @_;

>> return keys %unique;

>
>
> return grep length, keys %unique;
>
> or, since there can only be one anyway:
>
> delete \$unique{''};
> return keys %unique;
>
>
>> }

--
Tad McClellan SGML consulting
Perl programming
Fort Worth, Texas

Tad McClellan, Jul 27, 2006
19. ### Ted ZlatanovGuest

On 26 Jul 2006, wrote:

> Quoth Ted Zlatanov <>:
>
>> Interestingly, length(undef) is also 0, which makes Tad's length()
>> test eliminate undef values as well. For extra credit and fun, figure
>> out why length(undef) is 0

>
> ... with a warning...

Hey, I'm training the OP to compete in fwp contests

Ted

Ted Zlatanov, Jul 27, 2006
20. ### JackGuest

Ted Zlatanov wrote:
> On 25 Jul 2006, wrote:
>
> > Forgive me if I am limited to some degree. I am just asking if someone
> > can provide some sample code that works takes \$multidimarray[1][0],
> > \$multidimarray[2][0], (a column) and produces a distinct count...
> >
> > I dont know how to take your suggestion of
> > map { \$_->[0] } @columnarray
> > and convert that into a solution for that counts the distinct entires
> > for the first column in a multidimensional array ..

>
> I'll try to help you. Keep in mind that the advice Paul gave was
> useful, I'm just restating it and elaborating. Don't feel bad about
> missing things here and there, everyone has to start somewhere.
>
> That map call will return the first (0) column of the array as a list.
>
> Your original question was how to find unique elements in a column.
>
> You posted:
>
> > sort @\$columnarray;
> > @out = grep(\$_ ne \$prev && (\$prev = \$_, 1), @\$columnarray);
> > if (\$#out == -1) { \$#out = 0; }
> > \$out = \$#out +1; # makes \$#out of 0 = 1 so it gets counted !
> > push @distinctcounts, \$out;

>
> The first line does nothing at all. Paul mentioned that too. Use
> warnings and strict mode, if possible, to avoid such code. Sort
> *returns* the sorted list, it doesn't modify in place.
>
> In addition your 'uniques' code is not very good. It may work in some
> cases, but really you should use a hash. Look at 'perldoc -q
> duplicates' and 'perldoc perldata' to get started. Actually all of
> the perldoc info is good
>
> Here's a (very simple) function to give you the unique items from a
> list you pass:
>
> sub uniques
> {
> my %unique = ();
> \$unique{\$_}++ foreach @_;
> return keys %unique;
> }
>
> Now use it like this:
>
> my @columnarray = ( [1,2,3], [1,2,3], [4,5,6], [7,8,9], );
>
> foreach my \$column (1 .. scalar @{\$columnarray[0]})
> {
> print "Unique elements in column \$column: ";
> print join ', ',
> uniques(map { \$_->[\$column-1] }
> @columnarray
> );
> print "\n";
> }
>
> I formatted this to be easy to understand, and I tested it with the
> data above under
>
> use warnings;
> use strict;
>
> and it worked correctly. Please learn from the code posted above - it
> shows many useful techniques.
>
> Ted

Ted - this is excellent stuff - how exactly can I capture an example of
2 elements representing a duplicate in a variable from this code ???

thanks again,

Jack

Jack, Aug 8, 2006