Finding the max number in a list

R

Ross Bamford

Of course, if we want to start off in that direction, we'd soon abandon
the plan to use a sort, per se, and replace it with a heap data
structure. Adding data to a heap is O(log N), and finding the maximum
value is still O(1). A binary dense heap may be easily maintained in an
array with no additional storage overhead. If finding the maximum value
is the most important or performance-critical operation on the
structure, it's definitely worth it to get O(log N) rather than O(N)
performance.

I'd agree. I was referring to exactly that with 'a hit on the add'. But
I thought (and forgive me if my memory has failed) adding to a heap runs
O(NlogN)?
Of course, this is a specific instance of the more general concept.
Exactly how efficient the code can be made depends on the information
available about the numbers. For example, if I know that numbers will
only be added and never removed, then I can give you the maximum in O(1)
time with O(1) additions and constant additional space requirement as
well!

:)
 
C

Chris Uppal

John said:
I have even seen a sort
that scales linearly for uniformly distributed data, but the coefficient
of N in that case would probably be a couple of orders of magnitude
greater than that for the simple comparison loop.

Got a link or reference for that ?

-- chris
 
C

Chris Uppal

Eric said:
Overkill.

And woefully inefficient on modern hardware, too.
To take advantage of SMP systems, CMT processors, even
hyperthreaded(tm) machines, you need some parallelism.
Here's the approach I'd recommend:

- If there are N numbers in the array, launch N/2
threads [...]

Very nice, but over-complicated. To get an O(1) algorithm, simply select one
of the elements of the list at random. You don't know about any internal
ordering of the array (by definition of the problem, which otherwise would be
trivial), so you can, without loss of generality, assume that the first element
is "random" enough. The algorithm, then, is: select the first element; that is
the max().

OK, the algorithm is heuristic rather than fully deterministic, and (like most
such) is not /guaranteed/ to produce the maximum element. Still in practice,
it should be adequate for many applications. And, of course, the speed of the
algorithm will tend to mitigate any small losses in ultimate accuracy.

Note, also, that there are large classes of cases for which the algorithm
/does/ produce a guaranteed correct result.

Interestingly, the algorithm has the property that it equally capable of
computing the min() with the /same code/. This code-sharing makes it
especially appealing for use in resource-constrained devices.

-- chris
 
S

Sharp

This won't be popular but it's easy:

int largest = 0;
for (int i : myInts) if (i > largest) largest = i;

It's not especially efficient, but then it's not especially inefficient
either.

Cheers,
Ross

I have never seen the for loop written like before.
Could you please explain what
for (int i: myInts)

means?

Cheers
Sharp
 
P

Patricia Shanahan

Chris said:
John C. Bollinger wrote:




Got a link or reference for that ?

-- chris

A pairwise comparison based sort (as in sorting Comparable
objects), is \Omega(n lg(n)). See
http://planetmath.org/encyclopedia/LowerBoundForSorting.html
for the reasoning.

However, in this case we are sorting ints, which are known
to be numbers in a bounded range, so a radix sort is
possible and is O(n). See
http://www.nist.gov/dads/HTML/topdownRadixSort.html. For
example, if one treated each bit as a character, it would
make 32n decisions.

For those of us with longer memories, punch card sorters
used a bottom-up radix sort. Always remember to number the
lines in your program, so that you can sort it if you drop
it on the floor.

Patricia
 
P

Patricia Shanahan

Chris said:
Eric Sosman wrote:

Overkill.

And woefully inefficient on modern hardware, too. To
take advantage of SMP systems, CMT processors, even
hyperthreaded(tm) machines, you need some parallelism.
Here's the approach I'd recommend:

- If there are N numbers in the array, launch N/2
threads [...]


Very nice, but over-complicated. To get an O(1)
algorithm, simply select one of the elements of the list
at random. You don't know about any internal ordering of
the array (by definition of the problem, which otherwise
would be trivial), so you can, without loss of
generality, assume that the first element is "random"
enough. The algorithm, then, is: select the first
element; that is the max().

OK, the algorithm is heuristic rather than fully
deterministic, and (like most such) is not /guaranteed/
to produce the maximum element. Still in practice, it
should be adequate for many applications. And, of
course, the speed of the algorithm will tend to mitigate
any small losses in ultimate accuracy.

Note, also, that there are large classes of cases for
which the algorithm /does/ produce a guaranteed correct
result.

Interestingly, the algorithm has the property that it
equally capable of computing the min() with the /same
code/. This code-sharing makes it especially appealing
for use in resource-constrained devices.

-- chris

This is yet another demonstration of the superiority of the
simple linear search. As a generalization, instead of
examining all N elements, look only at the first k, for some
k<=N.

The probability of finding the maximum, assuming uniformly
distributed data, is k/N. If a 1/N probability is good
enough, it reduces to the algorithm Chris describes. With k
equal to N, it is certain to find the maximum. With
intermediate values of k, there is a trade off between
probabiity of finding the actual maximum and time spent
examining the list.

Patricia
 
G

Glen Pepicelli

You can bite the bullet and sort it-- then keep it sorted. this way you
can get the max int fast.

Alternatively, you must look at each element every time.

So it depends on when you want to spend the time to do whats needed.
 
G

Glen Pepicelli

Actually a third choice which is good is to just keep track of the max
int by itself. Every time someone inserts something into your array
check to see if it is bigger than the current max int.
 
C

Chris Uppal

Patricia said:
However, in this case we are sorting ints, which are known
to be numbers in a bounded range, so a radix sort is
possible and is O(n).

Ah, yes. I should have thought of that myself (too used to thinking of int-s
as a reasonably approximation to integers -- silly).

Come to think of it, a bucket sort (if that's the right name) could do it too,
although the space overheads (2**32 elements-worth of int[] arrays) would make
it a little impractical. (Unless the range if ints were known to be more
severely limited than to just the 32-bit range).

But neither algorithm sounds as if it's the one that John was talking about --
unless the "uniform distribution" precondition he mentioned was to allow most
of the steps of a radix sort to do some useful partitioning.

-- chris
 
G

googmeister

But neither algorithm sounds as if it's the one that John was talking
about --
unless the "uniform distribution" precondition he mentioned was to allow most
of the steps of a radix sort to do some useful partitioning.

Perhaps this was his intent - one level of MSD radix sorting,
then cuttoff to insertion sort: Partition the interval, say [0, 1],
into n equally spaced subintervals. Put the n elements in the
appropriate bucket. Insertion sort the elements in bucket i
and concatenate all of the results. Let n_i be the number of
elements in bucket i. The expected time to insertion sort all
of the buckets is O(n) since E[sum_i (n_i)^2] <= 2n.
 
T

tzvika.barenholz

It would be extremely inefficient to sort an array at the cost of
n*logn just for the max value.
 
S

steve

It would be extremely inefficient to sort an array at the cost of
n*logn just for the max value.


all this bullshit in the thread , with people trying to show how clever they
can be.

But no one has asked the Guy what he is doing with the list.
if he is going to refer to it just once, then the method he needs is going to
be completely different than if he is going to refer to it 10,000 times.

only when you know what he is doing with the list & how he is going to use
it, can a clear decision be made.


sorting may actually be the best method, or perhaps a single scan thru the
list, but who knows!!
 
C

Chris Smith

steve said:
all this bullshit in the thread , with people trying to show how clever they
can be.

But no one has asked the Guy what he is doing with the list.

Threads don't belong to the OP. The discussion on this thread is
interesting independently of the original question. Since the original
question was pretty obviously homework, it's probably already due and
the OP doesn't care any more.
only when you know what he is doing with the list & how he is going to use
it, can a clear decision be made.

Clearly, though, it makes sense to be aware of the alternatives.

--
www.designacourse.com
The Easiest Way To Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 
S

Sharp

Thanks for your consideration.
Finding the maximum integer from the array is only done once.
Threads don't belong to the OP. The discussion on this thread is
interesting independently of the original question. Since the original
question was pretty obviously homework, it's probably already due and
the OP doesn't care any more.

It's not homework.

-Sharp
 
B

Brooks Hagenow

steve said:
all this bullshit in the thread , with people trying to show how clever they
can be.

But no one has asked the Guy what he is doing with the list.
if he is going to refer to it just once, then the method he needs is going to
be completely different than if he is going to refer to it 10,000 times.

only when you know what he is doing with the list & how he is going to use
it, can a clear decision be made.


sorting may actually be the best method, or perhaps a single scan thru the
list, but who knows!!

I agree. The first question I had is how long is the list. If it is 20
integers then walk through it comparing each number. If we are talking
tens of thousands then start considering the sorting, multi-threading,
fancy methods.

If you start using parallel processing to get the max of 10 numbers, I
would call you crazy.
 
C

Chris Smith

Brooks Hagenow said:
I agree. The first question I had is how long is the list. If it is 20
integers then walk through it comparing each number. If we are talking
tens of thousands then start considering the sorting, multi-threading,
fancy methods.

If it's 20 integers, then from a performance standpoint it really
doesn't matter what you do. There are plenty of situations, though,
where sequentially scanning a large data structure is the right choice
*for* performance reasons. All other solutions add time to operations
like adding and deleting to/from the list, and if these operations are
*far* more common than getting the maximum, it may be quite justified to
avoid that cost.
If you start using parallel processing to get the max of 10 numbers, I
would call you crazy.

I think (hope, at least) that Eric's reply was a joke. Although
starting n/2 threads per Eric's suggestion is merely silly for a 10-
number list, it is utterly fatal for a list with 20,000 elements. There
is no case in which it's a good idea.

--
www.designacourse.com
The Easiest Way To Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 
P

Patricia Shanahan

Brooks said:
I agree. The first question I had is how long is the
list. If it is 20 integers then walk through it
comparing each number. If we are talking tens of
thousands then start considering the sorting,
multi-threading, fancy methods.

If you start using parallel processing to get the max of
10 numbers, I would call you crazy.

I think the parallel processing proposal, at least in the
extreme form described, was a joke. Even if I had a VERY
long array, and a multiprocessor, I would chop the linear
scan into one chunk per hardware thread (e.g. 8 chunks if
I were using four dual-threaded processors), spawn a thread
for each chunk, and when they finish linear scan their
results.

Just when would sorting beat linear scan?

If anything, greater length increases the advantage for
linear scan. For short arrays, the lg(n) term in the O(n
lg(n)) complexity of Arrays.sort(int []) is very small, and
could be dominated by differences in how much bookkeeping
each method does. I don't expect anything based on quicksort
to beat linear scan even under those conditions.

When you get up to tens of thousands of elements, the log
base two of the length is in the 13 to 16 range, and
Arrays.sort could lose even if did less, rather than more,
work per comparison than linear scan.

For very long arrays, cache misses become an issue and
linear scan has an additional advantage because it is, well,
linear. There will be at most one cache miss per line and
those misses are easy to predict.

Even a radix sort, which could have O(n) time for int data,
would have a much higher constant multiplier than linear
scan, and be slower in practice.

The only case I've been able to think of so far in which one
should not do linear scan is if operations such as min, max,
and scanning in size order occurred often enough that it was
worth maintaining a TreeSet.

In my opinion, finding the maximum element in an unordered
array or list is one of the rare and delightful cases in
which the simple, easy, direct algorithm is highly
efficient, so there is no messy trade-off between simplicity
and performance, no need to ask "How many elements?".

Patricia
 
E

Eric Sosman

Chris said:
I think (hope, at least) that Eric's reply was a joke. Although
starting n/2 threads per Eric's suggestion is merely silly for a 10-
number list, it is utterly fatal for a list with 20,000 elements. There
is no case in which it's a good idea.

Actually, it's n-1 threads. "Even more actually,"
because of the way I decided to handle odd n, it's
2**ceil(lg(n))-1 threads. Plus, of course, the thread
that does all this launching and reaping, plus whatever
additional threads the JVM decides to give me gratis ...

(Yes, it was intended as a joke; and no, I cannot
think of any case where it would be a good idea. I was
just trying to poke fun at the idea of dragging out the
heavy artillery like TreeSet and Arrays.sort() to solve
such a dead-simple problem, by concocting a even heavier
and even less appropriate artillery piece: a cannon to
commit canaricide.)

Knuth describes some parallelizations of sorting
methods in "The Art of Computer Programming, Volume III:
Sorting and Searching," section 5.3.4, but they don't
seem amenable to effective implementation in Java.
 
J

John C. Bollinger

Chris said:
I think (hope, at least) that Eric's reply was a joke. Although
starting n/2 threads per Eric's suggestion is merely silly for a 10-
number list, it is utterly fatal for a list with 20,000 elements. There
is no case in which it's a good idea.

I certainly assumed it was a joke. For that reason I chose to not point
out (there) that the sum of 1/2 + 1/4 + 1/8 + ... is 1, thus his
algorithm was still O(N), as opposed to O(logN) as he claimed. Even if
you inverted it so that you initially started just two threads, one for
each half of the array, and they each started two threads, etc., you
still end up with a number of comparisons governed by that same sum, and
hence having no better than linear complexity.

As Patricia said early on (paraphrased), each element has to be examined
at least once so linear asymptotic complexity is the best you can do.
That's for a 100% reliable algorithm, of course :).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,780
Messages
2,569,611
Members
45,282
Latest member
RoseannaBa

Latest Threads

Top