[Algorithm] Sum of Primes < 1000000

  • Thread starter Luc The Perverse
  • Start date
S

Simon

Jean-Francois Briere said:
The sum should be 37550402023 and not 37551402022
(Your're adding 999999 by error).
Your code should read:

// move forward to next prime
prime++;
while (prime < numbers.length && (numbers[prime])) {
prime++;
}

Regards

Oops. Thanks for the correction. I think the "-1" was a relict from a version
where I tested numbes[prime+1] or something. Actually this loop is pretty weird.
It should be a do-loop instead.

Cheers,
Simon
 
P

Patricia Shanahan

Simon said:
I doubt that such micro-optimisations are of much use here.

This, I think, is the big message from this thread. Simply implementing
a very well-known algorithm changed the performance from somewhere
between 1 minute and 10 minutes to a fraction of a second. Tweaking the
code got a further factor of about 3.

It is very, very unlikely that any amount of micro-optimization applied
to the original algorithm could have got the performance of a simple,
direct sieve implementation.

Patricia
 
C

Chris Uppal

Patricia said:
If you use a boolean[], those million elements occupy a million bytes of
memory, because, at least in the implementations I've examined, each
boolean in an array gets its own byte.

BitSet uses a long[], and packs 64 bits into each element, so a million
elements only take 125,000 bytes of memory.

That may make a significant difference if at least one cache in the
system has a size in the range of tens to hundreds of megabytes.

I think you must have meant that last word to be kilobytes ?

-- chris
 
L

Lew

Patricia said:
This, I think, is the big message from this thread. Simply implementing
a very well-known algorithm changed the performance from somewhere
between 1 minute and 10 minutes to a fraction of a second. Tweaking the
code got a further factor of about 3.

It is very, very unlikely that any amount of micro-optimization applied
to the original algorithm could have got the performance of a simple,
direct sieve implementation.

Some years back I tried to make this point to my project manager on a contract
I was working. The optimization there was of a database structure, where the
difference for one report was between O(n^2) and O(n). I was forbidden to make
the algorithmic optimization, then got in trouble because the report was too
slow once they loaded all the data. It took months and lawyers to straighten
that one out.

I don't know why so many project managers seem to ignore this kind of
reasoning. Most are not as egregious, but the problem remains. Not only do we
as engineers need to know about the big O, but we have a fiduciary
responsibility to communicate such reasoning to our customers (including
employers).

- Lew
 
L

Luc The Perverse

Lew said:
Some years back I tried to make this point to my project manager on a
contract I was working. The optimization there was of a database
structure, where the difference for one report was between O(n^2) and
O(n). I was forbidden to make the algorithmic optimization, then got in
trouble because the report was too slow once they loaded all the data. It
took months and lawyers to straighten that one out.

ROFL - that is hillarious!

There is a sublime sense of irony when stupid ideas blow up in leaders
faces. It is usually clouded over by the problem itself though and/or them
yelling at you.
I don't know why so many project managers seem to ignore this kind of
reasoning. Most are not as egregious, but the problem remains. Not only do
we as engineers need to know about the big O, but we have a fiduciary
responsibility to communicate such reasoning to our customers (including
employers).

I had a unique method for dealing with this - I would generate a test case
which would cause unreasonably bad results and "demonstrate" it. Of
course, I would have the entire test case suite available, and a proposed
fix with a sample demonstration and presentation. Of course, this may not
be as practical for optimizations which take a great deal of time . . or
other environments but it worked for me.

It was my reasoning that if I could not come up with a test case which
caused undesirable results then I didn't need a workaround.
 
P

Philipp

Lew said:
Some years back I tried to make this point to my project manager on a
contract I was working. The optimization there was of a database
structure, where the difference for one report was between O(n^2) and
O(n). I was forbidden to make the algorithmic optimization, then got in
trouble because the report was too slow once they loaded all the data.
It took months and lawyers to straighten that one out.

I don't know why so many project managers seem to ignore this kind of
reasoning. Most are not as egregious, but the problem remains. Not only
do we as engineers need to know about the big O, but we have a fiduciary
responsibility to communicate such reasoning to our customers (including
employers).

Big O is just how the system scales, not the absolute running time. If
you have an amount of entries which will not vary by a factor of 10 in a
close future, it would be wise to check if the O(n^2) is not much faster
than the O(n) algorithm on these entries...
 
C

Chris Uppal

Lew said:
The optimization there was of a database
structure, where the difference for one report was between O(n^2) and
O(n). I was forbidden to make the algorithmic optimization, then got in
trouble because the report was too slow once they loaded all the data. It
took months and lawyers to straighten that one out.

I think part of the problem with this sort of situation is that we don't have a
commonly understood way of adding contingent sub-plans to projects. It seems
quite plausible to me that your manager might have correctly decided that the
risk and time involved in the optimisation was not acceptable at that time.
But ideally you and s/he should have been able to add a contingent extension to
the project plan, somewhat like (briefly)

Risk:
The XYZ report may run too slowly with production-scale datasets.
Reason:
O(N^2) algorithm used (unecessarily).
Responsibility for monitoring and reviewing risk:
<some person, group, or organisation> at <some time>.
Suggested fix:
<details>
Probable time:
<N weeks> + testing.
NB, retesting required /at system integration scope/.

The "Responsibility for monitoring and reviewing risk:" bit would be, to my
mind, the single most important bit of this in that the system couldn't be
deemed complete unless the issue had been resolved.

At least, /I/ have never seen anything like this used in project planning (not
even informally).

-- chris
 
L

Lew

Chris said:
I think part of the problem with this sort of situation is that we don't have a
commonly understood way of adding contingent sub-plans to projects. It seems
quite plausible to me that your manager might have correctly decided that the
risk and time involved in the optimisation was not acceptable at that time.

The fix involved half a day's work and was routine - no risk.
But ideally you and s/he should have been able to add a contingent extension to
the project plan, somewhat like (briefly)

Risk:
The XYZ report may run too slowly with production-scale datasets.

There was no doubt that it would, given the projected data sizes.
Reason:
O(N^2) algorithm used (unecessarily).
Responsibility for monitoring and reviewing risk:
<some person, group, or organisation> at <some time>.
Suggested fix:
<details>
Probable time:
<N weeks> + testing.
NB, retesting required /at system integration scope/.

No such reasoned approach was forthcoming from my project manager then. He
simply didn't want to hear it. The difference in that case was not weeks but a
few hours of development time to implement the fix.
The "Responsibility for monitoring and reviewing risk:" bit would be, to my
mind, the single most important bit of this in that the system couldn't be
deemed complete unless the issue had been resolved.

I had a fix ready to implement. I made sure my replacement knew of it. When he
in turn was finally permitted to make the change, it took half a day and
completely fixed the problem. That didn't happen until the project manager who
had forbidden the change was replaced. By the guy I'd trained with the fix.
At least, /I/ have never seen anything like this used in project planning (not
even informally).

Not sure of the antecedent. If you mean you've never seen someone irrationally
and arbitrarily refuse to let a product work when it would have been easy to
repair, lucky you. If you mean you've never seen anyone plan for such a fix,
using reasoned analysis and with fallback positions, unlucky you. I have seen
both ends of that spectrum.

- Lew
 
C

Chris Uppal

Lew wrote:

[me:]
The fix involved half a day's work and was routine - no risk.

/Nothing/ involves no risk. But yes, given your estimate, the manager's
position does sound a little, err, irrational...

Not sure of the antecedent. If you mean you've never seen someone
irrationally and arbitrarily refuse to let a product work when it would
have been easy to repair, lucky you.

Oddly, enough, I don't think I've ever been in the position where "management"
hasn't heeded my warnings on specific technical issues ("heeded" != "agreed
with after due consideration"). I /have/ been in the position of saying
something like "but this plan only covers the functionality we actually know
about now; there's at least that much again, /really important/ stuff, which we
haven't even started to think about" -- naturally I was roundly criticised for
"being negative"...
If you mean you've never seen anyone
plan for such a fix, using reasoned analysis and with fallback positions,
unlucky you. I have seen both ends of that spectrum.

That's what I meant. Of course I have seen plans changed to allow for problems
as they arise, but I haven't seen anyone using a planning idiom to postpone
addressing, but still properly manage, potential risks as they are identified.

-- chris
 
L

Luc The Perverse

Chris Uppal said:
Oddly, enough, I don't think I've ever been in the position where
"management"
hasn't heeded my warnings on specific technical issues ("heeded" !=
"agreed
with after due consideration"). I /have/ been in the position of saying
something like "but this plan only covers the functionality we actually
know
about now; there's at least that much again, /really important/ stuff,
which we
haven't even started to think about" -- naturally I was roundly criticised
for
"being negative"...

In general I would complain about my situation, a programmer being subjected
to menial work - but one benefit does come out of it.

It is very easy to quit your job on principle, if you are making less than
10$ (UD)/ hr
 
D

Daniel Pitts

Luc said:
In general I would complain about my situation, a programmer being subjected
to menial work - but one benefit does come out of it.

It is very easy to quit your job on principle, if you are making less than
10$ (UD)/ hr
Hmm, thats extremely low wage for a programmer.
Actually, thats about what a Starbucks Barista makes in San Fransisco.
You'd be better off working in a coffee shop. At least then you could
get free caffein!
 
L

Luc The Perverse

Daniel Pitts said:
Hmm, thats extremely low wage for a programmer.
Actually, thats about what a Starbucks Barista makes in San Fransisco.
You'd be better off working in a coffee shop. At least then you could
get free caffein!

They're called internships - what you take when you can't get real jobs . .
.. And then intermittent manual labor jobs when you can't find an internship.

Keep in mind that the cost of living is MUCH higher in San Francisco than
here in BFE.
 
D

Daniel Pitts

Luc said:
They're called internships - what you take when you can't get real jobs . .
. And then intermittent manual labor jobs when you can't find an internship.

Keep in mind that the cost of living is MUCH higher in San Francisco than
here in BFE.
That is a valid point. Ofcourse, you could get an internship in S.F.
for $20/hr :)
 
J

John Ersatznom

Simon said:
I'm not sure what you are saying.

Then you didn't read it and understand it. Please try again, or give up,
but there's really no need to attack.
 
L

Lew

John said:
Then you didn't read it and understand it. Please try again, or give up,
but there's really no need to attack.

I saw no attack in Simon's post. Did I miss something?

Simon raised some interesting mathematically-motivated points. I am in no
position yet to judge the relative merit of the arguments, but I was awaiting
with some interest your response to Simon, on the assumption that you were
going to continue the mathematical reasoning as you clarified the points about
which Simon asked.

I continue to hope for such information.

- Lew
 
P

Patricia Shanahan

Chris said:
Patricia said:
If you use a boolean[], those million elements occupy a million bytes of
memory, because, at least in the implementations I've examined, each
boolean in an array gets its own byte.

BitSet uses a long[], and packs 64 bits into each element, so a million
elements only take 125,000 bytes of memory.

That may make a significant difference if at least one cache in the
system has a size in the range of tens to hundreds of megabytes.

I think you must have meant that last word to be kilobytes ?

Thanks for the correction.

Patricia
 
P

Patricia Shanahan

Philipp said:
Big O is just how the system scales, not the absolute running time. If
you have an amount of entries which will not vary by a factor of 10 in a
close future, it would be wise to check if the O(n^2) is not much faster
than the O(n) algorithm on these entries...

I did that once, and created a lot of trouble. It was a system for
managing and applying patch cards when booting an operating system. I
was told that there would never be more than order tens of cards at a
time, before the OS would be recompiled with source code changes.

I did a very simple implementation that I knew was O(n^2), but that I
had tested for a couple of hundred cards, several times the number
required by the specification, and at that size it ran about as fast as
the card reader.

Over the years, the number of cards increased to about a thousand, and
it became a real bottleneck. I tried to explain the issue, and get
permission to rework the code to use e.g. a tree structure or a hash
table instead of a linear search...

A lot depends on your confidence that reimplementation will be permitted
if the problem outgrows the O(n^n) solution.

Patricia
 
L

Lew

Patricia said:
I did that once, and created a lot of trouble. It was a system for
managing and applying patch cards when booting an operating system. I
was told that there would never be more than order tens of cards at a
time, before the OS would be recompiled with source code changes.

I did a very simple implementation that I knew was O(n^2), but that I
had tested for a couple of hundred cards, several times the number
required by the specification, and at that size it ran about as fast as
the card reader.

Over the years, the number of cards increased to about a thousand, and
it became a real bottleneck. I tried to explain the issue, and get
permission to rework the code to use e.g. a tree structure or a hash
table instead of a linear search...

A lot depends on your confidence that reimplementation will be permitted
if the problem outgrows the O(n^n) solution.

Over the last thirty or so years, I have seen many projects where we were told
"we can always refactor it later", but later were told, "we are not going to
refactor given the investment in the existing codebase, besides it works,
doesn't it?"

As a practical matter it is almost impossible to get management buy-in to
throw away existing code no matter how terrible. One worst case scenario was a
nine-month project that had cost my employer nearly a million dollars to a
cosultant, and still didn't work - it gave wrong results and crashed the host.
(The consultant was known in our trenches to be a scammer, but we never could
convince management of that.) I stepped in, took two days and about 150 lines
of code (including blank lines and comments) and created a complete,
bulletproof solution. (Not actually a difficult project, you see - the
consultant was a scammer.) I nearly got in trouble, since I had completely
disregarded the nine months / million dollars worth of existing code that
didn't work. As it was, I was the focus of a lot of managerial concern. (They
always looked so sad when I told them, "Port Steve's code? I never even looked
at Steve's code; I just rewrote the application from the specification.") The
investors pulled out all their money from the company two weeks later
(possibly for unrelated reasons). Welcome to the future.

There has to be an awful lot of pain before there is buy-in for refactoring,
and often not even then.

- Lew
 
S

Simon

John said:
Then you didn't read it

Yes, I did.
and understand it.

No, I didn't. That's what I'm trying to say.
Please try again, or give up,
but there's really no need to attack.

I apologise for anything which you might have interpreted as an attack, which it
wasn't intended to be. I was trying to be precise as to which arguments of yours
I think are questionable. If you think I got you wrong, please explain. If you
think that my arguments are wrong, please try to be precise also, so I can clarify.

Cheers,
Simon
 
J

John Ersatznom

Lew said:
I saw no attack in Simon's post. Did I miss something?

Yes; the fact that he was accusing me of being bad at math.
Simon raised some interesting mathematically-motivated points.

Well, to start with, he basically accused Wikipedia of having the
integral of ln(x) wrong.
He also correctly stated that an average of a set exceeds 1/4 the size
of the set, but thereby was trying to imply that the optimization sucked
because something was therefore O(N) in the total number range instead
of O(log N). The set was of primes, and the size of that is O(log N), so
1/4 of that is O(log N). He actually ends up proving the weak result
that the size of the average prime less than N is *at least* O(log N)
(not ruling out O(N) or whatever); I had gotten the stronger result that
the size of the average prime *was* only O(log N).

The reasoning in a nutshell:

Algorithm to build sieve up to N as I stated it has three parts. Loop
until sqrt(N) (O(sqrt(N)). For each p reached that's not gotten marked
composite earlier, loop from p^2 until N step p. The number of these
loops is O(P), P the number of primes up to sqrt(N). The length of a
given one is O(N/p). Turns out P is O(log(sqrt(N)) = O(1/2 log(N)) =
O(log(N)), so this loop is O(log(N)*X) where the average length of a
loop is O(X). What is the average length of a loop?

It's O(N/p) of course, for the average choice of p. Primes are
logarithmically distributed (P is O(log(N)) above) so the average p is
going to be small fairly independently of the range. It's hard to be
sure it's O(1) though. The average is the sum divided by the total
number. The sum of the primes, the way they are logarithmically
distributed, could be approximated by the integral of the logarithm. (It
might actually be better now I think of it to integrate N/p -- N*(1/2 +
1/3 + 1/5...) looks like N times an exponential decay function, whose
integral is itself and which is clearly O(1))

I concluded that the average length of the loop is bounded by O(N), and
the whole algorithm by O(N log N). If it's actually N log log N or can
be made such, so much the better.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,904
Latest member
HealthyVisionsCBDPrice

Latest Threads

Top