find a number

H

Hendrik Maryns

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
NotDashEscaped: You need GnuPG to verify this message

(e-mail address removed) schreef:
Hi,

the solution without extra storage that I've got involve accessing each
and every element in the array (only once, as stated in the problem).

And how does it work?
Hower using extra-storage (like, say, 1000 bits) the search could
usually stop way before reaching the end.
Indeed.

Is your solution without extra-storage similar (in that it needs to
access every element)?

I have this little idea, it relies on the fact that the array is filled
with the numbers from 1 to 1000. Maybe this is enough of a tip?
It stops as soon as the double number is found. Which of course can be
the last one accessed, if one is unlucky.
Another hint: take care for numbers which are stored in the slot they
refer to.

H.
--
Hendrik Maryns

==================
www.lieverleven.be
http://aouw.org
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)

iD8DBQFEIt9Ve+7xMGD3itQRArC4AJ9/l8f/fRdHkj4AsyDlSjG+jDYzHgCfXbNn
DsK8M0AEb44XCKlSbKagh18=
=biCm
-----END PGP SIGNATURE-----
 
W

wolfgang zeidler

Roedy said:
The classic solution is to sort and scan for the dup, but of course
sorts require some aux storage.

A flat footed solution would be to scan for the first number, starting
at the second position then scan for the second starting at the third
etc.

Another even worse flat footed solution would be to scan for 0 then
when you find it set it to -1. Then scan for 1 and set it to -1. When
you are done scan for anything that is not -1.

Faster Solution:
set N to 500
test: scan all values for ...
number of values < N
number of values > N
depending of this test:
- result = N
- or new test with N = 250
- or new test with N = 750

Much Faster Solution:
use the *sum* of all values to ...
... o, i forgot it was the homework
of the original poster

regards, wz
 
A

alexandre_paterson

Dag Hendrik, hoe gaat het met uw ?

;)

Hendrik Maryns wrote:
....
And how does it work?

Quite fast, and I realize it's the same method that "wolfgang zeidler"
hinted in his "much faster solution".

But ssshhht, we're not supposed to solve homework here.
From your message I don't get if you're trying to help me or if you're
asking for help :)))

I'm not the OP and I've got a very fast solution... But it only works
if I access every element in the array once and exactly once.

If you've got a solution that works without extra storage and that
stops before using every single element in the array (in case it
finds the solution), you can mail it to me ;)

Tot ziens, Alex
 
M

Martin Gregorie

murali@pune said:
Hi,

I have an array of 1001 integers. The integers are in random order, but

I know each of the integers is between 1 and 1000 (inclusive). In
addition, each number appears only once in the array, except for one
number, which occurs twice. Assume that I can access each element of
the array only once.


Here I need an algorithm to find the repeated number. If I used
auxiliary storage in my algorithm, can you find an algorithm that does
not require it?
Yes. Easy-peasy.
 
C

Chris Uppal

If you've got a solution that works without extra storage and that
stops before using every single element in the array (in case it
finds the solution), you can mail it to me ;)

There is one simple solution which uses no auxiliary space (except for a fixed
number of local variables), and which doesn't have to scan the entire array.
Since it breaks the terms of th OP's problem, there's no reason not to describe
it exactly -- you just use the array itself to mark the numbers you've seen
before (you could negate the existing entry for instance). That doesn't
satisfy the OPs requirement, though, because you end up reading (on average)
each array element twice.

There is also a near-solution (which might be the one Hendrik has in mind) that
/almost/ solves the problem without using extra space (except local variables),
and which runs in (I currently believe) genuinely sub-linear time, BUT it reads
/exactly one/ array element twice...

I can't be bothered to write it up now, and anyway you may want to think about
it some more, so more later (unless Hendrik beats me to it ;-)

-- chris
 
R

Roedy Green

I know each of the integers is between 1 and 1000 (inclusive). In
addition, each number appears only once in the array, except for one
number, which occurs twice. Assume that I can access each element of
the array only once.

A solution you might use to such a problem in real life is to use a
java.util.BitSet which then uses as array of 16 longs to track if you
have seen a particular int before.

The artificial feature of the problem is you know there is exactly 1
dup. That would never happen in real life.

You chase a chain. Lets say the first number is 7. mark the 0th slot
as -1 and go to slot 7. Look at that number as your next slot, and
mark it -1 keep going until you hit a -1. You are still using one
extra piece of storage.
 
A

alexandre_paterson

Re Chris,

Chris said:
There is one simple solution which uses no auxiliary space (except for a fixed
number of local variables), and which doesn't have to scan the entire array.
Since it breaks the terms of th OP's problem, there's no reason not to describe
it exactly -- you just use the array itself to mark the numbers you've seen
before (you could negate the existing entry for instance). That doesn't
satisfy the OPs requirement, though, because you end up reading (on average)
each array element twice.

Definitely... By allowing to read the array more than once, the
array can easily be reused to store information (by negating
each instance, by adding a certain amount, etc.). However,
by not having to scan the entire array, as you're writing, you
wouldn't end up reading each element twice (on average)
but only 4/3 on average. I mean, you end reading twice
every element that you scan... Not those that you don't once
you've got the answer.
There is also a near-solution (which might be the one Hendrik has in mind) that
/almost/ solves the problem without using extra space (except local variables),
and which runs in (I currently believe) genuinely sub-linear time, BUT it reads
/exactly one/ array element twice...

Oh I'm curious about that one, even if it's only a "near-solution" ;)

Talk to you soon, Alex
 
H

Hendrik Maryns

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
NotDashEscaped: You need GnuPG to verify this message

(e-mail address removed) uitte de volgende tekst op 03/23/2006 07:25 PM:
Dag Hendrik, hoe gaat het met uw ?

Goed, but there's an excess w in there. No shame, even a lot of natives
make that error.
Hendrik Maryns wrote:
...

Quite fast, and I realize it's the same method that "wolfgang zeidler"
hinted in his "much faster solution".

But ssshhht, we're not supposed to solve homework here.

Ah, I'm sorry, I took you for the OP.
asking for help :)))

I'm not the OP and I've got a very fast solution... But it only works
if I access every element in the array once and exactly once.

If you've got a solution that works without extra storage and that
stops before using every single element in the array (in case it
finds the solution), you can mail it to me ;)

As Chris pointed out, it indeed accesses one slot twice. The trick is
to use the numbers themselves as pointers in the array. It is what
Roedy already described in his last post. I hope the OP has given up on
reading now, so I?ll post it here.

Insert this in the place where Thomas put // insert you code here:

int i = vi[0];
int tmp = 0;
while(vi != -1){
tmp = i;
i = vi;
vi[tmp] = -1;
}
return i;

You access the slot at the final position of i twice.

H.

--
Hendrik Maryns

==================
www.lieverleven.be
http://aouw.org
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)

iD8DBQFEI0UOe+7xMGD3itQRAiRjAJ9GvglDCfnBnFbxDiY/UiUQroPcJwCeKGR8
3Ukllv21rdau+Vvv38WBdqM=
=U6t2
-----END PGP SIGNATURE-----
 
T

Tom Leylan

Hi Roedy:

Keep in mind the goal isn't to determine if we've seen the number before :)
Say there are just 6 elements that contain numbers 1 thru 5 plus a dupe.
What do you get when you add them (one visit per element) and what would you
expect to get if there was no dupe?

As an alternative each element could be printed to paper and the list
searched manually.

I don't see how it can't be solved using "no other storage" so I'm guessing
a variable or two doesn't count.
 
C

Chris Uppal

Hendrik said:
int i = vi[0];
int tmp = 0;
while(vi != -1){
tmp = i;
i = vi;
vi[tmp] = -1;
}
return i;


Doesn't work, consider what happens if the array is:
[ 2 1 0 4 3 ]

It goes through the sequence:

i vi
0 [ 2 1 0 4 3 ]
2 [ -1 1 0 4 3 ]
0 [ -1 1 -1 4 3 ]

and so deduces that the repeated number is 2. Which is incorrect ;-)

You have to start with i = vi.length. See my other post for an attempted
explanation.

-- chris
 
C

Chris Uppal

Oh I'm curious about that one, even if it's only a "near-solution" ;)

OK. Here we go. But first I should warn you that I haven't /proved/ that this
works, and there is a certain amount of hand-waving in the explanation. I
/have/ verified that it works (by brute force iteration over all possible
inputs) for values of N up to 9.

The following explanation is rather complicated, I'd be fascinated if someone
can come up with a simper explanation.

First off, consider a related case. We have an array of size N, containing (in
some order) all the integers from 0 through N-1. Since every integer in the
array is also a valid array index, we can make "chains" through the array by
considering each array slot to "point" to another array slot. Now, since each
integer/index occurs exactly once, every slot must be pointed to by exactly one
other slot. That means that the chains are actually cycles -- a cycle is a
chain which ends up back at its starting point. The cycles do not overlap --
each slot is on exactly one chain. There may be anything from 1 to N cycles.
Draw them like this:
o O 0 o . o O
(which intended to be a picture of a number of different sized circles).

Next, consider the case where we have one extra integer. In this case "most"
of the array will still consist of non-overlapping cycles, but there will be
one slot which is on a cycle, /and/ is pointed to by another slot. That other
slot may, in turn, be pointed to by another, and so on. So that cycle will
have an extra "tail" -- a chain that hangs off it. Draw it like this:
o Q O o . o O
(where the Q is the cycle which has the tail).

The slot where the tail joins the Q is the one which is pointed to by two other
slots, and so its index is the repeated number. So, if we can find a point on
the tail, then we can follow the chain back to the cycle, and then around the
cycle, to find out where the tail joins on. We can use Hendrik's code to do
that.

So the only difficulty we still have is how to find a point on the tail.
That's where it gets a little bit magical. The /last/ slot in the array is
guaranteed to be at the end of the tail, because it's index is N, but all the
numbers in the array are < N, and so no other slot can point to it. So if we
change Hendrik's code slightly to start at the end of the array, we're done.

That manages to use O(1) extra space (two variables, and perhaps the odd stack
slot) and to find the dup with <=N+1 array reads. It's clearly necessary to
read one element twice (so this doesn't solve the original problem), but in
general it will require < N reads. Unfortunately my maths isn't good enough to
know how the time really varies with N -- I'd guess that the tail is on average
the same length as the average cycle, but I don't have a sense of how big the
average cycle is. I /conjecture/ that it's genuinely sub-linear (O(sqrt(N)) or
something, perhaps). If I get around to it I may do a few empirical tests.

-- chris
 
J

James Westby

Chris said:
OK. Here we go. But first I should warn you that I haven't /proved/ that this
works, and there is a certain amount of hand-waving in the explanation. I
/have/ verified that it works (by brute force iteration over all possible
inputs) for values of N up to 9.

The following explanation is rather complicated, I'd be fascinated if someone
can come up with a simper explanation.

First off, consider a related case. We have an array of size N, containing (in
some order) all the integers from 0 through N-1. Since every integer in the
array is also a valid array index, we can make "chains" through the array by
considering each array slot to "point" to another array slot. Now, since each
integer/index occurs exactly once, every slot must be pointed to by exactly one
other slot. That means that the chains are actually cycles -- a cycle is a
chain which ends up back at its starting point. The cycles do not overlap --
each slot is on exactly one chain. There may be anything from 1 to N cycles.
Draw them like this:
o O 0 o . o O
(which intended to be a picture of a number of different sized circles).

Next, consider the case where we have one extra integer. In this case "most"
of the array will still consist of non-overlapping cycles, but there will be
one slot which is on a cycle, /and/ is pointed to by another slot. That other
slot may, in turn, be pointed to by another, and so on. So that cycle will
have an extra "tail" -- a chain that hangs off it. Draw it like this:
o Q O o . o O
(where the Q is the cycle which has the tail).

The slot where the tail joins the Q is the one which is pointed to by two other
slots, and so its index is the repeated number. So, if we can find a point on
the tail, then we can follow the chain back to the cycle, and then around the
cycle, to find out where the tail joins on. We can use Hendrik's code to do
that.

So the only difficulty we still have is how to find a point on the tail.
That's where it gets a little bit magical. The /last/ slot in the array is
guaranteed to be at the end of the tail, because it's index is N, but all the
numbers in the array are < N, and so no other slot can point to it. So if we
change Hendrik's code slightly to start at the end of the array, we're done.

That manages to use O(1) extra space (two variables, and perhaps the odd stack
slot) and to find the dup with <=N+1 array reads. It's clearly necessary to
read one element twice (so this doesn't solve the original problem), but in
general it will require < N reads. Unfortunately my maths isn't good enough to
know how the time really varies with N -- I'd guess that the tail is on average
the same length as the average cycle, but I don't have a sense of how big the
average cycle is. I /conjecture/ that it's genuinely sub-linear (O(sqrt(N)) or
something, perhaps). If I get around to it I may do a few empirical tests.

-- chris

Nice argument. I have one point to make though.

If you read the problem again though you will see that the array is of
size 1001, and it contains numbers from 1 to 1000 inclusive, with 1
duplicate. So starting at the beginning is correct in the problem as stated.

However this doesn't affect the validity of your argument, it is just
that the first element is guaranteed to be on the tail in that formulation.

The bit about being composed of sub-cycles has triggered a memory in my
brain, but I can't grab hold of it. Some set theory lecture where we did
something similar I think.

This is similar to Pollard's rho method, with the random walk being
defined by the problem itself. That has expected time complexity
O(sqrt(N)), so your hunch could well be right. My notes say that that
proof uses an argument similar to the birthday paradox, but
unfortunately omits the details.

Again, nicely argued.


James
 
H

Hendrik Maryns

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
NotDashEscaped: You need GnuPG to verify this message

Chris Uppal schreef:
Hendrik said:
int i = vi[0];
int tmp = 0;
while(vi != -1){
tmp = i;
i = vi;
vi[tmp] = -1;
}
return i;


Doesn't work, consider what happens if the array is:
[ 2 1 0 4 3 ]


I assumed the array to be filled with numbers starting from 1. And
there is no double number in there?
It goes through the sequence:

i vi
0 [ 2 1 0 4 3 ]
2 [ -1 1 0 4 3 ]
0 [ -1 1 -1 4 3 ]

and so deduces that the repeated number is 2. Which is incorrect ;-)

You have to start with i = vi.length.

If the array contents start from 0, yes.
See my other post for an attempted
explanation.

You don't have to care about cycles, as the double number cannot be in
one of them, and 0 can't either (since it isn't in there). But that's
what you say in you other post.

You just misread the OP's exercise, so basically we're both right.

It is an interesting question what the complexity is. This will be the
average length of a cycle. Intuitively, I'd say this is n/2. I don't
expect something sublinear here.

(There are three characters too much in there, though: the
initialisation to 0 from tmp is not necessary, but I didn't have a Java
at hand when I wrote it, and wasn't sure)

H.
--
Hendrik Maryns

==================
www.lieverleven.be
http://aouw.org
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)

iD8DBQFEI/Vle+7xMGD3itQRAmpYAJ4lJ7XlyvOsGlY39Cbb/sI9caofaACfdWld
eMLXnCskazd6pvJ+hFqmY6Y=
=WW/I
-----END PGP SIGNATURE-----
 
T

tom fredriksen

Sorry I got a bit lost in your description, but I was thinking sort of
the same. Except I dont understand what you mean by tail and all of
that. So sorry if I am only repeating what you are saying.

My suggestion is to treat it as a recursion problem where you start of
as you do and for each entry you read the entry and then you recurse on
the number in the entry as the next index. And so you keep on going
until there are not more numbers left, at that point the recursion goes
backwards and you start writing numbers into their respective entries.
As soon as you meet an entry where the entry number and the index is
equal and you have a number to write you know you have found the second
number. Now you print the number found and let the program finish on its
own.

Is that what you where saying? I haven't programmed it so I cant verify
its correctness at the moment.

/tom
 
A

alexandre_paterson

Re,

Chris said:
The following explanation is rather complicated, I'd be fascinated if someone
can come up with a simper explanation.
(snipped)

Thanks a lot for taking the time to write that.

Sometimes the "simpler explanation" is a piece of code :)

We don't see that many homework questions here that end
up in such an active thread ;)

We can use Hendrik's code to do that.

Out of curiosity, I ran a few million random simulation with
Hendrik's code and, for an array of 1000 elements, and it
took on average 667 reads in the array.

See you soon, Alex
 
O

Oliver Wong

tom fredriksen said:
Sorry I got a bit lost in your description, but I was thinking sort of the
same. Except I dont understand what you mean by tail and all of that. So
sorry if I am only repeating what you are saying.

My suggestion is to treat it as a recursion problem where you start of as
you do and for each entry you read the entry and then you recurse on the
number in the entry as the next index. And so you keep on going until
there are not more numbers left, at that point the recursion goes
backwards and you start writing numbers into their respective entries. As
soon as you meet an entry where the entry number and the index is equal
and you have a number to write you know you have found the second number.
Now you print the number found and let the program finish on its own.

Is that what you where saying? I haven't programmed it so I cant verify
its correctness at the moment.

If I understand your algorithm correctly, you're using O(N) extra space,
because you're using 1 local variable per recusirve call, and in the worst
case, you make as many recursive calls as the size of the array.

- Oliver
 
H

Hendrik Maryns

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
NotDashEscaped: You need GnuPG to verify this message

tom fredriksen schreef:
Sorry I got a bit lost in your description, but I was thinking sort of
the same. Except I dont understand what you mean by tail and all of
that. So sorry if I am only repeating what you are saying.

My suggestion is to treat it as a recursion problem where you start of
as you do and for each entry you read the entry and then you recurse on
the number in the entry as the next index. And so you keep on going
until there are not more numbers left, at that point the recursion goes
backwards and you start writing numbers into their respective entries.
As soon as you meet an entry where the entry number and the index is
equal and you have a number to write you know you have found the second
number. Now you print the number found and let the program finish on its
own.

The recursion will never end, because at one point you will get into one
of the cycles Chris described. So you have to mark where you have been
*before* you do the next recursion call. Then as soon as you meet a
slot that has been marked before, you can break out of it immediately.
So basically it does the same as my algorithm, and I see no reason to
use recursion here.
Is that what you where saying?

More or less, if you take note of the above comments.

H.
--
Hendrik Maryns

==================
www.lieverleven.be
http://aouw.org
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)

iD8DBQFEJBGge+7xMGD3itQRAsamAJ4ijO7nDBX0sR0VyMob6FxZtrjAgwCaA9ro
v4JRRnoezOTPX9xtN2/q/Dw=
=IwxM
-----END PGP SIGNATURE-----
 
C

Chris Uppal

James said:
If you read the problem again though you will see that the array is of
size 1001, and it contains numbers from 1 to 1000 inclusive, with 1
duplicate. So starting at the beginning is correct in the problem as
stated.

Thanks for the correction. I used a language with 1-based array indexing to
play with the problem, and then translated everthing back into 0-based for the
post, including the number range -- and missed the fact that that changes the
"shape" of the problem.

This is similar to Pollard's rho method,

I hadn't heard of that before. An interesting algorithm -- almost magical ;-)

-- chris
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,432
Messages
2,571,681
Members
48,796
Latest member
Greg L.

Latest Threads

Top