# Optimize a discovery algorithm

Discussion in 'C Programming' started by pozz, Jan 1, 2014.

1. ### pozzGuest

I'm programming a small 8-bit microcontroller (AVR from Atmel). This is
the single master in a 2-wires RS485 network where N slaves (AVR based)
are connected. In RS485 networks, only one transmitter can be active at
a time, otherwise the transmitted characters will arrive corrupted to
the receivers. Moreover, the characters transmitted from the active
network that are listening if they aren't transmitting).

I want to design and implement a discovery technique: the master should
be able to understand how many slaves are connected and assign them a
I can use the 32-bits serial number stored in a non volatile memory in
the slaves and master (each node on the bus has a different serial number).

The discovery algorithm I was thinking of is based on the following
consideration: if two or more slaves answer to a query originated by the
master, two cases could happen.

1. the answer messages are separated in time, so they arrive
uncorrupted to the master;
2. two or more answer messages overlap, so they could arrive
corrupted to the master.

The master is able to detect a corrupted message (through some
mechanisms, such as CRC).

The algorithm I'm implementing is based on binary search: the master
sends a broadcast query with an interval of serial numbers. All the
slaves with a serial number in the interval answers to the master. As
the slave. When a slave receives a new address, it stops responding to
discovery queries from the master.

A test code for the algorithm is:

uint32_t rx_sn;

int
autodetect(void)
{
uint32_t value = 0;
uint32_t vbit = 0;
uint32_t mbit = 0;
int result = 0;
while (result == 0) {
vbit = 0;
mbit = 0;
continue;
} else if (r == NO_ANSWER) {
if (vbit == 0) {
result = 1; /* No more slaves to detect */
} else {
value |= mbit;
vbit = 1;
continue;
}
} else {
result = -1; /* Error */
}
} else { /* r == GARBAGE */
result = -2; /* Error */
} else {
if (mbit) mbit <<= 1; else mbit = 1;
vbit = 0;
}
}
}
return result;
}

The function query() simulates the transmitting on the bus of the
message "Are there any slave with serial numbers in the range defined by
this mask and this value?". The return value of query() could be
NO_ANSWER (no slave has responded at all), GARBAGE (two or more slave
If the result is ANSWER_OK, the serial number of the slave that has
responded is stored in global variable rx_sn.

The function assign_new_address() simulates the transmitting on the bus
of the message "The slave with this serial number will use this address
for further communication. From now on, it won't answer anymore to
discovery queries".

The interval of serial numbers is defined by a 32-bits MASK and a
32-bits VALUE: a serial number SN is in the range if

If garbage is received, the algorithm tries to reduce the interval
fixing the value of the least significant bit, first to 0 and after to
1. If more than one slave answer, the successive left bit is fixed.

I think this algorithm works, but it is slow and not optimized. As you
can note, when a new slave is discovered, it starts from the beginning.
This approach will loose some time to sends the same query many times.

Suppose to have three slaves on the network with serial numbers
00000001, 00000003 and 00000007. The previous algorithm will produce 12
queries:

Assigning new address for SN 0x00000001
Assigning new address for SN 0x00000003
Assigning new address for SN 0x00000007

The query 2 has no answer, so it's a lost of time to send again during
the algorithm (see query 6): it will never be answered.

I think this could be solved avoiding to restart the algorithm from the
beginning every time a new slave is found. I have the impression it
could be optimized with recursion, but I can't use recursion on AVR
small microcontroller (stack space is very small). I know every
recursive algorithm can be written without recursion, so it could be my
solution.

Is someone suggest an optimization of my technique?

Thank you.

pozz, Jan 1, 2014

2. ### pozzGuest

Il 02/01/2014 03:05, Robert Wessel ha scritto:
> I've posted this before in comp.arch.embedded,

Yes, I remember it. The thread you wrote in was originated always from
but now I've problem in implementing the algorithm in C.

> but don't use a mask
> and value, use a range instead:

With mask and value you don't have a mathematical range (min..max), but
always a group of serial numbers (serial numbers that have certain bits
in certain positions). Both tecnique are identical, I think.

I hope binary operations (AND) on 32-bits variables will be much more
simple for 8-bit core than mathematical comparisons..

> It's only necessary to be able to reasonably reliably detect that
> someone has answered. The proper way to do the query is to specify a
> range in which you'd like devices to respond. For example, let's say
> you have a sixteen bit serial number (too short, but it keeps the
> example short). The first "Are You There" query would specify
> 0x0000-0xffff, so any device would respond. If the host got no
> response, it's done. If it got a good response* (say 0x1234: "I'm
> Here"), then it would record that and then re-query for the ranges
> left and right of the response (IOW, 0x000-0x1233 and 0x1235-0xffff).
> If it got a gibberish response (a collision), it would simply split
> the range in two (IOW, a binary search) and re-query those subranges
> (IOW, 0x0000-0x7fff and 0x8000-0xffff). The process repeats until all
> queried subranges return nothing.

It's very similar to my algorithm except for two things.
You use a numerical min..max range and I use bitmask, but it is
equivalent (see above).
You don't repeat the entire process from the beginning, but you continue
querying remaining subranges until the last one. This is the complex
job I'm not able to make and it is my true question of my original post:
how to implement the discovery algorithm without needing to restart from
the beginning (and without using recursion). It's strictly a
programming problem, related to C language.

> The performance is usually only slightly worse than linear with the
> number of devices to find, although you can construct pathological
> cases where it's O(n*lg(m)), where m is the number of possible serial
> numbers. So much longer serial numbers have only modest impact.
>
> Repeating the whole process several times can help compensate for less
> than perfect collision detection.
>
> If you don't have a reasonable chance of detecting a collision, the
> problem is much harder.
>
>
> *Either because that was the only device in the subrange, or it
> "overpowered" all the other devices in the subrange, and the host got
> a good message from it.

I agree with you. Indeed I have chosen to use your tecnique, but now I
have to implement it

pozz, Jan 2, 2014

3. ### pozzGuest

Il 02/01/2014 02:51, Sam ha scritto:
> On 01/01/2014 23:51, pozz wrote:
>> I'm programming a small 8-bit microcontroller (AVR from Atmel). This is
>> the single master in a 2-wires RS485 network where N slaves (AVR based)
>> are connected. In RS485 networks, only one transmitter can be active at
>> a time, otherwise the transmitted characters will arrive corrupted to
>> the receivers. Moreover, the characters transmitted from the active
>> transmitter are received by all receivers (all other nodes in the
>> network that are listening if they aren't transmitting).
>>
>> I want to design and implement a discovery technique:

>
> FYI there was a thread in comp.arch.embedded in mid-December (e.g.
> around 18th) called: "Re: RS485 bus and auto-addressing". You might
> find some value in reading that one - even if that thread doesn't
> completely match your intended algorithm (e.g. that was discussing
> addresses), some of the issues described there might be relevant.

Yes, I know very well that thread, because it was originated by me.

I learned many things by reading the answers to my post, and now I want
to implement the technique described here, inspired from the thread in
comp.arch.embedded.

Now the problem is strictly related to programming: how to optimize the
algorithm I have chosen, so reducing the number of queries on the bus
that are time consuming, without using recursion.

pozz, Jan 2, 2014
4. ### pozzGuest

Il 02/01/2014 03:05, Robert Wessel ha scritto:
> I've posted this before in comp.arch.embedded, but don't use a mask
> and value, use a range instead:

(transformed with bitmask operations as I'm trying to do) and another
approach.

Because collision detection is problematic in RS485 bus, slaves answer
with a break character to queries and not with a data message. If there
are two or more slaves that answer, the break character is normally
detected by the master as there was just one slave.

This approach is more reliable for sure, but it costs much more queries
to detect the exact serial number. For example, if I have just one
slave on the bus, with your (and mine) approach a single query is
sufficient. With break characters I need about 32 queries, until I
remain with a one-element range.

I think our approach is good if auto-detection is controlled by the
user: a new slave is installed, the user push a button to start
auto-detection and check the result, otherwise it could restart again
the process.

The second approach could be better in situations where the
auto-detection isn't initiated by a user but automatically with a
regular frequency, it isn't user controlled (the result isn't checked by
a user) and the time needed to discover the new slave isn't critical.

pozz, Jan 2, 2014
5. ### Johannes BauerGuest

On 02.01.2014 08:51, pozz wrote:

> I hope binary operations (AND) on 32-bits variables will be much more
> simple for 8-bit core than mathematical comparisons..

Definitely not. I'd guess exactly the same amount of clock cycles per
operation (four instructions and one conditional branch). You'll see it
if you look at the assembly output via objdump.

Regards,
Johannes

--
>> Wo hattest Du das Beben nochmal GENAU vorhergesagt?

> Zumindest nicht öffentlich!

Ah, der neueste und bis heute genialste Streich unsere großen
Kosmologen: Die Geheim-Vorhersage.
- Karl Kaos über Rüdiger Thomas in dsa <hidbv3\$om2\$>

Johannes Bauer, Jan 2, 2014
6. ### pozzGuest

Il 02/01/2014 10:23, Johannes Bauer ha scritto:
>> I hope binary operations (AND) on 32-bits variables will be much more
>> simple for 8-bit core than mathematical comparisons..

>
> Definitely not. I'd guess exactly the same amount of clock cycles per
> operation (four instructions and one conditional branch). You'll see it
> if you look at the assembly output via objdump.

You are right... maybe bitmask operations are even less efficient.

Anyway the optimization problem is the same... or similar.

pozz, Jan 2, 2014
7. ### Keith ThompsonGuest

Johannes Bauer <> writes:
> On 02.01.2014 08:51, pozz wrote:
>> I hope binary operations (AND) on 32-bits variables will be much more
>> simple for 8-bit core than mathematical comparisons..

>
> Definitely not. I'd guess exactly the same amount of clock cycles per
> operation (four instructions and one conditional branch). You'll see it
> if you look at the assembly output via objdump.

I'm not so sure about that.

If the CPU can only perform 8-bit operations (I have no idea whether
that's the case or not), it can do a 32-bit AND, OR, or XOR in 4 8-bit
operations. For arithmetic operations, it has to deal with carry bits
for addition and subtraction, and more complex interactions for
multiplication and division.

Bitwise operations are, on a small scale, "embarassingly parallel".
https://en.wikipedia.org/wiki/Embarassingly_parallel

--
Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Keith Thompson, Jan 2, 2014
8. ### WillemGuest

Keith Thompson wrote:
) Johannes Bauer <> writes:
)> On 02.01.2014 08:51, pozz wrote:
)>> I hope binary operations (AND) on 32-bits variables will be much more
)>> simple for 8-bit core than mathematical comparisons..
)>
)> Definitely not. I'd guess exactly the same amount of clock cycles per
)> operation (four instructions and one conditional branch). You'll see it
)> if you look at the assembly output via objdump.
)
) I'm not so sure about that.
)
) If the CPU can only perform 8-bit operations (I have no idea whether
) that's the case or not), it can do a 32-bit AND, OR, or XOR in 4 8-bit
) operations. For arithmetic operations, it has to deal with carry bits
) for addition and subtraction, and more complex interactions for
) multiplication and division.

The addition and subtraction operations on 8-bit CPU's (or any CPU for that
matter) already takes the carry bits into account, so 'it has to deal with
carry bits' has been built into the hardware.

SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

Willem, Jan 2, 2014
9. ### glen herrmannsfeldtGuest

Keith Thompson <> wrote:
> Johannes Bauer <> writes:
>> On 02.01.2014 08:51, pozz wrote:
>>> I hope binary operations (AND) on 32-bits variables will be much more
>>> simple for 8-bit core than mathematical comparisons..

>> Definitely not. I'd guess exactly the same amount of clock cycles
>> per operation (four instructions and one conditional branch).
>> You'll see it if you look at the assembly output via objdump.

> I'm not so sure about that.

> If the CPU can only perform 8-bit operations (I have no idea whether
> that's the case or not), it can do a 32-bit AND, OR, or XOR in 4 8-bit
> operations. For arithmetic operations, it has to deal with carry bits
> for addition and subtraction, and more complex interactions for
> multiplication and division.

This is why some people like litte-endian systems. In this particular
case, of you execute four successive operations, each using the carry
generated by the previous (you also have a carry-in if needed)
You only need to increment an address, never decrement.

But past the 6502, I haven't been convinced. Note that the 6502
doesn't push the address of the following instruction on the stack
in the case of a subroutine call. (It hasn't computes the address
yet.)

But for anything past the 6502, and pretty much anything with
multiply, the advantage pretty much goes away. Well, if you
want the full description "Blaauw and Brooks" in their book
on computer architecture, explain it.

> Bitwise operations are, on a small scale, "embarassingly parallel".
> https://en.wikipedia.org/wiki/Embarassingly_parallel

If you are doing the operations sequentially, though, they aren't
parallel anyway.

-- glen

glen herrmannsfeldt, Jan 2, 2014
10. ### pozzGuest

Il 02/01/2014 17:38, Keith Thompson ha scritto:
> Johannes Bauer <> writes:
>> On 02.01.2014 08:51, pozz wrote:
>>> I hope binary operations (AND) on 32-bits variables will be much more
>>> simple for 8-bit core than mathematical comparisons..

>>
>> Definitely not. I'd guess exactly the same amount of clock cycles per
>> operation (four instructions and one conditional branch). You'll see it
>> if you look at the assembly output via objdump.

>
> I'm not so sure about that.
>
> If the CPU can only perform 8-bit operations (I have no idea whether
> that's the case or not), it can do a 32-bit AND, OR, or XOR in 4 8-bit
> operations. For arithmetic operations, it has to deal with carry bits
> for addition and subtraction, and more complex interactions for
> multiplication and division.
>
> Bitwise operations are, on a small scale, "embarassingly parallel".
> https://en.wikipedia.org/wiki/Embarassingly_parallel

Those are similar arguments I have to use bitwise operations. I tried
the following instructions with avr-gcc compiler (it's a compiler for
AVR8 core) and the result is the following.

eecfg.x, eecfg.y and eecfg.z are defined as uint32_t.

In the "bitwise method", the test is:

if ((eecfg.x & eecfg.y) == eecfg.z) {
13e: 40 91 da 00 lds r20, 0x00DA
142: 50 91 db 00 lds r21, 0x00DB
146: 60 91 dc 00 lds r22, 0x00DC
14a: 70 91 dd 00 lds r23, 0x00DD
14e: 80 91 d6 00 lds r24, 0x00D6
152: 90 91 d7 00 lds r25, 0x00D7
156: a0 91 d8 00 lds r26, 0x00D8
15a: b0 91 d9 00 lds r27, 0x00D9
15e: 48 23 and r20, r24
160: 59 23 and r21, r25
162: 6a 23 and r22, r26
164: 7b 23 and r23, r27
166: 80 91 de 00 lds r24, 0x00DE
16a: 90 91 df 00 lds r25, 0x00DF
16e: a0 91 e0 00 lds r26, 0x00E0
172: b0 91 e1 00 lds r27, 0x00E1
176: 48 17 cp r20, r24
178: 59 07 cpc r21, r25
17a: 6a 07 cpc r22, r26
17c: 7b 07 cpc r23, r27
17e: 09 f0 breq .+2 ; 0x182 <__stack+0x23>

In the "mathematical method", the test is:

if ((eecfg.x >= eecfg.y) && (eecfg.x <= eecfg.z)) {
13e: 40 91 d6 00 lds r20, 0x00D6
142: 50 91 d7 00 lds r21, 0x00D7
146: 60 91 d8 00 lds r22, 0x00D8
14a: 70 91 d9 00 lds r23, 0x00D9
14e: 80 91 da 00 lds r24, 0x00DA
152: 90 91 db 00 lds r25, 0x00DB
156: a0 91 dc 00 lds r26, 0x00DC
15a: b0 91 dd 00 lds r27, 0x00DD
15e: 48 17 cp r20, r24
160: 59 07 cpc r21, r25
162: 6a 07 cpc r22, r26
164: 7b 07 cpc r23, r27
166: 08 f4 brcc .+2 ; 0x16a <__stack+0xb>
168: a4 c0 rjmp .+328 ; 0x2b2 <__stack+0x153>
16a: 80 91 de 00 lds r24, 0x00DE
16e: 90 91 df 00 lds r25, 0x00DF
172: a0 91 e0 00 lds r26, 0x00E0
176: b0 91 e1 00 lds r27, 0x00E1
17a: 84 17 cp r24, r20
17c: 95 07 cpc r25, r21
17e: a6 07 cpc r26, r22
180: b7 07 cpc r27, r23
182: 08 f4 brcc .+2 ; 0x186 <__stack+0x27>

21 instructions for bitwise comparison versus 23 instructions for
mathematical comparison. There isn't a good reason to choose one method
for reducing instructions.

pozz, Jan 2, 2014
11. ### Keith ThompsonGuest

Willem <> writes:
> Keith Thompson wrote:
> ) Johannes Bauer <> writes:
> )> On 02.01.2014 08:51, pozz wrote:
> )>> I hope binary operations (AND) on 32-bits variables will be much more
> )>> simple for 8-bit core than mathematical comparisons..
> )>
> )> Definitely not. I'd guess exactly the same amount of clock cycles per
> )> operation (four instructions and one conditional branch). You'll see it
> )> if you look at the assembly output via objdump.
> )
> ) I'm not so sure about that.
> )
> ) If the CPU can only perform 8-bit operations (I have no idea whether
> ) that's the case or not), it can do a 32-bit AND, OR, or XOR in 4 8-bit
> ) operations. For arithmetic operations, it has to deal with carry bits
> ) for addition and subtraction, and more complex interactions for
> ) multiplication and division.
>
> The addition and subtraction operations on 8-bit CPU's (or any CPU for that
> matter) already takes the carry bits into account, so 'it has to deal with
> carry bits' has been built into the hardware.

Fair enough; multi-word addition and subtraction (where a "word" is the
size that can be handled by a CPU instruction) are likely to be as fast
as multi-word bitwise operations, assuming that single-word ADD and XOR,
for example, take the same number of clock cycles.

Multiplication and division are another story.

--
Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Keith Thompson, Jan 2, 2014
12. ### pozzGuest

Il 02/01/2014 23:05, Robert Wessel ha scritto:
>> It's very similar to my algorithm except for two things.
>> You use a numerical min..max range and I use bitmask, but it is
>> equivalent (see above).
>> You don't repeat the entire process from the beginning, but you continue
>> querying remaining subranges until the last one. This is the complex
>> job I'm not able to make and it is my true question of my original post:
>> how to implement the discovery algorithm without needing to restart from
>> the beginning (and without using recursion). It's strictly a
>> programming problem, related to C language.

>
> The advantage of using ranges is that you can eliminate a found device
> from a range fairly easily, and the continue checking on either side
> of that found device.

I think it's the same for bitwise ranges.

>> I agree with you. Indeed I have chosen to use your tecnique, but now I
>> have to implement it

>
> Well, in pseudocode:
>
> struct {first, last} stack[33];
> // (stack levels need to match number of bits in serial #s + 1)
> [...]

But you are using a "custom stack", 33 elements wide! It's a lot of
memory space lost (64 * 33 = 2kB!!). I can't statically allocate that
space in my AVR8 microcontroller. It's better to shoot a discovered
slave and restart the algorithm from the beginning: more queries so more
time to complete, but much less RAM requirements.

I think your code is very similar, in terms of memory requirements, to
an analog recursive algorithm that is much shorter and simpler.

> As I mentioned, doing this several times to increase reliability may
> be a good idea if your collision detection is not 100% reliable.

Yes, it is a possibility. Anyway the discovery is manually started by
the user that can check the result. If a problem appears, the user can
restart the detection.

pozz, Jan 2, 2014
13. ### pozzGuest

Il 02/01/2014 23:55, Robert Wessel ha scritto:
>>
>> But you are using a "custom stack", 33 elements wide! It's a lot of
>> memory space lost (64 * 33 = 2kB!!). I can't statically allocate that
>> space in my AVR8 microcontroller. It's better to shoot a discovered
>> slave and restart the algorithm from the beginning: more queries so more
>> time to complete, but much less RAM requirements.

>
>
> No, it's a stack of 33 pairs of 32 bit words. IOW, 264 bytes.

Yes, you're right. I was calculating bits and not bytes. Anyway it's a
big space to allocate (I'm using ATmega32A that has 2kB of RAM).

>> I think your code is very similar, in terms of memory requirements, to
>> an analog recursive algorithm that is much shorter and simpler.

>
>
> Well, it is the basic explicit stack version of the inherently
> recursive routine. A "real" recursive routine would probably use
> considerably more than 264 bytes of stack, though (because of the need
> to the parameters). The real recursive implementation would be less
> code, though.

You're right again. If I had RAM space, I would use a "real" recursive
routine: much more simple.

> You could reduce stack usage by iteratively looking at smaller ranges
> than normally are generated for each recursion (implicit or explicit).
> For example, you might subdivide the incoming range into 256 sub
> parts, and then issue queries for those 1/256th subranges until to get
> a response (valid or invalid), and then recurse (again implicitly or
> explicitly) on that range and the remaining area to the right. That
> would reduce the maximum smaller ranged recursed on, limiting the
> stack size (and if you used 256, you'd need no more than 5* stack
> levels). Of course that would increase the number of probes by about
> a factor of 256 as well. An intermediate approach (say 16 subranges),
> would reduce stack usage to 9, at an 16 fold increase in work.
>
>
> *It might be four (or 32, in the original), but I'm too lazy to figure
> out if that's true with the simpler empty range discarding method I'm
> using.

RAM space requirements is important as well as number of probes on the
bus: the probe could take about 100ms-200ms (the timeout of the answer).
If the number of probes increase, the overall time increases as well.

I think my original implementation could be a good compromise: I will
send more probes on the bus (not too much more) than the recursive
algorithm, but I don't need memory space for stack.

pozz, Jan 3, 2014
14. ### Aleksandar KuktinGuest

On Thu, 02 Jan 2014 16:55:22 -0600, Robert Wessel wrote:

> On Thu, 02 Jan 2014 23:17:41 +0100, pozz <> wrote:
>
>>Il 02/01/2014 23:05, Robert Wessel ha scritto:
>>>> It's very similar to my algorithm except for two things.
>>>> You use a numerical min..max range and I use bitmask, but it is
>>>> equivalent (see above).
>>>> You don't repeat the entire process from the beginning, but you
>>>> continue querying remaining subranges until the last one. This is
>>>> the complex job I'm not able to make and it is my true question of my
>>>> original post:
>>>> how to implement the discovery algorithm without needing to restart
>>>> from the beginning (and without using recursion). It's strictly a
>>>> programming problem, related to C language.
>>>
>>> The advantage of using ranges is that you can eliminate a found device
>>> from a range fairly easily, and the continue checking on either side
>>> of that found device.

[snip]

>>>> I agree with you. Indeed I have chosen to use your tecnique, but now
>>>> I have to implement it
>>>
>>> Well, in pseudocode:
>>>
>>> struct {first, last} stack[33];
>>> // (stack levels need to match number of bits in serial #s + 1)
>> > [...]

>>
>>But you are using a "custom stack", 33 elements wide! It's a lot of
>>memory space lost (64 * 33 = 2kB!!). I can't statically allocate that
>>space in my AVR8 microcontroller. It's better to shoot a discovered
>>slave and restart the algorithm from the beginning: more queries so more
>>time to complete, but much less RAM requirements.

>
> No, it's a stack of 33 pairs of 32 bit words. IOW, 264 bytes.
>
>>I think your code is very similar, in terms of memory requirements, to
>>an analog recursive algorithm that is much shorter and simpler.

>
> Well, it is the basic explicit stack version of the inherently recursive
> routine. A "real" recursive routine would probably use considerably
> more than 264 bytes of stack, though (because of the need to save
> parameters). The real recursive implementation would be less code,
> though.

[snip]

Well... IMHO, this problem does not really have a good solution.
Optimization (AKA, political decision making) is in order.

Using a large amount of memory is mandatory to achieve the minimum
possible packet traffic on the net^H^H^Hbus. Memory is needed to store
the intermediate state of the probing algorithm. Discarding it (AKA using
less memory) requires regenerating the state at a subsequent point in
execution, AKA sending more packets.

Below, I propose a solution (which has not been tested and may or may not
contain fence-post errors) that uses the absolute minimum amount of
memory and discards as little of the intermediate state as possible,
thereby sending less packets than it would otherwise need to. It's also
non-recursive (in a way) and short.

uint32_t top, bottom, reusable;

#define DEFAULT_BOTTOM (~0) /* Assuming the compiler will fill the
entire field with ones. */
top = 0;
bottom = DEFAULT_BOTTOM;

do {
reusable = do_querry_bus(top, bottom);
switch (reusable) {
space where the device was detected. */
bottom = DEFAULT_BOTTOM;
break;

/* Maybe it's in the other half. */
reusable = (bottom - top) /2;

top = bottom;
bottom = bottom + reusable; /* May or may not contain
fence-post errors. */
break;

default:
bottom = bottom - ((bottom - top) /2); /* May or may not contain
fence-post errors. */
};
} while (bottom > top);
#undef DEFAULT_BOTTOM
}

Aleksandar Kuktin, Jan 3, 2014
15. ### Aleksandar KuktinGuest

On Fri, 03 Jan 2014 17:02:10 +0000, Aleksandar Kuktin wrote:

God, I'm an idiot. Or not. The unpatched code will not incur any
additional cost when more than one slave is on the bus, but it /will/
send an unneccessary packet if there is only one slave on the bus.

@@ -10,9 +10,8 @@
reusable = do_querry_bus(top, bottom);
switch (reusable) {
- space where the device was detected. */
- top = do_assign_address(top, bottom) +1;
+ top = bottom +1;
bottom = DEFAULT_BOTTOM;
break;

Aleksandar Kuktin, Jan 3, 2014
16. ### Aleksandar KuktinGuest

On Fri, 03 Jan 2014 17:12:31 +0000, Aleksandar Kuktin wrote:

> On Fri, 03 Jan 2014 17:02:10 +0000, Aleksandar Kuktin wrote:
>

>
> God, I'm an idiot. Or not. The unpatched code will not incur any
> additional cost when more than one slave is on the bus, but it /will/
> send an unneccessary packet if there is only one slave on the bus.

And it also has a halting problem when either the bus is empty or all the
slaves are very close to the beggining of the address space.

Aleksandar Kuktin, Jan 3, 2014