Optimize a discovery algorithm

Discussion in 'C Programming' started by pozz, Jan 1, 2014.

  1. pozz

    pozz Guest

    I'm programming a small 8-bit microcontroller (AVR from Atmel). This is
    the single master in a 2-wires RS485 network where N slaves (AVR based)
    are connected. In RS485 networks, only one transmitter can be active at
    a time, otherwise the transmitted characters will arrive corrupted to
    the receivers. Moreover, the characters transmitted from the active
    transmitter are received by all receivers (all other nodes in the
    network that are listening if they aren't transmitting).

    I want to design and implement a discovery technique: the master should
    be able to understand how many slaves are connected and assign them a
    different address for further communication.
    I can use the 32-bits serial number stored in a non volatile memory in
    the slaves and master (each node on the bus has a different serial number).

    The discovery algorithm I was thinking of is based on the following
    consideration: if two or more slaves answer to a query originated by the
    master, two cases could happen.

    1. the answer messages are separated in time, so they arrive
    uncorrupted to the master;
    2. two or more answer messages overlap, so they could arrive
    corrupted to the master.

    The master is able to detect a corrupted message (through some
    mechanisms, such as CRC).

    The algorithm I'm implementing is based on binary search: the master
    sends a broadcast query with an interval of serial numbers. All the
    slaves with a serial number in the interval answers to the master. As
    soon as the master receives a valid answer, it assign a new address to
    the slave. When a slave receives a new address, it stops responding to
    discovery queries from the master.

    A test code for the algorithm is:

    uint32_t rx_sn;

    int
    autodetect(void)
    {
    uint32_t mask = 0;
    uint32_t value = 0;
    uint32_t vbit = 0;
    uint32_t mbit = 0;
    int result = 0;
    while (result == 0) {
    int r = query(mask, value);
    if (r == ANSWER_OK) {
    assign_new_address(rx_sn);
    mask = value = 0;
    vbit = 0;
    mbit = 0;
    continue;
    } else if (r == NO_ANSWER) {
    if (vbit == 0) {
    if (mask == 0) {
    result = 1; /* No more slaves to detect */
    } else {
    value |= mbit;
    vbit = 1;
    continue;
    }
    } else {
    result = -1; /* Error */
    }
    } else { /* r == GARBAGE */
    if (mask == UINT32_MAX) {
    result = -2; /* Error */
    } else {
    if (mbit) mbit <<= 1; else mbit = 1;
    mask |= mbit;
    vbit = 0;
    }
    }
    }
    return result;
    }

    The function query() simulates the transmitting on the bus of the
    message "Are there any slave with serial numbers in the range defined by
    this mask and this value?". The return value of query() could be
    ANSWER_OK (just one slave has responded with a correct answer),
    NO_ANSWER (no slave has responded at all), GARBAGE (two or more slave
    have responded so a corrupted answer is received).
    If the result is ANSWER_OK, the serial number of the slave that has
    responded is stored in global variable rx_sn.

    The function assign_new_address() simulates the transmitting on the bus
    of the message "The slave with this serial number will use this address
    for further communication. From now on, it won't answer anymore to
    discovery queries".

    The interval of serial numbers is defined by a 32-bits MASK and a
    32-bits VALUE: a serial number SN is in the range if

    (SN & MASK) == VALUE

    If garbage is received, the algorithm tries to reduce the interval
    fixing the value of the least significant bit, first to 0 and after to
    1. If more than one slave answer, the successive left bit is fixed.

    I think this algorithm works, but it is slow and not optimized. As you
    can note, when a new slave is discovered, it starts from the beginning.
    This approach will loose some time to sends the same query many times.

    Suppose to have three slaves on the network with serial numbers
    00000001, 00000003 and 00000007. The previous algorithm will produce 12
    queries:

    1. mask=00000000 value=00000000: GARBAGE
    2. mask=00000001 value=00000000: NO ANSWER
    3. mask=00000001 value=00000001: GARBAGE
    4. mask=00000003 value=00000001: ANSWER OK
    Assigning new address for SN 0x00000001
    5. mask=00000000 value=00000000: GARBAGE
    6. mask=00000001 value=00000000: NO ANSWER
    7. mask=00000001 value=00000001: GARBAGE
    8. mask=00000003 value=00000001: NO ANSWER
    9. mask=00000003 value=00000003: GARBAGE
    10. mask=00000007 value=00000003: ANSWER OK
    Assigning new address for SN 0x00000003
    11. mask=00000000 value=00000000: ANSWER OK
    Assigning new address for SN 0x00000007
    12. mask=00000000 value=00000000: NO ANSWER

    The query 2 has no answer, so it's a lost of time to send again during
    the algorithm (see query 6): it will never be answered.

    I think this could be solved avoiding to restart the algorithm from the
    beginning every time a new slave is found. I have the impression it
    could be optimized with recursion, but I can't use recursion on AVR
    small microcontroller (stack space is very small). I know every
    recursive algorithm can be written without recursion, so it could be my
    solution.

    Is someone suggest an optimization of my technique?

    Thank you.
    pozz, Jan 1, 2014
    #1
    1. Advertising

  2. pozz

    pozz Guest

    Il 02/01/2014 03:05, Robert Wessel ha scritto:
    > I've posted this before in comp.arch.embedded,


    Yes, I remember it. The thread you wrote in was originated always from
    me. I learned many things reading that thread in comp.arch.embedded,
    but now I've problem in implementing the algorithm in C.


    > but don't use a mask
    > and value, use a range instead:


    With mask and value you don't have a mathematical range (min..max), but
    always a group of serial numbers (serial numbers that have certain bits
    in certain positions). Both tecnique are identical, I think.

    I hope binary operations (AND) on 32-bits variables will be much more
    simple for 8-bit core than mathematical comparisons..


    > It's only necessary to be able to reasonably reliably detect that
    > someone has answered. The proper way to do the query is to specify a
    > range in which you'd like devices to respond. For example, let's say
    > you have a sixteen bit serial number (too short, but it keeps the
    > example short). The first "Are You There" query would specify
    > 0x0000-0xffff, so any device would respond. If the host got no
    > response, it's done. If it got a good response* (say 0x1234: "I'm
    > Here"), then it would record that and then re-query for the ranges
    > left and right of the response (IOW, 0x000-0x1233 and 0x1235-0xffff).
    > If it got a gibberish response (a collision), it would simply split
    > the range in two (IOW, a binary search) and re-query those subranges
    > (IOW, 0x0000-0x7fff and 0x8000-0xffff). The process repeats until all
    > queried subranges return nothing.


    It's very similar to my algorithm except for two things.
    You use a numerical min..max range and I use bitmask, but it is
    equivalent (see above).
    You don't repeat the entire process from the beginning, but you continue
    querying remaining subranges until the last one. This is the complex
    job I'm not able to make and it is my true question of my original post:
    how to implement the discovery algorithm without needing to restart from
    the beginning (and without using recursion). It's strictly a
    programming problem, related to C language.


    > The performance is usually only slightly worse than linear with the
    > number of devices to find, although you can construct pathological
    > cases where it's O(n*lg(m)), where m is the number of possible serial
    > numbers. So much longer serial numbers have only modest impact.
    >
    > Repeating the whole process several times can help compensate for less
    > than perfect collision detection.
    >
    > If you don't have a reasonable chance of detecting a collision, the
    > problem is much harder.
    >
    >
    > *Either because that was the only device in the subrange, or it
    > "overpowered" all the other devices in the subrange, and the host got
    > a good message from it.


    I agree with you. Indeed I have chosen to use your tecnique, but now I
    have to implement it :)
    pozz, Jan 2, 2014
    #2
    1. Advertising

  3. pozz

    pozz Guest

    Il 02/01/2014 02:51, Sam ha scritto:
    > On 01/01/2014 23:51, pozz wrote:
    >> I'm programming a small 8-bit microcontroller (AVR from Atmel). This is
    >> the single master in a 2-wires RS485 network where N slaves (AVR based)
    >> are connected. In RS485 networks, only one transmitter can be active at
    >> a time, otherwise the transmitted characters will arrive corrupted to
    >> the receivers. Moreover, the characters transmitted from the active
    >> transmitter are received by all receivers (all other nodes in the
    >> network that are listening if they aren't transmitting).
    >>
    >> I want to design and implement a discovery technique:

    >
    > FYI there was a thread in comp.arch.embedded in mid-December (e.g.
    > around 18th) called: "Re: RS485 bus and auto-addressing". You might
    > find some value in reading that one - even if that thread doesn't
    > completely match your intended algorithm (e.g. that was discussing
    > address allocation, whereas your slave devices seem to have pre-defined
    > addresses), some of the issues described there might be relevant.


    Yes, I know very well that thread, because it was originated by me.

    I learned many things by reading the answers to my post, and now I want
    to implement the technique described here, inspired from the thread in
    comp.arch.embedded.

    Now the problem is strictly related to programming: how to optimize the
    algorithm I have chosen, so reducing the number of queries on the bus
    that are time consuming, without using recursion.
    pozz, Jan 2, 2014
    #3
  4. pozz

    pozz Guest

    Il 02/01/2014 03:05, Robert Wessel ha scritto:
    > I've posted this before in comp.arch.embedded, but don't use a mask
    > and value, use a range instead:


    Just for your curiosity. I was in doubt between your technique
    (transformed with bitmask operations as I'm trying to do) and another
    approach.

    Because collision detection is problematic in RS485 bus, slaves answer
    with a break character to queries and not with a data message. If there
    are two or more slaves that answer, the break character is normally
    detected by the master as there was just one slave.

    This approach is more reliable for sure, but it costs much more queries
    to detect the exact serial number. For example, if I have just one
    slave on the bus, with your (and mine) approach a single query is
    sufficient. With break characters I need about 32 queries, until I
    remain with a one-element range.

    I think our approach is good if auto-detection is controlled by the
    user: a new slave is installed, the user push a button to start
    auto-detection and check the result, otherwise it could restart again
    the process.

    The second approach could be better in situations where the
    auto-detection isn't initiated by a user but automatically with a
    regular frequency, it isn't user controlled (the result isn't checked by
    a user) and the time needed to discover the new slave isn't critical.
    pozz, Jan 2, 2014
    #4
  5. On 02.01.2014 08:51, pozz wrote:

    > I hope binary operations (AND) on 32-bits variables will be much more
    > simple for 8-bit core than mathematical comparisons..


    Definitely not. I'd guess exactly the same amount of clock cycles per
    operation (four instructions and one conditional branch). You'll see it
    if you look at the assembly output via objdump.

    Regards,
    Johannes

    --
    >> Wo hattest Du das Beben nochmal GENAU vorhergesagt?

    > Zumindest nicht öffentlich!

    Ah, der neueste und bis heute genialste Streich unsere großen
    Kosmologen: Die Geheim-Vorhersage.
    - Karl Kaos über Rüdiger Thomas in dsa <hidbv3$om2$>
    Johannes Bauer, Jan 2, 2014
    #5
  6. pozz

    pozz Guest

    Il 02/01/2014 10:23, Johannes Bauer ha scritto:
    >> I hope binary operations (AND) on 32-bits variables will be much more
    >> simple for 8-bit core than mathematical comparisons..

    >
    > Definitely not. I'd guess exactly the same amount of clock cycles per
    > operation (four instructions and one conditional branch). You'll see it
    > if you look at the assembly output via objdump.


    You are right... maybe bitmask operations are even less efficient.

    Anyway the optimization problem is the same... or similar.
    pozz, Jan 2, 2014
    #6
  7. Johannes Bauer <> writes:
    > On 02.01.2014 08:51, pozz wrote:
    >> I hope binary operations (AND) on 32-bits variables will be much more
    >> simple for 8-bit core than mathematical comparisons..

    >
    > Definitely not. I'd guess exactly the same amount of clock cycles per
    > operation (four instructions and one conditional branch). You'll see it
    > if you look at the assembly output via objdump.


    I'm not so sure about that.

    If the CPU can only perform 8-bit operations (I have no idea whether
    that's the case or not), it can do a 32-bit AND, OR, or XOR in 4 8-bit
    operations. For arithmetic operations, it has to deal with carry bits
    for addition and subtraction, and more complex interactions for
    multiplication and division.

    Bitwise operations are, on a small scale, "embarassingly parallel".
    https://en.wikipedia.org/wiki/Embarassingly_parallel

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Working, but not speaking, for JetHead Development, Inc.
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Jan 2, 2014
    #7
  8. pozz

    Willem Guest

    Keith Thompson wrote:
    ) Johannes Bauer <> writes:
    )> On 02.01.2014 08:51, pozz wrote:
    )>> I hope binary operations (AND) on 32-bits variables will be much more
    )>> simple for 8-bit core than mathematical comparisons..
    )>
    )> Definitely not. I'd guess exactly the same amount of clock cycles per
    )> operation (four instructions and one conditional branch). You'll see it
    )> if you look at the assembly output via objdump.
    )
    ) I'm not so sure about that.
    )
    ) If the CPU can only perform 8-bit operations (I have no idea whether
    ) that's the case or not), it can do a 32-bit AND, OR, or XOR in 4 8-bit
    ) operations. For arithmetic operations, it has to deal with carry bits
    ) for addition and subtraction, and more complex interactions for
    ) multiplication and division.

    The addition and subtraction operations on 8-bit CPU's (or any CPU for that
    matter) already takes the carry bits into account, so 'it has to deal with
    carry bits' has been built into the hardware.


    SaSW, Willem
    --
    Disclaimer: I am in no way responsible for any of the statements
    made in the above text. For all I know I might be
    drugged or something..
    No I'm not paranoid. You all think I'm paranoid, don't you !
    #EOT
    Willem, Jan 2, 2014
    #8
  9. Keith Thompson <> wrote:
    > Johannes Bauer <> writes:
    >> On 02.01.2014 08:51, pozz wrote:
    >>> I hope binary operations (AND) on 32-bits variables will be much more
    >>> simple for 8-bit core than mathematical comparisons..


    >> Definitely not. I'd guess exactly the same amount of clock cycles
    >> per operation (four instructions and one conditional branch).
    >> You'll see it if you look at the assembly output via objdump.


    > I'm not so sure about that.


    > If the CPU can only perform 8-bit operations (I have no idea whether
    > that's the case or not), it can do a 32-bit AND, OR, or XOR in 4 8-bit
    > operations. For arithmetic operations, it has to deal with carry bits
    > for addition and subtraction, and more complex interactions for
    > multiplication and division.


    This is why some people like litte-endian systems. In this particular
    case, of you execute four successive operations, each using the carry
    generated by the previous (you also have a carry-in if needed)
    You only need to increment an address, never decrement.

    But past the 6502, I haven't been convinced. Note that the 6502
    doesn't push the address of the following instruction on the stack
    in the case of a subroutine call. (It hasn't computes the address
    yet.)

    But for anything past the 6502, and pretty much anything with
    multiply, the advantage pretty much goes away. Well, if you
    want the full description "Blaauw and Brooks" in their book
    on computer architecture, explain it.

    > Bitwise operations are, on a small scale, "embarassingly parallel".
    > https://en.wikipedia.org/wiki/Embarassingly_parallel


    If you are doing the operations sequentially, though, they aren't
    parallel anyway.

    -- glen
    glen herrmannsfeldt, Jan 2, 2014
    #9
  10. pozz

    pozz Guest

    Il 02/01/2014 17:38, Keith Thompson ha scritto:
    > Johannes Bauer <> writes:
    >> On 02.01.2014 08:51, pozz wrote:
    >>> I hope binary operations (AND) on 32-bits variables will be much more
    >>> simple for 8-bit core than mathematical comparisons..

    >>
    >> Definitely not. I'd guess exactly the same amount of clock cycles per
    >> operation (four instructions and one conditional branch). You'll see it
    >> if you look at the assembly output via objdump.

    >
    > I'm not so sure about that.
    >
    > If the CPU can only perform 8-bit operations (I have no idea whether
    > that's the case or not), it can do a 32-bit AND, OR, or XOR in 4 8-bit
    > operations. For arithmetic operations, it has to deal with carry bits
    > for addition and subtraction, and more complex interactions for
    > multiplication and division.
    >
    > Bitwise operations are, on a small scale, "embarassingly parallel".
    > https://en.wikipedia.org/wiki/Embarassingly_parallel


    Those are similar arguments I have to use bitwise operations. I tried
    the following instructions with avr-gcc compiler (it's a compiler for
    AVR8 core) and the result is the following.

    eecfg.x, eecfg.y and eecfg.z are defined as uint32_t.

    In the "bitwise method", the test is:

    if ((eecfg.x & eecfg.y) == eecfg.z) {
    13e: 40 91 da 00 lds r20, 0x00DA
    142: 50 91 db 00 lds r21, 0x00DB
    146: 60 91 dc 00 lds r22, 0x00DC
    14a: 70 91 dd 00 lds r23, 0x00DD
    14e: 80 91 d6 00 lds r24, 0x00D6
    152: 90 91 d7 00 lds r25, 0x00D7
    156: a0 91 d8 00 lds r26, 0x00D8
    15a: b0 91 d9 00 lds r27, 0x00D9
    15e: 48 23 and r20, r24
    160: 59 23 and r21, r25
    162: 6a 23 and r22, r26
    164: 7b 23 and r23, r27
    166: 80 91 de 00 lds r24, 0x00DE
    16a: 90 91 df 00 lds r25, 0x00DF
    16e: a0 91 e0 00 lds r26, 0x00E0
    172: b0 91 e1 00 lds r27, 0x00E1
    176: 48 17 cp r20, r24
    178: 59 07 cpc r21, r25
    17a: 6a 07 cpc r22, r26
    17c: 7b 07 cpc r23, r27
    17e: 09 f0 breq .+2 ; 0x182 <__stack+0x23>

    In the "mathematical method", the test is:

    if ((eecfg.x >= eecfg.y) && (eecfg.x <= eecfg.z)) {
    13e: 40 91 d6 00 lds r20, 0x00D6
    142: 50 91 d7 00 lds r21, 0x00D7
    146: 60 91 d8 00 lds r22, 0x00D8
    14a: 70 91 d9 00 lds r23, 0x00D9
    14e: 80 91 da 00 lds r24, 0x00DA
    152: 90 91 db 00 lds r25, 0x00DB
    156: a0 91 dc 00 lds r26, 0x00DC
    15a: b0 91 dd 00 lds r27, 0x00DD
    15e: 48 17 cp r20, r24
    160: 59 07 cpc r21, r25
    162: 6a 07 cpc r22, r26
    164: 7b 07 cpc r23, r27
    166: 08 f4 brcc .+2 ; 0x16a <__stack+0xb>
    168: a4 c0 rjmp .+328 ; 0x2b2 <__stack+0x153>
    16a: 80 91 de 00 lds r24, 0x00DE
    16e: 90 91 df 00 lds r25, 0x00DF
    172: a0 91 e0 00 lds r26, 0x00E0
    176: b0 91 e1 00 lds r27, 0x00E1
    17a: 84 17 cp r24, r20
    17c: 95 07 cpc r25, r21
    17e: a6 07 cpc r26, r22
    180: b7 07 cpc r27, r23
    182: 08 f4 brcc .+2 ; 0x186 <__stack+0x27>


    21 instructions for bitwise comparison versus 23 instructions for
    mathematical comparison. There isn't a good reason to choose one method
    for reducing instructions.
    pozz, Jan 2, 2014
    #10
  11. Willem <> writes:
    > Keith Thompson wrote:
    > ) Johannes Bauer <> writes:
    > )> On 02.01.2014 08:51, pozz wrote:
    > )>> I hope binary operations (AND) on 32-bits variables will be much more
    > )>> simple for 8-bit core than mathematical comparisons..
    > )>
    > )> Definitely not. I'd guess exactly the same amount of clock cycles per
    > )> operation (four instructions and one conditional branch). You'll see it
    > )> if you look at the assembly output via objdump.
    > )
    > ) I'm not so sure about that.
    > )
    > ) If the CPU can only perform 8-bit operations (I have no idea whether
    > ) that's the case or not), it can do a 32-bit AND, OR, or XOR in 4 8-bit
    > ) operations. For arithmetic operations, it has to deal with carry bits
    > ) for addition and subtraction, and more complex interactions for
    > ) multiplication and division.
    >
    > The addition and subtraction operations on 8-bit CPU's (or any CPU for that
    > matter) already takes the carry bits into account, so 'it has to deal with
    > carry bits' has been built into the hardware.


    Fair enough; multi-word addition and subtraction (where a "word" is the
    size that can be handled by a CPU instruction) are likely to be as fast
    as multi-word bitwise operations, assuming that single-word ADD and XOR,
    for example, take the same number of clock cycles.

    Multiplication and division are another story.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Working, but not speaking, for JetHead Development, Inc.
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Jan 2, 2014
    #11
  12. pozz

    pozz Guest

    Il 02/01/2014 23:05, Robert Wessel ha scritto:
    >> It's very similar to my algorithm except for two things.
    >> You use a numerical min..max range and I use bitmask, but it is
    >> equivalent (see above).
    >> You don't repeat the entire process from the beginning, but you continue
    >> querying remaining subranges until the last one. This is the complex
    >> job I'm not able to make and it is my true question of my original post:
    >> how to implement the discovery algorithm without needing to restart from
    >> the beginning (and without using recursion). It's strictly a
    >> programming problem, related to C language.

    >
    > The advantage of using ranges is that you can eliminate a found device
    > from a range fairly easily, and the continue checking on either side
    > of that found device.


    I think it's the same for bitwise ranges.


    >> I agree with you. Indeed I have chosen to use your tecnique, but now I
    >> have to implement it :)

    >
    > Well, in pseudocode:
    >
    > struct {first, last} stack[33];
    > // (stack levels need to match number of bits in serial #s + 1)
    > [...]


    But you are using a "custom stack", 33 elements wide! It's a lot of
    memory space lost (64 * 33 = 2kB!!). I can't statically allocate that
    space in my AVR8 microcontroller. It's better to shoot a discovered
    slave and restart the algorithm from the beginning: more queries so more
    time to complete, but much less RAM requirements.

    I think your code is very similar, in terms of memory requirements, to
    an analog recursive algorithm that is much shorter and simpler.


    > As I mentioned, doing this several times to increase reliability may
    > be a good idea if your collision detection is not 100% reliable.


    Yes, it is a possibility. Anyway the discovery is manually started by
    the user that can check the result. If a problem appears, the user can
    restart the detection.
    pozz, Jan 2, 2014
    #12
  13. pozz

    pozz Guest

    Il 02/01/2014 23:55, Robert Wessel ha scritto:
    >>
    >> But you are using a "custom stack", 33 elements wide! It's a lot of
    >> memory space lost (64 * 33 = 2kB!!). I can't statically allocate that
    >> space in my AVR8 microcontroller. It's better to shoot a discovered
    >> slave and restart the algorithm from the beginning: more queries so more
    >> time to complete, but much less RAM requirements.

    >
    >
    > No, it's a stack of 33 pairs of 32 bit words. IOW, 264 bytes.


    Yes, you're right. I was calculating bits and not bytes. Anyway it's a
    big space to allocate (I'm using ATmega32A that has 2kB of RAM).


    >> I think your code is very similar, in terms of memory requirements, to
    >> an analog recursive algorithm that is much shorter and simpler.

    >
    >
    > Well, it is the basic explicit stack version of the inherently
    > recursive routine. A "real" recursive routine would probably use
    > considerably more than 264 bytes of stack, though (because of the need
    > to save registers, local variables, and return addresses in addition
    > to the parameters). The real recursive implementation would be less
    > code, though.


    You're right again. If I had RAM space, I would use a "real" recursive
    routine: much more simple.


    > You could reduce stack usage by iteratively looking at smaller ranges
    > than normally are generated for each recursion (implicit or explicit).
    > For example, you might subdivide the incoming range into 256 sub
    > parts, and then issue queries for those 1/256th subranges until to get
    > a response (valid or invalid), and then recurse (again implicitly or
    > explicitly) on that range and the remaining area to the right. That
    > would reduce the maximum smaller ranged recursed on, limiting the
    > stack size (and if you used 256, you'd need no more than 5* stack
    > levels). Of course that would increase the number of probes by about
    > a factor of 256 as well. An intermediate approach (say 16 subranges),
    > would reduce stack usage to 9, at an 16 fold increase in work.
    >
    >
    > *It might be four (or 32, in the original), but I'm too lazy to figure
    > out if that's true with the simpler empty range discarding method I'm
    > using.


    RAM space requirements is important as well as number of probes on the
    bus: the probe could take about 100ms-200ms (the timeout of the answer).
    If the number of probes increase, the overall time increases as well.

    I think my original implementation could be a good compromise: I will
    send more probes on the bus (not too much more) than the recursive
    algorithm, but I don't need memory space for stack.
    pozz, Jan 3, 2014
    #13
  14. On Thu, 02 Jan 2014 16:55:22 -0600, Robert Wessel wrote:

    > On Thu, 02 Jan 2014 23:17:41 +0100, pozz <> wrote:
    >
    >>Il 02/01/2014 23:05, Robert Wessel ha scritto:
    >>>> It's very similar to my algorithm except for two things.
    >>>> You use a numerical min..max range and I use bitmask, but it is
    >>>> equivalent (see above).
    >>>> You don't repeat the entire process from the beginning, but you
    >>>> continue querying remaining subranges until the last one. This is
    >>>> the complex job I'm not able to make and it is my true question of my
    >>>> original post:
    >>>> how to implement the discovery algorithm without needing to restart
    >>>> from the beginning (and without using recursion). It's strictly a
    >>>> programming problem, related to C language.
    >>>
    >>> The advantage of using ranges is that you can eliminate a found device
    >>> from a range fairly easily, and the continue checking on either side
    >>> of that found device.


    [snip]

    >>>> I agree with you. Indeed I have chosen to use your tecnique, but now
    >>>> I have to implement it :)
    >>>
    >>> Well, in pseudocode:
    >>>
    >>> struct {first, last} stack[33];
    >>> // (stack levels need to match number of bits in serial #s + 1)
    >> > [...]

    >>
    >>But you are using a "custom stack", 33 elements wide! It's a lot of
    >>memory space lost (64 * 33 = 2kB!!). I can't statically allocate that
    >>space in my AVR8 microcontroller. It's better to shoot a discovered
    >>slave and restart the algorithm from the beginning: more queries so more
    >>time to complete, but much less RAM requirements.

    >
    > No, it's a stack of 33 pairs of 32 bit words. IOW, 264 bytes.
    >
    >>I think your code is very similar, in terms of memory requirements, to
    >>an analog recursive algorithm that is much shorter and simpler.

    >
    > Well, it is the basic explicit stack version of the inherently recursive
    > routine. A "real" recursive routine would probably use considerably
    > more than 264 bytes of stack, though (because of the need to save
    > registers, local variables, and return addresses in addition to the
    > parameters). The real recursive implementation would be less code,
    > though.


    [snip]

    Well... IMHO, this problem does not really have a good solution.
    Optimization (AKA, political decision making) is in order.

    Using a large amount of memory is mandatory to achieve the minimum
    possible packet traffic on the net^H^H^Hbus. Memory is needed to store
    the intermediate state of the probing algorithm. Discarding it (AKA using
    less memory) requires regenerating the state at a subsequent point in
    execution, AKA sending more packets.

    Below, I propose a solution (which has not been tested and may or may not
    contain fence-post errors) that uses the absolute minimum amount of
    memory and discards as little of the intermediate state as possible,
    thereby sending less packets than it would otherwise need to. It's also
    non-recursive (in a way) and short.

    void alloc_addr(void) {
    uint32_t top, bottom, reusable;

    #define DEFAULT_BOTTOM (~0) /* Assuming the compiler will fill the
    entire field with ones. */
    top = 0;
    bottom = DEFAULT_BOTTOM;

    do {
    reusable = do_querry_bus(top, bottom);
    switch (reusable) {
    case ANSWER_OK:
    /* Assuming do_assign_address() returns the point in the address
    space where the device was detected. */
    top = do_assign_address(top, bottom) +1;
    bottom = DEFAULT_BOTTOM;
    break;

    case ANSWER_NONE:
    /* Maybe it's in the other half. */
    reusable = (bottom - top) /2;

    top = bottom;
    bottom = bottom + reusable; /* May or may not contain
    fence-post errors. */
    break;

    case ANSWER_COLLISION:
    default:
    bottom = bottom - ((bottom - top) /2); /* May or may not contain
    fence-post errors. */
    };
    } while (bottom > top);
    #undef DEFAULT_BOTTOM
    }
    Aleksandar Kuktin, Jan 3, 2014
    #14
  15. On Fri, 03 Jan 2014 17:02:10 +0000, Aleksandar Kuktin wrote:

    > void alloc_addr(void)


    God, I'm an idiot. Or not. The unpatched code will not incur any
    additional cost when more than one slave is on the bus, but it /will/
    send an unneccessary packet if there is only one slave on the bus.

    @@ -10,9 +10,8 @@
    reusable = do_querry_bus(top, bottom);
    switch (reusable) {
    case ANSWER_OK:
    - /* Assuming do_assign_address() returns the point in the address
    - space where the device was detected. */
    - top = do_assign_address(top, bottom) +1;
    + do_assign_address(top, bottom);
    + top = bottom +1;
    bottom = DEFAULT_BOTTOM;
    break;
    Aleksandar Kuktin, Jan 3, 2014
    #15
  16. On Fri, 03 Jan 2014 17:12:31 +0000, Aleksandar Kuktin wrote:

    > On Fri, 03 Jan 2014 17:02:10 +0000, Aleksandar Kuktin wrote:
    >
    >> void alloc_addr(void)

    >
    > God, I'm an idiot. Or not. The unpatched code will not incur any
    > additional cost when more than one slave is on the bus, but it /will/
    > send an unneccessary packet if there is only one slave on the bus.


    And it also has a halting problem when either the bus is empty or all the
    slaves are very close to the beggining of the address space.
    Aleksandar Kuktin, Jan 3, 2014
    #16
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Chris

    dynamic discovery ???

    Chris, Nov 12, 2004, in forum: ASP .Net
    Replies:
    0
    Views:
    408
    Chris
    Nov 12, 2004
  2. Replies:
    45
    Views:
    1,065
    Seth Breidbart
    Feb 16, 2006
  3. Ahmed Moustafa
    Replies:
    0
    Views:
    775
    Ahmed Moustafa
    Nov 15, 2003
  4. Richard Cavell
    Replies:
    4
    Views:
    451
    Simon Slavin
    Feb 15, 2005
  5. Bapaiah Katepalli
    Replies:
    1
    Views:
    1,495
    Mike Treseler
    Jun 23, 2006
Loading...

Share This Page