Multiple inputs adder

MariuszK · Jul 17, 2006

Hello,

I am looking for effective vhdl multiple input adders.
Inputs count could be up to 1024 inputs!
Requirements:
- adder should be full parameterized (inputs count, adder words width,
etc.)
- adder should be optimized from viewpoint of time (frequency) and
resources
- adder should be pipelined

Currently, I want build next tree structure.
a0+a1+a2+a3+a4+a5+a6+a7+......a1023 =

a0+a1 a2+a3 a4+a5 a6 +a7
a01+a23 a45+a67
a0123+a4567
a01234567 + ...
.........................................................
a0123....1023

This architecture for 1024 input adder will be included
- 1023 two inputs adder
- log2(1024)=10 pipeline stages

Thank you for any answer.

Best Regards
Mariusz

Andy · Jul 17, 2006

It would help to know how you get the input data (all at once, in order
one operand per clock, random order one operand per clock, etc.)

If you can only get one input per clock anyway, a simple accumulator
(sum = sum + input) will be as high performance as you can get (the sum
pops out 1 clock after the last input).

The only time you'd need a binary tree type of implementation is if you
have all inputs available at once (doubtful for 1024 inputs).

Most synthesis tools are pretty good at turning a straight sum into a
tree anyway:

a+b+c+d => (a+b)+(c+d)

But there are certainly limits as to how far they can go (1024
inputs?).

Andy

MariuszK · Jul 17, 2006

Andy napisal(a):

It would help to know how you get the input data (all at once, in order
one operand per clock, random order one operand per clock, etc.)

If you can only get one input per clock anyway, a simple accumulator
(sum = sum + input) will be as high performance as you can get (the sum
pops out 1 clock after the last input).

The only time you'd need a binary tree type of implementation is if you
have all inputs available at once (doubtful for 1024 inputs).

Most synthesis tools are pretty good at turning a straight sum into a
tree anyway:

a+b+c+d => (a+b)+(c+d)

But there are certainly limits as to how far they can go (1024
inputs?).

Andy

I will have all input data per clock - all at once. I want calculate
this sum as fast as possible and with pipeline "possibility".

By the way. How can I force synthesis tools to change straight sum into
a tree calculation.
Is that conversion made automatically? = a+b+c+d => (a+b)+(c+d)
Additionally that "tree calculation" in my opinion should be made
in few clocks tact to calculate i.e. 1024 inputs.
Is pipeline architecture will made automatically by synthesis tools??

Mariusz

Jonathan Bromley · Jul 18, 2006

I will have all input data per clock - all at once. I want calculate
this sum as fast as possible and with pipeline "possibility".

By the way. How can I force synthesis tools to change straight sum into
a tree calculation.

Consider a recursive function or a recursive generate.

Basically, to calculate sum(array):
if array has one element ->
result = that element,
else ->
result = sum(left half of array) + sum(right half of array)

Some care is required to get everything exactly right if the
array size is not an exact power of 2. Extreme care is
required to get the bit widths of the sums right.

Is that conversion made automatically? = a+b+c+d => (a+b)+(c+d)

Yes, for small calculations; probably not, for huge ones.

Additionally that "tree calculation" in my opinion should be made
in few clocks tact to calculate i.e. 1024 inputs.

If you use a straightforward tree it will have 1023 adders over
10 levels. So it would be very easy to pipeline it over 10 clocks,
or perhaps over 5 clocks if you can squeeze two adds into
one clock cycle.

Depending on your target technology, other structures
such as a Wallace tree may be more efficient than a
simple tree of adders. It's harder to parameterise, though.

Is pipeline architecture will made automatically by synthesis tools??

Not usually, although the better tools can do some pipelining if
you give them appropriate hints. Note also that, unless you
have a behavioural synth tool, you must describe the delays that
will be introduced by the pipelining - not necessarily in the
right place (the tool will move the delay registers into the
pipeline where they will do the most good) but you must
get the right number of cycles of delay.
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
(e-mail address removed)
http://www.MYCOMPANY.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.

MariuszK · Jul 18, 2006

Hello,

Depending on your target technology, other structures
such as a Wallace tree may be more efficient than a
simple tree of adders. It's harder to parameterise, though.

My target platform is Xilinx.

Wallace tree structure could be used for multiple input adder???
I use Wallace tree for multiplication, only.
Could you send me any information (links, sample) how can I built
wallace tree structure for multiple input adder? What others structure
(except simple tree adder) could be used to build multiple input adder?

Thanks
Mariusz

Four Bit Adder Help For ALU	4	Nov 13, 2009
Use of index	0	Jul 12, 2008
geting error as unxpected symbol read: ". in array initialization	0	Mar 27, 2016
XML W3C Schema, check excistence of attribute depending on otherattributes	2	Aug 30, 2008
Question on List	2	Jun 27, 2008
what is often before a pattern ?	2	Jun 8, 2006
retriving escape unicode sequences from files ...	1	Aug 4, 2012
retriving escape unicode sequences from files ...	1	Aug 4, 2012

Multiple inputs adder

MariuszK

Andy

MariuszK

Jonathan Bromley

MariuszK

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads