Parity Check

E

Ed

Hi,

I have a 36 bit bus for which I need to generate an even parity bit. The
only way I can think to do this, is to use a loop and check each bit
individually. Is that the best way to do it? Does anyone know any better
techniques for generating the parity bit?

Thanks for any help,
 
G

Georg Acher

Ed said:
Hi,

I have a 36 bit bus for which I need to generate an even parity bit. The

Smells like PCI...
only way I can think to do this, is to use a loop and check each bit
individually. Is that the best way to do it? Does anyone know any better
techniques for generating the parity bit?

The loop will work and synthesize, but depending on the tool it may get
quite slow in a chip ;-)

You can write a function which calculates the parity for 4 bits, and combine the
9 outputs of 9 functions calls in two more similar stages. The function itself
maps easily to the usual FPGA logic elements and the cascading gives a balanced
and short path for all bits.
 
D

David Bishop

Better to do it recursively

function xor_reduce (arg : std_logic_vector )
return std_logic is
variable Upper, Lower : std_logic;
variable Half : integer;
variable BUS_int : std_logic_vector ( arg'length - 1 downto 0 );
variable Result : std_logic;
begin
if (arg'LENGTH < 1) then -- In the case of a NULL range
Result := '0';
else
BUS_int := to_ux01 (arg);
if ( BUS_int'length = 1 ) then
Result := BUS_int ( BUS_int'left );
elsif ( BUS_int'length = 2 ) then
Result := BUS_int ( BUS_int'right ) xor BUS_int ( BUS_int'left);
else
Half := ( BUS_int'length + 1 ) / 2 + BUS_int'right;
Upper := xor_reduce ( BUS_int ( BUS_int'left downto Half ));
Lower := xor_reduce ( BUS_int ( Half - 1 downto BUS_int'right));
Result := Upper xor Lower;
end if;
end if;
return Result;
end;

This code is from the VHDL-200X-ft packages, and a form of it will
appear in the VHDL-2005 std_logic_1164 and numeric_std packages.
http://www.eda.org/vhdl-200x/vhdl-200x-ft/
 
K

Kai Harrekilde-Petersen

David Bishop said:
Better to do it recursively

[snip function]

I can highly recommend this. A for-loop will elaborate into a linear
chain of XOR gates in Synopsys DC, while the recursive function
elaborates directly to a balanced binary tree.

Simulation time, however, will suffer.

Regards,


Kai
 
E

Ed

Ed said:
Hi,

I have a 36 bit bus for which I need to generate an even parity bit. The
only way I can think to do this, is to use a loop and check each bit
individually. Is that the best way to do it? Does anyone know any better
techniques for generating the parity bit?

Thanks for any help,

Thanks for the replies.

Yes it is for PCI. Well spotted.

The following seems like a simple (and I guess quick) way of doing it (din
is the 36 bit bus and parity is the output):

int1 <= din(0) xor din(1);
int2 <= int1 xor din(2);
int3 <= int2 xor din(3);
int4 <= int3 xor din(4);
int5 <= int4 xor din(5);
int6 <= int5 xor din(6);
int7 <= int6 xor din(7);
int8 <= int7 xor din(8);
int9 <= int8 xor din(9);
int10 <= int9 xor din(10);
int11 <= int10 xor din(11);
int12 <= int11 xor din(12);
int13 <= int12 xor din(13);
int14 <= int13 xor din(14);
int15 <= int14 xor din(15);
int16 <= int15 xor din(16);
int17 <= int16 xor din(17);
int18 <= int17 xor din(18);
int19 <= int18 xor din(19);
int20 <= int19 xor din(20);
int21 <= int20 xor din(21);
int22 <= int21 xor din(22);
int23 <= int22 xor din(23);
int24 <= int23 xor din(24);
int25 <= int24 xor din(25);
int26 <= int25 xor din(26);
int27 <= int26 xor din(27);
int28 <= int27 xor din(28);
int29 <= int28 xor din(29);
int30 <= int29 xor din(30);
int31 <= int30 xor din(31);
int32 <= int31 xor din(32);
int33 <= int32 xor din(33);
int34 <= int33 xor din(34);
parity <= int34 xor din(35);

How does this compare to other ways of doing it? I can't see any real
disadvantage to doing it this way.

Thanks again.
 
N

Nicolas Matringe

Ed a écrit:
The following seems like a simple (and I guess quick) way of doing it (din
is the 36 bit bus and parity is the output):

int1 <= din(0) xor din(1);
int2 <= int1 xor din(2); [...]
int34 <= int33 xor din(34);
parity <= int34 xor din(35);

How does this compare to other ways of doing it? I can't see any real
disadvantage to doing it this way.

First, it's very long to type. You could have used a for... loop :eek:)
Second, this creates a cascade of gates, with a big delay. Delay from
din(0) to parity is 36 times longer than delay from din(35) to parity.
Suppose a gate delay of 1ns, you end up with 36ns which is not PCI-33
compliant.
Balanced binary trees (such as created by the recursive function given
earlier) have constant delays for every input.
 
E

Ed

Ed said:
Hi,

I have a 36 bit bus for which I need to generate an even parity bit. The
only way I can think to do this, is to use a loop and check each bit
individually. Is that the best way to do it? Does anyone know any better
techniques for generating the parity bit?

Thanks for any help,

I've tried using the recursive function, but I get the following error:

ERROR:Xst:831 - C:/XilinxTest/Test/parity_gen.vhd (Line 29). Recursion
detected in function 'xor_reduce'.

So I guess my software (ISE 4.2i) doesn't allow it.

I've synthesized the following code:

parity <= (din(35) xor din(34) xor din(33) xor din(32)) xor
(din(31) xor din(30) xor din(29) xor din(28)) xor
(din(27) xor din(26) xor din(25) xor din(24)) xor
(din(23) xor din(22) xor din(21) xor din(20)) xor
(din(19) xor din(18) xor din(17) xor din(16)) xor
(din(15) xor din(14) xor din(13) xor din(12)) xor
(din(11) xor din(10) xor din(9) xor din(8)) xor
(din(7) xor din(6) xor din(5) xor din(4)) xor
(din(3) xor din(2) xor din(1) xor din(0));

And the synthesis report shows:

Maximum combinational path delay: 10.483ns

Which I think is good enough for the 33MHz PCI bus.
 
I

ivailokroumov

Dear Ed,
Let me tell you my commentary about your question. While I read all posts
and answers of your questions I was thinking about your problem. I'll give
you some suggestions:
1. Because you use WebPack from Xilinx, you should go to the
http://www.xilinx.com/xlnx/xebiz/de...ondaryNavPick=Design+Tools&key=DS-ISE-WEBPACK
and need to download the lastest version ot the software.
2. If you wish to use cascade chacking of parity like this:
int1 <= din(0) xor din(1);
int2 <= int1 xor din(2);
int3 <= int2 xor din(3);
int4 <= int3 xor din(4);
int5 <= int4 xor din(5);
int6 <= int5 xor din(6);
int7 <= int6 xor din(7);
int8 <= int7 xor din(8);
int9 <= int8 xor din(9);
int10 <= int9 xor din(10);
int11 <= int10 xor din(11);
int12 <= int11 xor din(12);
int13 <= int12 xor din(13);
int14 <= int13 xor din(14);
int15 <= int14 xor din(15);
int16 <= int15 xor din(16);
int17 <= int16 xor din(17);
int18 <= int17 xor din(18);
int19 <= int18 xor din(19);
int20 <= int19 xor din(20);
int21 <= int20 xor din(21);
int22 <= int21 xor din(22);
int23 <= int22 xor din(23);
int24 <= int23 xor din(24);
int25 <= int24 xor din(25);
int26 <= int25 xor din(26);
int27 <= int26 xor din(27);
int28 <= int27 xor din(28);
int29 <= int28 xor din(29);
int30 <= int29 xor din(30);
int31 <= int30 xor din(31);
int32 <= int31 xor din(32);
int33 <= int32 xor din(33);
int34 <= int33 xor din(34);
parity <= int34 xor din(35);

you should know that if, you are going to put it otu of process, it will
have behaviour like paralel assignment, and believe me, you and your
compiler would be confused.

3. If you decide to use "for loop" your compiler will generate much
feedbacks, and the circuit will slowly.

4. If you decide to use recursion, the circuit will be faster, however you
shoud be 100% sure that every thing work properly.

I guess that the manner which will give you the best productiveness will
be recirsion ot the example which (e-mail address removed) (Georg Acher) gave
you.
Best Regards
Ivaylo Krumov
 
R

rickman

Ed said:
I've tried using the recursive function, but I get the following error:

ERROR:Xst:831 - C:/XilinxTest/Test/parity_gen.vhd (Line 29). Recursion
detected in function 'xor_reduce'.

So I guess my software (ISE 4.2i) doesn't allow it.

I've synthesized the following code:

parity <= (din(35) xor din(34) xor din(33) xor din(32)) xor
(din(31) xor din(30) xor din(29) xor din(28)) xor
(din(27) xor din(26) xor din(25) xor din(24)) xor
(din(23) xor din(22) xor din(21) xor din(20)) xor
(din(19) xor din(18) xor din(17) xor din(16)) xor
(din(15) xor din(14) xor din(13) xor din(12)) xor
(din(11) xor din(10) xor din(9) xor din(8)) xor
(din(7) xor din(6) xor din(5) xor din(4)) xor
(din(3) xor din(2) xor din(1) xor din(0));

And the synthesis report shows:

Maximum combinational path delay: 10.483ns

Which I think is good enough for the 33MHz PCI bus.

This looks ok to me. I belive that Georg was saying that you could
define a 4 input function that would make the code more readable and
produce the same result. I am not sure that your original code would
create anything different in the end since the logic is the same in
either case. Since the tools have the freedom to reorganize
combinatorial logic the structure of the code is not always reflected in
the resulting chip level implementation.

--

Rick "rickman" Collins

(e-mail address removed)
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design URL http://www.arius.com
4 King Ave 301-682-7772 Voice
Frederick, MD 21701-3110 301-682-7666 FAX
 
M

Marcus Harnisch

Kai Harrekilde-Petersen said:
I can highly recommend this. A for-loop will elaborate into a linear
chain of XOR gates in Synopsys DC, while the recursive function
elaborates directly to a balanced binary tree.

While I agree that recursive solutions are most elegant and *will* be
synthesized into a binary tree, I found that Synopsys DC is actually
pretty good at turning loops into trees all by itself. At least for
low-level functions such as AND, OR, XOR, etc.

In fact, a couple of months ago, I synthesized an iterative XOR-reduce
function and got a nice tree of gates in return. Can't remember if I
had VHDL-Presto (Synopsys' new VHDL compiler) enabled or not and
whether that made a difference.

Best regards,
Marcus
 
K

Kai Harrekilde-Petersen

Marcus Harnisch said:
While I agree that recursive solutions are most elegant and *will* be
synthesized into a binary tree, I found that Synopsys DC is actually
pretty good at turning loops into trees all by itself. At least for
low-level functions such as AND, OR, XOR, etc.

Note that I am talking about what the VHDL code _elaborates_ into, not
what you get after synthesis. The loop and the recursive XOR_reduce()
can result in the same logic after synthesis. Back when I looked at
it, the loop approach would take much more synthesis time and effort
for DC to turn it into a binary tree, than when DC got the tree served
on a silver plate.

But if DC can now elaborate from a loop to a binary tree, that just
good. And damn well about time, I might add.

How about muxes? Does "y <= a(sel);" elaborate to a mux these days or
do I still need to write a case or a for-loop with an exit statement?

Regards,


Kai
 
M

Marcus Harnisch

Kai,

Kai Harrekilde-Petersen said:
Note that I am talking about what the VHDL code _elaborates_ into, not
what you get after synthesis.

Sorry for misreading that. My bad.
But if DC can now elaborate from a loop to a binary tree, that just
good. And damn well about time, I might add.

I didn't say that and I am pretty sure it does not. And IMHO it is not
supposed to either. The elaboration result is a fairly generic
representation of the code. If you wrote a loop, you'll basically get
it. The `compile' command, which does optimization and technology
mapping, will figure out what it thinks is best suited to achieve the
goals (constraints).
How about muxes? Does "y <= a(sel);" elaborate to a mux these days or
do I still need to write a case or a for-loop with an exit statement?

Don't know and I don't have access to DC at the moment. Never noticed
anything wrong in that respect. What did you get instead? Around which
time frame did you find that to be an issue?

Just curious. Why do you care so much about elaboration results?

Best regards,
Marcus
 
K

Kai Harrekilde-Petersen

Marcus Harnisch said:
Sorry for misreading that. My bad.


I didn't say that and I am pretty sure it does not. And IMHO it is not
supposed to either.

Why not? it's a perfectly legal representation of the logic, it costs
no extra area/cells, and will in most cases be the fastest
implementation you can get. It's only when you DON'T have equal
input-delays that the binary tree (or ternary or whatever your std
cell library gives you) is not the optimal solution.
The elaboration result is a fairly generic
representation of the code. If you wrote a loop, you'll basically get
it. The `compile' command, which does optimization and technology
mapping, will figure out what it thinks is best suited to achieve the
goals (constraints).

Yeah, but DC is not extracting all the information from the VHDL that
it could. The MUX example below shows that clearly. (I might add
that VHDL also limits the amount of information that I can convey to
DC, but I'm not flogging that beast now).
Don't know and I don't have access to DC at the moment. Never noticed
anything wrong in that respect. What did you get instead? Around which
time frame did you find that to be an issue?

My original tests were done around 2000-2001. At the time, a straight

y <= a(sel);

would not turn into a nice MUX. I forget what DC did, but it churned
out a circuit that was bigger and slower than a MUX. And it would
take the longest time to synthesize.

A for loop like this:

for i in sel'range loop
if i = sel then
y <= a(i);
end if
end loop;

will result in a chain of 2:1 MUXes. Sure, DC can turn it into a real
N:1 mux, but it would take 2-3x the synthesis time of this loop:

for i in sel'range loop
if i = sel then
y <= a(i);
exit;
end if
end loop;

(note the addition of the exit statement). That elaborates to a
single N:1 MUX in GTECH, and will take up the minimum synthesis time
and have the best area/timing parameters.
Just curious. Why do you care so much about elaboration results?

Because what the code elaborates into will dictate, to a high degree,
what DC can get out of it in the long run. Also, if I can write code
that elaborates directly to what I want, DC will spend less time in
synthesis and timing optimization.

And when it takes about 24 hours to synthesize all RTL code for a
chip, with multiple DC licenses, it's worth paying attention to, IMHO.

For example, we have blocks that are notorious for spending 2-3 hours
in the elaboration phase, and a few more in synthesis. That limits
your turn-around time and effectiveness when trying out vaious options
during the synthesis or changing the VHDL code to get better (faster
or smaller logic).

Last week a colleague and I was discussing how to make the fastest
1-complement adder that would add 32 16-bit values (TCP and IP
checksum, it anyone wonders). Turns out that the fastest
implementation we could come up with includes 16 32-bit popcount()s.
The obvious implementation is a simple for-loop, but thats results in
poor timing and long synthesis times. So, out with a bunch of
Full-adders and make a tree out of that. It takes a bit of time and
will probably be fairly unreadable, but sometimes you just got to do
it.

I've never used AutoLogic-II myself, but colleagues that have used it
tell me that it was significantly better back then, than what DC have
even today.


Regards,


Kai
 
M

Marcus Harnisch

Hello Kai,

Kai Harrekilde-Petersen said:
it's a perfectly legal representation of the logic, it costs
no extra area/cells, and will in most cases be the fastest
implementation you can get.

I completely agree. But if I write a loop, it is my damn right to get
a loop ;-) If I later tell the tool that I want something more
optimized, fine. I guess we'll have to agree to disagree in that
respect.
Yeah, but DC is not extracting all the information from the VHDL that
it could. The MUX example below shows that clearly.
A for loop like this:

for i in sel'range loop
if i = sel then
y <= a(i);
end if
end loop;

will result in a chain of 2:1 MUXes. Sure, DC can turn it into a real
N:1 mux, but it would take 2-3x the synthesis time of this loop:

for i in sel'range loop
if i = sel then
y <= a(i);
exit;
end if
end loop;

(note the addition of the exit statement).

In the first case, it is more difficult to figure out that there will
be only one matching case. All values of `i' are unique and the `if'
is supposed to know that. Never mind that the conditional operator
would have to be taken into account as well, since otherwise there
might be multiple matches despite the unique `i'. In that case the
last one wins so that we'll get a single assignment again. Except if
there is another assignment involving perhaps a variable. Not
impossible, but not trivial either.

In the second case there will be at most one matching case --
ever. Simple.
Also, if I can write code that elaborates directly to what I want,
DC will spend less time in synthesis and timing optimization.

In a lot of cases, my time coming up with super-cool
pre-obfuscated^Woptimized code, explaining everyone how that stuff is
supposed to work, verifying that not a single error has crept in,
possibly better simulation performance, etc., is worth more than a few
hours of synthesis.
So, out with a bunch of Full-adders and make a tree out of that. It
takes a bit of time and will probably be fairly unreadable, but
sometimes you just got to do it.

Yes, but I would like to remind you that my original statement
mentioned basic functions:
[...] At least for low-level functions such as AND, OR, XOR, etc.

In cases as simple as these, the time spent on coming up and verifying
a recursive solution is really not well spent.

I'd be interested if anyone here has used the generic Wallace tree
components from Designware. It looks like they are internally used by
DW itself, but might possibly be used in your own blocks. I remember
seeing these in the DW documentation but I don't have DC access.

Best regards,
Marcus
 
K

Kai Harrekilde-Petersen

Marcus Harnisch said:
I completely agree. But if I write a loop, it is my damn right to get
a loop ;-) If I later tell the tool that I want something more
optimized, fine. I guess we'll have to agree to disagree in that
respect.

After thinking about it a bit more, I actually would prefer if DC
turned the loop into an N-input XOR gate (or whatever) at the GTECH
level. And only during the optimization decide how to implement that
N-input gate. I think that would satisfy both of us at the same time
:)

[snip example]
In the first case, it is more difficult to figure out that there will
be only one matching case. All values of `i' are unique and the `if'
is supposed to know that. Never mind that the conditional operator
would have to be taken into account as well, since otherwise there
might be multiple matches despite the unique `i'. In that case the
last one wins so that we'll get a single assignment again. Except if
there is another assignment involving perhaps a variable. Not
impossible, but not trivial either.

I consider the case to be trivial, but it might not be trivial to a
compiler.
In a lot of cases, my time coming up with super-cool
pre-obfuscated^Woptimized code, explaining everyone how that stuff is
supposed to work, verifying that not a single error has crept in,
possibly better simulation performance, etc., is worth more than a few
hours of synthesis.

It's not a clear-cut case, and sometimes it's worth optimizing the
code. Sometimes it's not. I just wish that DC was smarter, so I
could spend more time on the "big issues" rather than spending time to
obfuscate everyday code snippets in order for DC to understand it.
[...] At least for low-level functions such as AND, OR, XOR, etc.

In cases as simple as these, the time spent on coming up and verifying
a recursive solution is really not well spent.

If DC had a good VHDL frontend, yes. IMHO, it doesn't have a good
VHDL frontend.
I'd be interested if anyone here has used the generic Wallace tree
components from Designware. It looks like they are internally used by
DW itself, but might possibly be used in your own blocks. I remember
seeing these in the DW documentation but I don't have DC access.

Sorry, I don't have any experience with that.


Kai
 
T

Tom Verbeure

Up to now, I've always been using the xor_reduce function that's part
of std_logic_misc. I know that this is not an official standard, but
it's there so why no use it.

Anyway, after this discussion, I wanted to see what DC actually does. I
created a block that clocks a 512 input vector, does a xor-reduce and
outputs it to a FF. I created a clock of 1 GHz (to be sure I wouldn't
meet timing) and did a synthesis for 3 cases:
1. xor_reduce of std_logic_misc
2. a simple for loop
3. the xor_reduce as suggested above.
I used a simple 'compile', so no high effort for mapping or area.

The result for case 1 and case 2 came out identical:
area: 32142
negative slack: 1.08

case 3:
area: 32627
negative slack: 1.06

The results are more or less identical. Contrary to some claims made
above, DC is smart enough NOT to make a sequential string of XOR
elements.

I did the same experiment but with only 128 inputs. In this case, the
negative slack was identical, but the area for the cases 1 and 2 was
slightly higher instead of slightly lower.

Tom
 
K

Kai Harrekilde-Petersen

Tom Verbeure said:
Up to now, I've always been using the xor_reduce function that's part
of std_logic_misc. I know that this is not an official standard, but
it's there so why no use it.

Anyway, after this discussion, I wanted to see what DC actually does. I
created a block that clocks a 512 input vector, does a xor-reduce and
outputs it to a FF. I created a clock of 1 GHz (to be sure I wouldn't
meet timing) and did a synthesis for 3 cases:
1. xor_reduce of std_logic_misc
2. a simple for loop
3. the xor_reduce as suggested above.
I used a simple 'compile', so no high effort for mapping or area.

The result for case 1 and case 2 came out identical:
area: 32142
negative slack: 1.08

case 3:
area: 32627
negative slack: 1.06

The results are more or less identical. Contrary to some claims made
above, DC is smart enough NOT to make a sequential string of XOR
elements.

Either I didn't state myself clearly enough, or you missed my point:
I've always been saying that DC _can_ do the optimizations. My point
has been that DC takes extra time to do the optimization.

Try running the synthesis without a constraint, or a clock of 100nsec,
and resynthesize. Then go looking at the synthesis result. What I
have been seeing is that in case 2, DC creates a string of XOR gates,
and hence one of the bits has a really bad timing.

Regards,


Kai
 
W

Wallclimber

Either I didn't state myself clearly enough, or you missed my point:
I've always been saying that DC _can_ do the optimizations.

Ok. I must have misunderstood.
My point has been that DC takes extra time to do the optimization.

That may be true. Though probably not relevant enough to justify the
risk of writing error prone recursive code instead of a one-liner from
a fairly standard library.
Try running the synthesis without a constraint, or a clock of 100nsec,
and resynthesize. Then go looking at the synthesis result. What I
have been seeing is that in case 2, DC creates a string of XOR gates,
and hence one of the bits has a really bad timing.

Once it meets your specified timing constraints, how can it be 'really
bad timing'? A standard ripple adder has 'really bad timing' compared
to more exotic configurations, but you won't use those until there's a
good reason to do so.

Once my design meets the timing, I care about minimum area (which, in
this case, is equivalent)

Tom
 
Joined
Oct 6, 2009
Messages
1
Reaction score
0
Hi,

I have not read all the thread, so excuse me if my post is not precise.

I think I know a simple solution for checking parity:
Why not using the XOR reduction opeartor?

input [35:0] din;
wire parity_bit = ^din;

or, check here:
paritycheck.wordpress.com/2007/06/27/coding-even-parity-checking-logic/

Good luck,
Yassen
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,141
Latest member
BlissKeto
Top