Marcus Harnisch said:
Sorry for misreading that. My bad.
I didn't say that and I am pretty sure it does not. And IMHO it is not
supposed to either.
Why not? it's a perfectly legal representation of the logic, it costs
no extra area/cells, and will in most cases be the fastest
implementation you can get. It's only when you DON'T have equal
input-delays that the binary tree (or ternary or whatever your std
cell library gives you) is not the optimal solution.
The elaboration result is a fairly generic
representation of the code. If you wrote a loop, you'll basically get
it. The `compile' command, which does optimization and technology
mapping, will figure out what it thinks is best suited to achieve the
goals (constraints).
Yeah, but DC is not extracting all the information from the VHDL that
it could. The MUX example below shows that clearly. (I might add
that VHDL also limits the amount of information that I can convey to
DC, but I'm not flogging that beast now).
Don't know and I don't have access to DC at the moment. Never noticed
anything wrong in that respect. What did you get instead? Around which
time frame did you find that to be an issue?
My original tests were done around 2000-2001. At the time, a straight
y <= a(sel);
would not turn into a nice MUX. I forget what DC did, but it churned
out a circuit that was bigger and slower than a MUX. And it would
take the longest time to synthesize.
A for loop like this:
for i in sel'range loop
if i = sel then
y <= a(i);
end if
end loop;
will result in a chain of 2:1 MUXes. Sure, DC can turn it into a real
N:1 mux, but it would take 2-3x the synthesis time of this loop:
for i in sel'range loop
if i = sel then
y <= a(i);
exit;
end if
end loop;
(note the addition of the exit statement). That elaborates to a
single N:1 MUX in GTECH, and will take up the minimum synthesis time
and have the best area/timing parameters.
Just curious. Why do you care so much about elaboration results?
Because what the code elaborates into will dictate, to a high degree,
what DC can get out of it in the long run. Also, if I can write code
that elaborates directly to what I want, DC will spend less time in
synthesis and timing optimization.
And when it takes about 24 hours to synthesize all RTL code for a
chip, with multiple DC licenses, it's worth paying attention to, IMHO.
For example, we have blocks that are notorious for spending 2-3 hours
in the elaboration phase, and a few more in synthesis. That limits
your turn-around time and effectiveness when trying out vaious options
during the synthesis or changing the VHDL code to get better (faster
or smaller logic).
Last week a colleague and I was discussing how to make the fastest
1-complement adder that would add 32 16-bit values (TCP and IP
checksum, it anyone wonders). Turns out that the fastest
implementation we could come up with includes 16 32-bit popcount()s.
The obvious implementation is a simple for-loop, but thats results in
poor timing and long synthesis times. So, out with a bunch of
Full-adders and make a tree out of that. It takes a bit of time and
will probably be fairly unreadable, but sometimes you just got to do
it.
I've never used AutoLogic-II myself, but colleagues that have used it
tell me that it was significantly better back then, than what DC have
even today.
Regards,
Kai