Rotate by variable

P

Patrick Moore

Hi all,

sorry to drop in like this but I'm having a problem and thought I may be
able to gain some information from you all...

I'm trying to achieve a variable rotate as below. i.e. it takes in two
numbers, one 32 bits long, the other 5 bits long, and outputs the 32 bit
number, rotated left by 5 bits.

Now, this will compile fine (i.e. it's syntactically correct) but can't be
synthesised in Synplify Pro.

Does anyone have any suggestions, or code snippets that would be able to
make this synthesisable for a Virtex II (Pro).

tia,

patrick.



entity ro_lft is
port(
quantity : in STD_LOGIC_VECTOR(31 downto 0);
amount : in STD_LOGIC_VECTOR(4 downto 0);
clk : in STD_LOGIC;
output : out STD_LOGIC_VECTOR(31 downto 0)
);
end ro_lft;

--}} End of automatically maintained section

architecture ro_lft of ro_lft is

signal rotated : STD_LOGIC_VECTOR(31 downto 0);
--signal result: integer;
--signal a_std_vec : std_logic_vector (31 downto 0);
signal rotate_by : std_logic_vector(4 downto 0);

begin

-- enter your statements here --

rotate_by <= amount;

process (clk)
begin
if clk = '1' and clk'event then

rotated <= std_logic_vector(unsigned(quantity) rol
to_integer(signed(rotate_by)));

end if;
end process;

output <= rotated;

end ro_lft;
 
E

Egbert Molenkamp

In your case the shift operator (ROL) can shift data to the left and to the
right (depends on sign of 'rotate_by').
As an alternative you could try the functions that are part of the
numeric_package:
shift_right en shift_left

I changed your code a little bit (only shift to the right). If your tool
supports this you can extend it to make a shift to the left and right.



library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity ro_lft is
port(
quantity : in unsigned(31 downto 0);
amount : in unsigned(4 downto 0);
clk : in STD_LOGIC;
output : out STD_LOGIC_VECTOR(31 downto 0)
);
end ro_lft;

--}} End of automatically maintained section

architecture ro_lft of ro_lft is

signal rotated : STD_LOGIC_VECTOR(31 downto 0);
signal rotate_by : unsigned(4 downto 0);

begin

-- enter your statements here --

rotate_by <= amount;

process (clk)
begin
if clk = '1' and clk'event then
rotated <=
std_logic_vector(shift_right(quantity,to_integer(unsigned(rotate_by))));
end if;
end process;

output <= rotated;

end ro_lft;


Egbert Molenkamp
 
J

Jonathan Bromley

I'm trying to achieve a variable rotate as below. i.e. it takes in two
numbers, one 32 bits long, the other 5 bits long, and outputs the 32 bit
number, rotated left by 5 bits.

Now, this will compile fine (i.e. it's syntactically correct) but can't be
synthesised in Synplify Pro.

Can we ALL please bang VERY LOUDLY on Synplicity's door until they
finally support variable-length shifts?
Does anyone have any suggestions, or code snippets that would be able to
make this synthesisable for a Virtex II (Pro).

Two suggestions. Both work, but you need to try them out in your
synthesis tool - we've found that this barrel shift construct is
one which shows up some very big differences among tools.
You may need to tweak the details to suit your precise definition
of a barrel shift - mine rotates to the right.

For the sake of argument, let's suppose we have the following
entity:

entity barrel_shift is
port (
A : in std_logic_vector(31 downto 0); -- input word
S : in std_logic_vector(4 downto 0); -- rotate count
F : out std_logic_vector(31 downto 0) -- rotated output
);
end;

(1) Consider a funnel shifter. Stick two copies of the original
word together, then pick bits off it.

architecture funnel of barrel_shift is
begin
process (A, S)
variable double: std_logic_vector(63 downto 0);
begin
double := A & A; -- two copies of the input word
for i in 0 to 31 loop
F(i) <= double(to_integer(unsigned(S)) + i);
end loop;
end process;
end;


This depends on the tool correctly optimising the subscript
calculation by unrolling the loop.

(2) Use variable-length slices:

architecture slice of barrel_shift is
begin
F <= A(to_integer(unsigned(S)) - 1 downto 0) &
A(31 downto to_integer(unsigned(S)));
end;

I have seen this work correctly in some tools.

HTH
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * Perl * Tcl/Tk * Verification * Project Services

Doulos Ltd. Church Hatch, 22 Market Place, Ringwood, Hampshire, BH24 1AW, UK
Tel: +44 (0)1425 471223 mail: (e-mail address removed)
Fax: +44 (0)1425 471573 Web: http://www.doulos.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.
 
P

Patrick Moore

architecture funnel of barrel_shift is
begin
process (A, S)
variable double: std_logic_vector(63 downto 0);
begin
double := A & A; -- two copies of the input word
for i in 0 to 31 loop
F(i) <= double(to_integer(unsigned(S)) + i);
end loop;
end process;
end;

This one works, and even Synplify likes it, which is saying something.

What's the best way to reverse the rotate on this? (make it rotate left)

I guess doing a

temp := 32-S

and

F(i)<= double(to_integer(unsigned(temp)) + i);

Would do it, but I'm unsure of the fine tuning? :S (or if there's a
better way around it...)

As you can tell, I'm not very experienced in this whole thing, but
hopefully over time ...
This depends on the tool correctly optimising the subscript
calculation by unrolling the loop.

(2) Use variable-length slices:

architecture slice of barrel_shift is
begin
F <= A(to_integer(unsigned(S)) - 1 downto 0) &
A(31 downto to_integer(unsigned(S)));
end;

I have seen this work correctly in some tools.

Unfortunately, Synplify Pro 7.2 doesn't like that one.. ;/

It did, thanks. :)

atb,

Patrick
 
J

Jonathan Bromley

This one works, and even Synplify likes it

It's the version that I have found to be most portable
across synthesis tools. You may possibly get a harmless
warning about one of the bits of "double" being unused.
What's the best way to reverse the rotate on this?
(make it rotate left)

Reverse the order of bit numbering in all the vector
declarations - ports and internal signals A, F, double:
std_logic_vector(0 to 31) etc. Other code unchanged.
Unconventional but effective.
I guess doing a

temp := 32-S

and

F(i)<= double(to_integer(unsigned(temp)) + i);

Would do it, but I'm unsure of the fine tuning

That, or something very close, sounds OK. But I would get
a bit jumpy about doing any unnecessary arithmetic on the
shift value; it would be OK in simulation but you really,
really don't want any adders in the select path. Hence
my preference for getting the left shift by re-numbering bits.
It's probably fine either way in most tools, but I've been
bitten with such things before now.
As you can tell, I'm not very experienced in this whole thing, but
hopefully over time ...

It's the usual engineering thing: there's no substitute for
lots of experience and plenty of paranoia.
(2) Use variable-length slices:
[...]
Unfortunately, Synplify Pro 7.2 doesn't like that one.. ;/

Don't say I didn't warn you :) In fairness, that one is a
bit hard on synthesis tools; it implies data paths whose
widths vary as a function of one of the input values, which
doesn't sound very sensible. Only when you stick the two
variable-width pieces together can the tool discover that
the result is of constant width.

--

Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * Perl * Tcl/Tk * Verification * Project Services

Doulos Ltd. Church Hatch, 22 Market Place, Ringwood, Hampshire, BH24 1AW, UK
Tel: +44 (0)1425 471223 mail: (e-mail address removed)
Fax: +44 (0)1425 471573 Web: http://www.doulos.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.
 
E

Egbert Molenkamp

Jonathan Bromley said:
Can we ALL please bang VERY LOUDLY on Synplicity's door until they
finally support variable-length shifts?

I did .. and just received the answer that the issue is solved in 7.5 and
that the version is available for download.

Egbert Molenkamp
 
J

Jonathan Bromley

Egbert Molenkamp said:
I did .. and just received the answer that the issue is solved in 7.5 and
that the version is available for download.

Yippee! Thanks Egbert.

--

Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * Perl * Tcl/Tk * Verification * Project Services

Doulos Ltd. Church Hatch, 22 Market Place, Ringwood, Hampshire, BH24 1AW, UK
Tel: +44 (0)1425 471223 mail: (e-mail address removed)
Fax: +44 (0)1425 471573 Web: http://www.doulos.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.
 
A

ALuPin

Hi,

I tried to compile your code with
Altera QuartusII v3.0 SP2

and got the following warning:

Warning: VHDL Subtype or Type Declaration warning at numeric_std.vhd(878):
subtype or type has null range Switching left and right bound of range.

Was does that mean?

Rgds
A.Vazquez
 
J

Jonathan Bromley

ALuPin said:
Hi,

I tried to compile your code with
Altera QuartusII v3.0 SP2

and got the following warning:

Warning: VHDL Subtype or Type Declaration warning at
numeric_std.vhd(878): subtype or type has null range

That's a completely reasonable warning, but VHDL is
supposed to allow null ranges. Many tools issue a warning,
but then go on to process the null range correctly. However,
Quartus says....
Switching left and right bound of range.

AARGH!!!! This is absurd. Given

signal s: std_logic_vector (7 downto 0);

* the range (0 downto 0) is just a single bit s(0)
* the range (0 downto 1) is a null range - no bits at all
* the range (1 downto 0) is two bits wide

Therefore, switching the left and right bound is
completely unacceptable - it changes the meaning of
the code in a disastrous way.
Was does that mean?

It means Quartus is doing something that it has no right to do.

See my other replies in this thread for a different solution
that does not require null ranges.
 
A

ALuPin

Apart from that I get the Info
"No valid register-to-register paths exist for clock Clk"

What does go wrong with timing calculation?

Rgds
 
R

Ray Andraka

Jonathan,

Which tools allow a null range? Last time I tried it (admittedly a
while back), Either synplify or modelsim (or both) choked on null
ranges, so I have avoided them.


--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930 Fax 401/884-7950
email (e-mail address removed)
http://www.andraka.com

"They that give up essential liberty to obtain a little
temporary safety deserve neither liberty nor safety."
-Benjamin Franklin, 1759
 
R

Ray Andraka

The problem with this is it tends to generate a shifter implemented as a big
mux, which is inefficient (it is o(n)). A merged tree structure is O(log(n)),
but I've yet to see synthesis do it from a single layer definition. The merged
tree implementation splits the shift into layers of 2:1 muxes, each layer either
shifting by a power of 2 or passing the input unchanged. You can use a generate
statement to do the construction with an inferred 2:1 mux in each iteration, and
that works with any synth tool that supports the generate statement. Some
synths also need a little extra guidance in the form of keep buffers to prevent
'optimization' destroying the structure.




Jonathan said:
This one works, and even Synplify likes it

It's the version that I have found to be most portable
across synthesis tools. You may possibly get a harmless
warning about one of the bits of "double" being unused.
What's the best way to reverse the rotate on this?
(make it rotate left)

Reverse the order of bit numbering in all the vector
declarations - ports and internal signals A, F, double:
std_logic_vector(0 to 31) etc. Other code unchanged.
Unconventional but effective.
I guess doing a

temp := 32-S

and

F(i)<= double(to_integer(unsigned(temp)) + i);

Would do it, but I'm unsure of the fine tuning

That, or something very close, sounds OK. But I would get
a bit jumpy about doing any unnecessary arithmetic on the
shift value; it would be OK in simulation but you really,
really don't want any adders in the select path. Hence
my preference for getting the left shift by re-numbering bits.
It's probably fine either way in most tools, but I've been
bitten with such things before now.
As you can tell, I'm not very experienced in this whole thing, but
hopefully over time ...

It's the usual engineering thing: there's no substitute for
lots of experience and plenty of paranoia.
(2) Use variable-length slices:
[...]
Unfortunately, Synplify Pro 7.2 doesn't like that one.. ;/

Don't say I didn't warn you :) In fairness, that one is a
bit hard on synthesis tools; it implies data paths whose
widths vary as a function of one of the input values, which
doesn't sound very sensible. Only when you stick the two
variable-width pieces together can the tool discover that
the result is of constant width.

--

Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * Perl * Tcl/Tk * Verification * Project Services

Doulos Ltd. Church Hatch, 22 Market Place, Ringwood, Hampshire, BH24 1AW, UK
Tel: +44 (0)1425 471223 mail: (e-mail address removed)
Fax: +44 (0)1425 471573 Web: http://www.doulos.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930 Fax 401/884-7950
email (e-mail address removed)
http://www.andraka.com

"They that give up essential liberty to obtain a little
temporary safety deserve neither liberty nor safety."
-Benjamin Franklin, 1759
 
J

Jonathan Bromley

Ray Andraka said:
Jonathan,

Which tools allow a null range? Last time I tried it (admittedly a
while back), Either synplify or modelsim (or both) choked on null
ranges, so I have avoided them.

No simulator should ever object, because null ranges are in the
LRM. ModelSim is fine with them, though it issues a warning
(as it's entitled to do). There was some funny business
between VHDL-87 and VHDL-93 about concatenations, which may
have got some people worried about the effect of null ranges.

The synthesis situation is trickier.

Leo Spectrum is OK with null ranges, Synopsys DC (last time I
checked) couldn't cope. This is pretty wild, since
ieee.numeric_std includes some null range definitions -
just goes to show that the synth tools rely on built-in
versions of the standard libraries, not the source code.
I haven't ever checked with Synplify, I don't think.

Regards
--

Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * Perl * Tcl/Tk * Verification * Project Services

Doulos Ltd. Church Hatch, 22 Market Place, Ringwood, Hampshire, BH24 1AW, UK
Tel: +44 (0)1425 471223 mail: (e-mail address removed)
Fax: +44 (0)1425 471573 Web: http://www.doulos.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.
 
J

Jonathan Bromley

The problem with this is it tends to generate a shifter implemented as a big
mux, which is inefficient
yes

(it is o(n))

I guess you mean its delay - there's another O(n) in the size,
because the whole affair has n outputs. I'm assuming you
mean (n) is the number of bits in the data word, so that
log2(n) is the number of bits in the shift count.

However, I don't quite follow - surely each individual mux is
implemented in a tree-like way, O(log(n)), by the synth tool?
Leo Spectrum, which I use most often, does essentially this,
but fiddles around quite a lot to make good use of the limited
fanin on Xilinx LUTs. I synth'd my "funnel shift" design
just now, and got propagation delay closely following O(log(n)).
A merged tree structure is O(log(n)),
but I've yet to see synthesis do it from a single layer
definition.

My synth runs today showed the area scaling just a little
faster than O(n.log(n)), presumably because there's some
logic replication going on to reduce fanout. AIUI your
merged tree would go exactly as O(n.log(n)), unless the
selector bits were buffered in some way for the same reason.

[...]
You can use a generate
statement to do the construction with an inferred 2:1 mux
in each iteration

Yes - I've never had cause to do this, but it's a nice approach.

Just for the record, here's what I got from a "dumb" synth run
on Leonardo (a rather elderly version, 2002e.16) targeting a
Spartan 2s15-6. Delays DON'T include I/O pad delays, and are
synth tool estimates rather than the actual values after P&R.
Experience suggests that Leo Spectrum is slightly pessimistic
in most of its timing estimates.

================================================
Select Data Area Delay Critical logic path
bits bits (LUTs) (ns) LUTs MUXFs XORCYs
================================================
3 8 23 4.1 3 0 0
4 16 66 5.5 4 0 0
5 32 189 7.2 3 3 1
6 64 448 9.7 7 0 0
7 128 1016 11.2 8 0 0
================================================

I have absolutely no idea why synth chose to build its odd
arrangement of MUXF4/5s and XORCYs in the N=32 case.

I am in no doubt that you could do better than
this, Ray, but I confess that I'm fairly impressed that
you can get reasonable behaviour from synthesising such
a no-brainer VHDL implementation.
 
R

Ray Andraka

I was referring to the amount of logic, which relates to the delay. If
the barrel shift is constructed the way I suggested, you'll get:

====================
Select Data Area
bits bits (LUTs)
====================
3 8 24
4 16 64
5 32 160
6 64 384
7 128 896
====================\

The area for a merged tree is O(nlogn), where a multiplexer for each output
has an area of O(n^2). The merged tree reuses subterms in the tree, where a
straight tree does not. The optimal merged tree uses 2 input muxes, which
'wastes' some of the lut inputs. It looks like your synthesis run did combine
some of the terms, enough that the smaller trees wound up close to the same
size as the optimal tree. As the shifter gets larger though, the synthesized
utilization grows faster than the optimal implementation, which is exactly as
I would expect. The merged tree with 2:1 muxes has fan-in and fan-out of 2 at
each node. The mux controls do have a high fanout, but they can easily be
duplicated to speed things up. The merged tree implementation is also very
easy to pipeline all the way down to one lut between each pipeline stage,
which makes for very high speed pipelined implementations.

The results with the straight mux are also highly dependent on the synthesis
tool. Some do better than others at combining the terms.

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930 Fax 401/884-7950
email (e-mail address removed)
http://www.andraka.com

"They that give up essential liberty to obtain a little
temporary safety deserve neither liberty nor safety."
-Benjamin Franklin, 1759
 
J

Jonathan Bromley

Ray Andraka said:
I was referring to the amount of logic, which relates to the delay. If
the barrel shift is constructed the way I suggested, you'll get:

====================
Select Data Area
bits bits (LUTs)
====================
3 8 24
4 16 64
5 32 160
6 64 384
7 128 896
====================

The area for a merged tree is O(nlogn), where a multiplexer for each output
has an area of O(n^2).

Sorry, I was being dense. Of course you're right that a single
mux has size O(n).
The merged tree reuses subterms in the tree, where a
straight tree does not. The optimal merged tree uses 2 input muxes, which
'wastes' some of the lut inputs. It looks like your synthesis run did combine
some of the terms, enough that the smaller trees wound up close to the same
size as the optimal tree.

Yes; and it worked quite hard to make use of more than 3 inputs on each LUT.
Of course, that breaks the symmetry...
As the shifter gets larger though, the synthesized
utilization grows faster than the optimal implementation, which is exactly as
I would expect. The merged tree with 2:1 muxes has fan-in and fan-out of 2 at
each node. The mux controls do have a high fanout, but they can easily be
duplicated to speed things up. The merged tree implementation is also very
easy to pipeline all the way down to one lut between each pipeline stage,
which makes for very high speed pipelined implementations.

This last point is obviously very important.
If your VHDL code implies registers between the stages, I guess
there will be no need to apply don't-touch attributes to the muxes.
The results with the straight mux are also highly dependent on the synthesis
tool. Some do better than others at combining the terms.

Yes - I was expressing mild surprise that Leo Spectrum was getting
so close to the "best" answer. As you say, it merges at least some
of the subtrees.

Thanks,
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * Perl * Tcl/Tk * Verification * Project Services

Doulos Ltd. Church Hatch, 22 Market Place, Ringwood, Hampshire, BH24 1AW, UK
Tel: +44 (0)1425 471223 mail: (e-mail address removed)
Fax: +44 (0)1425 471573 Web: http://www.doulos.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.
 
R

Ray Andraka

I too am mildly surprised. I've seen some pretty gawdawful stuff come out with
a straight mux put in.
The syn_keeps aren't necessary with the pipeline. In my case, I have a VHDL
component that not only
generates the optimal tree, it also places it. That one is hard to beat with
inferred code because of the
lousy job the placer does with multiple levels of LUTs..

Jonathan said:
....
Yes - I was expressing mild surprise that Leo Spectrum was getting
so close to the "best" answer. As you say, it merges at least some
of the subtrees.

Thanks,
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * Perl * Tcl/Tk * Verification * Project Services

Doulos Ltd. Church Hatch, 22 Market Place, Ringwood, Hampshire, BH24 1AW, UK
Tel: +44 (0)1425 471223 mail: (e-mail address removed)
Fax: +44 (0)1425 471573 Web: http://www.doulos.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930 Fax 401/884-7950
email (e-mail address removed)
http://www.andraka.com

"They that give up essential liberty to obtain a little
temporary safety deserve neither liberty nor safety."
-Benjamin Franklin, 1759
 
A

Allan Herriman

No simulator should ever object, because null ranges are in the
LRM. ModelSim is fine with them, though it issues a warning
(as it's entitled to do). There was some funny business
between VHDL-87 and VHDL-93 about concatenations, which may
have got some people worried about the effect of null ranges.

The synthesis situation is trickier.

Leo Spectrum is OK with null ranges, Synopsys DC (last time I
checked) couldn't cope. This is pretty wild, since
ieee.numeric_std includes some null range definitions -
just goes to show that the synth tools rely on built-in
versions of the standard libraries, not the source code.
I haven't ever checked with Synplify, I don't think.

Synplify will issue an error if it sees a null range. (I haven't used
it for a while though; perhaps things have improved in the last six
months.)

Regards,
Allan.
 
R

Ray Andraka

That was my experience as well. I don't think that has been fixed either.


Synplify will issue an error if it sees a null range. (I haven't used
it for a while though; perhaps things have improved in the last six
months.)

Regards,
Allan.

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930 Fax 401/884-7950
email (e-mail address removed)
http://www.andraka.com

"They that give up essential liberty to obtain a little
temporary safety deserve neither liberty nor safety."
-Benjamin Franklin, 1759
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top