Fast Counter

Jessica Shaw · Sep 7, 2011

Hi,

I need a 700 MHz to 800Mhz synchronous 16 bit counter. The counter
will also have a Start, Reset and Stop pins.

Reset will intialize the counter to zero. Start will let the counter
run on each rising edge of the 700 or 800 Mhz clock. And stop will
stop the counter and user will be able to read the value.

I do not know

1. What FPGA or CPLD will be able to do this task at the above
mentioned high frequency?
2. Do I need a PLL inside the FPGA or CPLD to produce such kind of
clock?
3. How can I generate this kind of clock?

Any advice will be appreciated.

jess

Jessica Shaw · Sep 7, 2011

Hi,

Why is it dfficult?

Jess

Bart Fox · Sep 7, 2011

Hi,

Why is it dfficult?

Why is it difficult to build and drive a car with 1000 km/h (620 mph)?
There are physical limits.
In both cases.

regards,
Bart

Jessica Shaw · Sep 8, 2011

Hi,

ok, so you can get a jet engine car but it will be difficult to drive
it on the road. So, what are the difficulties with making such a
FPGA?

Second, what should be the good solution?

jess

KJ · Sep 8, 2011

Hi,

ok, so you can get a jet engine car but it will be difficult to drive
it on the road. So, what are the difficulties with making such a
FPGA?

The short answer is that there are tradeoffs that are based on
technology as well as market demand. The FPGA companies' customers
make use of capabilities that are unique to FPGAs and are willing to
compromise on things like top speed or low power consumption.

Providing things like re-programmability and a fairly generic pool of
logic that can implement basically any arbitrary function which FPGAs
do quite well doesn't come 'free'. Historically, the price to be paid
has meant that you will pay a higher piece price, run slower and
consume more power than you would if you have the resources and
business case to develop a custom single function part. Without
getting into a debate about the merits of each cost, suffice it to say
that there is sufficient market demand for such programmable products
to keep companies in business and profitable.

There are many niches that one can play in the programmable logic
world and be profitable. Some of these niches involve providing lower
power or higher performance than some other companies FPGA. However,
within each niche product, you'll find something that you can't do (or
can't do well) with that part that you can with some other part.
Before there were even FPGAs, there were PLDs which provide much the
same type of functionality but were blazingly fast compared to those
first FPGAs...but again, there were tradeoffs, notably the amount of
logic that could be implemented in a single device.

So, in the end, if you're a user of an FPGA, it really doesn't matter
"what are the difficulties with making such a FPGA" as you asked.
Your job is to find the FPGA that has the right set of features for
your application.

Second, what should be the good solution?

Before there can be a solution there must first be a full discovery of
what the constraints really are so nobody here will be able to
confidently offer up what will be a 'good' solution for you. You can
get possible solutions that happen to work for you, but since we don't
know your constraints we don't really know if any proposed solution
would really be 'good'.

For example, you didn't state any latency requirement on when the
count must be valid relative to the assertion of 'stop'. If there is
none, than one can play the simple game of having four counters
running on different phases of 200 MHz. At the end, simply add the
value of the four counters to get the final result.

This approach would obviously take more logic than a hypothetical
single 16 bit counter, but since you have not stated any logic
resource constraints for the counter one would have no idea of whether
or not this approach is 'good' for you. So logic resource are another
possible constraint.

Counters do not have to be binary, a 16 bit LFSR will run quite fast
but then it requires interpretation of the output in order to figure
out what the binary equivalent value...but maybe that's OK in your
application. So counting sequence is another possible constraint.

There likely is a programmable part that can implement a full 16 bit
binary counter in minimal resources but maybe the cost is too high and
it makes the product not profitable so that part can't be used.

As you can see, there are likely all kinds of constraints that one may
not necessarily realize up front. It is up to you to understand your
function and performance goals, the constraints that you must live
within and come up with the optimal solution...that's what is known as
engineering. In short, you have to look at tradeoffs.

Kevin Jennings

jeppe · Sep 8, 2011

Hi

If you implement the two least signigicant bit as a Johnson counter will you "only" need to implement a 14-bit 200MHz counter for the rest of the bits.

your welcome

Jessica Shaw · Sep 8, 2011

Kevin said:
For example, you didn't state any latency requirement on when the
count must be valid relative to the assertion of 'stop'. If there is
none, than one can play the simple game of having four counters
running on different phases of 200 MHz. At the end, simply add the
value of the four counters to get the final result.

Can you advice more on how can I use four counters running on
different phases of 200MHz. I am little confused about different
phases. Can you advice some application notes or examples.

Thanks
jess

Jessica Shaw · Sep 8, 2011

A timing waveform or block diagram would work too
jess

KJ · Sep 11, 2011

Can you advice more on how can I use four counters running on
different phases of 200MHz. I am little confused about different
phases. Can you advice some application notes or examples.

Actually, you don't need to run four different phases of the clock as
I mentioned, you can run the four counters on the same 200 MHz clock
that is phase locked to the 800 MHz which might be a simpler situation
to describe in this forum. Either approach is viable, there can be
other ways to accomplish the same thing as well..

Start with a free running two bit counter that is clocked by your 800
MHz clock. If on a particular clock cycle you want to advance your
counter by 1 then what you would do is set a bit in a four bit
vector. Something like this...

signal Counter_Enable: std_ulogic_vector(0 to 3);
signal Counter: natural range Counter_Enable'range;
....
if rising_edge(Clock_800MHz) then
-- Free running counter
if (Reset = '1') or (Counter = 3) then
Counter <= 0;
Counter_Enable <= (others => '0');
else
Counter <= Counter + 1;
end if;
Counter_Enable(Counter) <= Count_By_1;
end if;

Now assume that you have a 200 MHz clock that is phase locked to the
800 MHz clock. The first thing you would want to do is resynchronize
the Counter_Enable to the slower clock like this...

Counter_Enable_Sync_200M <= Counter_Enable when
rising_edge(Clock_200M);

The reason for this is that the individual bits of 'Counter_Enable',
since they are clocked by the 800 MHz clock, will be changing at times
that will make meeting timing difficult. By syncing them to the 200
MHz clock, now you have a counter enable that will be there for an
entire 200 MHz clock cycle. So you use those to bump the individual
counters like this...

for i in Counter_Enable_Sync_200M'range loop
if (Counter_Enable_Sync_200M = '1') then
Counter_200M(i) <= Counter_200M(i) + 1;
end if;
end loop;

At the end, you add the four counters up to get the final output...

Counter_Out <= Counter_200M(0) + Counter_200M(1) + Counter_200M(2) +
Counter_200M(3)

Also, note that each of the four 'Counter_200M' counters would only
need to be 14 bits rather than 16 since at most they count one time
every four of the 800 MHz clocks.

I've left out some of the details, but outlined it enough that you
should be able to follow the general idea. One other point that
you've left out of your description is what is generating the 700-800
MHz input that you are counting in the first place. What I've
described presumes that you have such a clock cycle and can then
derive the slower clock from that faster clock...maybe that's your
situation, maybe not. Like I said before, you haven't described
enough for anyone to know what problem is being solved.

Kevin Jennings

valtih1978 · Sep 12, 2011

Hi,

Why is it dfficult?

Jess

Because FPGA gates _are emulated_.

You might wonder why ASICs, easily running at 4 GHz today, do only at
400 MHz in case of FPGAs.

FPGAs are intended to emulate the logic. They do it much faster and
more efficiently than SW emulation on usual processor-based (super-)
computers. However, emulation is still achieved through configuration of
the real HW. That means that you must have abudance of HW resources that
may be configured into this or that mode. Some true gates are used as
switches rather than do useful job. Others stay unused because FPGA
designers are not application aware and, therefore, cannot be sure which
resources will be necessary, they cannot optimize placement and routing
for user to minimize the paths. In result, you have 10x larger, more
expensive, power-hungry device than ASIC. It is as much slower because
signals must pass through configuration switches and suboptimal routing,
around unused resources and drive unused gates.

Comp.arch.fpga is the place to ask for FPGA caps.

Martin Thompson · Sep 12, 2011

valtih1978 said:
Because FPGA gates _are emulated_.

I'm not sure that's a fair representation! True, there are no
user-accessible "gates" as such, but there are a plethora of low-level
configurable logic elements of various sorts, which are real enough!

You might wonder why ASICs, easily running at 4 GHz today, do only at
400 MHz in case of FPGAs.

FPGAs are intended to emulate the logic.

FPGAs are intended to *implement* some logic. If you want to use it for
the narrow world of emulating an ASIC, that's fine.

I don't. I build products with them.

They do it much faster and more
efficiently than SW emulation on usual processor-based (super-)
computers. However, emulation is still achieved through configuration of the
real HW. That means that you must have abudance of HW resources that may be
configured into this or that mode.

Those who target FPGAs must be well aware of these resources, so that
they use them to their best advantage. Just throwing an ASIC netlist at
them will not realise the sort of results that an experienced and
talented FPGA user will. (I had "designer" rather than "user" in here
first, but that might cause confusion with the paragraph below...)

Some true gates are used as switches rather than do useful job. Others
stay unused because FPGA designers are not application aware and,
therefore, cannot be sure which resources will be necessary,

I assume you mean the designers of the FPGA silicon, not people like me
who design logic to in said silicon. Who are often called FPGA
designers as well...

they cannot optimize placement and routing for user to minimize the
paths. In result, you have 10x larger, more expensive, power-hungry
device than ASIC. It is as much slower because signals must pass
through configuration switches and suboptimal routing, around unused
resources and drive unused gates.

On the upside, they are

* Cheap enough
* Small enough
* Low power enough

for an awful lot of real applications outside of "emulating ASICs".

And they have vastly lower NRE and are completely field reconfigurable
(unlike ASICs).

Cheers,
Martin

valtih1978 · Sep 12, 2011

but there are a plethora of low-level
configurable logic elements of various sorts, which are real enough!

In which point do I say otherwise?

FPGAs are intended to *implement* some logic. If you want to use it

for the narrow world of emulating an ASIC, that's fine. I don't. I
build products with them.

On the upside, they are * Cheap enough * Small enough * Low power

enough * And they have vastly lower NRE and are completely field
reconfigurable (unlike ASICs).

That is why we emulate our circuits in FPGAs rather than produce them in
silicon. Please, do not confuse the emulation with simulation (aka
prototyping). Both emulation and simulation mimic some object. The
difference is that in simulation (prototyping) you study the behaviour,
including internals of your model. Using emulator, you do not care about
the model. The emulation means (at least as how understand it) that
there is some SW (machine or circuit) that runs on top of another, HW
layer (machine or circuit). Emulated part will be more flexible but
executes 10x slower.

The prototypes are simulations implemented in FPGA. FPGAs are ideal to
speed up simulation. But, they are also ideal for emulating any user
logic outside the domain of logic simulation.

Andy · Sep 12, 2011

Valtih,

I don't believe your definitions of emulation and simulation are
commonly used in industry, but I see your point (with your
clarification).

I agree with Martin however, that FPGAs IMPLEMENT logic. We often
think of them as emulating gates, but modern FPGA synthesis tools do
not compile a design down to gates and then emulate the (groups of)
gates with FPGA resources, they compile the design to FPGA resources
directly. Descriptions of the implementation (ntelists, etc.) often
use familiar sounding gate terminology, but that is a documentation
artifact, not based on how the synthesis tool does its job.

The implemented logic may be used as a prototype for an ASIC that it
is emulating (industry standard definition thereof, as a replacement
for, or augmentation of, simulation). Or the implemented logic may be
the final product. If you designed a board 20+ years ago, you would
not say that PALs and SSI circuits (74xx) emulated the logic you
wanted, you would say they IMPLEMENTED the logic you wanted. When I
was in college, we studied IMPLEMENTING logic functions (often
represented as series of gates, truth tables, sum-of-products, etc.)
using SSI components like multiplexers, decoders, etc. When I got into
industry, PALs/PLAs were the main tool of choice, and such tricks were
"obsolete". Little did I know that in a few short years, I would be
dusting off those same tricks designing FPGAs (before FPGA synthesis
got much better).

Throwing a figure like "10x slower" around is a bit short-sighted.
They can be 10x slower, but they can also be much less slower,
depending on what you are trying to accomplish, and on the available
resources in the FPGA device.

Andy

Jessica Shaw · Sep 12, 2011

Rob Wrote

Different phases as in, you run your external 200 MHz clock into a PLL
(or DLL, depending on what the FPGA you're using has, they'll serve the
same purpose) and bring out four 200 MHz clocks, each 1.25 ns apart from
the next (4 * 1.25 ns = 5 ns = 1/200 MHz).

Then you have four counters with four enable flip-flops, each running on
a different one of those clock phases. Your start and stop pulses
control the enable flops.

Then you put some downstream logic on one of those clock phases that,
after you've gotten a stop pulse, adds up the results of those four
counters. The fact that you'll have some number of counters with N
counts, and some with N-1 gives you an effective 1.25 ns resolution on
your timing.

Will I have four stop and start pulses to control the flip flops? I
did not understand the part saying that The fact that you'll have some
number of counters with Ncounts, and some with N-1 gives you an
effective 1.25 ns resolution on your timing.

Thans
jess

Jessica Shaw · Sep 12, 2011

Hi KJ,

You are suggesting that I should use 800MHz clock and divided into
four 200MHz clocks. Each clock will be running a counter. A two bit
counter will be running on 800 MHz clock. Will the four counters have
their own enbale, stop and start signals? Is free running counter is
the "counter" defined as signal?

jess

valtih1978 · Sep 13, 2011

Thank you explaining the difference between implemention and emulation.
Indeed, personal computers, ASICs, FPGAs are the technologies to
implement user algorithms. You just say: implement that for me and
compiler does the job. This is ok, but, I do not see why ASICs are
different in this respect. There is Design Compiler. It does the same
thing as FPGA-oriented synthesis - it maps HDL to the gates available in
target technology (see them packed in LUTS, FPGA in technology view).

More importantly, this abstraction from implementation details brings us
away from from the question: _why universal computers are ten times
slower than the special-purpose ones?_ Highlighting that synthesizer
produces "soft gates" out of RTL description unveils the virtual
computation on top of native one. This is important here because it
answers Jessica's question.

valtih1978 · Sep 17, 2011

Support from the major FPGA vendor is very good. It is natural because
HW achieves its supercomputer performance through the fine grain
parallelism. It consists of millions of tiny parallel processors - the
gates. Thus, it seems impossible to extract the available parallelism
out of RTL description without compiling it into a netlist. Should it be
a target fpga or virtual technology.

Let's suppose you compile RTL directly into target fpga technology right
away and, thus, achieve the most optimal FPGA implementation ever
possible. How do you explain Jessica why you are still 10x behind ASIC?

KJ · Sep 17, 2011

Let's suppose you compile RTL directly into target fpga technology right
away and, thus, achieve the most optimal FPGA implementation ever
possible. How do you explain Jessica why you are still 10x behind ASIC?

I already explained the reason in my first post on this topic over a
week ago, perhaps you should read the first half of the post.
http://groups.google.com/group/comp...75d11/173b156f9e0a5825?hl=en#173b156f9e0a5825

To reiterate a bit, ASICs are single function parts, FPGAs are run-
time programmable. At the lowest level, both parts are all built on
the same basic technology and will have the same speed at that level
(i.e. the transistor level).

In order to provide run-time programmable parts, FPGAs are designed
such that the end user does not have direct control all the way down
to the transistor level. The primitive elements for a user for
implementing logic in an FPGA are mostly look up tables and flip
flops. There are no 'gates' that the user has control over.

The reason that FPGAs exist at all is because there is market demand
for a component that
- Implements arbitrary logic where the ability to implement any design
change is not limited by the FPGA, nor does it require payment to the
FPGA supplier to implement the change. In other words, the cost and
implementation time for a design change is completely under the
control of the designer that *uses* the FPGA, not the supplier of the
FPGA.
- Other technologies such as ASICs and CPLDs have not been able to
crush FPGAs out of the market. In fact, the opposite has been
happening for a long time: ASICs and CPLDs design starts are being
squeezed out by FPGA designs.

The 'design cost' that a user will pay for choosing an FPGA over an
ASIC is speed and power. The market currently supports many niches
for implementing logic designs. FPGAs, CPLDs and ASICs fill different
niches, they each are optimal for certain designs and sub-optimal for
others...that's the way it is, get on with it.

Kevin Jennings

valtih1978 · Sep 17, 2011

Actually, it was rhetoric question with the purpose to show that whether
mapping to FPGA is immediate or undergoes virtual gate representation is
not important for FPGA vs. ASIC performance.

Regarding your marketing manifest, adding that "FPGAs are designed such
that the end user does not have direct control all the way down to the
transistor level" does not add very much to it. How do you run you
design on FPGA if you have no control over its "gates"? Actually, it
says that "we do not allow you to turn our general-purpose computer into
app-specific one by design". I'm sure, that the problem is not a design.
You cannot do that in principle. FPGA stays a fixed, hardwired
general-purpose piece of computer. It executes user app at the higher
level. In other words, it emulates user circuit rather than implements
it natively. As any emulation, it is is 10x slower. 'Niches' do not
change this principle.

So, you cannot bypass this picture.

KJ · Sep 18, 2011

Actually, it was rhetoric question with the purpose to show that whether
mapping to FPGA is immediate or undergoes virtual gate representation is
not important for FPGA vs. ASIC performance.

It appears that you don't even read your postings. Your stated
question was "How do you explain Jessica why you are still 10x behind
ASIC?" That's not a very good example of a 'rhetorical question'...

Regarding your marketing manifest, adding that "FPGAs are designed such
that the end user does not have direct control all the way down to the
transistor level" does not add very much to it.

Actually it has everything to do with 'it', but you do not seem to be
understanding 'it'. In this case, 'it' is the difference in system
level performance of an ASIC versus and FPGA. The reason for that
difference has to do with the fact that FPGA manufacturers saw a
market need for a device that can implement arbitrary logic (like an
ASIC can) but is user programmable. In order to implement the 'user
programmable' part of their product, some of the potential performance
of the raw silicon technology was used leaving less performance for
the end user. FPGA manufacturers were not the first to see that need
and market such a part they are one of many.

How do you run you
design on FPGA if you have no control over its "gates"?

Here you're wrong on at least a couple of fronts:
- FPGAs implement logic with lookup table memory, not in logic gates.
- Since one can implement logic with lookup table memory and no gates
the lack of 'control over its gates' is not relevant...there are no
'gates' to control and yet functional designs can be implemented just
fine.
- 'Gates' are not the real primitive device, they are themselves an
abstraction. Transistors are the primitive. Control of voltage,
current and charge is the game.
- I never said anything about controlling 'gates' in the first place.
What I said was "...does not have direct control all the way down to
the transistor level". 'Transistors' are not 'gates'. Transistors
can be used to implement a 'gate', but the reverse is not true.

Actually, it
says that "we do not allow you to turn our general-purpose computer into
app-specific one by design".

That's your interpretation...I disagree with it completely, but you
can have that. Computers have a definition (perhaps you should look
up generally accepted definitions), but those generally accepted
definitions do not include 'FPGA' or 'ASIC'. An FPGA or ASIC or
discrete logic gates or even discrete transistors can be used to
implement a computer. However, none of those devices are in any a
'general-purpose computer' or any other type of computer.

I'm sure, that the problem is not a design.
You cannot do that in principle. FPGA stays a fixed, hardwired
general-purpose piece of computer.

Not true at all...see previous paragraph...and you should probably
research the definition of computer as well.

It executes user app at the higher
level.

As does an ASIC design...unless you really think that ASIC designers
design everything down to the transistor level. Gates are an
abstraction.

A high level design language like VHDL can be used to describe an
intended function. That description can be used to implement a design
in many technologies. The technology chosen does not change the 'user
app' therefore that 'user app' cannot be at any different level then
if a different technology choice had been used.

In other words, it emulates user circuit rather than implements
it natively.

Not true. From a black box perspective, an FPGA and an ASIC can be
designed to implement exactly the same function. They simply have
different primitive elements that can be manipulated by the designer.
The choice of technology used to implement a design does not imply
that one is an emulation of the other.

As any emulation, it is is 10x slower.

Not true either. A discrete logic gate implementation or a discrete
transistor implementation would be much slower than an FPGA...but they
would not be an emulation as defined by most reasonable sources. But
you appear to suggest with this statement that an implementation that
is 10x slower is an emulation. If so, I've provided the counter-
example to your statement, thereby disproving it.

Perhaps if you peruse the following links and do some more research,
you will discover what the word emulation is generally accepted to
mean:
- http://en.wikipedia.org/wiki/Emulation
- http://www.merriam-webster.com/dictionary/emulation?show=0&t=1316306655

So, you cannot bypass this picture.

No idea what picture you think is being bypassed. You can choose to
use the words 'implementation' and 'emulation' how you want, that's
your choice. However, since those words already have accepted
definitions that are different than what you have chosen don't expect
to get much acceptance of your usage.

This is the last I have to say on this thread.

Kevin Jennings

Dual Edged Counter	6	Apr 17, 2013
Very fast counter in VirtexII	10	Feb 21, 2009
Synchronous programmable counter	1	Jun 7, 2013
Calculating Pulse per minute in a FPGA	6	May 28, 2013
VHDL program error	2	Nov 26, 2014
Digital Counter Error	0	Mar 27, 2013
conv N/A _ with_Virtex5	0	Apr 23, 2014
Timing Problems with counter	3	Mar 10, 2009

Fast Counter

Jessica Shaw

Jessica Shaw

Bart Fox

Jessica Shaw

KJ

jeppe

Jessica Shaw

Jessica Shaw

KJ

valtih1978

Martin Thompson

valtih1978

Andy

Jessica Shaw

Jessica Shaw

valtih1978

valtih1978

KJ

valtih1978

KJ

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads