The littlest CPU

R

rickman

I may need to add a CPU to a design I am doing. I had rolled my own
core once with a 16 bit data path and it worked out fairly well. But
it was 600 LUT/FFs and I would like to use something smaller if
possible. The target is a Lattice XP3 with about 3100 LUT/FFs and
about 2000 are currently used. I believe that once I add the CPU
core, I can take out a lot of the logic since it runs so slowly. The
fastest parallel data rate is 8 kHz with some at 1 kHz and the rest at
100 Hz. I probably would have used a CPU to start with instead of the
FPGA, but there was a possible need to handle higher speed signals
which seems to have gone away.

I recall that someone had started a thread about serial
implementations of processors that were supported by a C compiler. I
don't think any ever turned up. But the OP had some other
requirements that may have excluded a few very small designs. Are
there any CPU cores, serial or parallel, that are significantly
smaller than 600 LUT/FFs? The Lattice part has LUT memory even dual
port, so that is not a constraint, the LUTs can be used for
registers.

Rick
 
J

John McCaskill

I may need to add a CPU to a design I am doing.  I had rolled my own
core once with a 16 bit data path and it worked out fairly well.  But
it was 600 LUT/FFs and I would like to use something smaller if
possible.  The target is a Lattice XP3 with about 3100 LUT/FFs and
about 2000 are currently used.  I believe that once I add the CPU
core, I can take out a lot of the logic since it runs so slowly.  The
fastest parallel data rate is 8 kHz with some at 1 kHz and the rest at
100 Hz.  I probably would have used a CPU to start with instead of the
FPGA, but there was a possible need to handle higher speed signals
which seems to have gone away.

I recall that someone had started a thread about serial
implementations of processors that were supported by a C compiler.  I
don't think any ever turned up.  But the OP had some other
requirements that may have excluded a few very small designs.  Are
there any CPU cores, serial or parallel, that are significantly
smaller than 600 LUT/FFs?  The Lattice part has LUT memory even dual
port, so that is not a constraint, the LUTs can be used for
registers.

Rick


The Xilinx PicoBlaze is less than 100 LUTs plus one block ram.
Someone has been working on a simple C compiler for the PicoBlaze, but
I have not tried it yet. I have used the PicoBlaze in many projects
and I am quite happy with it.

I have not used it, but Lattice has the Micro8. Have you looked at
it? It has been mentioned here as the Lattice equivalent to the
PicoBlaze.

Regards,

John McCaskill
www.FasterTechnology.com
 
A

Antti

I may need to add a CPU to a design I am doing.  I had rolled my own
core once with a 16 bit data path and it worked out fairly well.  But
it was 600 LUT/FFs and I would like to use something smaller if
possible.  The target is a Lattice XP3 with about 3100 LUT/FFs and
about 2000 are currently used.  I believe that once I add the CPU
core, I can take out a lot of the logic since it runs so slowly.  The
fastest parallel data rate is 8 kHz with some at 1 kHz and the rest at
100 Hz.  I probably would have used a CPU to start with instead of the
FPGA, but there was a possible need to handle higher speed signals
which seems to have gone away.

I recall that someone had started a thread about serial
implementations of processors that were supported by a C compiler.  I
don't think any ever turned up.  But the OP had some other
requirements that may have excluded a few very small designs.  Are
there any CPU cores, serial or parallel, that are significantly
smaller than 600 LUT/FFs?  The Lattice part has LUT memory even dual
port, so that is not a constraint, the LUTs can be used for
registers.

Rick

im OP

hi I may have different interests, yes smallest nonserialized CPU
as for your current task is one of the wishes, and here also there
is no one definitive winner

pico paco blazes and mico8 are out of the question, most others
are too large

i have used cut AVR core in XP3 but i dont recall the lut count

Antti
 
H

HT-Lab

rickman said:
I may need to add a CPU to a design I am doing. I had rolled my own
core once with a 16 bit data path and it worked out fairly well. But
it was 600 LUT/FFs and I would like to use something smaller if
possible. The target is a Lattice XP3 with about 3100 LUT/FFs and
about 2000 are currently used. I believe that once I add the CPU
core, I can take out a lot of the logic since it runs so slowly. The
fastest parallel data rate is 8 kHz with some at 1 kHz and the rest at
100 Hz. I probably would have used a CPU to start with instead of the
FPGA, but there was a possible need to handle higher speed signals
which seems to have gone away.

I recall that someone had started a thread about serial
implementations of processors that were supported by a C compiler. I
don't think any ever turned up. But the OP had some other
requirements that may have excluded a few very small designs. Are
there any CPU cores, serial or parallel, that are significantly
smaller than 600 LUT/FFs?

I would suggest you check out one of the many free PIC cores available on
the web. The reason for suggesting PIC is that it is accompanied by a
processional IDE from Microchip. Developing a processor is easy and the web
is full of wonderful and clever implementation but at the end of the day if
you have to develop some software you need a good IDE.

I just tried a quick push-button synthesis of a 16C54,

***********************************************
Device Utilization for LFXP3C/PQFP208
***********************************************
Resource Used Avail Utilization
 
R

rickman

im OP

hi I may have different interests, yes smallest nonserialized CPU
as for your current task is one of the wishes, and here also there
is no one definitive winner

pico paco blazes and mico8 are out of the question, most others
are too large

i have used cut AVR core in XP3 but i dont recall the lut count

Have you tabulated your findings anywhere? The last time I did a
survey of ARM7 processors, I put it all into a spread sheet and posted
it on the web. I think it was useful for a while, but the market
overtook it and I couldn't keep up!

I read your thread about the serial processor and it was interesting.
I think my project actually has the time to use such a processor, but
I think you never found one that met your requirements.

I am not looking for a large address space, but I would like for it to
be able to read data from an SD card. My design uses FPGAs both on
the application board and the test fixture. Ultimately I want the
test fixture to be able to read a programming file from an SD card and
configure the target FGPA without a programming cable.

Of all the suggestions, so far the PIC sounds like the best one. I
couldn't find a C compiler for the picoblaze or the pacoblaze. There
is mention of someone creating one, but the web site is no longer
accessible.

Rick
 
A

Antti

Have you tabulated your findings anywhere?  The last time I did a
survey of ARM7 processors, I put it all into a spread sheet and posted
it on the web.  I think it was useful for a while, but the market
overtook it and I couldn't keep up!

I read your thread about the serial processor and it was interesting.
I think my project actually has the time to use such a processor, but
I think you never found one that met your requirements.

I am not looking for a large address space, but I would like for it to
be able to read data from an SD card.  My design uses FPGAs both on
the application board and the test fixture.  Ultimately I want the
test fixture to be able to read a programming file from an SD card and
configure the target FGPA without a programming cable.

Of all the suggestions, so far the PIC sounds like the best one.  I
couldn't find a C compiler for the picoblaze or the pacoblaze.  There
is mention of someone creating one, but the web site is no longer
accessible.

Rick

Hi Rick here is reply to your post :)
http://antti-lukats.blogspot.com/2008/07/rules-of-life.html

in short i am doing almost the same as you intend to at the moment

Antti
 
J

Josep Duran

Of all the suggestions, so far the PIC sounds like the best one. I
couldn't find a Ccompilerfor thepicoblazeor the pacoblaze. There
is mention of someone creating one, but the web site is no longer
accessible.

Rick

You can find a download link here :

http://www.asm.ro/fpga/

Disclaimer : I never used it myself


Josep
 
H

Henri

I may need to add a CPU to a design I am doing. I had rolled my own
core once with a 16 bit data path and it worked out fairly well. But
it was 600 LUT/FFs and I would like to use something smaller if
possible. The target is a Lattice XP3 with about 3100 LUT/FFs and
about 2000 are currently used. I believe that once I add the CPU
core, I can take out a lot of the logic since it runs so slowly. The
fastest parallel data rate is 8 kHz with some at 1 kHz and the rest at
100 Hz. I probably would have used a CPU to start with instead of the
FPGA, but there was a possible need to handle higher speed signals
which seems to have gone away.

I recall that someone had started a thread about serial
implementations of processors that were supported by a C compiler. I
don't think any ever turned up. But the OP had some other
requirements that may have excluded a few very small designs. Are
there any CPU cores, serial or parallel, that are significantly
smaller than 600 LUT/FFs? The Lattice part has LUT memory even dual
port, so that is not a constraint, the LUTs can be used for
registers.

Rick

Maybe something worth checking:

http://www.zylin.com/zpu.htm

From the above website:

1. The ZPU is now open source. See ZPU mailing list for more details.
2. BSD license for HDL implementations--no hiccups when using in
proprietary commercial products. Under the open source royalty free
license, there are no limits on what type of technology (FPGA,
anti-fuse, or ASIC) in which the ZPU can be implemented.
3. GPL license for architecture, documentation and tools
4. Completely FPGA brand and type neutral implementation
5. 298 LUT @ 125 MHz after P&R with 16 bit datapath and 4kBytes BRAM
6. 442 LUT @ 95 MHz after P&R with 32 bit datapath and 32kBytes BRAM
7. Codesize 80% of ARM thumb
8. Configurable 16/32 bit datapath
9. GCC toolchain(GDB, newlib, libstdc++)
10. Debugging via simulator or GDB stubs
11. HDL simulation feedback to simulator for powerful profiling
capabilities
12. Eclipse ZPU plug-in
13. eCos embedded operating system support.



Henri
 
A

Antti

Maybe something worth checking:

http://www.zylin.com/zpu.htm

 From the above website:

    1.   The ZPU is now open source. See ZPU mailing list for more details.
    2. BSD license for HDL implementations--no hiccups when using in
proprietary commercial products. Under the open source royalty free
license, there are no limits on what type of technology (FPGA,
anti-fuse, or ASIC) in which the ZPU can be implemented.
    3. GPL license for architecture, documentation and tools
    4. Completely FPGA brand and type neutral implementation
    5. 298 LUT @ 125 MHz after P&R with 16 bit datapath and 4kBytes BRAM
    6. 442 LUT @ 95 MHz after P&R with 32 bit datapath and 32kBytes BRAM
    7. Codesize 80% of ARM thumb
    8. Configurable 16/32 bit datapath
    9. GCC toolchain(GDB, newlib, libstdc++)
   10. Debugging via simulator or GDB stubs
   11. HDL simulation feedback to simulator for powerful profiling
capabilities
   12. Eclipse ZPU plug-in
   13. eCos embedded operating system support.

Henri

eh this is still on my MUST evaluate plan :)

80% of THUMB? that nice also, i just made my first THUMB assembly
program
Atmel dataflash bootstrap loader, its about 60 bytes of code (thumb)
would be fun to compare if that optimized to max thumb code still
compacts on zpu :)
my code is really funky it loads 1 32 bit constant and constructs all
other constants, also uses lower port of io address as mask constant,
etc..

Antti
 
R

rickman

Maybe something worth checking:

http://www.zylin.com/zpu.htm

From the above website:

1. The ZPU is now open source. See ZPU mailing list for more details.
2. BSD license for HDL implementations--no hiccups when using in
proprietary commercial products. Under the open source royalty free
license, there are no limits on what type of technology (FPGA,
anti-fuse, or ASIC) in which the ZPU can be implemented.
3. GPL license for architecture, documentation and tools
4. Completely FPGA brand and type neutral implementation
5. 298 LUT @ 125 MHz after P&R with 16 bit datapath and 4kBytes BRAM
6. 442 LUT @ 95 MHz after P&R with 32 bit datapath and 32kBytes BRAM
7. Codesize 80% of ARM thumb
8. Configurable 16/32 bit datapath
9. GCC toolchain(GDB, newlib, libstdc++)
10. Debugging via simulator or GDB stubs
11. HDL simulation feedback to simulator for powerful profiling
capabilities
12. Eclipse ZPU plug-in
13. eCos embedded operating system support.

Henri

I'm pretty impressed. Small, fast and with GCC support!

Rick
 
R

Robert F. Jarnot

The '16 Bit Microcontroller' at Opencores by Dr. Juergen Sauermann is
also an impressive piece of work.
 
R

rickman

The '16 Bit Microcontroller' at Opencores by Dr. Juergen Sauermann is
also an impressive piece of work.

Can you tell us what you find impressive about it? I took a look and
it is listed as 800 slices which means it can be as big as 1600 LUTs.
That is over three times the size of my CPU and an even larger ratio
compared to the ZPU and others.

Is it the fact that it has a C compiler and a simulator?

Rick
 
R

Robert F. Jarnot

What impresses me about this design is the approach -- determine what
kind of architecture a 'clean' compiler would like to see, and implement
the corresponding hardware and compiler. Throwing in an RTOS is a nice
bonus too.

I agree that your design is very impressive, both in resource usage and
performance. I like some of the architectural details too, especially
those borrowed from the transputer (looking back to the transputer for
ideas is a good idea in my opinion). Having GCC support is a big plus
too. What I do not have a feeling for is the relative performance of
the two designs -- do you have any feeling for this?

(Note to rickman: my initial reply was directly to you, not the
newsgroup. Sorry. This reply is very similar to the one I sent you
directly)
 
R

rickman

What impresses me about this design is the approach -- determine what
kind of architecture a 'clean' compiler would like to see, and implement
the corresponding hardware and compiler. Throwing in an RTOS is a nice
bonus too.

I agree that your design is very impressive, both in resource usage and
performance. I like some of the architectural details too, especially
those borrowed from the transputer (looking back to the transputer for
ideas is a good idea in my opinion). Having GCC support is a big plus
too. What I do not have a feeling for is the relative performance of
the two designs -- do you have any feeling for this?

(Note to rickman: my initial reply was directly to you, not the
newsgroup. Sorry. This reply is very similar to the one I sent you
directly)

No problem. I was waiting for this one to appear so I could respond
in public. I think there is some interest in the discussion.

Yes, once I had a chance to look a bit more at the docs, I see the
history and I also like the idea. I'm not sure why it is so large
though. His design sounds simple with few registers and not even an
internal stack if I understand correctly. The various Forth like CPUs
all have one if not two internal stacks which in effect are local
memories (in FPGA implementations). I expect (without looking at the
design in detail) that this design suffers somewhat in speed in that
things are done sequentially that can be done in parallel in other
processors. But then those "other" processors are not built to run
C. So I expect any fair comparison needs to take that into account.

I can't say my design is impressive really. It is not complete in
that there are no tools of any sort. I made a crude assembler but
mostly hand coded in machine language. So I don't really have any
idea of how fast it would run an application written in a high level
language. I like to think that it would handle Forth pretty well, but
I have not spent the time to really get that underway.

I did see that the C16 (that is Dr. Juergen Sauermann's CPU name) is
constructed somewhat like the 8080. That processor had a three
machine cycle instruction timing and may have also used two input
clocks for each machine cycle (this is really stretching my wayback
machine). I remember this partly because I have an 8008 computer
which was the predecessor to the 8080. It used the three machine
cycles because it only had an 8 bit multiplexed bus. It used two
cycles to output a 14 bit address (IIRC) and the third cycle was for
the 8 bits of data. Every instruction was built of these three
machine cycle memory ops (even if it was a register transfer).

His machine seems to have emulated that and so uses up to 6 clock
cycles for a basic instruction. I don't know much about the ZPU, but
my CPU uses one clock cycle for any instruction other than program
memory reads which require three cycles.

You like the variable length literal instructions ala the Transputer?
They are used to set up the immediate addresses for jumps and calls
too. Unfortunately this makes for some trouble with defining
addresses in the assembler. I never did get that to work correctly.
Every time a byte was added or subtracted from the opcodes, it would
move all of the other labels and you had to start over with the
calculations. I think you could have situations that never
converged.

Otherwise I was pretty happy with my CPU. But I don't want to
continue using it if there are better CPUs available. But it will be
a couple of weeks before I can really spend any time on this.

Rick
 
R

Robert F. Jarnot

rickman said:
No problem. I was waiting for this one to appear so I could respond
in public. I think there is some interest in the discussion.

Yes, once I had a chance to look a bit more at the docs, I see the
history and I also like the idea. I'm not sure why it is so large
though. His design sounds simple with few registers and not even an
internal stack if I understand correctly. The various Forth like CPUs
all have one if not two internal stacks which in effect are local
memories (in FPGA implementations). I expect (without looking at the
design in detail) that this design suffers somewhat in speed in that
things are done sequentially that can be done in parallel in other
processors. But then those "other" processors are not built to run
C. So I expect any fair comparison needs to take that into account.

I can't say my design is impressive really. It is not complete in
that there are no tools of any sort. I made a crude assembler but
mostly hand coded in machine language. So I don't really have any
idea of how fast it would run an application written in a high level
language. I like to think that it would handle Forth pretty well, but
I have not spent the time to really get that underway.

I did see that the C16 (that is Dr. Juergen Sauermann's CPU name) is
constructed somewhat like the 8080. That processor had a three
machine cycle instruction timing and may have also used two input
clocks for each machine cycle (this is really stretching my wayback
machine). I remember this partly because I have an 8008 computer
which was the predecessor to the 8080. It used the three machine
cycles because it only had an 8 bit multiplexed bus. It used two
cycles to output a 14 bit address (IIRC) and the third cycle was for
the 8 bits of data. Every instruction was built of these three
machine cycle memory ops (even if it was a register transfer).

His machine seems to have emulated that and so uses up to 6 clock
cycles for a basic instruction. I don't know much about the ZPU, but
my CPU uses one clock cycle for any instruction other than program
memory reads which require three cycles.

You like the variable length literal instructions ala the Transputer?
They are used to set up the immediate addresses for jumps and calls
too. Unfortunately this makes for some trouble with defining
addresses in the assembler. I never did get that to work correctly.
Every time a byte was added or subtracted from the opcodes, it would
move all of the other labels and you had to start over with the
calculations. I think you could have situations that never
converged.

Otherwise I was pretty happy with my CPU. But I don't want to
continue using it if there are better CPUs available. But it will be
a couple of weeks before I can really spend any time on this.

Rick

Yes, I like the idea of prefix instructions -- I am a believer in
compact instruction sets, even if it makes the CPU slightly more
complex. The transputer linker had the same issues you allude with
yours -- the linker would sometimes have to make many 10's, or even a
few hundred passes (for a large program) to make all of the variable
length prefix instructions as short as possible. That is probably one
of the reasons that the successor to the transputer from www.xmos.com
looks much more like a modern register-based architecture with a lot of
other clever transputer features retained or extended. Sauermann
started with the 8080/Z80 only to come across the poor match to a C
compiler. Since this was his starting point, I am not surprised that
his final design shows some heritage from these designs. I would be
very interested in knowing how your design fares with a C compiler (if
someone smarter than me has the strength to do the port).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top