SMP MPP VLIW for machine code Java, Forth, C, Scheme, etc.

Discussion in 'Java' started by Reply7471859353@wmconnect.com, Jul 23, 2005.

  1. Guest


    > > > > 4) thats it! this is my whole base model list!
    > > > >
    > > > > 5) iterate testing and recurse testing of my sixteen bit VLIW decode.
    > > > >
    > > > > maw
    > > > >
    > > > > ---
    > > > > Mr Moore's X18 homepage ( obsolete? )
    > > > > ---
    > > > > Mr Moore's 25 x ( obsolete? )
    > > > > ---
    > > > > <head><title>Chuck Moore's X18 Forth Microcomputer Core</title>
    > > > > <meta name=description content="A high-performance, low-power
    > > > > microcomputer core. Available as a GDS II file. On-chip memory and
    > > > > stacks. Forth instruction set.">
    > > > > <meta name=keywords content="microprocessor, stacks, push-down stacks,
    > > > > mips, power, low power, instructions, instruction set, DRAM, ROM,
    > > > > watchdog, watchdog timer">
    > > > > </head><body bgcolor=#d0ffd0>
    > > > >
    > > > > Updated 2001 June
    > > > > <h1>X18 Microcomputer core</h1>
    > > > > High performance, low power Forth engine. Optimized for compute-bound
    > > > > portable applications. 18 bit address/data matches cache SRAM.
    > > > >
    > > > > <h1>Features</h1><ul>
    > > > > <li>2400 Mips, sustained
    > > > > <li>Asynchronous (no external clock)
    > > > > <li>2 16-deep push-down stacks
    > > > > <li>27 0-operand instructions
    > > > > <li>128 words ROM, 384 DRAM
    > > > >
    > > > > <li>Watchdog timer
    > > > > <li>20 mW @ 1.8 V
    > > > > <li>.2 sq mm</ul>
    > > > >
    > > > > <h1>Architecture</h1>
    > > > > The X18 is an evolution of the F21 and i21 microprocessors. With .18um
    > > > > transistors, it has 5x their speed and 1/5 their power. It has their
    > > > > 16-deep Return and Data stacks and 27 0-operand instructions, packed 3
    > > > > per word. A 100ms watchdog timer assures continued operation. Boots
    > > > > from on-chip ROM.
    > > > >
    > > > > <p>Redesigned with new layout and simulation tools to be robust and to
    > > > > minimize power. The computer can be throttled by a factor of 1024 to
    > > > > provide 2.4 Mips using 20 uW. It may be stopped altogether, but will
    > > > > have to reboot.
    > > > >
    > > > > <p>Multiply (125 Mops) and divide (40 Mops) have been improved.
    > > > > Internal memory is fast enough (1 ns) to sustain 2400 Mips. Data
    > > > > access, especially to external SRAM, will slow this. Code is loaded
    > > > > into on-chip DRAM for execution.
    > > > >
    > > > > <h1> CPU </h1>
    > > > > Forth code is highly factored into many small subroutines. An optimized
    > > > > processor requires an efficient call/return mechanism. This is best
    > > > > achieved with 2 push-down stacks. Each is implemented as a register
    > > > > feeding a 16x18-bit RAM with 8-transistor bit cells. The current entry
    > > > > is indicated by a 16-bit bidirectional, circular shift register.
    > > > >
    > > > > <p>One stack is used to store subroutine return addresses. All
    > > > > processors have such a stack. The other is used to pass parameters to
    > > > > and from subroutines. Other processors use registers or stack frames
    > > > > for this purpose. However, all languages use an implicit stack to
    > > > > evaluate expressions. Forth makes it explicit.
    > > > >
    > > > > <p> As if emphasizing their importance, the stacks require 2/3 of the
    > > > > CPU silicon area. It is difficult to achieve their 1-cycle accesss
    > > > > timing.
    > > > >
    > > > >
    > > > > <p> The merits of stack vs. register designs have been argued for
    > > > > decades. A comprehensive book, <a
    > > > > href=http://www.cs.cmu.edu/~koopman/stack_computers/index.html><em>Stack
    > > > > Computers,</em></a> by Phil Koopman has been published online. To quote
    > > > > Sec 6.2: "0-operand stack addressing ... makes stack machines superior
    > > > > to conventional machines in the areas of program size, processor
    > > > > complexity, system complexity, processor performance, and consistency
    > > > > of program execution."
    > > > >
    > > > > <p> The Forth ALU operates on the top 1 or 2 items of the parameter
    > > > > stack, leaving the result there. This permits 0-operand instructions.
    > > > > Eliminating register addresses permits shorter instructions, in this
    > > > > case 5-bit. Several instructions are required to rearrange the stack.
    > > > > And it's convenient to move things to the return stack.
    > > > >
    > > > > <p> An address register is useful to reduce stack manipulation. It also
    > > > > supports incrementing to address successive words in memory. Similar
    > > > > use of the top of the return stack provides 2 addresses for
    > > > > memory-memory moves.
    > > > >
    > > > > <p> A demultiplexor allows the packing of up to 3 instructions per
    > > > > word. This increases the density of compiled code and reduces the
    > > > > interference between instruction and data memory access. It keeps the
    > > > > CPU busy while the next instruction is being fetched. Providing a
    > > > > sustained execution speed of 2400 Mips.
    > > > >
    > > > > <p> This is implemented by a 3-bit shift register. The current bit
    > > > > enables its slot into the instruction latch. A ready pulse from the
    > > > > memory manager latches the high-order 5 bits (slot 0). The pulse is
    > > > > delayed by a string of 14 inverters so that it repeats 2 ns later,
    > > > > latching the next slot. Slot 2 stops the process, as does a jump or
    > > > > fetch/store, until the next ready pulse.
    > > > >
    > > > > <p> There are 27 simple instructions, exactly suited to Forth. This
    > > > > allows 1-1 compilation of Forth source to machine code. On other
    > > > > processors, each Forth primitive requires several instructions. The
    > > > > situation is reversed for other languages: several Forth instructions
    > > > > may be required for their primitives.
    > > > >
    > > > > <p><table border>
    > > > > <tr><td>...<td>Register
    > > > >
    > > > > <tr><td>T<td>Top of stack
    > > > > <tr><td>S<td>2nd number on stack
    > > > > <tr><td>R<td>Top of Return stack
    > > > > <tr><td>A<td>Address</table>
    > > > >
    > > > > <p>Remember that fetch pushes the stack, store and binary operations
    > > > > pop it.<table border>
    > > > > <tr><td>Code<td>Op<td>Action
    > > > > <tr><td>0<td>word ;<td>Jump to subroutine; tail recursion
    > > > >
    > > > > <tr><td>1<td>if<td>Jump to 'then' if T0-T17 are zero
    > > > > <tr><td>2<td>word<td>Call subroutine
    > > > > <tr><td>3<td>-if<td>Jump to 'then' if T17 is one
    > > > > <tr><td>6<td>;<td>Return
    > > > >
    > > > > <tr><td>8<td>@r<td>Fetch from address in R
    > > > > <tr><td>9<td>@+<td>Fetch from address in A; increment A
    > > > >
    > > > > <tr><td>a<td>n<td>Fetch literal
    > > > > <tr><td>b<td>@<td>Fetch from address in A
    > > > > <tr><td>c<td>!r<td>Store into address in R
    > > > > <tr><td>d<td>!+<td>Store into address in A; increment A
    > > > > <tr><td>f<td>!<td>Store into address in A
    > > > >
    > > > > <tr><td>10<td>-<td>Ones-complement T
    > > > >
    > > > > <tr><td>11<td>2*<td>Shift T left 1 bit
    > > > > <tr><td>12<td>2/<td>Shift T right 1 bit; preserve T17
    > > > > <tr><td>13<td>+*<td>Add S to T if T0=1 (multiply step)
    > > > > <tr><td>14<td>or<td>Exclusive-or S to T
    > > > > <tr><td>15<td>and<td>And S to T
    > > > > <tr><td>17<td>+<td>Add S to T
    > > > >
    > > > >
    > > > > <tr><td>18<td>pop<td>Fetch R
    > > > > <tr><td>19<td>a<td>Fetch A
    > > > > <tr><td>1a<td>dup<td>Duplicate T
    > > > > <tr><td>1b<td>over<td>Fetch S
    > > > > <tr><td>1c<td>push<td>Store into R
    > > > > <tr><td>1d<td>a!<td>Store into A
    > > > >
    > > > > <tr><td>1e<td>nop<td>Do nothing
    > > > > <tr><td>1f<td>drop<td>Store T nowhere
    > > > > nop</table>
    > > > >
    > > > > <p> Another advantage of the 5-bit instruction is ease of decoding. A
    > > > > tree of NAND and NOR gates lead from the instruction bus to the enable
    > > > > for each register. This is facilitated by the limit of 10 lines to be
    > > > > routed: each bit and its complement.
    > > > > </body>
    > > > > ---
    > > > > <head><title>Chuck Moore's 25x Forth Multicomputer Chip</title>
    > > > > <meta name=description content="A parallel computer with 25 computers
    > > > > on a chip. An on-chip network goes off-chip to array even more
    > > > > computers.">
    > > > > <meta name=keywords content="microcomputer, microprocessor, parallel,
    > > > > network, array, memory, coprocessors">
    > > > > </head><body bgcolor=#d0ffd0>
    > > > >
    > > > > Updated 2001 June
    > > > > <h1>25x Microcomputer</h1>
    > > > > An array of 25 microcomputers on a 7 sq mm die.
    > > > >
    > > > > <h1>Features</h1><ul>
    > > > > <li>.2 sq mm asynchronous microcomputer core
    > > > > <li>5 x 5 array of cores: 60,000 Mips
    > > > > <li>5 horizontal, 5 vertical parallel interconnect buses: 180 Ghz
    > > > > bandwidth
    > > > > <li>Specialized computers to interface off-chip.
    > > > > <li>Max power 500 mW @ 1.8 V, with 25 computers running
    > > > >
    > > > > <li>100mAh battery life is 1 year, with 1 computer running throttled
    > > > > <li>64-pin SOIC: mirrored pin-out to 4ns cache SRAM
    > > > > <li>Array chips on 2-sided PCB</ul>
    > > > >
    > > > > <h1>Description</h1>
    > > > > Availability of the tiny (.2 sq mm), asynchronous <a href=X18.html>X18
    > > > > microcomputer core</a> naturally suggested arraying it on a chip. Its
    > > > > extremely low power (20 mW) made that feasible. A 5x5 array was chosen
    > > > > to fit on a 7 sq mm die, the smallest available prototype, though
    > > > > larger arrays are possible. 25 computers running at 2400 Mips is a
    > > > > total of 60,000 Mips. An unlimited supply.
    > > > >
    > > > > <p>Communication among the computers is provided by a network with 5
    > > > > horizontal and 5 vertical buses. Each computer has 2 bus registers to
    > > > > access a horizontal and a vertical bus. Each bus is 18-bits wide and
    > > > > can run at 1 GHz. All 10 buses can be active at once connecting a
    > > > > 20-computer subset. So total bandwidth is 180 GHz.
    > > > >
    > > > > <p>Each computer can customized. Registers are added to the 16
    > > > > processors at the edge of the array and connected to package pins. Each
    > > > > computer is responsible for a particular interface. Protocols are
    > > > > implemented with software.<ul>
    > > > > <li>SRAM controller
    > > > > <li>Flash controller
    > > > > <li>4 serial controllers
    > > > >
    > > > > <li>USB controller
    > > > > <li>D/A controller
    > > > > <li>A/D controller</ul>
    > > > > After booting from ROM, the computers await code downloaded from one of
    > > > > these interfaces.
    > > > >
    > > > > <h1>Pinout</h1>
    > > > > Chosen to be the mirror image of an 18-bit cache memory chip. This is
    > > > > the fastest memory available, with 4 ns access. Its package is a
    > > > > 100-pin SOIC. The 18-bit Multicomputer thus has 256K words of external
    > > > > memory in 1 chip.
    > > > >
    > > > > <p>Putting the Multicomputer chip on the top of a 2-sided PCB and the
    > > > > SRAM chip on the bottom gives a very small footprint. A decoupling
    > > > > capacitor is the only other component needed. An array of such pairs is
    > > > > a multicomputer board. Connecting Multicomputer to SRAM is trivial,
    > > > > with mm traces. Routing for power and a serial network is also easy.
    > > > > Computers load code from the network.
    > > > >
    > > > > <p>A parallel computer with 60Gips nodes! Power is determined by the
    > > > > SRAM.
    > > > >
    > > > > <h1>Cost/Availability</h1>
    > > > > The chip is awaiting funding. If interested, contact <a
    > > > > href=mailto:></a>
    > > > >
    > > > > <p>A 7 sq mm die, packaged, will cost about $1 in quantity 1,000,000.
    > > > > Cost per Mip is 0.
    > > > >
    > > > >
    > > > > <p>25 prototypes can be obtained from <a
    > > > > href=http://www.mosis.com>MOSIS</a> for $14,000 with 16 week
    > > > > turn-around. The TSMC .18um process has monthly submissions.
    > > > > </body>
    > > > > ---
    > > >
    > > > Maybe an important note.
    > > >
    > > > ONLY the diagonal needs the X[],Y[] and *SPECIAL* C[] register ( each
    > > > is unique for parallel ram access, a 4 x 4 x MemWidth multiplex for
    > > > maybe four Direct RAM Bus DRAMS)
    > > >
    > > > the other ( 200+ nodes are used for programmable multiplexing)
    > > >
    > > > Night all'
    > > >
    > > > maw

    > >
    > > stack_machine_id[A/B-select, [ A[0..15]] or B[0==self,1..15]]
    > >
    > > I am IBM.

    >
    >
    > > the other ( 200+ nodes are used for programmable multiplexing)

    >
    > in my hypothetical super scalable parallel architecture ,
    >
    > stack_machine_id[A/B-select, [ A[0..15]] or B[0==self or ZERO ID
    > ,1..15]]
    > in self mode the program may programmatically generate message routing
    >
    > machine code for a DirectMemoryAccess ( DMA) like transfer of data.
    > INTRA-PROCESSOR vs INTER-PROCESSOR data transfer.
    > ( at least three states, a[0..15], b[0..15] or self[0..15]
    >
    > I am IBM.


    self represents the diagonal

    ( as within this message's previously mentioned use of the term
    /diagonal/)
    , Jul 23, 2005
    #1
    1. Advertising

  2. Guest

    wrote:
    > maybe PICTURE THIS


    > Mr. Moore's type of 25 X model, HOWEVER,


    > 1) expanded to a sixteen by sixteen array for A and B busses, (5x5 -->
    > 16x16)
    > 1a) both A and B are MULTIPLEXED into TWO buses, each with an ID
    > multiplier of sixteen for inter and intra processor reg maps,
    > ( SIXTEEN is for a MAXIMUM bandwidth! )
    > 1b) both A and B have peek ahead, two register element stacks
    > 1c) !!! TIMES FOUR FOR FAULT TOLERANT ( SUPER COOLED?) VERSION OPTION
    > !!!


    > 2) special C bus for local parallel memory ( Direct RamBus DRAM ?)


    > 3) extra X and Y stacks ( along with the T/S/parameter stack )


    > 4) thats it! this is my whole base model list!


    > 5) iterate testing and recurse testing of my sixteen bit VLIW decode.


    > maw


    > ---
    > Mr Moore's X18 homepage ( obsolete? )
    > ---
    > Mr Moore's 25 x ( obsolete? )
    > ---
    > <head><title>Chuck Moore's X18 Forth Microcomputer Core</title>
    > <meta name=description content="A high-performance, low-power
    > microcomputer core. Available as a GDS II file. On-chip memory and
    > stacks. Forth instruction set.">
    > <meta name=keywords content="microprocessor, stacks, push-down stacks,
    > mips, power, low power, instructions, instruction set, DRAM, ROM,
    > watchdog, watchdog timer">
    > </head><body bgcolor=#d0ffd0>


    > Updated 2001 June
    > <h1>X18 Microcomputer core</h1>
    > High performance, low power Forth engine. Optimized for compute-bound
    > portable applications. 18 bit address/data matches cache SRAM.


    > <h1>Features</h1><ul>
    > <li>2400 Mips, sustained
    > <li>Asynchronous (no external clock)
    > <li>2 16-deep push-down stacks
    > <li>27 0-operand instructions
    > <li>128 words ROM, 384 DRAM


    > <li>Watchdog timer
    > <li>20 mW @ 1.8 V
    > <li>.2 sq mm</ul>


    > <h1>Architecture</h1>
    > The X18 is an evolution of the F21 and i21 microprocessors. With .18um
    > transistors, it has 5x their speed and 1/5 their power. It has their
    > 16-deep Return and Data stacks and 27 0-operand instructions, packed 3
    > per word. A 100ms watchdog timer assures continued operation. Boots
    > from on-chip ROM.


    > <p>Redesigned with new layout and simulation tools to be robust and to
    > minimize power. The computer can be throttled by a factor of 1024 to
    > provide 2.4 Mips using 20 uW. It may be stopped altogether, but will
    > have to reboot.


    > <p>Multiply (125 Mops) and divide (40 Mops) have been improved.
    > Internal memory is fast enough (1 ns) to sustain 2400 Mips. Data
    > access, especially to external SRAM, will slow this. Code is loaded
    > into on-chip DRAM for execution.


    > <h1> CPU </h1>
    > Forth code is highly factored into many small subroutines. An optimized
    > processor requires an efficient call/return mechanism. This is best
    > achieved with 2 push-down stacks. Each is implemented as a register
    > feeding a 16x18-bit RAM with 8-transistor bit cells. The current entry
    > is indicated by a 16-bit bidirectional, circular shift register.


    > <p>One stack is used to store subroutine return addresses. All
    > processors have such a stack. The other is used to pass parameters to
    > and from subroutines. Other processors use registers or stack frames
    > for this purpose. However, all languages use an implicit stack to
    > evaluate expressions. Forth makes it explicit.


    > <p> As if emphasizing their importance, the stacks require 2/3 of the
    > CPU silicon area. It is difficult to achieve their 1-cycle accesss
    > timing.


    > <p> The merits of stack vs. register designs have been argued for
    > decades. A comprehensive book, <a
    > href=http://www.cs.cmu.edu/~koopman /stack_computers/index.html><em>Stack
    > Computers,</em></a> by Phil Koopman has been published online. To quote
    > Sec 6.2: "0-operand stack addressing ... makes stack machines superior
    > to conventional machines in the areas of program size, processor
    > complexity, system complexity, processor performance, and consistency
    > of program execution."


    > <p> The Forth ALU operates on the top 1 or 2 items of the parameter
    > stack, leaving the result there. This permits 0-operand instructions.
    > Eliminating register addresses permits shorter instructions, in this
    > case 5-bit. Several instructions are required to rearrange the stack.
    > And it's convenient to move things to the return stack.


    > <p> An address register is useful to reduce stack manipulation. It also
    > supports incrementing to address successive words in memory. Similar
    > use of the top of the return stack provides 2 addresses for
    > memory-memory moves.


    > <p> A demultiplexor allows the packing of up to 3 instructions per
    > word. This increases the density of compiled code and reduces the
    > interference between instruction and data memory access. It keeps the
    > CPU busy while the next instruction is being fetched. Providing a
    > sustained execution speed of 2400 Mips.


    > <p> This is implemented by a 3-bit shift register. The current bit
    > enables its slot into the instruction latch. A ready pulse from the
    > memory manager latches the high-order 5 bits (slot 0). The pulse is
    > delayed by a string of 14 inverters so that it repeats 2 ns later,
    > latching the next slot. Slot 2 stops the process, as does a jump or
    > fetch/store, until the next ready pulse.


    > <p> There are 27 simple instructions, exactly suited to Forth. This
    > allows 1-1 compilation of Forth source to machine code. On other
    > processors, each Forth primitive requires several instructions. The
    > situation is reversed for other languages: several Forth instructions
    > may be required for their primitives.


    > <p><table border>
    > <tr><td>...<td>Register


    > <tr><td>T<td>Top of stack
    > <tr><td>S<td>2nd number on stack
    > <tr><td>R<td>Top of Return stack
    > <tr><td>A<td>Address</table>


    > <p>Remember that fetch pushes the stack, store and binary operations
    > pop it.<table border>
    > <tr><td>Code<td>Op<td>Action
    > <tr><td>0<td>word ;<td>Jump to subroutine; tail recursion


    > <tr><td>1<td>if<td>Jump to 'then' if T0-T17 are zero
    > <tr><td>2<td>word<td>Call subroutine
    > <tr><td>3<td>-if<td>Jump to 'then' if T17 is one
    > <tr><td>6<td>;<td>Return


    > <tr><td>8<td>@r<td>Fetch from address in R
    > <tr><td>9<td>@+<td>Fetch from address in A; increment A


    > <tr><td>a<td>n<td>Fetch literal
    > <tr><td>b<td>@<td>Fetch from address in A
    > <tr><td>c<td>!r<td>Store into address in R
    > <tr><td>d<td>!+<td>Store into address in A; increment A
    > <tr><td>f<td>!<td>Store into address in A


    > <tr><td>10<td>-<td>Ones-complement T


    > <tr><td>11<td>2*<td>Shift T left 1 bit
    > <tr><td>12<td>2/<td>Shift T right 1 bit; preserve T17
    > <tr><td>13<td>+*<td>Add S to T if T0=1 (multiply step)
    > <tr><td>14<td>or<td>Exclusive-or S to T
    > <tr><td>15<td>and<td>And S to T
    > <tr><td>17<td>+<td>Add S to T


    > <tr><td>18<td>pop<td>Fetch R
    > <tr><td>19<td>a<td>Fetch A
    > <tr><td>1a<td>dup<td>Duplicate T
    > <tr><td>1b<td>over<td>Fetch S
    > <tr><td>1c<td>push<td>Store into R
    > <tr><td>1d<td>a!<td>Store into A


    > <tr><td>1e<td>nop<td>Do nothing
    > <tr><td>1f<td>drop<td>Store T nowhere
    > nop</table>


    > <p> Another advantage of the 5-bit instruction is ease of decoding. A
    > tree of NAND and NOR gates lead from the instruction bus to the enable
    > for each register. This is facilitated by the limit of 10 lines to be
    > routed: each bit and its complement.
    > </body>
    > ---
    > <head><title>Chuck Moore's 25x Forth Multicomputer Chip</title>
    > <meta name=description content="A parallel computer with 25 computers
    > on a chip. An on-chip network goes off-chip to array even more
    > computers.">
    > <meta name=keywords content="microcomputer, microprocessor, parallel,
    > network, array, memory, coprocessors">
    > </head><body bgcolor=#d0ffd0>


    > Updated 2001 June
    > <h1>25x Microcomputer</h1>
    > An array of 25 microcomputers on a 7 sq mm die.


    > <h1>Features</h1><ul>
    > <li>.2 sq mm asynchronous microcomputer core
    > <li>5 x 5 array of cores: 60,000 Mips
    > <li>5 horizontal, 5 vertical parallel interconnect buses: 180 Ghz
    > bandwidth
    > <li>Specialized computers to interface off-chip.
    > <li>Max power 500 mW @ 1.8 V, with 25 computers running


    > <li>100mAh battery life is 1 year, with 1 computer running throttled
    > <li>64-pin SOIC: mirrored pin-out to 4ns cache SRAM
    > <li>Array chips on 2-sided PCB</ul>


    > <h1>Description</h1>
    > Availability of the tiny (.2 sq mm), asynchronous <a href=X18.html>X18
    > microcomputer core</a> naturally suggested arraying it on a chip. Its
    > extremely low power (20 mW) made that feasible. A 5x5 array was chosen
    > to fit on a 7 sq mm die, the smallest available prototype, though
    > larger arrays are possible. 25 computers running at 2400 Mips is a
    > total of 60,000 Mips. An unlimited supply.


    > <p>Communication among the computers is provided by a network with 5
    > horizontal and 5 vertical buses. Each computer has 2 bus registers to
    > access a horizontal and a vertical bus. Each bus is 18-bits wide and
    > can run at 1 GHz. All 10 buses can be active at once connecting a
    > 20-computer subset. So total bandwidth is 180 GHz.


    > <p>Each computer can customized. Registers are added to the 16
    > processors at the edge of the array and connected to package pins. Each
    > computer is responsible for a particular interface. Protocols are
    > implemented with software.<ul>
    > <li>SRAM controller
    > <li>Flash controller
    > <li>4 serial controllers


    > <li>USB controller
    > <li>D/A controller
    > <li>A/D controller</ul>
    > After booting from ROM, the computers await code downloaded from one of
    > these interfaces.


    > <h1>Pinout</h1>
    > Chosen to be the mirror image of an 18-bit cache memory chip. This is
    > the fastest memory available, with 4 ns access. Its package is a
    > 100-pin SOIC. The 18-bit Multicomputer thus has 256K words of external
    > memory in 1 chip.


    > <p>Putting the Multicomputer chip on the top of a 2-sided PCB and the
    > SRAM chip on the bottom gives a very small footprint. A decoupling
    > capacitor is the only other component needed. An array of such pairs is
    > a multicomputer board. Connecting Multicomputer to SRAM is trivial,
    > with mm traces. Routing for power and a serial network is also easy.
    > Computers load code from the network.


    > <p>A parallel computer with 60Gips nodes! Power is determined by the
    > SRAM.


    > <h1>Cost/Availability</h1>
    > The chip is awaiting funding. If interested, contact <a
    > href=mailto:></a>


    > <p>A 7 sq mm die, packaged, will cost about $1 in quantity 1,000,000.
    > Cost per Mip is 0.


    > <p>25 prototypes can be obtained from <a
    > href=http://www.mosis.com>MOSIS</a> for $14,000 with 16 week
    > turn-around. The TSMC .18um process has monthly submissions.
    > </body>
    > ---


    Maybe an important note.

    ONLY the diagonal needs the X[],Y[] and *SPECIAL* C[] register ( each
    is unique for parallel ram access, a 4 x 4 x MemWidth multiplex for
    maybe four Direct RAM Bus DRAMS)

    the other ( 200+ nodes are used for programmable multiplexing)

    Night all'

    maw

    ---

    > > > the other ( 200+ nodes are used for programmable multiplexing)


    > > in my hypothetical super scalable parallel architecture ,
    > >
    > > stack_machine_id[A/B-select, [ A[0..15]] or B[0==self or ZERO ID
    > > ,1..15]]
    > > in self mode the program may programmatically generate message routing
    > >
    > > machine code for a DirectMemoryAccess ( DMA) like transfer of data.
    > > INTRA-PROCESSOR vs INTER-PROCESSOR data transfer.
    > > ( at least three states, a[0..15], b[0..15] or self[0..15]
    > >
    > > I am IBM.

    >


    'self' represents the /diagonal/ ( of Mr. Moore's modified 25x model),
    as within this posting's previously mentioned use of the term
    /diagonal/.


    ---

    REPOSTED
    , Jul 31, 2005
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    1
    Views:
    378
  2. Gardner Pomper
    Replies:
    0
    Views:
    502
    Gardner Pomper
    Nov 12, 2003
  3. Herb
    Replies:
    2
    Views:
    1,540
  4. Kevin Walzer

    Re: PIL (etc etc etc) on OS X

    Kevin Walzer, Aug 1, 2008, in forum: Python
    Replies:
    4
    Views:
    385
    Fredrik Lundh
    Aug 13, 2008
  5. idle
    Replies:
    1
    Views:
    423
Loading...

Share This Page