how to design this datapath unit for DSP using VHDL/Verilog?

Discussion in 'VHDL' started by walala, Aug 30, 2003.

  1. walala

    walala Guest

    Dear all,

    I want to design an arithmatic datapath unit for digital signal processing
    using VHDL and/or Verilog.

    The input are 5 elements(either sequential or parallel) each having 8 bits.
    It needs to multiply each of these 5 inputs with a predefined constant
    matrix(10x10, floating point scaled and round to integer). The output will
    be a 10x10 matrix summing the above five matrices up, each element having 12
    bits). So for each element of the matrix, I can have a MAC unit. The
    internal computation will be 16 bits.

    Hence for each 5 inputs x1, x2, x3, x4, x5, the output matrix

    Y=x1*C1+x2*C2+x3*C3+x4*C4+x5*C5 where Y, C1, C2, C3, C4, C5 are matrices;

    If I put an MAC for each element, I will have a purely parallel
    architecture, but I need 100 16bits MAC units, which will be too resource
    consuming.

    I am considering to make a parallel-serial architecture, at each time, it
    outputs one row, which will be 10x12 bits... so the output will be
    row-by-row.

    I also need to consider to streamlize the datapath operation. Since there
    will be a stream of 5 elements input in a non-stop fashion, the output will
    also be non-stop streaming. So after one row is outputted, that row can be
    used for computation/storage of the results for the next 5 input elements.

    I am ok so far in thinking... but further thinking makes me confused and
    perplexed... how to do sequential timing control(how to what to do at which
    cycle)? do I need to pipelining? how to design the architecture? I mean, I
    know pipelining theoratically from one semester course, but now I am going
    to implement one, I am totally lost...

    Finally, how to program this? Is there any examples for this?

    Please help me!

    Thanks a lot,

    -Walala
     
    walala, Aug 30, 2003
    #1
    1. Advertising

  2. walala

    David Jones Guest

    In article <bipblj$53j$>, walala <> wrote:
    >Dear all,
    >
    >I want to design an arithmatic datapath unit for digital signal processing
    >using VHDL and/or Verilog.
    >
    >The input are 5 elements(either sequential or parallel) each having 8 bits.
    >It needs to multiply each of these 5 inputs with a predefined constant
    >matrix(10x10, floating point scaled and round to integer). The output will
    >be a 10x10 matrix summing the above five matrices up, each element having 12
    >bits). So for each element of the matrix, I can have a MAC unit. The
    >internal computation will be 16 bits.
    >
    >Hence for each 5 inputs x1, x2, x3, x4, x5, the output matrix
    >
    >Y=x1*C1+x2*C2+x3*C3+x4*C4+x5*C5 where Y, C1, C2, C3, C4, C5 are matrices;


    What is your throughput requirement and what technology are you using?

    That will determine the amount of parallelism that you need.

    If the requirement is low enough, then only one MAC unit will be required.

    Next, you must define the timing of the inputs. If they are serial, then
    it's easy: stuff the data into the MAC unit. Being pipelined (right?),
    the MAC unit will output the answer N clocks later.

    If you have more parallelism in your input data than you want in your
    MAC units, then you will need to buffer the data. This circuit will be
    easy to design once you define the timing requirements.
     
    David Jones, Aug 30, 2003
    #2
    1. Advertising

  3. walala

    walala Guest

    Hi David,

    Thanks for your answer!

    The requirement of output throughput is 33-50MHz, i.e., it should output 33
    million to 50 million 12-bits element per second,

    and each 5 inputs correspond to 10x10=100 such 12-bits element outputs...

    The technology I am going to use is 0.25u.

    I think the inputs are naturally serial, but again, I am not sure how to do
    the parallel-serial partition of the internal MACs... and how to pace the
    outputs...

    Seems inputs are faster than the outputs, maybe I should let the input wait
    after fed into the unit?

    Can you give some further advice on how to do this architecture? how to do
    the timing? I think it is really difficult...and point me to some resources?

    Thanks very much,

    -Walala

    "David Jones" <> wrote in message
    news:7N14b.5257$...
    > In article <bipblj$53j$>, walala <>

    wrote:
    > >Dear all,
    > >
    > >I want to design an arithmatic datapath unit for digital signal

    processing
    > >using VHDL and/or Verilog.
    > >
    > >The input are 5 elements(either sequential or parallel) each having 8

    bits.
    > >It needs to multiply each of these 5 inputs with a predefined constant
    > >matrix(10x10, floating point scaled and round to integer). The output

    will
    > >be a 10x10 matrix summing the above five matrices up, each element having

    12
    > >bits). So for each element of the matrix, I can have a MAC unit. The
    > >internal computation will be 16 bits.
    > >
    > >Hence for each 5 inputs x1, x2, x3, x4, x5, the output matrix
    > >
    > >Y=x1*C1+x2*C2+x3*C3+x4*C4+x5*C5 where Y, C1, C2, C3, C4, C5 are matrices;

    >
    > What is your throughput requirement and what technology are you using?
    >
    > That will determine the amount of parallelism that you need.
    >
    > If the requirement is low enough, then only one MAC unit will be required.
    >
    > Next, you must define the timing of the inputs. If they are serial, then
    > it's easy: stuff the data into the MAC unit. Being pipelined (right?),
    > the MAC unit will output the answer N clocks later.
    >
    > If you have more parallelism in your input data than you want in your
    > MAC units, then you will need to buffer the data. This circuit will be
    > easy to design once you define the timing requirements.
     
    walala, Aug 30, 2003
    #3
  4. walala

    walala Guest

    Can we assume the input are all present at once(parallel)? Since there are
    only 5 inputs(5x8=40bits), is it a reasonable assumption?

    "walala" <> wrote in message
    news:biqil7$kf7$...
    > Hi David,
    >
    > Thanks for your answer!
    >
    > The requirement of output throughput is 33-50MHz, i.e., it should output

    33
    > million to 50 million 12-bits element per second,
    >
    > and each 5 inputs correspond to 10x10=100 such 12-bits element outputs...
    >
    > The technology I am going to use is 0.25u.
    >
    > I think the inputs are naturally serial, but again, I am not sure how to

    do
    > the parallel-serial partition of the internal MACs... and how to pace the
    > outputs...
    >
    > Seems inputs are faster than the outputs, maybe I should let the input

    wait
    > after fed into the unit?
    >
    > Can you give some further advice on how to do this architecture? how to do
    > the timing? I think it is really difficult...and point me to some

    resources?
    >
    > Thanks very much,
    >
    > -Walala
    >
    > "David Jones" <> wrote in message
    > news:7N14b.5257$...
    > > In article <bipblj$53j$>, walala <>

    > wrote:
    > > >Dear all,
    > > >
    > > >I want to design an arithmatic datapath unit for digital signal

    > processing
    > > >using VHDL and/or Verilog.
    > > >
    > > >The input are 5 elements(either sequential or parallel) each having 8

    > bits.
    > > >It needs to multiply each of these 5 inputs with a predefined constant
    > > >matrix(10x10, floating point scaled and round to integer). The output

    > will
    > > >be a 10x10 matrix summing the above five matrices up, each element

    having
    > 12
    > > >bits). So for each element of the matrix, I can have a MAC unit. The
    > > >internal computation will be 16 bits.
    > > >
    > > >Hence for each 5 inputs x1, x2, x3, x4, x5, the output matrix
    > > >
    > > >Y=x1*C1+x2*C2+x3*C3+x4*C4+x5*C5 where Y, C1, C2, C3, C4, C5 are

    matrices;
    > >
    > > What is your throughput requirement and what technology are you using?
    > >
    > > That will determine the amount of parallelism that you need.
    > >
    > > If the requirement is low enough, then only one MAC unit will be

    required.
    > >
    > > Next, you must define the timing of the inputs. If they are serial,

    then
    > > it's easy: stuff the data into the MAC unit. Being pipelined (right?),
    > > the MAC unit will output the answer N clocks later.
    > >
    > > If you have more parallelism in your input data than you want in your
    > > MAC units, then you will need to buffer the data. This circuit will be
    > > easy to design once you define the timing requirements.

    >
    >
     
    walala, Aug 30, 2003
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. walala
    Replies:
    1
    Views:
    779
    Jonathan Bromley
    Sep 1, 2003
  2. Michael Gallen
    Replies:
    0
    Views:
    744
    Michael Gallen
    Nov 25, 2003
  3. Replies:
    5
    Views:
    577
    Ray Andraka
    Mar 3, 2005
  4. Shenli

    Datapath design problem?

    Shenli, Jan 26, 2007, in forum: VHDL
    Replies:
    2
    Views:
    573
    Ray Andraka
    Jan 28, 2007
  5. alexandis

    Find treenode by DataPath, not ValuePath

    alexandis, Feb 1, 2008, in forum: ASP .Net
    Replies:
    2
    Views:
    506
    alexandis
    Feb 1, 2008
Loading...

Share This Page