Addressing 8 Channel A to D Data

A

Artist

In an ADSP-BF547 I will have a buffer of data from an AD7606 A to D Converter. There will be 200k samples from each of eight channels for a total of 1..6M samples. On this data I need to do one DFT for each channel in real time. It is essential the DFT be as efficient as possible and so arises the question about the most efficient way to address the data in a channel.

For this many samples in a channel I do not have a choice about the sequence the DMA will put the data in the memory. The sequence is defined by the below equation which maps a sample row and a channel number to an index in the dynamically allocated linear array that data is stored in:

i = s*8 + c eq 1

Where:
i is the index number of an A to D data element in the linear array
s is the sample row number, 0 <= s < 200e3
c is the A to D channel number (column). There are eight of them. 0 <= c< 8

I could use the above equation 1 which would require one multiplication andone addition.

Since the number of channels is a convenient power of two I could instead shift and OR:

i = s<<3 | c eq 2

I could also dynamically allocate a linear array of pointers to each row:

u16 ** D2 = (u16 **)malloc( 200e3 * sizeof( u16 ** ) );
for( j=0; j<200e3; j++ ){
D2[j] = &D[ J*8 ];
}

Where:
D is the array DMA stored the A to D data in.
D2 is an array of pointers.

This way the data in D can be accessed by

Dadc = D2[c] eq 3

I expect the above for loop to be executed only once per buffer at initialization because the DMA alternates between two buffers. While it is filling one, the DFT will be done on the other. A DFT for a buffer has to be completed before DMA has finished filling the other buffer.

I need to know which of the above equations, 1, 2, or 3, will access the data quickest.
 
I

Ike Naar

I could also dynamically allocate a linear array of pointers to each row:

u16 ** D2 = (u16 **)malloc( 200e3 * sizeof( u16 ** ) );

That should probably be sizeof(u16*).
You can avoid this mistake by using the idiom

u16 ** D2 = malloc(200e3 * sizeof *D2);
I need to know which of the above equations, 1, 2, or 3, will access
the data quickest.

Try them all, and measure.
 
W

Willem

Artist wrote:
) In an ADSP-BF547 I will have a buffer of data from an AD7606 A to D
) Converter. There will be 200k samples from each of eight channels for a
) total of 1.6M samples. On this data I need to do one DFT for each channel
) in real time. It is essential the DFT be as efficient as possible and so
) arises the question about the most efficient way to address the data in a
) channel.
)
) For this many samples in a channel I do not have a choice about the
) sequence the DMA will put the data in the memory. The sequence is defined
) by the below equation which maps a sample row and a channel number to an
) index in the dynamically allocated linear array that data is stored in:
)
) i = s*8 + c eq 1
)
) Where:
) i is the index number of an A to D data element in the linear array
) s is the sample row number, 0 <= s < 200e3
) c is the A to D channel number (column). There are eight of them. 0 <= c < 8
)
) I could use the above equation 1 which would require one multiplication and one addition.
)
) Since the number of channels is a convenient power of two I could instead shift and OR:
)
) i = s<<3 | c eq 2

Any halfway decent compiler will optimize s*8 to s<<3 anyway, or will
optimize both to some different kind of operation. So effectively they're
likely to give the exact same result.

) I could also dynamically allocate a linear array of pointers to each row:
)
) u16 ** D2 = (u16 **)malloc( 200e3 * sizeof( u16 ** ) );
) for( j=0; j<200e3; j++ ){
) D2[j] = &D[ J*8 ];
) }
)
) Where:
) D is the array DMA stored the A to D data in.
) D2 is an array of pointers.
)
) This way the data in D can be accessed by
)
) Dadc = D2[c] eq 3

That seems like it would be slower except on hardware where memory lookups
are cheaper than arithmetics (old 8-bit hardware, or PICs, for example).

) I expect the above for loop to be executed only once per buffer at
) initialization because the DMA alternates between two buffers. While it
) is filling one, the DFT will be done on the other. A DFT for a buffer has
) to be completed before DMA has finished filling the other buffer.
)
) I need to know which of the above equations, 1, 2, or 3, will access the data quickest.

If you *need* to know, then try all of them.

Also, you could maybe convert the pointer to something like: ((u16[8]) *)
I.E. a pointer to an array of 8 values. Then you can use double-indexing
on that and let the compiler figure out the arithmetics.

That will probably be equivalent to solutions 1 and 2, though.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
 
K

Keith Thompson

Artist said:
u16 ** D2 = (u16 **)malloc( 200e3 * sizeof( u16 ** ) );
for( j=0; j<200e3; j++ ){
D2[j] = &D[ J*8 ];
}
[...]

Ike Naar already pointed out that "sizeof (u16 **)" should be
"sizeof (u16*) -- or, better, "sizeof *D2". (u16** and u16* are
very likely to have the same size, but it's not guaranteed.)

But you should also be aware that 200e3 is a *floating-point* constant,
which means that it might be subject to rounding errors. You should
replace it with 200000 -- or, better, with a macro or constant with that
value.
 
B

Ben Bacarisse

Artist said:
I need to know which of the above equations, 1, 2, or 3, will access
the data quickest.

Since I think you'll have to try them all, you might want to consider
two other options:

4) Use a 2D array to get the compiler to do the address arithmetic.
It's unlikely, but the compiler might be able to do it better than
you can, but if it's no worse you get cleaner code and that can only
be a Good Thing.

5) Transpose the array, i.e. switch rows with columns. This alters the
access pattern and may be better or worse depending on things like
cache configuration and so on.
 
W

Willem

Ben Bacarisse wrote:
)<snip>
)> I need to know which of the above equations, 1, 2, or 3, will access
)> the data quickest.
)
) Since I think you'll have to try them all, you might want to consider
) two other options:
)
) 4) Use a 2D array to get the compiler to do the address arithmetic.
) It's unlikely, but the compiler might be able to do it better than
) you can, but if it's no worse you get cleaner code and that can only
) be a Good Thing.

It's already a 2D array, isn't it?
You just need to cast it to the correct type.

) 5) Transpose the array, i.e. switch rows with columns. This alters the
) access pattern and may be better or worse depending on things like
) cache configuration and so on.

Nope. Quote from the OP:
)> For this many samples in a channel I do not have a choice about the
)> sequence the DMA will put the data in the memory.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
 
8

88888 Dihedral

Artistæ–¼ 2012å¹´8月18日星期六UTC+8上åˆ11時47分59秒寫é“:
In an ADSP-BF547 I will have a buffer of data from an AD7606 A to D Converter. There will be 200k samples from each of eight channels for a total of1.6M samples. On this data I need to do one DFT for each channel in real time. It is essential the DFT be as efficient as possible and so arises the question about the most efficient way to address the data in a channel.



For this many samples in a channel I do not have a choice about the sequence the DMA will put the data in the memory. The sequence is defined by thebelow equation which maps a sample row and a channel number to an index inthe dynamically allocated linear array that data is stored in:



i = s*8 + c eq 1



Where:

i is the index number of an A to D data element in the linear array

s is the sample row number, 0 <= s < 200e3

c is the A to D channel number (column). There are eight of them. 0 <=c < 8



I could use the above equation 1 which would require one multiplication and one addition.



Since the number of channels is a convenient power of two I could insteadshift and OR:



i = s<<3 | c eq 2



I could also dynamically allocate a linear array of pointers to each row:



u16 ** D2 = (u16 **)malloc( 200e3 * sizeof( u16 ** ) );

for( j=0; j<200e3; j++ ){

D2[j] = &D[ J*8 ];

}



Where:

D is the array DMA stored the A to D data in.

D2 is an array of pointers.



This way the data in D can be accessed by



Dadc = D2[c] eq 3



I expect the above for loop to be executed only once per buffer at initialization because the DMA alternates between two buffers. While it is filling one, the DFT will be done on the other. A DFT for a buffer has to be completed before DMA has finished filling the other buffer.



I need to know which of the above equations, 1, 2, or 3, will access the data quickest.


OK, can you relax the precesion required?

Do you just need to track down the main carrier only ?

Do you really need an accurate spectrum analysis?

There are 4 cores in most mass production cspu on the streets nowadays,
and also ?00G routers are so cheap that one can put up 8 PCs in a lan for
the 8 DFTs.
 
B

Ben Bacarisse

Willem said:
Ben Bacarisse wrote:
)<snip>
)> I need to know which of the above equations, 1, 2, or 3, will access
)> the data quickest.
)
) Since I think you'll have to try them all, you might want to consider
) two other options:
)
) 4) Use a 2D array to get the compiler to do the address arithmetic.
) It's unlikely, but the compiler might be able to do it better than
) you can, but if it's no worse you get cleaner code and that can only
) be a Good Thing.

It's already a 2D array, isn't it?
You just need to cast it to the correct type.

Yes, but the OP's 1, 2 and 3 were about the accesses, so I was adding a
fourth -- use 'native' [x][y] rather than treating the data as
1-dimensional (the OP's 1 and 2), or as an array of pointers (which is
how I interpreted their number 3).

<snip>
 
A

Artist

Artistæ–¼ 2012å¹´8月18日星期六UTC+8上åˆ11時47分59秒寫é“:
In an ADSP-BF547 I will have a buffer of data from an AD7606 A to D Converter. There will be 200k samples from each of eight channels for a total of 1.6M samples. On this data I need to do one DFT for each channel in realtime. It is essential the DFT be as efficient as possible and so arises the question about the most efficient way to address the data in a channel.
For this many samples in a channel I do not have a choice about the sequence the DMA will put the data in the memory. The sequence is defined by the below equation which maps a sample row and a channel number to an index in the dynamically allocated linear array that data is stored in:
i = s*8 + c eq 1

i is the index number of an A to D data element in the linear array
s is the sample row number, 0 <= s < 200e3
c is the A to D channel number (column). There are eight of them. 0 <= c < 8
I could use the above equation 1 which would require one multiplicationand one addition.
Since the number of channels is a convenient power of two I could instead shift and OR:
i = s<<3 | c eq 2
I could also dynamically allocate a linear array of pointers to each row:
u16 ** D2 = (u16 **)malloc( 200e3 * sizeof( u16 ** ) );
for( j=0; j<200e3; j++ ){
D2[j] = &D[ J*8 ];

D is the array DMA stored the A to D data in.
D2 is an array of pointers.
This way the data in D can be accessed by
Dadc = D2[c] eq 3
I expect the above for loop to be executed only once per buffer at initialization because the DMA alternates between two buffers. While it is filling one, the DFT will be done on the other. A DFT for a buffer has to be completed before DMA has finished filling the other buffer.
I need to know which of the above equations, 1, 2, or 3, will access the data quickest.




OK, can you relax the precesion required?



Do you just need to track down the main carrier only ?



Do you really need an accurate spectrum analysis?



There are 4 cores in most mass production cspu on the streets nowadays,

and also ?00G routers are so cheap that one can put up 8 PCs in a lan for

the 8 DFTs.


I need the main carrier only. It is a lock in amplifier. In normal operation DFT will output the amplitude of the main carrier's harmonic only, and none of the other harmonics. For cross talk rejection the each channel's harmonic will be different. For the DFT the sine wave multiplier will be factored out of the additions and subtractions as much as possible to minimize the floating point multiplications. This what an FFT does.

All eight channels has to be done by only one Blackfin ADSP-BF547. The choice of this processor was a compromise between computing power and power consumption. It has two 32 bit ALU's. It will be operated at clock speed of 525MHz.
 
A

Artist

Also, you could maybe convert the pointer to something like: ((u16[8]) *)

I.E. a pointer to an array of 8 values. Then you can use double-indexing

on that and let the compiler figure out the arithmetics.

That will probably be equivalent to solutions 1 and 2, though.

Casting to ((u16[8]) *) is what I will do. I was looking for a way to simply cast it like this earlier. What I had read about C and C++ is that both do 2 dimensional arrays by creating a linear array of pointers in the first dimension that point to an arrays of values as the second dimension. So I thought I would have to create an array of pointers as I did for eq 3.
 
A

Artist

5) Transpose the array, i.e. switch rows with columns. This alters the

access pattern and may be better or worse depending on things like

cache configuration and so on.

The ADSP-BF547 does have a way for DMA to put each A to D channel in its own section of the buffer so the DFT could do its operation on linear arrays.But the registers that its DMA uses to do this are 16 bits wide. This is not wide enough for 200k samples.

Transposing or otherwise rearranging data after the DMA would cost more time than handling the channels as a two dimensional arrays in the DFT.
 
K

Keith Thompson

[165 lines deleted]

It would be helpful if you could post using something other than
Google Groups. Their interface to Usenet groups has never been
very good, and it's become much worse recently. Quoted text is
double-spaced, quoted quoted text is quadruple-spaced, and there's
no convenient way to limit line length to a reasonable width like
72 columns. I don't know how the result looks when viewed in GG's
web interface, but it's very difficult to read in my newsreader.

Consider signing up with a free NNTP service like
news.eternal-september.org and using an actual Usenet client.
Firefox supports NNTP, or you can use a dedicated client. (I use
Gnus myself, but if you're not already an Emacs user you'd probably
hate it.)

If you can't do that, consider copy-and-pasting your article into a text
editor, fixing up the formatting, and copy-and-pasting back into Google
Groups.

Most of us use text-based, not web-based, interfaces. My newsreader
does wrap very long lines, but not on word boundaries. I can re-wrap
an article to use shorter lines, but that destroys code layout.
 
K

Keith Thompson

Artist said:
Also, you could maybe convert the pointer to something like:
((u16[8]) *)

I.E. a pointer to an array of 8 values. Then you can use
double-indexing on that and let the compiler figure out the
arithmetics. That will probably be equivalent to solutions 1 and 2,
though.

Casting to ((u16[8]) *) is what I will do. I was looking for a way to
simply cast it like this earlier. What I had read about C and C++ is
that both do 2 dimensional arrays by creating a linear array of
pointers in the first dimension that point to an arrays of values as
the second dimension. So I thought I would have to create an array of
pointers as I did for eq 3.

A C 2-dimensional array is simply an array of arrays.

You can build a data structure that *acts* like a 2-dimensional array
(e.g., you can index it using the same arr[x][y] syntax) by building a
1-d array of pointers, each of which points to the first element of a
1-d array of elements. This requires substantial programming overhead
to create, initialize, and then deallocate all the arrays.

If you're casting a pointer of one type to a pointer of another type,
it's very likely that you're doing something wrong, non-portable, and/or
dangerous, and that there's a cleaner way to do it.

Suggested reading: Section 6 of the comp.lang.c FAQ
<http://www.c-faq.com/>.
 
W

Willem

Keith Thompson wrote:
) If you're casting a pointer of one type to a pointer of another type,
) it's very likely that you're doing something wrong, non-portable, and/or
) dangerous, and that there's a cleaner way to do it.

In this case, he's casting a pointer to a bit of memory, which is filled
by an external source with something that fits the target pointer type.
So that's in the non-portable field, and it's actually the best way to
handle this specific situation.

You really should read the whole thread, or at least the OP, before you
jump to conclusions and chip in with generic (and in this case wrong)
advice.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
 
8

88888 Dihedral

The ADSP-BF547 does have a way for DMA to put each A to D channel in its own section of the buffer so the DFT could do its operation on linear arrays. But the registers that its DMA uses to do this are 16 bits wide. This isnot wide enough for 200k samples.



Transposing or otherwise rearranging data after the DMA would cost more time than handling the channels as a two dimensional arrays in the DFT.

An N-point DFT algorithm normally requires N*log(N) operations per N samples.

In theory it is better to use larger N in OFDM or DMT like modulation schemes
within a segment of data. But in the commercial word the cost per unit
is so limited that it is not as priced high as some phase array systems.


In the commercial word the mass production concerns for lower cost per unit
with low N <4096 in cheap asics are quite popular.
 
A

Artist

An N-point DFT algorithm normally requires N*log(N) operations per N samples.



In theory it is better to use larger N in OFDM or DMT like modulation schemes

within a segment of data. But in the commercial word the cost per unit

is so limited that it is not as priced high as some phase array systems.





In the commercial word the mass production concerns for lower cost per unit

with low N <4096 in cheap asics are quite popular.

N*log(N) operations per N samples is the cost of doing a complete spectrum FFT where the number of samples N is a power of two. In my case I am doing an FFT that outputs only one of the harmonics, is optimized for that harmonic, and also optimizes for an arbitrary N. It is fastest if N is a power oftwo. It is slowest if N is a prime number.
 
M

Michael Karas

[This followup was posted to comp.lang.c and a copy was sent to the
cited author.]

The ADSP-BF547 does have a way for DMA to put each A to D channel in its own section of the buffer so the DFT could do its operation on linear arrays. But the registers that its DMA uses to do this are 16 bits wide. This is not wide enough for 200k samples.

Transposing or otherwise rearranging data after the DMA would cost more time than handling the channels as a two dimensional arrays in the DFT.


What dictates the use of buffers of 200K samples?

Could you use the smaller number of samples afforded by the 16-bit
register sizes and then run your algorithm more often?
 
A

Artist

[This followup was posted to comp.lang.c and a copy was sent to the

cited author.]


Artist says...
The ADSP-BF547 does have a way for DMA to put each A to D channel in its own section of the buffer so the DFT could do its operation on linear arrays. But the registers that its DMA uses to do this are 16 bits wide. This is not wide enough for 200k samples.

Transposing or otherwise rearranging data after the DMA would cost moretime than handling the channels as a two dimensional arrays in the DFT.





What dictates the use of buffers of 200K samples?



Could you use the smaller number of samples afforded by the 16-bit

register sizes and then run your algorithm more often?



--



Michael Karas

Carousel Design Solutions

http://www.carousel-design.com

There is a compromise to be made between bandwidth and speed of the response. The tighter, or smaller, the bandwidth of the Fourier Transform, the greater the duration of time the Fourier Transform is done over. The tighter the bandwidth the more noise is filtered out. This, and quantization noise analysis, determines the sample rate and the number of samples.

It is simpler for me to recast the buffer than deal with more FFTs more often and combining the results.
 
A

Artist

Consider signing up with a free NNTP service like

news.eternal-september.org and using an actual Usenet client.

I will look at using this. I didn't know a free USENET service like this existed. Thanks for the tip.
 
A

Artist

Also, you could maybe convert the pointer to something like: ((u16[8]) *)

I have trouble finding the right syntax to do this recast. My best effort so far gets these error messages:

Attempt 1:

".\ADC.c", line 205: cc0070: error: incomplete type is not allowed
u16 *pData[8][] = &((u16 * *)(pBuff->Data));
^

".\ADC.c", line 205: cc0158: error: expression must be an lvalue or a
function designator
u16 *pData[8][] = &((u16 * *)(pBuff->Data));
^
Attempt 2:

".\imi_ADC.c", line 205: cc0070: error: incomplete type is not allowed
u16 *pData[8][] = &(((u16[8]) *)(pBuff->Data));
^

".\imi_ADC.c", line 205: cc0119: error: cast to type "u16 [8]" is not allowed
u16 *pData[8][] = &(((u16[8]) *)(pBuff->Data));
^
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top