Continuously concatenating binary data

U

Unforgiven

I have an application, where I continuously get new binary data input, in
the form of a char*. This data comes from the Windows Multimedia wave input
functions, but that's not important. What it means is that every 2 seconds,
I need to add 22050 bytes to an ever expanding buffer. I have no idea at the
beginning how large this buffer would need to be.

Now there are several possibilities to do is, as I see it:
1. Just make the buffer a void* (or char*), and realloc it every 2 seconds,
copying the new data to the end. This isn't a good idea of course, because
realloc will become very expensive as the buffer grows.
2. Use something like this, with ssBuffer an ostringstream:
ssBuffer << newdata;
Then just read out the entire stream at the end.
I don't know how ostringstream manages buffer growth, so this might not be
any better (performance-wise) than the realloc approach.
3. Do the same as above, but with an ofstream. This can handle really huge
input (although I don't expect input to be more than 10-15 seconds of audio
data ever), and should be reasonably efficient since Windows buffers file
I/O, but it does require the user to have writing rights whereever I'm going
to put this file.
4. Copy every 2 seconds of data into it's own 'minibuffer', add those to a
std::list, and at the end create a large buffer only once, copying all
individual pieces into it.

What would be the best approach in your opinions? Or perhaps you have an
even better one that I didn't think of.

Thanks in advance.
 
V

Victor Bazarov

Unforgiven said:
I have an application, where I continuously get new binary data input, in
the form of a char*. This data comes from the Windows Multimedia wave input
functions, but that's not important. What it means is that every 2 seconds,
I need to add 22050 bytes to an ever expanding buffer. I have no idea at the
beginning how large this buffer would need to be.

What do you need the buffer for? Do you use it right away? Does
the buffer have to be contiguous during your input?

If not, use a list<your22050bytes>. I suspect that even if you do
need to use the "stream" right away, the list is quick enough for
all your streaming needs.

Victor
 
N

Nitin Rajput

I think having a vector<char> should be good enough. vectors should
not be more than twice worse than array accesses - They are pretty
fast. Also they would allow you to expand as more data comes in.

You can look at the vector allocation strategy - it doubles its size
wheneve there is an overflow kindof situation.

-nitin
 
K

K_Lee

Just a thought:

If your user have small amount of memory or record large
amount of data, all your malloc/realloc will turn into
swap disk i/o.

It would be no differents than stream approach.
In fact stream give you better control on amount of memory
your app needs.


--
The source is out there. Browse and document open/share source
projects such as Apache, Tcl, Ethereal, Mozilla, .Net SSCLI.
http://www.slink-software.com

Victor Bazarov said:
Unforgiven said:
I have an application, where I continuously get new binary data input, in
the form of a char*. This data comes from the Windows Multimedia wave input
functions, but that's not important. What it means is that every 2 seconds,
I need to add 22050 bytes to an ever expanding buffer. I have no idea at the
beginning how large this buffer would need to be.

What do you need the buffer for? Do you use it right away? Does
the buffer have to be contiguous during your input?

If not, use a list<your22050bytes>. I suspect that even if you do
need to use the "stream" right away, the list is quick enough for
all your streaming needs.

Victor
 
L

lilburne

Nitin said:
I think having a vector<char> should be good enough. vectors should
not be more than twice worse than array accesses - They are pretty
fast. Also they would allow you to expand as more data comes in.

You can look at the vector allocation strategy - it doubles its size
wheneve there is an overflow kindof situation.

A raw char vector is probably not a good idea. As the vector
grows you not only start moving large amounts of data about,
but run the risk of being unable to allocate enough
contiguous memory.

Vectors are alright if you know in advance that the number
of elements going to be used is reasonably small (a few
thousand at most).

A list of vectors holding each 2 seconds worth of data is
probably sufficient in this case.
 
U

Unforgiven

lilburne said:
A raw char vector is probably not a good idea. As the vector
grows you not only start moving large amounts of data about,
but run the risk of being unable to allocate enough
contiguous memory.

This is one of the reasons I didn't even give a vector as an option.
Doubling size may limit reallocs, but when you start to get into the really
big amounts of data, it could potentially waste a *lot* of memory.

Another problem with any approach that uses contiguous memory (which would
include C-style arrays, std::vector and I suppose also memory-based streams
such as std::eek:stringstream) is that freeing memory (a realloc is basically a
malloc, memcpy, free sequence) tends to be very expensive on Windows. I
believe it has to do with the memory manager wanting to pack the heap after
each free (I once had to deallocate a 300MB (don't ask) 4-dimensional jagged
array of bools (bool****) and it took nearly 5 minutes on a Pentium III
600MHz)

Contiguous memory should not be much of a problem. All we need is contiguous
address space, not actual contiguous memory, thanks to the virtue of virtual
memory. And because is the heap is packed at least every so often, it
shouldn't give any problems soon.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top