Continuously concatenating binary data

Unforgiven · Oct 16, 2003

I have an application, where I continuously get new binary data input, in
the form of a char*. This data comes from the Windows Multimedia wave input
functions, but that's not important. What it means is that every 2 seconds,
I need to add 22050 bytes to an ever expanding buffer. I have no idea at the
beginning how large this buffer would need to be.

Now there are several possibilities to do is, as I see it:
1. Just make the buffer a void* (or char*), and realloc it every 2 seconds,
copying the new data to the end. This isn't a good idea of course, because
realloc will become very expensive as the buffer grows.
2. Use something like this, with ssBuffer an ostringstream:
ssBuffer << newdata;
Then just read out the entire stream at the end.
I don't know how ostringstream manages buffer growth, so this might not be
any better (performance-wise) than the realloc approach.
3. Do the same as above, but with an ofstream. This can handle really huge
input (although I don't expect input to be more than 10-15 seconds of audio
data ever), and should be reasonably efficient since Windows buffers file
I/O, but it does require the user to have writing rights whereever I'm going
to put this file.
4. Copy every 2 seconds of data into it's own 'minibuffer', add those to a
std::list, and at the end create a large buffer only once, copying all
individual pieces into it.

What would be the best approach in your opinions? Or perhaps you have an
even better one that I didn't think of.

Thanks in advance.

Victor Bazarov · Oct 16, 2003

Unforgiven said:
I have an application, where I continuously get new binary data input, in
the form of a char*. This data comes from the Windows Multimedia wave input
functions, but that's not important. What it means is that every 2 seconds,
I need to add 22050 bytes to an ever expanding buffer. I have no idea at the
beginning how large this buffer would need to be.

What do you need the buffer for? Do you use it right away? Does
the buffer have to be contiguous during your input?

If not, use a list<your22050bytes>. I suspect that even if you do
need to use the "stream" right away, the list is quick enough for
all your streaming needs.

[...]

Victor

Nitin Rajput · Oct 16, 2003

I think having a vector<char> should be good enough. vectors should
not be more than twice worse than array accesses - They are pretty
fast. Also they would allow you to expand as more data comes in.

You can look at the vector allocation strategy - it doubles its size
wheneve there is an overflow kindof situation.

-nitin

K_Lee · Oct 16, 2003

Just a thought:

If your user have small amount of memory or record large
amount of data, all your malloc/realloc will turn into
swap disk i/o.

It would be no differents than stream approach.
In fact stream give you better control on amount of memory
your app needs.

--
The source is out there. Browse and document open/share source
projects such as Apache, Tcl, Ethereal, Mozilla, .Net SSCLI.
http://www.slink-software.com

Victor Bazarov said:
Unforgiven said:

I have an application, where I continuously get new binary data input, in
the form of a char*. This data comes from the Windows Multimedia wave input
functions, but that's not important. What it means is that every 2 seconds,
I need to add 22050 bytes to an ever expanding buffer. I have no idea at the
beginning how large this buffer would need to be.

Click to expand...

What do you need the buffer for? Do you use it right away? Does
the buffer have to be contiguous during your input?

If not, use a list<your22050bytes>. I suspect that even if you do
need to use the "stream" right away, the list is quick enough for
all your streaming needs.

[...]

Click to expand...

Victor

lilburne · Oct 16, 2003

Nitin said:
I think having a vector<char> should be good enough. vectors should
not be more than twice worse than array accesses - They are pretty
fast. Also they would allow you to expand as more data comes in.

You can look at the vector allocation strategy - it doubles its size
wheneve there is an overflow kindof situation.

A raw char vector is probably not a good idea. As the vector
grows you not only start moving large amounts of data about,
but run the risk of being unable to allocate enough
contiguous memory.

Vectors are alright if you know in advance that the number
of elements going to be used is reasonably small (a few
thousand at most).

A list of vectors holding each 2 seconds worth of data is
probably sufficient in this case.

Unforgiven · Oct 16, 2003

lilburne said:
A raw char vector is probably not a good idea. As the vector
grows you not only start moving large amounts of data about,
but run the risk of being unable to allocate enough
contiguous memory.

This is one of the reasons I didn't even give a vector as an option.
Doubling size may limit reallocs, but when you start to get into the really
big amounts of data, it could potentially waste a *lot* of memory.

Another problem with any approach that uses contiguous memory (which would
include C-style arrays, std::vector and I suppose also memory-based streams
such as std:

stringstream) is that freeing memory (a realloc is basically a
malloc, memcpy, free sequence) tends to be very expensive on Windows. I
believe it has to do with the memory manager wanting to pack the heap after
each free (I once had to deallocate a 300MB (don't ask) 4-dimensional jagged
array of bools (bool****) and it took nearly 5 minutes on a Pentium III
600MHz)

Contiguous memory should not be much of a problem. All we need is contiguous
address space, not actual contiguous memory, thanks to the virtue of virtual
memory. And because is the heap is packed at least every so often, it
shouldn't give any problems soon.

Uploading images - binary or unsupported text encoding	2	Dec 24, 2022
Windows LLDP Driver Responds With No Data	0	Mar 17, 2023
Binary to BCD code understanding	0	Dec 27, 2021
How to treat an input data as variable?	4	Apr 13, 2023
EEG stream data with mne and brainfolw	0	Jul 26, 2023
Put binary-data with faraday	2	May 28, 2014
Writing binary data from database to file	2	Sep 3, 2010
Collect Excel Data from Website	5	Apr 30, 2022

Continuously concatenating binary data

Unforgiven

Victor Bazarov

Nitin Rajput

K_Lee

lilburne

Unforgiven

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads