Library design for downloading an unknown amount of data?

Jef Driesen · Aug 24, 2009

Hi,

Let me give some background info first. I'm writing a library to
download data from a number of external devices (dive computers). All
devices have different transfer protocols, but they can be grouped in
two categories:

1. Devices which support random access to the internal memory.

For these type of devices, my library exposes the following function to
read a block of data at a specific memory address:

device_status_t
device_read (device_t *device,
unsigned int address,
unsigned char data[],
unsigned int size);

2. Devices which support sequential access to the internal memory.

These type of devices only support downloading all the data as a single
continuous stream of data. In some cases the data stream is split into
multiple smaller packets, but you can only request each packet sequentially.

Currently I have this function to download the data:

device_status_t
device_dump (device_t *device,
unsigned char data[],
unsigned int size,
unsigned int *result);

To use this function, you need to pass a buffer that is large enough the
store the downloaded data, and you get the actual size in the "result"
parameter after the download.

Unfortunately for some devices the actual amount of data is not known in
advance and you have to assume a worst case scenario. This is not
exactly optimal, because the actual amount can be much less than the
maximum (e.g. only a few KB downloaded from a device with a few MB's of
memory), and the maximum amount of memory varies greatly between
different devices (e.g. ranging from a few KB to a few MB). And to make
it even worse, for some devices the maximum amount of memory is simply
not known.

So I would like to replace this api function with something better. The
question is of course how to do this? I already have a number of ideas,
but none of them is perfect.

One idea would be to turn the "data" and "size" parameters into output
parameters (optionally wrapped in a new opaque buffer object):

device_status_t
device_dump (device_t *device,
const unsigned char *data[],
unsigned int *size);

But now you have to make a choice for the ownership for the data. Either
the library or the applications needs to own the data (to be able to
free it again).

A. If the library owns the data, the data pointer needs to remain valid
after the function had returned control to the application. Thus the
buffer will have to be stored inside the device handle, and destroyed
with the device handle (or another function call). The consequences for
the application are:

+ No need to free the returned data.
- The lifetime of the buffer is now tied to the lifetime of the device
handle. If the device handle is destroyed (or another device_* functions
is called on the same handle) the internal buffer will (or might) become
invalid. Thus if you want to preserve the data, you'll need to copy it,
but than you end up with two copies!
- The internals of the device object are leaked to the outside.

B. If the application owns the data, the library can malloc the required
amount of data and pass that pointer to the application.

+ No lifetime or leak issues.
- The application needs to free the data once it doesn't need it
anymore. But if it forget to do so, you end up with a memory leak.
- Different design compared with some other areas of my api (read further).

I have also a more highlevel api that does not download raw memory, but
individual pieces of information (dive profiles). This api is built on
top of the lower level device_read() and device_dump() functions. It's
an iterator style api, that is used like this (which is very similar to
a database cursor):

device_t *device = ...;

device_entry_t *entry = NULL;
while (device_entry_next (device, &entry) == SUCCESS) {
device_entry_get_data (entry, &data, &size);
device_entry_get_datetime (entry, &dt);
device_entry_get_fingerprint (entry, &fp_data, &fp_size);
device_entry_get_devid (entry, &id);
}

device_entry_reset (device);

In this case it makes more sense to use "library owns the data" I think.
There is no real lifetime problem because the device_entry_reset()
exists and that can easily be used to destroy the buffer before the
device handle is destroyed. Using "application owns the data" would
result in extra data copying, because internally usually a single buffer
is allocated for the data of all entries. Returning a newly allocated
buffer for each entry would result in many small malloc'ed buffers and
additional copying. Also not optimal.

Any other ideas or even an alternative design?

Thanks in advance,

Jef

Loïc Domaigné · Aug 25, 2009

Hi Jef,

Let me give some background info first. I'm writing a library to
download data from a number of external devices (dive computers).

[snip]

I don't know if that makes sense, but you could use a in-out parameter
for the size argument:

device_status_t
device_dump (device_t *device,
unsigned char data[],
unsigned int* size,
);

As input parameter, size contains the size of my data[] array, and as
output value the size of the data downloaded. If the array data[] is
too small, you should guarantee that you won't overflow the data
array. The application could then handle such situation by reissuing a
device_dump() as long as there are data to download.

You could alternatively use two parameters instead of a in-out
parameter for the size:

device_status_t
device_dump (device_t *device,
unsigned char data[],
unsigned int size,
unsigned int *download_size
);

HTH,
Loïc
--
My Blog: http://www.domaigne.com/blog

"The most amazing achievement of the computer software industry is its
continuing cancellation of the steady and staggering gains made by the
computer hardware industry." -- Henry Petroski

Jef Driesen · Aug 26, 2009

Loïc Domaigné said:
Hi Jef,

Let me give some background info first. I'm writing a library to
download data from a number of external devices (dive computers).

Click to expand...

[snip]

I don't know if that makes sense, but you could use a in-out parameter
for the size argument:

device_status_t
device_dump (device_t *device,
unsigned char data[],
unsigned int* size,
);

As input parameter, size contains the size of my data[] array, and as
output value the size of the data downloaded. If the array data[] is
too small, you should guarantee that you won't overflow the data
array. The application could then handle such situation by reissuing a
device_dump() as long as there are data to download.

You could alternatively use two parameters instead of a in-out
parameter for the size:

device_status_t
device_dump (device_t *device,
unsigned char data[],
unsigned int size,
unsigned int *download_size
);

This last version is exactly what I have at the moment. But with the
exception that I'm returning an error code if the buffer is too small.
Of course I'm not overflowing the buffer

What are you meaning with "... reissuing a device_dump() ..."?

Do you mean returning the exact size to the application, so it can retry
the download with a buffer of the correct size? The problem with that
approach is that retrying the download is not really an option in my
situation. Downloading is often very slow (e.g. order of minutes) and in
some cases even requires some action by the end user (e.g. pressing a
button on the device). So this is not something you want to do.

If you mean returning chunks of data until the application has retrieved
all the data, that is a different story. I suppose you have something in
mind that resembles the typical file I/O pattern where you keep reading
data until some EOF condition is reached:

unsigned int nbytes;
unsigned char buffer[1024];
while (device_dump (device, buffer, sizeof (buffer), &nbytes) != EOF) {
/* Process the chunk */
}

A potential problem with this approach is that the buffer size will be
very different from the packet size that is used in the underlying
transfer protocol. Some devices send everything in a single packet,
others use multiple packets, where the packet size can be fixed of
variable. Thus this would require some internal caching, making the
implementation more complex. The simplest implementation is probably to
download everything, cache the data, return it chunk by chunk to the
application and free the cached data once the final chunk is delivered.
(Note that this resembles somewhat one of my candidate solutions where
the application is handed a pointer to the internal cache, but now we
don't have to expose the internal cache and we can also destroy it earlier.)

Downside is that if the application wants the data in a single buffer
(which is very likely), it will have to dynamically increase the buffer,
or use a very large "worst case" buffer. But in that last case, we are
back to the situation that I wanted to fix. That is the area where I
like the solutions in my original post, because there you get the exact
size together with the data, so you can allocate a buffer of the right
size in one shot.

Nick Keighley · Aug 26, 2009

Loïc Domaigné wrote:

as in a device for recording the time and depth of a dive?

<snip>

[try a download and retry on failure now knowing the data size]

Do you mean returning the exact size to the application, so it can retry
the download with a buffer of the correct size? The problem with that
approach is that retrying the download is not really an option in my
situation. Downloading is often very slow (e.g. order of minutes) and in
some cases even requires some action by the end user (e.g. pressing a
button on the device). So this is not something you want to do.

sounds icky. Growing up on fairly slow comms links (or even mind
boggling
slow comms links) repeating transfers unnecessarily Just Seems Wrong.

If you mean returning chunks of data until the application has retrieved
all the data, that is a different story. I suppose you have something in
mind that resembles the typical file I/O pattern where you keep reading
data until some EOF condition is reached:

unsigned int nbytes;
unsigned char buffer[1024];
while (device_dump (device, buffer, sizeof (buffer), &nbytes) != EOF) {
/* Process the chunk */

}

A potential problem with this approach is that the buffer size will be
very different from the packet size that is used in the underlying
transfer protocol. Some devices send everything in a single packet,
others use multiple packets, where the packet size can be fixed of
variable. Thus this would require some internal caching, making the
implementation more complex.

any way of finding out the packet size? Could you make the buffer and
packet size identical or could you make the buffer size bigger than
any
packet size?

Even if you can't do that the book-keeping isn't *that* hard.

The simplest implementation is probably to
download everything, cache the data, return it chunk by chunk to the
application and free the cached data once the final chunk is delivered.
yup

(Note that this resembles somewhat one of my candidate solutions where
the application is handed a pointer to the internal cache, but now we
don't have to expose the internal cache and we can also destroy it earlier.)

Downside is that if the application wants the data in a single buffer
(which is very likely), it will have to dynamically increase the buffer,
or use a very large "worst case" buffer.

or build a linked list of buffers and then glue them all together when
the
EOM occurs.

But in that last case, we are
back to the situation that I wanted to fix. That is the area where I
like the solutions in my original post, because there you get the exact
size together with the data, so you can allocate a buffer of the right
size in one shot.

the realloc() case isn't too bad. A common strategy is to double (or
some other
constant between 1 and 2) the buffer size each time it runs out of
memory.
If you are really memory impoverished at the receive end (unlikely I
think)
then you could use another realloc() to trim the buffer when you get
to the end.

Jef Driesen · Aug 26, 2009

Nick said:
as in a device for recording the time and depth of a dive?

Indeed.

Some additional information about the library can be found on our
website: http://www.divesoftware.org/libdc/

If you mean returning chunks of data until the application has retrieved
all the data, that is a different story. I suppose you have something in
mind that resembles the typical file I/O pattern where you keep reading
data until some EOF condition is reached:

unsigned int nbytes;
unsigned char buffer[1024];
while (device_dump (device, buffer, sizeof (buffer), &nbytes) != EOF) {
/* Process the chunk */

}

A potential problem with this approach is that the buffer size will be
very different from the packet size that is used in the underlying
transfer protocol. Some devices send everything in a single packet,
others use multiple packets, where the packet size can be fixed of
variable. Thus this would require some internal caching, making the
implementation more complex.

Click to expand...

any way of finding out the packet size? Could you make the buffer and
packet size identical or could you make the buffer size bigger than
any packet size?

For protocols where the data is send in multiple packets, the packet
size is usually known in advance. In some protocols it is simply fixed,
and for others you can choose it yourself (within some limits), but you
should use the maximum allowable size for maximum speed.

For protocols where everything is send in a single packet, the first few
bytes usually contain the total length of the packet. So you can only
get the length during the transfer itself.

One of the goals of my project is to hide the protocol details from the
user and provide a common, easy to use api for all supported devices.
Thus the idea is that the user should be able to download data without
having to know about packet sizes, etc. That should be handled
internally by the device backend code.

Right now the device with the highest memory capacity has 2MB of memory.
With the current api, you could allocate a 2MB buffer and that would
work for all supported devices. But that is a lot of overkill for those
devices that only have 32KB of memory. And what if someday there appears
a new device with even more memory?

Even if you can't do that the book-keeping isn't *that* hard.

I didn't say it was too complex, only more complex. Thus I was only
saying that if there is an easier solution that is equally good, I would
prefer that.

or build a linked list of buffers and then glue them all together when
the EOM occurs.

the realloc() case isn't too bad. A common strategy is to double (or
some other
constant between 1 and 2) the buffer size each time it runs out of
memory.
If you are really memory impoverished at the receive end (unlikely I
think)
then you could use another realloc() to trim the buffer when you get
to the end.

I'm not really concerned that an application will run out of memory. The
receiver end is typically a desktop PC, where these amounts of data are
rather small compared to the available memory. But I don't think that is
an excuse for consuming more memory than necessary, especially because
for most devices I do know the total size in advance. But if I want to
support all devices with a single api, I also need to take into account
those few devices where it is not know in advance. And who knows
somebody will want to run this on a less powerful mobile device.

I'm actually more concerned about the added complexity on the
application side. This linked list or growing buffer is something that
will have to be implemented in every application.

Nick Keighley · Aug 26, 2009

Indeed.

Some additional information about the library can be found on our
website:http://www.divesoftware.org/libdc/

sounds interesting. Multiple manufacturers just adds to the fun!

If you mean returning chunks of data until the application has retrieved
all the data, that is a different story. I suppose you have something in
mind that resembles the typical file I/O pattern where you keep reading
data until some EOF condition is reached:
unsigned int nbytes;
unsigned char buffer[1024];
while (device_dump (device, buffer, sizeof (buffer), &nbytes) != EOF) {
/* Process the chunk */
}
A potential problem with this approach is that the buffer size will be
very different from the packet size that is used in the underlying
transfer protocol. Some devices send everything in a single packet,
others use multiple packets, where the packet size can be fixed of
variable. Thus this would require some internal caching, making the
implementation more complex.

Click to expand...

Click to expand...

any way of finding out the packet size? Could you make the buffer and
packet size identical or could you make the buffer size bigger than
any packet size?

Click to expand...

For protocols where the data is send in multiple packets, the packet
size is usually known in advance. In some protocols it is simply fixed,
and for others you can choose it yourself (within some limits), but you
should use the maximum allowable size for maximum speed.

For protocols where everything is send in a single packet, the first few
bytes usually contain the total length of the packet. So you can only
get the length during the transfer itself.

but at least you know at the beginning of transmission. At that point
you could malloc the correct sized buffer.

One of the goals of my project is to hide the protocol details from the
user and provide a common, easy to use api for all supported devices.

good idea

Thus the idea is that the user should be able to download data without
having to know about packet sizes, etc. That should be handled
internally by the device backend code.
ok

Right now the device with the highest memory capacity has 2MB of memory.
With the current api, you could allocate a 2MB buffer and that would
work for all supported devices.

couldn't you alloacte an amount specific to the device? ie. have the
library
make the decision? So you get an API like so:-

Result device_download (Device_context*, Byte** data, size_t
*data_size);

But that is a lot of overkill for those
devices that only have 32KB of memory. And what if someday there appears
a new device with even more memory?

hide that detail in the device driver

I didn't say it was too complex, only more complex. Thus I was only
saying that if there is an easier solution that is equally good, I would
prefer that.

I'm not really concerned that an application will run out of memory.

until now I wasn't certain you were running on modern "desk top"
hardware.
Since you are memory isn't *that* important.

The
receiver end is typically a desktop PC, where these amounts of data are
rather small compared to the available memory. But I don't think that is
an excuse for consuming more memory than necessary,

oh I agree, but it becomes *less* critical

especially because
for most devices I do know the total size in advance. But if I want to
support all devices with a single api, I also need to take into account
those few devices where it is not know in advance. And who knows
somebody will want to run this on a less powerful mobile device.
ah

I'm actually more concerned about the added complexity on the
application side. This linked list or growing buffer is something that
will have to be implemented in every application

I'd kind of assumed you wrote that once and hid it in a library.

application
device specific code
memory manager

Depands how clever you want to be. You could have one memory
management
strategy or you could select one. Your device context could specify
which.

thinking out loud...

struct
{
size_t initial_buffer_size;
double buffer_growth_multiplier;
Byte* (*grow_memory) (double);
Byte* (*eom_action) (Byte*);
} Device_context;

gets tricky but not undoable to handle realloc() and linked list
memory management. Depends how clever you want to be.

Nobody · Aug 26, 2009

One of the goals of my project is to hide the protocol details from the
user and provide a common, easy to use api for all supported devices.
Thus the idea is that the user should be able to download data without
having to know about packet sizes, etc. That should be handled
internally by the device backend code.

But what do you want to do with the data? That should be the main
influence on the API you provide, not the mechanisms for getting at the
data. Work from the top down rather than bottom up.

In general, the more you specify, the less flexibility you have for
implementation. Don't specify anything the application doesn't need to
have specified.

E.g. if a function returns a pointer to an array of structures, that means
that the back-end has to provide that data as a contiguous array in a
specific format. OTOH, if you provide an abstract handle along with
iteration and/or indexing operations, you have more flexibility in
implementing each backend.

Jef Driesen · Aug 26, 2009

Nick said:
sounds interesting. Multiple manufacturers just adds to the fun!

If it wasn't interesting, I wouldn't have started this project

but at least you know at the beginning of transmission. At that point
you could malloc the correct sized buffer.

Indeed, and that's also what I'm currently doing.

The point is how to pass the data back to the application, with an api
that is easy to use (e.g. without having to go through many pages of
documentation) and at the same time also efficient (e.g. no unnecessary
memory copies etc). That's what I would like to solve.

couldn't you alloacte an amount specific to the device? ie. have the
library
make the decision? So you get an API like so:-

Result device_download (Device_context*, Byte** data, size_t
*data_size);

That's one of the suggestions I made in my original post. With this
approach, the backend would allocate a buffer of the right size and pass
it back to the application by means of the two output parameters. The
backend either knows the total size in advance, or gets to know it
during the transfer.

(The alternative with this api is to store the downloaded data into the
device handle itself and destroy it together with the device handle. But
then the lifetime of the returned pointer is tied to the lifetime of the
device handle.)

Downside is that the application needs to free the data, or you end up
with a memory leak. I know this is really the responsibility of the
application writer, but it feels "wrong" when memory is allocated in one
place (inside the library) needs to be freed somewhere else (in the
application). If you the malloc is explicit, you immediately know that
at one point you need to free that memory again, but here it would be
hidden.

But the biggest disadvantage is that this api would be inconsistent with
the rest of my api. It would be the only place where the returned data
needs to destroyed by the application.

This other api also downloads data, but not the entire memory contents.
Only the dive profiles (most dive computers also have lots of other
data) because that's what the user is interested in. To reduce download
time, we try to only download new dives based on the fingerprint of the
last downloaded dive. Like I mentioned in my original post, this is an
iterator style api, that downloads dives until all dives are downloaded
(e.g. like an EOF). Each call returns a single dive entry object, which
is basically also a binary blob, but with some additional metadata that
is required for decoding the data, but which is not present in the main
data itself. For instance model and serial numbers, date/time of the
dive, the fingerprint data, and optionally some backend specific data.

When the application wants to access the dive data or the fingerprint
data it gets a pointer to this data. Since this data is already owned by
the dive entry object, it would be a little silly to return a copy of
this data.

(Currently, the dive entry object itself would be stored in the device
handle, so the object is owned by the library and the application
doesn't need to free it. But that doesn't change anything for the dive
or fingerprint data inside.)

hide that detail in the device driver

I assume you are referring to the above, where the memory is allocated
inside the library?

I'd kind of assumed you wrote that once and hid it in a library.

application
device specific code
memory manager

Depands how clever you want to be. You could have one memory
management
strategy or you could select one. Your device context could specify
which.

thinking out loud...

struct
{
size_t initial_buffer_size;
double buffer_growth_multiplier;
Byte* (*grow_memory) (double);
Byte* (*eom_action) (Byte*);
} Device_context;

gets tricky but not undoable to handle realloc() and linked list
memory management. Depends how clever you want to be.

If memory is allocated inside the library, the realloc() or linked list
isn't necessary. Inside the backend we do know the exact size and can
allocate the right amount of memory straight away. It's only when we
want to store the data into an application buffer things get more
complicated.

Ben Bacarisse · Aug 26, 2009

Jef Driesen said:
The point is how to pass the data back to the application, with an api
that is easy to use (e.g. without having to go through many pages of
documentation) and at the same time also efficient (e.g. no
unnecessary memory copies etc). That's what I would like to solve.

[Apologies if I have missed some context that renders this suggestion
pointless -- I have not been following this thread properly.]

Have you considered a call-back API? The "user" passes in a function
that is called for every chunk of data. The user then has more
control: they can allocate and copy, copy to a pre-existing buffer,
or simply do something with the data with no copying at all.

It is more complex but it can sometimes solve more problems.

Two things to bear in mind. This works best if the chunks can be
meaningful to the application -- so I would try to avoid presenting
the call-back with incomplete messages for example. Always provide
some way to get user supplied data included in the call-back. A void
* is often all you need. The user includes it when registering a
call-back and it is passed the called function along with the "real"
data.

I know this does not answer any of your questions about data
allocation and ownership, but you seem to have a handle on what the
problems are with that.

Jef Driesen · Aug 26, 2009

Nobody said:
But what do you want to do with the data? That should be the main
influence on the API you provide, not the mechanisms for getting at the
data. Work from the top down rather than bottom up.

In general, the more you specify, the less flexibility you have for
implementation. Don't specify anything the application doesn't need to
have specified.

E.g. if a function returns a pointer to an array of structures, that means
that the back-end has to provide that data as a contiguous array in a
specific format. OTOH, if you provide an abstract handle along with
iteration and/or indexing operations, you have more flexibility in
implementing each backend.

The library contains two major layers.

The first one is the device layer, which itself consist of a protocol
layer and a memory layout layer. The protocol layer is all about the
low-level data transfer (packets, timings, etc). The memory layout is
the information that describes where the dives are stored in the memory.
This leads two two different api's in the library:

1. An api for downloading the entire memory (using only the protocol layer).

2. An api for downloading individual dives (using both the protocol and
memory layout).

Both api's return binary data, and its structure is entirely defined by
the manufacturer of the device. For the memory dump it's a single blob
of binary data. The typical usage for this data is for diagnostic
purpose. When the other api fails for some reason, the end user can send
us a memory dump so we can try to reproduce the problem, without the
need for the device itself. So an application only needs to save this
data. It could also try to parse the memory dump, but the other api is
intended for that purpose.

An single dive is also a blob of binary data, but with some additional
metadata that is required for parsing that data, such as current time at
the time of the download, model/serial number, firmware revision, etc.

The second layer is the parser layer that contains the information to
extract the interesting information (depths, pressures, etc) from the
binary dive data.

The typical usage is that an application opens a device handle and
downloads some dives. Next, it creates a parser for that type of device
and processes each dive with it. The data that comes out of the parser
is than displayed in the application, stored in its database, etc.

Jef Driesen · Aug 27, 2009

Ben said:
Jef Driesen said:

The point is how to pass the data back to the application, with an api
that is easy to use (e.g. without having to go through many pages of
documentation) and at the same time also efficient (e.g. no
unnecessary memory copies etc). That's what I would like to solve.

Click to expand...

[Apologies if I have missed some context that renders this suggestion
pointless -- I have not been following this thread properly.]

That's fine. All feedback is appreciated here.

Have you considered a call-back API? The "user" passes in a function
that is called for every chunk of data. The user then has more
control: they can allocate and copy, copy to a pre-existing buffer,
or simply do something with the data with no copying at all.

It is more complex but it can sometimes solve more problems.

Two things to bear in mind. This works best if the chunks can be
meaningful to the application -- so I would try to avoid presenting
the call-back with incomplete messages for example. Always provide
some way to get user supplied data included in the call-back. A void
* is often all you need. The user includes it when registering a
call-back and it is passed the called function along with the "real"
data.

I know this does not answer any of your questions about data
allocation and ownership, but you seem to have a handle on what the
problems are with that.

I do use callbacks in my api. For instance for reporting progress and
other notifications. In my opinion, callbacks are almost perfect for
this kind of task. But using them as a mechanism to return data to the
application feels wrong.

Let's have a look at my device_dump() function. Its purpose is to
download a blob of binary data and return that back to the application.
There is no need to return partial chunks. Progress notifications are
handled separately. Thus the callback function would only be called
once, right before the function returns control back to the caller.

typedef void (*callback_t) (const unsigned char *data, unsigned int
size, void *userdata);

device_status_t
device_dump (device_t *device,
callback_t callback,
void *userdata)
{
unsigned char buffer[SOMESIZE]; /* Or malloc'ed of course. */

/* Do some work here. */

if (callback)
callback (buffer, sizeof (buffer), userdata);

return SUCCESS;
}

That does work of course, but it also scatters the code over at least
two places.

typedef struct {
unsigned char *data;
unsigned int size;
} mydata_t;

static void
mycallback (const unsigned char *data,
unsigned int size,
void *userdata)
{
mydata_t *mydata = (mydata_t *) userdata;

mydata->data = malloc (size);
mydata->size = size;

memcpy (mydata->data, data, size);
}

int
main (int argc, char *argv[])
{
mydata_t mydata;
device_t *device = ...;

device_dump (device, mycallback, &mydata);

free (mydata.data);

return 0;
}

Ben Bacarisse · Aug 27, 2009

Jef Driesen said:
Ben Bacarisse wrote:

I do use callbacks in my api. For instance for reporting progress and
other notifications. In my opinion, callbacks are almost perfect for
this kind of task. But using them as a mechanism to return data to the
application feels wrong.

Let's have a look at my device_dump() function. Its purpose is to
download a blob of binary data and return that back to the
application. There is no need to return partial chunks. Progress
notifications are handled separately. Thus the callback function would
only be called once, right before the function returns control back to
the caller.

Yes, almost pointless in this case. I though you also had a
chunk-by-chunk interface as well. Even so it may not be worth it. My
main purpose was to alert you to the option, and you are obviously
aware of it!

<snip>

Jef Driesen · Sep 2, 2009

Ben said:
Yes, almost pointless in this case. I though you also had a
chunk-by-chunk interface as well. Even so it may not be worth it. My
main purpose was to alert you to the option, and you are obviously
aware of it!

The chunk-by-chunk interface was a suggestion by one of the other
posters. Just like one of my possible apis, it has a number of pros and
contras. Some of those are worse/better than others, and I want to
explore some possibilities before actually picking the final design
(because it will be pretty hard to change afterwards).

One of the main reasons for my post is to get some feedback from people
not directly related to my project, which could result in fresh ideas or
enhancements that I didn't think of myself.

Nick Keighley · Sep 2, 2009

The chunk-by-chunk interface was a suggestion by one of the other
posters. Just like one of my possible apis, it has a number of pros and
contras. Some of those are worse/better than others, and I want to
explore some possibilities before actually picking the final design
(because it will be pretty hard to change afterwards).

I'd argue if you hide the implementation behind some abstract library
then you *can* change it afterwards. The Open Closed Principal.

Jef Driesen · Sep 2, 2009

Nick said:
I'd argue if you hide the implementation behind some abstract library
then you *can* change it afterwards. The Open Closed Principal.

I try to adhere to that principle in my design as much as possible. For
instance my device handles are fully opaque objects that can only be
manipulated through the public api and thus hide the actual
implementation from the user.

But that isn't applicable everywhere. If you take the variant of my
device_dump() function where the data is returned to the caller by means
of the output parameters:

device_status_t
device_dump (device_t *device,
const unsigned char *data[],
unsigned int *size);

With this api, I'll have to specify who owns the data (e.g. library or
application), and thus needs to destroy the data when done. But once you
pick one option, you can't change it anymore without breaking applications.

chad · Sep 2, 2009

Hi Jef,

Let me give some background info first. I'm writing a library to
download data from a number of external devices (dive computers).

Click to expand...

[snip]

I don't know if that makes sense, but you could use a in-out parameter
for the size argument:

device_status_t
device_dump (device_t *device,
unsigned char data[],
unsigned int* size,
);

As input parameter, size contains the size of my data[] array, and as
output value the size of the data downloaded. If the array data[] is
too small, you should guarantee that you won't overflow the data
array. The application could then handle such situation by reissuing a
device_dump() as long as there are data to download.

You could alternatively use two parameters instead of a in-out
parameter for the size:

device_status_t
device_dump (device_t *device,
unsigned char data[],
unsigned int size,
unsigned int *download_size
);

Is there a particular advantage of using one over the other in this
case?

Jef Driesen · Sep 3, 2009

chad said:
Hi Jef,

Let me give some background info first. I'm writing a library to
download data from a number of external devices (dive computers).

Click to expand...

[snip]

I don't know if that makes sense, but you could use a in-out parameter
for the size argument:

device_status_t
device_dump (device_t *device,
unsigned char data[],
unsigned int* size,
);

As input parameter, size contains the size of my data[] array, and as
output value the size of the data downloaded. If the array data[] is
too small, you should guarantee that you won't overflow the data
array. The application could then handle such situation by reissuing a
device_dump() as long as there are data to download.

You could alternatively use two parameters instead of a in-out
parameter for the size:

device_status_t
device_dump (device_t *device,
unsigned char data[],
unsigned int size,
unsigned int *download_size
);

Click to expand...

Is there a particular advantage of using one over the other in this
case?

It's mainly a matter of personal preference I think.

I prefer the second form, because the size parameter remains unchanged
when the function returns. The initial value is never overwritten with
the result, and you can also pass constants.

Need total amount displayed of data-price attribute from each table	2	Jul 3, 2022
How can I calculate the last payment for Reprofiled Amount column with 2 decimal places to make the sum of all payments to be the same as RC amount?	2	Jul 13, 2023
Windows LLDP Driver Responds With No Data	0	Mar 17, 2023
An unknown bug doesn't allow the quotes app to work. What's the issue?	3	Apr 23, 2023
How to Design a Cross Network app	0	Aug 15, 2022
Importing an API for stock market data	1	Mar 18, 2022
I am having trouble finding a method of using the git enterprise api to scrape data from projects	1	Jun 1, 2023
Issue with passing fetched data to POST form. How can I?	0	Jul 23, 2023

Library design for downloading an unknown amount of data?

Jef Driesen

Loïc Domaigné

Jef Driesen

Nick Keighley

Jef Driesen

Nick Keighley

Nobody

Jef Driesen

Ben Bacarisse

Jef Driesen

Jef Driesen

Ben Bacarisse

Jef Driesen

Nick Keighley

Jef Driesen

chad

Jef Driesen

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads