Maps, filters and accumulators

  • Thread starter ballpointpenthief
  • Start date
B

ballpointpenthief

How good is the C language for implementing maps, filters and
accumulators?

SCHEME programmers would be able to pass procedures as arguments quite
easily, whereas C programmers would - I guess - have use variadic
function pointers which is frankly disgusting, and I'm not even sure I
want to sit down and hack at the language in that way.

Your thoughts? Any libraries you could refer me to?

Cheers, Matt
 
D

Dave Vandervies

How good is the C language for implementing maps, filters and
accumulators?

SCHEME programmers would be able to pass procedures as arguments quite
easily, whereas C programmers would - I guess - have use variadic
function pointers which is frankly disgusting, and I'm not even sure I
want to sit down and hack at the language in that way.

Your thoughts? Any libraries you could refer me to?

C programmers can pass pointers to functions around with (almost) the
same ease that Scheme programmers can pass the functions themselves,
and can use them in much the same way.
C doesn't have anonymous functions or closures, and you may miss those,
but anonymous functions are "only" a notational convenience (if sometimes
a major one!) and closures can be faked, if you control the interfaces
along the entire chain of callers that ends up invoking the function,
by passing a pointer to a struct containing anything that needs to last
longer than a single invocation of the function.

They don't have to be variadic functions (and trying to do it that way
would make it unnecessarily much uglier), but you do have to be more
careful with your data structures. Scheme has linked lists built into the
language, which makes dealing with lists of values Rather Easier. In C
you could build your own linked lists, but it's often more natural to use
arrays for the sorts of things you'd use a list for in a lispy language.

Here's some examples to (hopefully) get you started. Don't let the
verbiage scare you off; most of it is a consequence of the fact that C
has static typing and requires everything to be declared before you use
it, and becomes fairly simple once you understand it.
It may be more useful to start at the bottom and work up the first time
you read this, since the earlier stuff is mostly just setting up for
the interesting stuff at the end.

Standard disclaimers apply: Not compiled or tested. For explanatory
purposes only. You will have no trouble finding intelligent people who
disagree with everything I say (though hopefully it's not too hard to
find intelligent people who agree with me either). Don't trust anybody
who's posting to usenet on a Friday night.
--------
/*********************************************************************
* Some typedefs for what we're working with. *
* In real code I would probably not bother with these, but it makes *
* things clearer for demonstration purposes (especially since *
* I'm using the same type for counters and other bookkeeping as *
* for the actual data). *
*********************************************************************/

/*This is the type of the actual data elements we're working with.*/
typedef int my_data_type;


/*C doesn't have a native boolean type. Doing something like this often
doesn't make things any clearer and can invite horrible abuse, but for
demonstration purposes in short chunks it makes the intent clearer.
*/
typedef int boolean;


/*Typedefing function pointers makes things A LOT easier to read if you're
still working on wrapping your head around them, but learning to read
function pointer declarations without the typedefs is worth your time
if you'll be working with them regularly.
*/

/*trans_func_c is a pointer to a function that takes a my_data_type
and returns another my_data_type (useful for map).
(trans=="transform", _c suffix means it has a cookie argument)
The "cookie" argument is used to pass information to the function
(faking a closure). If the function doesn't need any external
information, it can be ignored.
*/
typedef my_data_type (*trans_func_c)(my_data_type in,void *cookie);

/*pred_func is a pointer to a predicate function on some value
(useful for filter)
*/
typedef boolean (*pred_func)(my_data_type in);

/*binary_func is a pointer to a binary operator on values
(useful for fold)
*/
typedef my_data_type (*binary_func)(my_data_type left,my_data_type right);



/*************************************************************************
* These are the functions that "really" do all the work. Once you get *
* the interfaces right (which will probably take a few tries to get *
* everything you need but not so much you forget how to use them) *
* these only have to be written once. *
*************************************************************************/
/*All of these use arrays for the lists they work on. Caller is
responsible for managing all memory.
*/
/*Note that these have to be written for every type you plan to use them
with (and, for ones that can sensibly work with multiple types, for
every combination of types).
C++ templates may make this easier if you want to work with several
different data types, but for a small number of types doing it this
way will work fine. (These are simple and specialized enough that
you could probably write a code generator to produce C code for the
types you want if you don't want to use C++ just to get templates.)
Even with templates the types still need to be statically known at
compile time. Implementing true dynamic typing is probably
sufficiently ugly that you'd be better off finding an embeddable
Scheme interpreter.
*/

/*Map: dest=func(src) for all 0<=i<num_in*/
void map_c(const my_data_type *src, my_data_type *dest, int num_in,
trans_func_c func, void *cookie)
{
int i;
for(i=0;i<num_in;i++)
dest=(*func)(src,cookie);
}


/*Filter: dest[] is populated with the elements of src[] for which
pred() returns true, in the order in which they appear in src[].
Returns the number of elements copied to dest[].
*/
int filter(const my_data_type *src,my_data_type *dest,int num_in,
pred_func pred)
{
int i;
int num_out=0;
for(i=0;i<num_in;i++)
if((*pred)(src))
dest[num_out++]=src;
return num_out;
}


/*Fold: Apply op to accumulator and src for each i in order.
Return final value of accumulator.
*/
my_data_type fold(const my_data_type *src,int num_in,
binary_func op,my_data_type acc)
{
int i;
for(i=0;i<num_in;i++)
acc=op(acc,src);
return acc;
}


/************************************************************************
* Everything above this can be re-used (but you probably want to write *
* it longhand a few times to make sure you understand what's going *
* on and have enough-but-not-too-much generality in the interface). *
* Next comes the functions we're giving to map, filter, and fold. *
************************************************************************/

/*An adder. The cookie argument needs to be a pointer to the value
we want to add.
This is a function we can point a trans_func_c at.
*/
my_data_type add(my_data_type val,void *vcookie)
{
my_data_type *cookie=vcookie;
return val + *cookie;
}

/*Is a value even?
This is a function we can point a pred_func at.
*/
boolean is_even(my_data_type val)
{
return (val%2)==0;
}

/*Add two values.
This is a function we can point a binary_func at.
*/
my_data_type plus(my_data_type left,my_data_type right)
{
return left+right;
}


/********************************************************
* Now that we've set everything up, let's show it off. *
********************************************************/

#include <stdio.h>

void print_array(const my_data_type *arr,int num)
{
int i;
for(i=0;i<num;i++)
printf("%d ",arr);
printf("\n");
}

int main(void)
{
const my_data_type src[]={0,1,2,3,4,5,6,7,8,9,10};
const int num_src=sizeof src / sizeof *src;
my_data_type dest[sizeof src / sizeof *src];
int num_dest;
my_data_type to_add;

printf("src[]: ");
print_array(src,num_src);

/*Add 17 to everything*/
to_add=17;
map_c(src,dest,num_src,add,&to_add);
num_dest=num_src; /*map() doesn't change or report size*/
printf("After map(): ");
print_array(dest,num_dest);

/*Filter for the even ones*/
num_dest=filter(src,dest,num_src,is_even);
printf("After filter(): ");
print_array(dest,num_dest);

/*Fold addition over entire list*/
printf("Sum of all values: %d\n",fold(src,num_src,plus,0));
/*Fold addition over filtered list*/
printf("Sum of even values: %d\n",fold(dest,num_dest,plus,0));

return 0;
}
 
B

ballpointpenthief

[Thanks Dave Vandervies, that was the most useful piece of information
I've ever read on USENET.]

It's still not possible to play with an arbritrary number of
base_types? It might be easy to set everything up to make it easily
extendable.
That was the most useful piece of information I've ever read on USENET.

I've studied your answer, and I now understand one way of setting this
up. I've chucked some simple code below.

One thing: What if we needed to play with an arbritrary number of
base_types? There seems to be no 'means of combination' (which I have
learnt is important off of those online MIT lectures.)

It's probably easy to guess that I've got zero real-world programming
experience.
Maybe problems like that aren't so common outside of language design.

Here's proof that I read the code:
=================================
#include <stdio.h>
#include <stdlib.h>
#include <stddef.h>
#include <stdbool.h>

struct record
{
int value;
char *data;
};

typedef struct record basic_type;

/*******************************/

void map(const basic_type *src, basic_type *dst, void *ext_var,
basic_type (*trans_func)(basic_type *, void *), size_t arr_len)
{
for (int i=0; i<arr_len; i++)
{
dst = trans_func(&src, ext_var);
}
}

size_t filter(const basic_type *src, basic_type *dst, void *ext_var,
_Bool (*pred_func)(basic_type *, void *), size_t arr_len)
{
basic_type *node_in = src;
basic_type *node_out = dst;
size_t count = 0;
for (int i=0; i<arr_len; i++)
{
if (pred_func(src, ext_var))
{
*dst++ = *src++;
count++;
}
else src++;
}
return count;
}
/****************************************************************/
/************Transformation functions****************************/
/****************************************************************/

basic_type trans_one(basic_type *in, void *ext_var)
{
return *in;
}

basic_type trans_two(basic_type *in, void *ext_var)
{
return *in;
}

/******************************************************************/
/**************Predicate functions*********************************/
/******************************************************************/

_Bool pred_one(basic_type *in, void *ext_var)
{

return true;
}

_Bool pred_two(basic_type *in, void *ext_var)
{
return false;
}
============================================
 
D

Dave Vandervies

[Thanks Dave Vandervies, that was the most useful piece of information
I've ever read on USENET.]

There is only one conclusion that can be drawn from that: You have
obviously not read anything by Chris Torek.

I've studied your answer, and I now understand one way of setting this
up. I've chucked some simple code below.

....and I have a few comments on it that I'll throw in inline.
One thing: What if we needed to play with an arbritrary number of
base_types?

Then you have to be a bit more clever.

The obvious ways to handle that are to use generics (like C++ templates)
or to have dynamic typing (like in Scheme or Smalltalk). Unfortunately,
C has neither of those, and trying to re-create them is usually (but
not always) the wrong solution.
(One of the big benefits of C is that you *can* build any high-level
abstraction you want to use[1]; one of the big problems of C is that
you *need* to build any high-level abstraction you want to use.)

A slightly less obvious way, but one that's probably more general and
better-suited to C in general, is to give the type-agnostic code a void
pointer that points at your data along with the size of the data, and let
the callback functions (which know what they're working with) convert the
void pointer they get back to a pointer to the appropriate type of data.
qsort and bsearch use this; the prototype for qsort is
--------
void qsort(void *base, size_t nmemb, size_t size,
int(*compar)(const void *, const void *));
--------
and the implementation will do something equivalent to this:
--------
void qsort(void *base, size_t nmemb, size_t size,
int(*compar)(const void *, const void *))
{
/*Get a pointer we can do arithmetic on*/
char *base_c=base;

/*...do sort stuff, including...*/
if((*compar)(base_c+i*size,base_c+j*size) > 0)
__swap_memory(base_c+i*size,base_c+j*size,size);
}
--------
(so the type-agnostic code only needs to be told how big the data
elements are so it can throw them around, and lets caller-provided code
do everything else with them).

A fully type-agnostic map with this type of interface would look something
like this:
--------
void map(const void *vfrom,size_t from_size,void *vto,size_t to_size,
size_t nelem,
void (*trans_func)(const void *from,void *to,void *cookie),
void *cookie)
{
const char *from=vfrom;
unsigned char *to=vto;
size_t i;

for(i=0;i<nelem;i++)
{
trans_func(from+i*from_size,to+i*to_size,cookie);
}
}
--------
and the transform function would look like this:
--------
void frob(const void *vfrom,void *to,void *vcookie)
{
const type1 *from=vfrom;
type2 *to=vto;
cookie_type *cookie=vcookie;

/*Now do stuff with *from (and optionally *cookie) and
stuff the results into *to
*/
}
--------
Note that to do things this way you'd still need a different transform
function for each type you use, even if you're doing the same operations
on them - an add-seventeen-to-int function would probably not give you
the results you want if you ask map to invoke it on your array of doubles.

Note also that this doesn't do any type-checking on the callback you
give it. If you're careful and know what you're doing, that's probably
not a major lack, but it does mean that if you get it wrong the compiler
will happily generate code that will silently corrupt your data (or,
if you're lucky, crash your program) instead of complaining that you've
gotten something wrong. (Of course, if you're not careful or don't
know what you're doing, you probably shouldn't be trying to program a
computer at all, but that doesn't seem to stop a lot of people.)


There seems to be no 'means of combination' (which I have
learnt is important off of those online MIT lectures.)

I'm not quite sure what you mean by "means of combination".

If you mean being able to use different aggregate types, you do that
the same way as you'd do it with different basic types - either by
writing one version for every type or by applying some cleverness to
avoid having to do all that work (either by getting a code generator to
do it for you or by avoiding it altogether).

If you mean composing functions, you can do that with fake-closures the
same way you'd do it with real closures; the equivalent of this:
--------
(define (anti-filter lis pred)
(filter lis (lambda (x) (not (pred x)))))
--------
would be something like this:
--------
struct not_cookie
{
void *orig_cookie;
_Bool (*orig_pred)(basic_type *in,void *cookie);
};

static _Bool not(basic_type *in, void *vcookie)
{
struct not_cookie *cookie=vcookie;
return !(*(cookie->orig_pred))(in,cookie->orig_cookie);
}

size_t anti_filter(const basic_type *src, basic_type *dst, void *ext_var,
_Bool (*pred_func)(basic_type *, void *), size_t arr_len)
{
struct not_cookie nc;
nc.orig_cookie=ext_var;
nc.orig_pred=pred_func;
return filter(src,dst,&nc,not,arr_len);
}
--------
(and the relative length of these two examples is why some people will
try to tell you you need closures and anonymous functions to usefully
use higher-order functions. They're wrong, but they do have a point
that's worth paying attention to.)

It's probably easy to guess that I've got zero real-world programming
experience.

For having zero real-world programming experience, your code is Rather
Good. I assume you have at least some non-real-world experience?
(The differences aren't always as big as big as a lot of people will
try to tell you they are.)


Here's proof that I read the code:

[most code snipped, leaving only the parts I have comments on]
#include <stdbool.h>

Note that <stdbool.h> and _Bool are new in C99. It's worth keeping in
mind that C99 implementations are still rather rare in the wild, so for
maximal portability you probably still want to avoid C99isms.
(Whether or not this is a disgraceful state of affairs for a standard
that's been around for seven years is a discussion that I try to avoid
getting into, but the pragmatic consequences aren't affected by your
position on that.)
If you use int for values with boolean meaning, most C programmers will
have no trouble figuring out that that's what you meant, and they'll
also be able to compile it with their C90 compilers (which is generaly
considered a Good Thing).

void map(const basic_type *src, basic_type *dst, void *ext_var,
basic_type (*trans_func)(basic_type *, void *), size_t arr_len)

There's not really any reason to give the callback function a pointer to
its input instead of just the data object (unless your basic_data type
is big, in which case you might be better off just passing a pointer,
but then you probably want to do something comparable for dst).
If you are passing the pointer, you should probably make it const.
So the type of trans_func is probably wrong here, and should either be
basic_type (*trans_func)(basic_type,void *)
or
basic_type (*trans_func)(const basic_type *,void *)
depending on whether you really want to pass a pointer or not.
{
for (int i=0; i<arr_len; i++)
{
dst = trans_func(&src, ext_var);
}
}


Not all newsreaders play nicely with tabs, and not all the ones that do
handle them sensibly use the same width, so if you want your code lined
up nicely (especially for anything other than indents at the beginning
of the line) you're probably better off converting them to spaces before
you post.

size_t filter(const basic_type *src, basic_type *dst, void *ext_var,
_Bool (*pred_func)(basic_type *, void *), size_t arr_len)
{
basic_type *node_in = src;
basic_type *node_out = dst;

Since you're not using these for anything, there's not much point
keeping them around. (The caller probably needs the original values,
but it has its own copy.)
size_t count = 0;
for (int i=0; i<arr_len; i++)
{
if (pred_func(src, ext_var))
{
*dst++ = *src++;
count++;
}
else src++;
}
return count;
}

It seems clearer to me to do this as array indexing instead of walking
the pointers through. Array indexing lets you keep the output count
and the dst-position in the same place, and also the iteration count
and the src-position.
(That's a matter of taste and style, though; a moderately clever optimizer
should produce equivalent output either way. A C++ programmer would
probably find your version easier to understand, too, since it more
closely matches the C++ iterator idioms.)


dave


[1] Well, some of them are hard enough that it ends up being less
painful to just do without. (Sometime when you're feeling
masochistic, try implementing fully general coroutines without
invoking implementation-specific knowledge. But even that *can*
be done if you don't mind explicitly managing your call-stack frames
(and if you do it right you even get first-class continuations out
of the deal).)
 
B

ballpointpenthief

Dave said:
A fully type-agnostic map with this type of interface would look something
like this:
--------
void map(const void *vfrom,size_t from_size,void *vto,size_t to_size,
size_t nelem,
void (*trans_func)(const void *from,void *to,void *cookie),
void *cookie)
{
const char *from=vfrom;
unsigned char *to=vto;
size_t i;

for(i=0;i<nelem;i++)
{
trans_func(from+i*from_size,to+i*to_size,cookie);
}
}
--------
and the transform function would look like this:
--------
void frob(const void *vfrom,void *to,void *vcookie)
{
const type1 *from=vfrom;
type2 *to=vto;
cookie_type *cookie=vcookie;

/*Now do stuff with *from (and optionally *cookie) and
stuff the results into *to
*/
}

....And here's my attempt:

#include <stdio.h>
#include <stddef.h>

/**********************************************************/

void map(void *dst, size_t dst_size, const void *src,
size_t src_size, void *ext_var,
void (*trans_func)(void *dst, const void *src, void
*ext_var),
size_t arr_len)
{
for (size_t i=0; i<arr_len; i++)
{
trans_func((unsigned char *)dst+(i*dst_size),
(const unsigned char *)src+(i*src_size), ext_var);
}
}

/**********************************************************/

// Transformation function #1
void add_x_to_an_int(void *dst, const void *src, void *ext_var)
{
// This function assumes that all three arguments are int *'s
*((int *)dst) = *((const int *)src) + *((int *)ext_var);
}

/**********************************************************/

void print_arr(int arr[10])
{
for (int i=0; i<10; i++) printf("%i ", arr);
printf("\n");
}

int main(void)
{
int test_src[10] = {0,1,2,3,4,5,6,7,8,9};
int test_dst[10];
int test_num = 17;
map(test_dst, sizeof(int), test_src, sizeof(int),
&test_num, add_x_to_an_int, 10);
print_arr(test_src);
print_arr(test_dst);
return 0;
}

I struggled a bit with this: Why cast to a (char *)? Why not a char? At
first I was doing:

trans_func( (void *)((char)dst+(i*dst_size)), ... with a cast to
increment the pointer, and then casting back to a (void *).
The former needed a char * for some reason, and the latter seems to be
unneccesary.

I'm not quite sure what you mean by "means of combination".

I suppose I meant combining first-class objects to create new first
class objects on the fly. (I think I am using the term first-class
correctly.)
If you mean being able to use different aggregate types, you do that
the same way as you'd do it with different basic types - either by
writing one version for every type or by applying some cleverness to
avoid having to do all that work (either by getting a code generator to
do it for you or by avoiding it altogether).

[The following was written in a footnote]
Sometime when you're feeling
masochistic, try implementing fully general coroutines without

What is a 'coroutine' please? Also an example and a use might be
helpful.
invoking implementation-specific knowledge. But even that *can*
be done if you don't mind explicitly managing your call-stack frames
(and if you do it right you even get first-class continuations out
of the deal).)

A 'call-stack frame' must mean a list of function pointers that need to
be called FIFO?

What is a 'continuation'?

Thanks,
Matt Smiglarski
 
B

ballpointpenthief

Just realised that I'm being very stupid about the pointer fitting in a
char (see below), and have just realised why an (int *) doesn't work
since writing this update.
Sorry for top-posting.
Dave said:
A fully type-agnostic map with this type of interface would look something
like this:
--------
void map(const void *vfrom,size_t from_size,void *vto,size_t to_size,
size_t nelem,
void (*trans_func)(const void *from,void *to,void *cookie),
void *cookie)
{
const char *from=vfrom;
unsigned char *to=vto;
size_t i;

for(i=0;i<nelem;i++)
{
trans_func(from+i*from_size,to+i*to_size,cookie);
}
}
--------
and the transform function would look like this:
--------
void frob(const void *vfrom,void *to,void *vcookie)
{
const type1 *from=vfrom;
type2 *to=vto;
cookie_type *cookie=vcookie;

/*Now do stuff with *from (and optionally *cookie) and
stuff the results into *to
*/
}

...And here's my attempt:

#include <stdio.h>
#include <stddef.h>

/**********************************************************/

void map(void *dst, size_t dst_size, const void *src,
size_t src_size, void *ext_var,
void (*trans_func)(void *dst, const void *src, void
*ext_var),
size_t arr_len)
{
for (size_t i=0; i<arr_len; i++)
{
trans_func((unsigned char *)dst+(i*dst_size),
(const unsigned char *)src+(i*src_size), ext_var);
}
}

/**********************************************************/

// Transformation function #1
void add_x_to_an_int(void *dst, const void *src, void *ext_var)
{
// This function assumes that all three arguments are int *'s
*((int *)dst) = *((const int *)src) + *((int *)ext_var);
}

/**********************************************************/

void print_arr(int arr[10])
{
for (int i=0; i<10; i++) printf("%i ", arr);
printf("\n");
}

int main(void)
{
int test_src[10] = {0,1,2,3,4,5,6,7,8,9};
int test_dst[10];
int test_num = 17;
map(test_dst, sizeof(int), test_src, sizeof(int),
&test_num, add_x_to_an_int, 10);
print_arr(test_src);
print_arr(test_dst);
return 0;
}

I struggled a bit with this: Why cast to a (char *)? Why not a char? At
first I was doing:

trans_func( (void *)((char)dst+(i*dst_size)), ... with a cast to
increment the pointer, and then casting back to a (void *).
The former needed a char * for some reason, and the latter seems to be
unneccesary.

I'm not quite sure what you mean by "means of combination".

I suppose I meant combining first-class objects to create new first
class objects on the fly. (I think I am using the term first-class
correctly.)
If you mean being able to use different aggregate types, you do that
the same way as you'd do it with different basic types - either by
writing one version for every type or by applying some cleverness to
avoid having to do all that work (either by getting a code generator to
do it for you or by avoiding it altogether).

[The following was written in a footnote]
Sometime when you're feeling
masochistic, try implementing fully general coroutines without

What is a 'coroutine' please? Also an example and a use might be
helpful.
invoking implementation-specific knowledge. But even that *can*
be done if you don't mind explicitly managing your call-stack frames
(and if you do it right you even get first-class continuations out
of the deal).)

A 'call-stack frame' must mean a list of function pointers that need to
be called FIFO?

What is a 'continuation'?

Thanks,
Matt Smiglarski
 
D

Dave Vandervies

Dave Vandervies wrote:

void map(void *dst, size_t dst_size, const void *src,
size_t src_size, void *ext_var,
void (*trans_func)(void *dst, const void *src, void
*ext_var),
size_t arr_len)
{
for (size_t i=0; i<arr_len; i++)
{
trans_func((unsigned char *)dst+(i*dst_size),
(const unsigned char *)src+(i*src_size), ext_var);
}
}

It's probably worth getting in the habit of declaring local non-void
pointers for void-pointer arguments that you plan to use instead of
casting them where you use them, since pointer casts often mean there's
something sketchy going on. (Though in this case it's worth noting that
the casts are only doing what the castless conversion would do anyways.)

This lets you keep the "Pointer cast -> suspicious" neurons active
while you're using this technique to implement type-agnostic functions.
If you ever have the misfortune to find yourself working somewhere
where productivity is measured in lines of code, it also gives you a way
to pad your numbers while actually *reducing* the cognitive burden on
both yourself and maintenance programmers (since they get to keep their
"pointer cast -> suspicious" neurons active too).



I struggled a bit with this: Why cast to a (char *)? Why not a char? At
first I was doing:

trans_func( (void *)((char)dst+(i*dst_size)), ... with a cast to
increment the pointer, and then casting back to a (void *).
The former needed a char * for some reason, and the latter seems to be
unneccesary.

Remember that in C, "char" is a synonym for "byte" (as well as being
"holds a character", except when it's too small for that).

When you put "test_src" (or "test_dst") in the list of arguments to map(),
the compiler does a few things for you:
-Converts the array (int[10]) to a pointer to its first argument (int *)
(This happens any time you use an array name in a value context)
-Converts the int * (type of the actual value given as an argument) to a
void *-to-void (type of the function argument)
void * has a few special properties that make it a generic pointer:
-Any data pointer type can be converted to a void * (and compilers
shouldn't complain)
-A void * can be converted to any data pointer type (and compilers
shouldn't be complain)
-For any data type T, a void * that was obtained by converting a T *
into a void * can be converted back to a T * that will be equivalent
to the original one
map() doing its magic ends up depending on all of these, but actually
passing it to the function is where we're using the first one.
-Puts that void * wherever map() expects to find it


So the pointer that map() gets looks like this:
+-void * pointing here
v
---------------------------------------- <-- the array
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ <-- bytes
^ ^ ^ ^ ^ ^ ^ ^ ^ ^ <-- ints

Since void * is a generic data type, the compiler doesn't know where
to point the result when you try to do pointer arithmetic on it[1].
Pointer arithmetic moves the pointer by the number of *things it's
pointing at* you add or subtract, but the internal representation
typically works in terms of which *byte* the pointer refers to[2].

But map() knows (because we told it) how many bytes each data object
takes up, and we know that "char" is exactly one byte in size, so we can
convert our void * to a char * (either at the beginning of the function
or every time we use it - this particular conversion is a no-op[3], so
any self-respecting optimizer will generate the same code either way),
and do the pointer arithmetic on *that* and get a byte pointer pointing at
the beginning of the data object we want the callback function to work on:

+-byte pointer pointing here
| +-add i*size to get byte pointer pointing here
v v
---------------------------------------- <-- the array
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ <-- bytes
^ ^ ^ ^ ^ ^ ^ ^ ^ ^ <-- ints

Once we have that pointer, we can pass it to the callback; we have a
callback that expects a void * (generic pointer), but since the compiler
knows that the callback expects a void * it will take care of that
conversion (since that's why generic pointers exist).


But note that with all of this manipulation, we're only working with
pointers and offsets, not with "actual" values. The only reason "char"
shows up in the code at all is because we know that it has a useful size.

We can convert a pointer to an integer type (including char), but that
conversion isn't guaranteed to be reversible (and, for char, it's highly
unlikely that it will be). So converting the pointer to a char probably
threw away most of the information that the pointer contained, and then
converting it back to a pointer gave you a pointer to some random chunk
of memory that (invoking some knowledge of what kind of implementation
you're likely to be working with) the program probably wasn't allowed
to access, so when the callback function tried to use *that* pointer it
probably crashed.



[Everything below here is starting to drift off-topic for CLC.
comp.programming might be a good next stop to get more information.]
[The following was written in a footnote]
Sometime when you're feeling
masochistic, try implementing fully general coroutines without

What is a 'coroutine' please? Also an example and a use might be
helpful.

Coroutines are a control structure that allows two (or more) independent
"threads" of program execution to pass control between them. (This is
NOT what most programmers are referring to when they talk about threads,
but I can't think of a better term for it.)

Essentially, a coroutine can call another coroutine as if it were a
function call, but when a coroutine is called it will resume where it
left off (as if it called a function that's returning) instead of at
the beginning (as if it were a function being called).

Google will happily give you lots of information, some of which is
probably reliable.
There's a pretty good description and a not-pathologically-limited
implementation at
http://www.chiark.greenend.org.uk/~sgtatham/coroutines.html .


A 'call-stack frame' must mean a list of function pointers that need to
be called FIFO?

No, it's the list of functions that *have been* called and haven't
returned yet, and contains things like local variables and where they
need to return to.
Most languages have a single FIFO call stack (hence the name "stack"),
and when a function returns it pops that function's call frame off the
stack and returns to the next function.
Coroutines need a separate call stack for each coroutine (since it needs
to be kept when control passes to another coroutine), and if you allow
the program to access them and don't require them to be strictly FIFO,
you end up getting closer to:
What is a 'continuation'?

It describes what happens next. In a lot of languages it's just
"the next operation gets run", but if you're working with a language
that lets you capture a continuation at an arbitrary point and return
to a captured continuation, you can use it to implement all sorts of
interesting things.
(It turns out that once you have this capability, you can use it to
implement any other flow control construct you want. I went and poked
my sigmonster to get some commentary on it in my .sig quote on this
post, f'rexample.)

One of the easier uses to describe is for early-exit: Capture a
continuation just before you start, and if you get the answer just feed
it to the continuation. At the continuation-capture point, you need some
way to tell whether it's just been captured (and you want to start the
computation) or it's been used for early-exit (and you want to carry on
with whatever it was you wanted the result for), but that can be easier
than backing out of, say, a deeply nested recursion one call at a time.


dave

[1] Actually, that's not quite correct, but if I didn't wave my hands
here I'd get rather too far away from the topic at hand.

[2] That's not required, but the compiler needs to be able to fake it
(because of, among other things, precisely the sorts of thing we're
talking about here). It's perfectly valid for most data pointers
to be word pointers and for byte pointers to contain a word pointer
and a byte offset in that word; and there have been real machines
that did this.

[3] char * and void * are required to have the same representation; no
other distinct types have that requirement (though it also applies
to "signed T *" and "unsigned T *" for any T for which signed and
unsigned are meaningful).
 
D

Dave Vandervies

ballpointpenthief said:
and have just realised why an (int *) doesn't work
since writing this update.

I have no idea what you're talking about here, and I just wrote a reply
to the post you're commenting on.
Sorry for top-posting.

....but you did provide a perfect example of why it's a Bad Thing.
If you'd put the comment quoted above after the section to which it was
relevant, and trimmed the rest of the quoted material, it would probably
have made a lot more sense.


dave
 
D

Dave Thompson

On Sat, 2 Sep 2006 05:06:07 +0000 (UTC),
C programmers can pass pointers to functions around <snip>
C doesn't have anonymous functions or closures, and you may miss those,
but anonymous functions are "only" a notational convenience (if sometimes
a major one!) and closures can be faked, <snip>
They don't have to be variadic functions (and trying to do it that way
would make it unnecessarily much uglier), <snip>

Agree so far.
Here's some examples to (hopefully) get you started. <snip>
/*Note that these have to be written for every type you plan to use them
with (and, for ones that can sensibly work with multiple types, for
every combination of types).
C++ templates may make this easier if you want to work with several
different data types, but for a small number of types doing it this
way will work fine. (These are simple and specialized enough that
you could probably write a code generator to produce C code for the
types you want if you don't want to use C++ just to get templates.)

As long as you use only named (or tagged) types -- and you can name
any type with typedef -- you can (I believe always) get this level of
templating with C preprocessor pasting. But this also gets ugly fast.

OTOH and OT, C++ can fake (or arguably implement) closures more
elegantly, by packaging the data (or references) with the functions in
a 'function object' which can be applied with more natural syntax.
It's 'only sugar', but sugar is still sweet. The C++ Standard library
even includes some basic, but still useful, examples. And FWIW with
C++ you can use nearly all of your C knowledge and experience and
often most or all of your tools, and have a guaranteed Standard
(portable) binding to actual C if and when you need it.
Even with templates the types still need to be statically known at
compile time. Implementing true dynamic typing is probably
sufficiently ugly that you'd be better off finding an embeddable
Scheme interpreter.
*/
C++ also provides dynamic typing within a hierarchy that you define --
which doesn't cover all types, and not necessarily ones supplied or
needed by thirdparty or library code, but with very little effort it
can be all types that your own code needs dynamically typed.


- David.Thompson1 at worldnet.att.net
 
D

Dave Thompson

On Fri, 8 Sep 2006 03:09:32 +0000 (UTC),
[3] char * and void * are required to have the same representation; no

The wording is imprecise but I believe it's actually pointers to all
flavors of char: plain, signed, and unsigned.
other distinct types have that requirement (though it also applies
to "signed T *" and "unsigned T *" for any T for which signed and
unsigned are meaningful).

I don't see that. Actual signed and unsigned integer type pairs (not
pointers) are required to have the same representation for values in
the shared (intersecting) range, and to have the same size and
alignment; so in practice there is no valid reason for pointers to
them to be different, but they aren't required to be the same.

What ARE required to have the same representation are:

- pointers to all struct types, and (separately) to all union types.
This is needed for opaque ones, and forward ones one-pass.

- pointers to differently-qualified types e.g. const int * and int *.
Which are formally distinct types in the type system, although
arguably not 'really' different.

- David.Thompson1 at worldnet.att.net
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top