Maps, filters and accumulators

Discussion in 'C Programming' started by ballpointpenthief, Sep 1, 2006.

  1. How good is the C language for implementing maps, filters and
    accumulators?

    SCHEME programmers would be able to pass procedures as arguments quite
    easily, whereas C programmers would - I guess - have use variadic
    function pointers which is frankly disgusting, and I'm not even sure I
    want to sit down and hack at the language in that way.

    Your thoughts? Any libraries you could refer me to?

    Cheers, Matt
    ballpointpenthief, Sep 1, 2006
    #1
    1. Advertising

  2. In article <>,
    ballpointpenthief <> wrote:
    >How good is the C language for implementing maps, filters and
    >accumulators?
    >
    >SCHEME programmers would be able to pass procedures as arguments quite
    >easily, whereas C programmers would - I guess - have use variadic
    >function pointers which is frankly disgusting, and I'm not even sure I
    >want to sit down and hack at the language in that way.
    >
    >Your thoughts? Any libraries you could refer me to?


    C programmers can pass pointers to functions around with (almost) the
    same ease that Scheme programmers can pass the functions themselves,
    and can use them in much the same way.
    C doesn't have anonymous functions or closures, and you may miss those,
    but anonymous functions are "only" a notational convenience (if sometimes
    a major one!) and closures can be faked, if you control the interfaces
    along the entire chain of callers that ends up invoking the function,
    by passing a pointer to a struct containing anything that needs to last
    longer than a single invocation of the function.

    They don't have to be variadic functions (and trying to do it that way
    would make it unnecessarily much uglier), but you do have to be more
    careful with your data structures. Scheme has linked lists built into the
    language, which makes dealing with lists of values Rather Easier. In C
    you could build your own linked lists, but it's often more natural to use
    arrays for the sorts of things you'd use a list for in a lispy language.

    Here's some examples to (hopefully) get you started. Don't let the
    verbiage scare you off; most of it is a consequence of the fact that C
    has static typing and requires everything to be declared before you use
    it, and becomes fairly simple once you understand it.
    It may be more useful to start at the bottom and work up the first time
    you read this, since the earlier stuff is mostly just setting up for
    the interesting stuff at the end.

    Standard disclaimers apply: Not compiled or tested. For explanatory
    purposes only. You will have no trouble finding intelligent people who
    disagree with everything I say (though hopefully it's not too hard to
    find intelligent people who agree with me either). Don't trust anybody
    who's posting to usenet on a Friday night.
    --------
    /*********************************************************************
    * Some typedefs for what we're working with. *
    * In real code I would probably not bother with these, but it makes *
    * things clearer for demonstration purposes (especially since *
    * I'm using the same type for counters and other bookkeeping as *
    * for the actual data). *
    *********************************************************************/

    /*This is the type of the actual data elements we're working with.*/
    typedef int my_data_type;


    /*C doesn't have a native boolean type. Doing something like this often
    doesn't make things any clearer and can invite horrible abuse, but for
    demonstration purposes in short chunks it makes the intent clearer.
    */
    typedef int boolean;


    /*Typedefing function pointers makes things A LOT easier to read if you're
    still working on wrapping your head around them, but learning to read
    function pointer declarations without the typedefs is worth your time
    if you'll be working with them regularly.
    */

    /*trans_func_c is a pointer to a function that takes a my_data_type
    and returns another my_data_type (useful for map).
    (trans=="transform", _c suffix means it has a cookie argument)
    The "cookie" argument is used to pass information to the function
    (faking a closure). If the function doesn't need any external
    information, it can be ignored.
    */
    typedef my_data_type (*trans_func_c)(my_data_type in,void *cookie);

    /*pred_func is a pointer to a predicate function on some value
    (useful for filter)
    */
    typedef boolean (*pred_func)(my_data_type in);

    /*binary_func is a pointer to a binary operator on values
    (useful for fold)
    */
    typedef my_data_type (*binary_func)(my_data_type left,my_data_type right);



    /*************************************************************************
    * These are the functions that "really" do all the work. Once you get *
    * the interfaces right (which will probably take a few tries to get *
    * everything you need but not so much you forget how to use them) *
    * these only have to be written once. *
    *************************************************************************/
    /*All of these use arrays for the lists they work on. Caller is
    responsible for managing all memory.
    */
    /*Note that these have to be written for every type you plan to use them
    with (and, for ones that can sensibly work with multiple types, for
    every combination of types).
    C++ templates may make this easier if you want to work with several
    different data types, but for a small number of types doing it this
    way will work fine. (These are simple and specialized enough that
    you could probably write a code generator to produce C code for the
    types you want if you don't want to use C++ just to get templates.)
    Even with templates the types still need to be statically known at
    compile time. Implementing true dynamic typing is probably
    sufficiently ugly that you'd be better off finding an embeddable
    Scheme interpreter.
    */

    /*Map: dest=func(src) for all 0<=i<num_in*/
    void map_c(const my_data_type *src, my_data_type *dest, int num_in,
    trans_func_c func, void *cookie)
    {
    int i;
    for(i=0;i<num_in;i++)
    dest=(*func)(src,cookie);
    }


    /*Filter: dest[] is populated with the elements of src[] for which
    pred() returns true, in the order in which they appear in src[].
    Returns the number of elements copied to dest[].
    */
    int filter(const my_data_type *src,my_data_type *dest,int num_in,
    pred_func pred)
    {
    int i;
    int num_out=0;
    for(i=0;i<num_in;i++)
    if((*pred)(src))
    dest[num_out++]=src;
    return num_out;
    }


    /*Fold: Apply op to accumulator and src for each i in order.
    Return final value of accumulator.
    */
    my_data_type fold(const my_data_type *src,int num_in,
    binary_func op,my_data_type acc)
    {
    int i;
    for(i=0;i<num_in;i++)
    acc=op(acc,src);
    return acc;
    }


    /************************************************************************
    * Everything above this can be re-used (but you probably want to write *
    * it longhand a few times to make sure you understand what's going *
    * on and have enough-but-not-too-much generality in the interface). *
    * Next comes the functions we're giving to map, filter, and fold. *
    ************************************************************************/

    /*An adder. The cookie argument needs to be a pointer to the value
    we want to add.
    This is a function we can point a trans_func_c at.
    */
    my_data_type add(my_data_type val,void *vcookie)
    {
    my_data_type *cookie=vcookie;
    return val + *cookie;
    }

    /*Is a value even?
    This is a function we can point a pred_func at.
    */
    boolean is_even(my_data_type val)
    {
    return (val%2)==0;
    }

    /*Add two values.
    This is a function we can point a binary_func at.
    */
    my_data_type plus(my_data_type left,my_data_type right)
    {
    return left+right;
    }


    /********************************************************
    * Now that we've set everything up, let's show it off. *
    ********************************************************/

    #include <stdio.h>

    void print_array(const my_data_type *arr,int num)
    {
    int i;
    for(i=0;i<num;i++)
    printf("%d ",arr);
    printf("\n");
    }

    int main(void)
    {
    const my_data_type src[]={0,1,2,3,4,5,6,7,8,9,10};
    const int num_src=sizeof src / sizeof *src;
    my_data_type dest[sizeof src / sizeof *src];
    int num_dest;
    my_data_type to_add;

    printf("src[]: ");
    print_array(src,num_src);

    /*Add 17 to everything*/
    to_add=17;
    map_c(src,dest,num_src,add,&to_add);
    num_dest=num_src; /*map() doesn't change or report size*/
    printf("After map(): ");
    print_array(dest,num_dest);

    /*Filter for the even ones*/
    num_dest=filter(src,dest,num_src,is_even);
    printf("After filter(): ");
    print_array(dest,num_dest);

    /*Fold addition over entire list*/
    printf("Sum of all values: %d\n",fold(src,num_src,plus,0));
    /*Fold addition over filtered list*/
    printf("Sum of even values: %d\n",fold(dest,num_dest,plus,0));

    return 0;
    }
    --------


    dave

    --
    Dave Vandervies
    I certainly assume that if the programmer is programming in Scheme,
    then he knows Scheme.
    --Joe Marshall in comp.lang.scheme
    Dave Vandervies, Sep 2, 2006
    #2
    1. Advertising

  3. [Thanks Dave Vandervies, that was the most useful piece of information
    I've ever read on USENET.]

    It's still not possible to play with an arbritrary number of
    base_types? It might be easy to set everything up to make it easily
    extendable.
    That was the most useful piece of information I've ever read on USENET.

    I've studied your answer, and I now understand one way of setting this
    up. I've chucked some simple code below.

    One thing: What if we needed to play with an arbritrary number of
    base_types? There seems to be no 'means of combination' (which I have
    learnt is important off of those online MIT lectures.)

    It's probably easy to guess that I've got zero real-world programming
    experience.
    Maybe problems like that aren't so common outside of language design.

    Here's proof that I read the code:
    =================================
    #include <stdio.h>
    #include <stdlib.h>
    #include <stddef.h>
    #include <stdbool.h>

    struct record
    {
    int value;
    char *data;
    };

    typedef struct record basic_type;

    /*******************************/

    void map(const basic_type *src, basic_type *dst, void *ext_var,
    basic_type (*trans_func)(basic_type *, void *), size_t arr_len)
    {
    for (int i=0; i<arr_len; i++)
    {
    dst = trans_func(&src, ext_var);
    }
    }

    size_t filter(const basic_type *src, basic_type *dst, void *ext_var,
    _Bool (*pred_func)(basic_type *, void *), size_t arr_len)
    {
    basic_type *node_in = src;
    basic_type *node_out = dst;
    size_t count = 0;
    for (int i=0; i<arr_len; i++)
    {
    if (pred_func(src, ext_var))
    {
    *dst++ = *src++;
    count++;
    }
    else src++;
    }
    return count;
    }
    /****************************************************************/
    /************Transformation functions****************************/
    /****************************************************************/

    basic_type trans_one(basic_type *in, void *ext_var)
    {
    return *in;
    }

    basic_type trans_two(basic_type *in, void *ext_var)
    {
    return *in;
    }

    /******************************************************************/
    /**************Predicate functions*********************************/
    /******************************************************************/

    _Bool pred_one(basic_type *in, void *ext_var)
    {

    return true;
    }

    _Bool pred_two(basic_type *in, void *ext_var)
    {
    return false;
    }
    ============================================
    ballpointpenthief, Sep 3, 2006
    #3
  4. In article <>,
    ballpointpenthief <> wrote:
    >[Thanks Dave Vandervies, that was the most useful piece of information
    >I've ever read on USENET.]


    There is only one conclusion that can be drawn from that: You have
    obviously not read anything by Chris Torek.


    >I've studied your answer, and I now understand one way of setting this
    >up. I've chucked some simple code below.


    ....and I have a few comments on it that I'll throw in inline.

    >One thing: What if we needed to play with an arbritrary number of
    >base_types?


    Then you have to be a bit more clever.

    The obvious ways to handle that are to use generics (like C++ templates)
    or to have dynamic typing (like in Scheme or Smalltalk). Unfortunately,
    C has neither of those, and trying to re-create them is usually (but
    not always) the wrong solution.
    (One of the big benefits of C is that you *can* build any high-level
    abstraction you want to use[1]; one of the big problems of C is that
    you *need* to build any high-level abstraction you want to use.)

    A slightly less obvious way, but one that's probably more general and
    better-suited to C in general, is to give the type-agnostic code a void
    pointer that points at your data along with the size of the data, and let
    the callback functions (which know what they're working with) convert the
    void pointer they get back to a pointer to the appropriate type of data.
    qsort and bsearch use this; the prototype for qsort is
    --------
    void qsort(void *base, size_t nmemb, size_t size,
    int(*compar)(const void *, const void *));
    --------
    and the implementation will do something equivalent to this:
    --------
    void qsort(void *base, size_t nmemb, size_t size,
    int(*compar)(const void *, const void *))
    {
    /*Get a pointer we can do arithmetic on*/
    char *base_c=base;

    /*...do sort stuff, including...*/
    if((*compar)(base_c+i*size,base_c+j*size) > 0)
    __swap_memory(base_c+i*size,base_c+j*size,size);
    }
    --------
    (so the type-agnostic code only needs to be told how big the data
    elements are so it can throw them around, and lets caller-provided code
    do everything else with them).

    A fully type-agnostic map with this type of interface would look something
    like this:
    --------
    void map(const void *vfrom,size_t from_size,void *vto,size_t to_size,
    size_t nelem,
    void (*trans_func)(const void *from,void *to,void *cookie),
    void *cookie)
    {
    const char *from=vfrom;
    unsigned char *to=vto;
    size_t i;

    for(i=0;i<nelem;i++)
    {
    trans_func(from+i*from_size,to+i*to_size,cookie);
    }
    }
    --------
    and the transform function would look like this:
    --------
    void frob(const void *vfrom,void *to,void *vcookie)
    {
    const type1 *from=vfrom;
    type2 *to=vto;
    cookie_type *cookie=vcookie;

    /*Now do stuff with *from (and optionally *cookie) and
    stuff the results into *to
    */
    }
    --------
    Note that to do things this way you'd still need a different transform
    function for each type you use, even if you're doing the same operations
    on them - an add-seventeen-to-int function would probably not give you
    the results you want if you ask map to invoke it on your array of doubles.

    Note also that this doesn't do any type-checking on the callback you
    give it. If you're careful and know what you're doing, that's probably
    not a major lack, but it does mean that if you get it wrong the compiler
    will happily generate code that will silently corrupt your data (or,
    if you're lucky, crash your program) instead of complaining that you've
    gotten something wrong. (Of course, if you're not careful or don't
    know what you're doing, you probably shouldn't be trying to program a
    computer at all, but that doesn't seem to stop a lot of people.)



    > There seems to be no 'means of combination' (which I have
    >learnt is important off of those online MIT lectures.)


    I'm not quite sure what you mean by "means of combination".

    If you mean being able to use different aggregate types, you do that
    the same way as you'd do it with different basic types - either by
    writing one version for every type or by applying some cleverness to
    avoid having to do all that work (either by getting a code generator to
    do it for you or by avoiding it altogether).

    If you mean composing functions, you can do that with fake-closures the
    same way you'd do it with real closures; the equivalent of this:
    --------
    (define (anti-filter lis pred)
    (filter lis (lambda (x) (not (pred x)))))
    --------
    would be something like this:
    --------
    struct not_cookie
    {
    void *orig_cookie;
    _Bool (*orig_pred)(basic_type *in,void *cookie);
    };

    static _Bool not(basic_type *in, void *vcookie)
    {
    struct not_cookie *cookie=vcookie;
    return !(*(cookie->orig_pred))(in,cookie->orig_cookie);
    }

    size_t anti_filter(const basic_type *src, basic_type *dst, void *ext_var,
    _Bool (*pred_func)(basic_type *, void *), size_t arr_len)
    {
    struct not_cookie nc;
    nc.orig_cookie=ext_var;
    nc.orig_pred=pred_func;
    return filter(src,dst,&nc,not,arr_len);
    }
    --------
    (and the relative length of these two examples is why some people will
    try to tell you you need closures and anonymous functions to usefully
    use higher-order functions. They're wrong, but they do have a point
    that's worth paying attention to.)


    >It's probably easy to guess that I've got zero real-world programming
    >experience.


    For having zero real-world programming experience, your code is Rather
    Good. I assume you have at least some non-real-world experience?
    (The differences aren't always as big as big as a lot of people will
    try to tell you they are.)



    >Here's proof that I read the code:


    [most code snipped, leaving only the parts I have comments on]

    >#include <stdbool.h>


    Note that <stdbool.h> and _Bool are new in C99. It's worth keeping in
    mind that C99 implementations are still rather rare in the wild, so for
    maximal portability you probably still want to avoid C99isms.
    (Whether or not this is a disgraceful state of affairs for a standard
    that's been around for seven years is a discussion that I try to avoid
    getting into, but the pragmatic consequences aren't affected by your
    position on that.)
    If you use int for values with boolean meaning, most C programmers will
    have no trouble figuring out that that's what you meant, and they'll
    also be able to compile it with their C90 compilers (which is generaly
    considered a Good Thing).


    >void map(const basic_type *src, basic_type *dst, void *ext_var,
    > basic_type (*trans_func)(basic_type *, void *), size_t arr_len)


    There's not really any reason to give the callback function a pointer to
    its input instead of just the data object (unless your basic_data type
    is big, in which case you might be better off just passing a pointer,
    but then you probably want to do something comparable for dst).
    If you are passing the pointer, you should probably make it const.
    So the type of trans_func is probably wrong here, and should either be
    basic_type (*trans_func)(basic_type,void *)
    or
    basic_type (*trans_func)(const basic_type *,void *)
    depending on whether you really want to pass a pointer or not.

    >{
    > for (int i=0; i<arr_len; i++)
    > {
    > dst = trans_func(&src, ext_var);
    > }
    >}


    Not all newsreaders play nicely with tabs, and not all the ones that do
    handle them sensibly use the same width, so if you want your code lined
    up nicely (especially for anything other than indents at the beginning
    of the line) you're probably better off converting them to spaces before
    you post.


    >size_t filter(const basic_type *src, basic_type *dst, void *ext_var,
    > _Bool (*pred_func)(basic_type *, void *), size_t arr_len)
    >{
    > basic_type *node_in = src;
    > basic_type *node_out = dst;


    Since you're not using these for anything, there's not much point
    keeping them around. (The caller probably needs the original values,
    but it has its own copy.)

    > size_t count = 0;
    > for (int i=0; i<arr_len; i++)
    > {
    > if (pred_func(src, ext_var))
    > {
    > *dst++ = *src++;
    > count++;
    > }
    > else src++;
    > }
    > return count;
    >}


    It seems clearer to me to do this as array indexing instead of walking
    the pointers through. Array indexing lets you keep the output count
    and the dst-position in the same place, and also the iteration count
    and the src-position.
    (That's a matter of taste and style, though; a moderately clever optimizer
    should produce equivalent output either way. A C++ programmer would
    probably find your version easier to understand, too, since it more
    closely matches the C++ iterator idioms.)


    dave


    [1] Well, some of them are hard enough that it ends up being less
    painful to just do without. (Sometime when you're feeling
    masochistic, try implementing fully general coroutines without
    invoking implementation-specific knowledge. But even that *can*
    be done if you don't mind explicitly managing your call-stack frames
    (and if you do it right you even get first-class continuations out
    of the deal).)

    --
    Dave Vandervies
    There are two kinds of program structures: Those that I can easily handle
    in my head without the use of paper, and those that wouldn't fit on any
    kind of paper anyway. --Christian Bau in comp.lang.c
    Dave Vandervies, Sep 4, 2006
    #4
  5. Dave Vandervies wrote:
    > In article <>,
    > ballpointpenthief <> wrote:


    > >One thing: What if we needed to play with an arbritrary number of
    > >base_types?

    >
    >
    > A fully type-agnostic map with this type of interface would look something
    > like this:
    > --------
    > void map(const void *vfrom,size_t from_size,void *vto,size_t to_size,
    > size_t nelem,
    > void (*trans_func)(const void *from,void *to,void *cookie),
    > void *cookie)
    > {
    > const char *from=vfrom;
    > unsigned char *to=vto;
    > size_t i;
    >
    > for(i=0;i<nelem;i++)
    > {
    > trans_func(from+i*from_size,to+i*to_size,cookie);
    > }
    > }
    > --------
    > and the transform function would look like this:
    > --------
    > void frob(const void *vfrom,void *to,void *vcookie)
    > {
    > const type1 *from=vfrom;
    > type2 *to=vto;
    > cookie_type *cookie=vcookie;
    >
    > /*Now do stuff with *from (and optionally *cookie) and
    > stuff the results into *to
    > */
    > }


    ....And here's my attempt:

    #include <stdio.h>
    #include <stddef.h>

    /**********************************************************/

    void map(void *dst, size_t dst_size, const void *src,
    size_t src_size, void *ext_var,
    void (*trans_func)(void *dst, const void *src, void
    *ext_var),
    size_t arr_len)
    {
    for (size_t i=0; i<arr_len; i++)
    {
    trans_func((unsigned char *)dst+(i*dst_size),
    (const unsigned char *)src+(i*src_size), ext_var);
    }
    }

    /**********************************************************/

    // Transformation function #1
    void add_x_to_an_int(void *dst, const void *src, void *ext_var)
    {
    // This function assumes that all three arguments are int *'s
    *((int *)dst) = *((const int *)src) + *((int *)ext_var);
    }

    /**********************************************************/

    void print_arr(int arr[10])
    {
    for (int i=0; i<10; i++) printf("%i ", arr);
    printf("\n");
    }

    int main(void)
    {
    int test_src[10] = {0,1,2,3,4,5,6,7,8,9};
    int test_dst[10];
    int test_num = 17;
    map(test_dst, sizeof(int), test_src, sizeof(int),
    &test_num, add_x_to_an_int, 10);
    print_arr(test_src);
    print_arr(test_dst);
    return 0;
    }

    I struggled a bit with this: Why cast to a (char *)? Why not a char? At
    first I was doing:

    trans_func( (void *)((char)dst+(i*dst_size)), ... with a cast to
    increment the pointer, and then casting back to a (void *).
    The former needed a char * for some reason, and the latter seems to be
    unneccesary.


    > I'm not quite sure what you mean by "means of combination".


    I suppose I meant combining first-class objects to create new first
    class objects on the fly. (I think I am using the term first-class
    correctly.)

    > If you mean being able to use different aggregate types, you do that
    > the same way as you'd do it with different basic types - either by
    > writing one version for every type or by applying some cleverness to
    > avoid having to do all that work (either by getting a code generator to
    > do it for you or by avoiding it altogether).
    >


    [The following was written in a footnote]

    > Sometime when you're feeling
    > masochistic, try implementing fully general coroutines without


    What is a 'coroutine' please? Also an example and a use might be
    helpful.

    > invoking implementation-specific knowledge. But even that *can*
    > be done if you don't mind explicitly managing your call-stack frames
    > (and if you do it right you even get first-class continuations out
    > of the deal).)


    A 'call-stack frame' must mean a list of function pointers that need to
    be called FIFO?

    What is a 'continuation'?

    Thanks,
    Matt Smiglarski
    ballpointpenthief, Sep 6, 2006
    #5
  6. Just realised that I'm being very stupid about the pointer fitting in a
    char (see below), and have just realised why an (int *) doesn't work
    since writing this update.
    Sorry for top-posting.

    ballpointpenthief wrote:
    > Dave Vandervies wrote:
    > > In article <>,
    > > ballpointpenthief <> wrote:

    >
    > > >One thing: What if we needed to play with an arbritrary number of
    > > >base_types?

    > >
    > >
    > > A fully type-agnostic map with this type of interface would look something
    > > like this:
    > > --------
    > > void map(const void *vfrom,size_t from_size,void *vto,size_t to_size,
    > > size_t nelem,
    > > void (*trans_func)(const void *from,void *to,void *cookie),
    > > void *cookie)
    > > {
    > > const char *from=vfrom;
    > > unsigned char *to=vto;
    > > size_t i;
    > >
    > > for(i=0;i<nelem;i++)
    > > {
    > > trans_func(from+i*from_size,to+i*to_size,cookie);
    > > }
    > > }
    > > --------
    > > and the transform function would look like this:
    > > --------
    > > void frob(const void *vfrom,void *to,void *vcookie)
    > > {
    > > const type1 *from=vfrom;
    > > type2 *to=vto;
    > > cookie_type *cookie=vcookie;
    > >
    > > /*Now do stuff with *from (and optionally *cookie) and
    > > stuff the results into *to
    > > */
    > > }

    >
    > ...And here's my attempt:
    >
    > #include <stdio.h>
    > #include <stddef.h>
    >
    > /**********************************************************/
    >
    > void map(void *dst, size_t dst_size, const void *src,
    > size_t src_size, void *ext_var,
    > void (*trans_func)(void *dst, const void *src, void
    > *ext_var),
    > size_t arr_len)
    > {
    > for (size_t i=0; i<arr_len; i++)
    > {
    > trans_func((unsigned char *)dst+(i*dst_size),
    > (const unsigned char *)src+(i*src_size), ext_var);
    > }
    > }
    >
    > /**********************************************************/
    >
    > // Transformation function #1
    > void add_x_to_an_int(void *dst, const void *src, void *ext_var)
    > {
    > // This function assumes that all three arguments are int *'s
    > *((int *)dst) = *((const int *)src) + *((int *)ext_var);
    > }
    >
    > /**********************************************************/
    >
    > void print_arr(int arr[10])
    > {
    > for (int i=0; i<10; i++) printf("%i ", arr);
    > printf("\n");
    > }
    >
    > int main(void)
    > {
    > int test_src[10] = {0,1,2,3,4,5,6,7,8,9};
    > int test_dst[10];
    > int test_num = 17;
    > map(test_dst, sizeof(int), test_src, sizeof(int),
    > &test_num, add_x_to_an_int, 10);
    > print_arr(test_src);
    > print_arr(test_dst);
    > return 0;
    > }
    >
    > I struggled a bit with this: Why cast to a (char *)? Why not a char? At
    > first I was doing:
    >
    > trans_func( (void *)((char)dst+(i*dst_size)), ... with a cast to
    > increment the pointer, and then casting back to a (void *).
    > The former needed a char * for some reason, and the latter seems to be
    > unneccesary.
    >
    >
    > > I'm not quite sure what you mean by "means of combination".

    >
    > I suppose I meant combining first-class objects to create new first
    > class objects on the fly. (I think I am using the term first-class
    > correctly.)
    >
    > > If you mean being able to use different aggregate types, you do that
    > > the same way as you'd do it with different basic types - either by
    > > writing one version for every type or by applying some cleverness to
    > > avoid having to do all that work (either by getting a code generator to
    > > do it for you or by avoiding it altogether).
    > >

    >
    > [The following was written in a footnote]
    >
    > > Sometime when you're feeling
    > > masochistic, try implementing fully general coroutines without

    >
    > What is a 'coroutine' please? Also an example and a use might be
    > helpful.
    >
    > > invoking implementation-specific knowledge. But even that *can*
    > > be done if you don't mind explicitly managing your call-stack frames
    > > (and if you do it right you even get first-class continuations out
    > > of the deal).)

    >
    > A 'call-stack frame' must mean a list of function pointers that need to
    > be called FIFO?
    >
    > What is a 'continuation'?
    >
    > Thanks,
    > Matt Smiglarski
    ballpointpenthief, Sep 6, 2006
    #6
  7. ballpointpenthief

    Default User Guest

    ballpointpenthief wrote:

    > Just realised that I'm being very stupid




    Please don't top-post. Your replies belong following or interspersed
    with properly trimmed quotes. See the majority of other posts in the
    newsgroup, or:
    <http://www.caliburn.nl/topposting.html>




    Brian
    Default User, Sep 6, 2006
    #7
  8. In article <>,
    ballpointpenthief <> wrote:
    >
    >Dave Vandervies wrote:



    >void map(void *dst, size_t dst_size, const void *src,
    > size_t src_size, void *ext_var,
    > void (*trans_func)(void *dst, const void *src, void
    >*ext_var),
    > size_t arr_len)
    >{
    > for (size_t i=0; i<arr_len; i++)
    > {
    > trans_func((unsigned char *)dst+(i*dst_size),
    > (const unsigned char *)src+(i*src_size), ext_var);
    > }
    >}


    It's probably worth getting in the habit of declaring local non-void
    pointers for void-pointer arguments that you plan to use instead of
    casting them where you use them, since pointer casts often mean there's
    something sketchy going on. (Though in this case it's worth noting that
    the casts are only doing what the castless conversion would do anyways.)

    This lets you keep the "Pointer cast -> suspicious" neurons active
    while you're using this technique to implement type-agnostic functions.
    If you ever have the misfortune to find yourself working somewhere
    where productivity is measured in lines of code, it also gives you a way
    to pad your numbers while actually *reducing* the cognitive burden on
    both yourself and maintenance programmers (since they get to keep their
    "pointer cast -> suspicious" neurons active too).




    >I struggled a bit with this: Why cast to a (char *)? Why not a char? At
    >first I was doing:
    >
    >trans_func( (void *)((char)dst+(i*dst_size)), ... with a cast to
    >increment the pointer, and then casting back to a (void *).
    >The former needed a char * for some reason, and the latter seems to be
    >unneccesary.


    Remember that in C, "char" is a synonym for "byte" (as well as being
    "holds a character", except when it's too small for that).

    When you put "test_src" (or "test_dst") in the list of arguments to map(),
    the compiler does a few things for you:
    -Converts the array (int[10]) to a pointer to its first argument (int *)
    (This happens any time you use an array name in a value context)
    -Converts the int * (type of the actual value given as an argument) to a
    void *-to-void (type of the function argument)
    void * has a few special properties that make it a generic pointer:
    -Any data pointer type can be converted to a void * (and compilers
    shouldn't complain)
    -A void * can be converted to any data pointer type (and compilers
    shouldn't be complain)
    -For any data type T, a void * that was obtained by converting a T *
    into a void * can be converted back to a T * that will be equivalent
    to the original one
    map() doing its magic ends up depending on all of these, but actually
    passing it to the function is where we're using the first one.
    -Puts that void * wherever map() expects to find it


    So the pointer that map() gets looks like this:
    +-void * pointing here
    v
    ---------------------------------------- <-- the array
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ <-- bytes
    ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ <-- ints

    Since void * is a generic data type, the compiler doesn't know where
    to point the result when you try to do pointer arithmetic on it[1].
    Pointer arithmetic moves the pointer by the number of *things it's
    pointing at* you add or subtract, but the internal representation
    typically works in terms of which *byte* the pointer refers to[2].

    But map() knows (because we told it) how many bytes each data object
    takes up, and we know that "char" is exactly one byte in size, so we can
    convert our void * to a char * (either at the beginning of the function
    or every time we use it - this particular conversion is a no-op[3], so
    any self-respecting optimizer will generate the same code either way),
    and do the pointer arithmetic on *that* and get a byte pointer pointing at
    the beginning of the data object we want the callback function to work on:

    +-byte pointer pointing here
    | +-add i*size to get byte pointer pointing here
    v v
    ---------------------------------------- <-- the array
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ <-- bytes
    ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ <-- ints

    Once we have that pointer, we can pass it to the callback; we have a
    callback that expects a void * (generic pointer), but since the compiler
    knows that the callback expects a void * it will take care of that
    conversion (since that's why generic pointers exist).


    But note that with all of this manipulation, we're only working with
    pointers and offsets, not with "actual" values. The only reason "char"
    shows up in the code at all is because we know that it has a useful size.

    We can convert a pointer to an integer type (including char), but that
    conversion isn't guaranteed to be reversible (and, for char, it's highly
    unlikely that it will be). So converting the pointer to a char probably
    threw away most of the information that the pointer contained, and then
    converting it back to a pointer gave you a pointer to some random chunk
    of memory that (invoking some knowledge of what kind of implementation
    you're likely to be working with) the program probably wasn't allowed
    to access, so when the callback function tried to use *that* pointer it
    probably crashed.



    [Everything below here is starting to drift off-topic for CLC.
    comp.programming might be a good next stop to get more information.]

    >[The following was written in a footnote]
    >
    >> Sometime when you're feeling
    >> masochistic, try implementing fully general coroutines without

    >
    >What is a 'coroutine' please? Also an example and a use might be
    >helpful.


    Coroutines are a control structure that allows two (or more) independent
    "threads" of program execution to pass control between them. (This is
    NOT what most programmers are referring to when they talk about threads,
    but I can't think of a better term for it.)

    Essentially, a coroutine can call another coroutine as if it were a
    function call, but when a coroutine is called it will resume where it
    left off (as if it called a function that's returning) instead of at
    the beginning (as if it were a function being called).

    Google will happily give you lots of information, some of which is
    probably reliable.
    There's a pretty good description and a not-pathologically-limited
    implementation at
    http://www.chiark.greenend.org.uk/~sgtatham/coroutines.html .



    >> invoking implementation-specific knowledge. But even that *can*
    >> be done if you don't mind explicitly managing your call-stack frames
    >> (and if you do it right you even get first-class continuations out
    >> of the deal).)

    >
    >A 'call-stack frame' must mean a list of function pointers that need to
    >be called FIFO?


    No, it's the list of functions that *have been* called and haven't
    returned yet, and contains things like local variables and where they
    need to return to.
    Most languages have a single FIFO call stack (hence the name "stack"),
    and when a function returns it pops that function's call frame off the
    stack and returns to the next function.
    Coroutines need a separate call stack for each coroutine (since it needs
    to be kept when control passes to another coroutine), and if you allow
    the program to access them and don't require them to be strictly FIFO,
    you end up getting closer to:

    >What is a 'continuation'?


    It describes what happens next. In a lot of languages it's just
    "the next operation gets run", but if you're working with a language
    that lets you capture a continuation at an arbitrary point and return
    to a captured continuation, you can use it to implement all sorts of
    interesting things.
    (It turns out that once you have this capability, you can use it to
    implement any other flow control construct you want. I went and poked
    my sigmonster to get some commentary on it in my .sig quote on this
    post, f'rexample.)

    One of the easier uses to describe is for early-exit: Capture a
    continuation just before you start, and if you get the answer just feed
    it to the continuation. At the continuation-capture point, you need some
    way to tell whether it's just been captured (and you want to start the
    computation) or it's been used for early-exit (and you want to carry on
    with whatever it was you wanted the result for), but that can be easier
    than backing out of, say, a deeply nested recursion one call at a time.


    dave

    [1] Actually, that's not quite correct, but if I didn't wave my hands
    here I'd get rather too far away from the topic at hand.

    [2] That's not required, but the compiler needs to be able to fake it
    (because of, among other things, precisely the sorts of thing we're
    talking about here). It's perfectly valid for most data pointers
    to be word pointers and for byte pointers to contain a word pointer
    and a byte offset in that word; and there have been real machines
    that did this.

    [3] char * and void * are required to have the same representation; no
    other distinct types have that requirement (though it also applies
    to "signed T *" and "unsigned T *" for any T for which signed and
    unsigned are meaningful).

    --
    Dave Vandervies
    You can even use it, if you're in a particularly troublesome mood, to
    implement BASIC's goto statement or INTERCAL's comefrom statement.
    --Bear in comp.lang.scheme
    Dave Vandervies, Sep 8, 2006
    #8
  9. In article <>,
    ballpointpenthief <> wrote:

    > and have just realised why an (int *) doesn't work
    >since writing this update.


    I have no idea what you're talking about here, and I just wrote a reply
    to the post you're commenting on.

    >Sorry for top-posting.


    ....but you did provide a perfect example of why it's a Bad Thing.
    If you'd put the comment quoted above after the section to which it was
    relevant, and trimmed the rest of the quoted material, it would probably
    have made a lot more sense.


    dave

    --
    Dave Vandervies
    You can even use it, if you're in a particularly troublesome mood, to
    implement BASIC's goto statement or INTERCAL's comefrom statement.
    --Bear in comp.lang.scheme
    Dave Vandervies, Sep 8, 2006
    #9
  10. On Sat, 2 Sep 2006 05:06:07 +0000 (UTC),
    (Dave Vandervies) wrote:

    > In article <>,
    > ballpointpenthief <> wrote:
    > >How good is the C language for implementing maps, filters and
    > >accumulators?
    > >
    > >SCHEME programmers would be able to pass procedures as arguments quite
    > >easily, whereas C programmers would - I guess - have use variadic
    > >function pointers which is frankly disgusting, and I'm not even sure I
    > >want to sit down and hack at the language in that way.
    > >
    > >Your thoughts? Any libraries you could refer me to?

    >
    > C programmers can pass pointers to functions around <snip>
    > C doesn't have anonymous functions or closures, and you may miss those,
    > but anonymous functions are "only" a notational convenience (if sometimes
    > a major one!) and closures can be faked, <snip>
    > They don't have to be variadic functions (and trying to do it that way
    > would make it unnecessarily much uglier), <snip>


    Agree so far.

    > Here's some examples to (hopefully) get you started. <snip>
    > /*Note that these have to be written for every type you plan to use them
    > with (and, for ones that can sensibly work with multiple types, for
    > every combination of types).
    > C++ templates may make this easier if you want to work with several
    > different data types, but for a small number of types doing it this
    > way will work fine. (These are simple and specialized enough that
    > you could probably write a code generator to produce C code for the
    > types you want if you don't want to use C++ just to get templates.)


    As long as you use only named (or tagged) types -- and you can name
    any type with typedef -- you can (I believe always) get this level of
    templating with C preprocessor pasting. But this also gets ugly fast.

    OTOH and OT, C++ can fake (or arguably implement) closures more
    elegantly, by packaging the data (or references) with the functions in
    a 'function object' which can be applied with more natural syntax.
    It's 'only sugar', but sugar is still sweet. The C++ Standard library
    even includes some basic, but still useful, examples. And FWIW with
    C++ you can use nearly all of your C knowledge and experience and
    often most or all of your tools, and have a guaranteed Standard
    (portable) binding to actual C if and when you need it.

    > Even with templates the types still need to be statically known at
    > compile time. Implementing true dynamic typing is probably
    > sufficiently ugly that you'd be better off finding an embeddable
    > Scheme interpreter.
    > */
    >

    C++ also provides dynamic typing within a hierarchy that you define --
    which doesn't cover all types, and not necessarily ones supplied or
    needed by thirdparty or library code, but with very little effort it
    can be all types that your own code needs dynamically typed.


    - David.Thompson1 at worldnet.att.net
    Dave Thompson, Sep 10, 2006
    #10
  11. On Fri, 8 Sep 2006 03:09:32 +0000 (UTC),
    (Dave Vandervies) wrote:
    <snip lots about pointers and some other stuff>
    > [3] char * and void * are required to have the same representation; no


    The wording is imprecise but I believe it's actually pointers to all
    flavors of char: plain, signed, and unsigned.

    > other distinct types have that requirement (though it also applies
    > to "signed T *" and "unsigned T *" for any T for which signed and
    > unsigned are meaningful).


    I don't see that. Actual signed and unsigned integer type pairs (not
    pointers) are required to have the same representation for values in
    the shared (intersecting) range, and to have the same size and
    alignment; so in practice there is no valid reason for pointers to
    them to be different, but they aren't required to be the same.

    What ARE required to have the same representation are:

    - pointers to all struct types, and (separately) to all union types.
    This is needed for opaque ones, and forward ones one-pass.

    - pointers to differently-qualified types e.g. const int * and int *.
    Which are formally distinct types in the type system, although
    arguably not 'really' different.

    - David.Thompson1 at worldnet.att.net
    Dave Thompson, Sep 21, 2006
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Pedja

    IE filters and ASP controls

    Pedja, Jan 20, 2004, in forum: ASP .Net
    Replies:
    1
    Views:
    406
    Curt_C [MVP]
    Jan 20, 2004
  2. Simon Elliott
    Replies:
    4
    Views:
    1,155
    Simon Elliott
    Mar 10, 2005
  3. Eugene Van den Bulke

    accumulators

    Eugene Van den Bulke, Jun 12, 2004, in forum: Python
    Replies:
    7
    Views:
    501
    Scott David Daniels
    Jun 25, 2004
  4. Dieter Vanderelst

    using the Filters DLL (image filters)

    Dieter Vanderelst, Feb 15, 2006, in forum: Python
    Replies:
    1
    Views:
    411
    Michele Petrazzo
    Feb 15, 2006
  5. Marcus
    Replies:
    2
    Views:
    586
    Marcus
    Dec 9, 2005
Loading...

Share This Page