extending printf

Discussion in 'C Programming' started by jacob navia, Mar 8, 2014.

  1. jacob navia

    jacob navia Guest

    There are several implementations of printf that allow users to define
    new %X formats.

    gcc and trio_printf propose a new function like

    SetNewPrintfFormat(char format, ...); // arguments are compiler specific

    for instance:
    PrintfFlags pf_flags = FLAG_ALTERNATE;

    PrintfCallback MyFormattingFunction;

    SetNewPrintfFormat('C',MyFormattingFunction);

    Then, you can write:

    printf("Customer: %C\n",customer);

    The problem with this approach is that all functions that check printf
    arguments within the compiler do not know about the %C format and will
    report spurious warnings.

    I am thinking that a

    #pragma printf('C',MyFormattingFunction)

    would be much better in the sense that the compiler would be informed of
    the substitution and can take it into account in its printf checking
    routines.

    What do you think?

    What problems ould be involved with this approach?

    Thanks in advance for your input.

    jacob
     
    jacob navia, Mar 8, 2014
    #1
    1. Advertisements

  2. jacob navia

    Stefan Ram Guest

    The real advantage of OOP as in Java is /extensibilty/.

    When a type like »com.example.Currency« is added, this type
    can specify its own toString() method for conversions to
    a string. The language dictates that the type name
    »com.example.Currency« is unique! Therefore, »myCurrency.toString()«
    will invoke the com.example.Currency#toString() method,
    no matter how many other toString() methods might be there.

    In the above case, I do not see how it is resolved that two
    separate extensions from two independent manufactures of C
    libraries /both/ will define the same format »%C«. The
    uppercase English alphabet only has so many letters.
     
    Stefan Ram, Mar 8, 2014
    #2
    1. Advertisements

  3. jacob navia

    BartC Guest

    Would the usual flags and options between % and C also be passed?

    If 'customer' was some kind of integer type, would it be necessary to write
    the requisite number of "l" width modifiers, or is that expected to be
    hard-coded within the spec of the custom function?
    Well, one alternative to this, would be:

    printf("Customer: %s\n",MyFormattingFunction(customer));

    The custom format would keep things shorter if there are lots of 'customer'
    types to be printed. Doing it as a standard function returning a string,
    also brings up the issue of managing the string memory, but it's not clear
    how the custom handler would deal with the same problem.

    But on the question of extensions to printf, something that came up in the
    'Padding involved' thread was the myriad different formats and modifiers
    that are now needed ("%zu" for size_t", or writing "%lld" for long long int
    etc).

    How hard would it be to implement something like "%?", where ? is
    substituted with an appropriate default format matching each argument? (Or
    perhaps %?d where ? is the requisite number of "l" modifers needed.)
     
    BartC, Mar 8, 2014
    #3
  4. jacob navia

    BartC Guest

    I don't think this is intended as general solution that can do everything
    Java might do.

    The %A to %Z format specifiers are clearly a limited resource, but that
    doesn't mean they should stay unused. This is a mechanism to make some
    limited use of them. Avoiding conflicts between different users of the same
    format is a separate matter.

    (And I can think of one way to share the same format between different parts
    of a program, but it might require more elaborate compiler support.)
     
    BartC, Mar 8, 2014
    #4
  5. jacob navia

    Eric Sosman Guest

    Here's a technique I've found useful:

    /**
    * Returns a pointer to the start of a "not too temporary" character
    * array of at least the stated size. The array will persist until
    * seven more calls have been made, after which it may be deallocated
    * or overwritten.
    */
    char *
    tempTextBuff(size_t size)
    {
    static struct { /* a text buffer: */
    char *text; /* the buffer's start */
    size_t size; /* the buffer's size */
    } buff[8];
    static int index = 0; /* index of last buffer used */

    index = (index + 1) & 7;
    if (buff[index].size < size) {
    free(buff[index].text);
    size = (size < 60) ? 60 : size;
    buff[index].text = getmem(size);
    buff[index].size = size;
    }
    return buff[index].text;
    }

    (The call to getmem() is just a "malloc() or die" wrapper,
    and the line before it tries to avoid multiple small allocations.)

    It's far from a "general solution" because somebody might
    try to keep using a pointer after it's gone stale and been
    re-gifted to somebody else: If there are P pointers in the pool
    and P+1 of them get used in one printf() call, you're sunk.
    Still, it seems to work well "in practice."
     
    Eric Sosman, Mar 8, 2014
    #5
  6. The joy of undefined behaviour! The standard allows for a very wide
    range of extensions. I can't see why "%Ua", "%Ub" and so on can't be
    used, nor, for that matter, "%{currency}" or even
    "%{matrix;4x3;float;5.3f}".
     
    Ben Bacarisse, Mar 8, 2014
    #6
  7. jacob navia

    BartC Guest

    I stopped short of suggesting that multi-letter extensions be used (because
    they start to become cryptic, they start to make more demands on
    (programmer) memory, and don't really resolve the issue of conflicts, just
    delay them).
    But I didn't realise you could go that far.

    Although if the specifier gets too long, the advantages of having it in the
    format string, instead of as a function operating on an argument (which will
    work anywhere), become less.
     
    BartC, Mar 8, 2014
    #7
  8. jacob navia

    BartC Guest

    I've used a pool of three strings (static, fixed allocation), for calls to
    external dynamic or OS functions.

    There are only three because, so far, the maximum number of char* arguments
    to such a function is three (this is where counted strings have to be
    converted to zero-terminated ones).

    I'm just hoping none of those functions do a call-back to my program which
    require another such call before the first one returns. Maybe I should have
    a pool of six strings instead...
     
    BartC, Mar 8, 2014
    #8
  9. jacob navia

    Melzzzzz Guest

    Custom handler could take FILE* as parameter and print directly.
    Same for snprintf, custom handler could take char* instead,
    so there would be two registered functions.
    Or you can combine formatting function with eg %S registered function
    that will print and free string.
    Problem is that all of this would be compiler specific so one can't use
    it universally.
     
    Melzzzzz, Mar 8, 2014
    #9
  10. jacob navia

    Stefan Ram Guest

    A general solution is dynamic memory. It must be
    communicated clearly to the programmer that he is obliged to
    free the memory. Therefore, it might be most non-surprising
    to follow the lead of the standard library and name the
    functions starting with »malloc_«. For example,

    if( s = malloc_sprintf( "%d\n", 2 )){ emit( s ); free( s ); }

    or

    if( s = malloc_sprintf( "%d\n", 2 ))emit_free( s );

    , where »_free« communicates that this function deallocates.

    malloc_sprintf can first call »vsnprintf( 0, 0,« to
    determine the size of the buffer needed and only then
    »vsprintf« to actually print into the buffer.
     
    Stefan Ram, Mar 8, 2014
    #10
  11. jacob navia

    Eric Sosman Guest

    The problem under discussion is not primarily about managing
    the memory, but about adding custom formatting to printf() et al.
    My suggestion of using a "not too temporary" buffer was aimed at
    a usage like

    printf("A = %s, B = %s\n",
    thingToString(aThing), thingToString(bThing));

    .... which your approach doesn't seem to handle well.

    Another advantage of the "not too temporary" buffer is that
    since the caller doesn't manage the memory, the formatter may
    choose to use or not use dynamic memory as circumstances dictate.
    Elsewhere in the program I snipped my code sample from is a
    function `const char *htmlEscape(const char *string)', which
    does what its name suggests: It returns a pointer to a string in
    which HTML meta-characters have been replaced by their escape
    sequences. In the case where the original string has no characters
    that need escaping it just returns the original string; otherwise,
    it builds the replacement in a "not too temporary" buffer and
    returns that, instead. The caller needn't care which pointer is
    returned, and needn't worry about managing it.

    It's not a fully general solution to this class of problem,
    as I wrote earlier. Still, I've found it remarkably helpful.
     
    Eric Sosman, Mar 8, 2014
    #11
  12. This sort of thing only works as long as only one library is involved.
    As soon as you have two, you get conflicts because one person sets %Z to
    mean one thing, another takes the same to mean another.
    Also, you don't really want to mess with something as fundamental as printf().
    Imagine trying to write debug code where you're not sure how printf() will
    behave because there are hooks into it which could do anything.

    However it's easy enough to write an xprintf() family of functions.
    Then it's simply a case of providing xformat(char *fmt, formatfunction fun);
    There are rather tedious question about formatfunction. For efficiency reasons
    it needs to take a char *outputbuffer rather than a FILE *, so it need to be
    passed a buffer size. Then you've got to decide on the behaviour if the buffer
    is too small. Remember the format function is user code, you can't impose too
    much fiddliness on it.

    You've also got to consider how many printf() arguments the format will take.
    Things like printf("%.*g", DBL_DIG, x) are useful, even indispensable in
    situations where you mustn't lose a single bit of accuracy and must be
    portable.

    You also want to be utf8-transparent. That solves the "not enough format
    specifiers" problem, because you can grab Greek or Chinese characters for
    exotic formatting.


    However a more general solution is to have a "to string" mechanism.
     
    Malcolm McLean, Mar 8, 2014
    #12
  13. jacob navia

    Ian Collins Guest

    I'm surprised no one had picked up on this, it looks like a good
    solution. It would certainly save having to remember the correct
    specifiers for built in types!

    Specifiers for user defined types could be added with an appropriate
    pragma, something like:

    typedef struct
    {
    int x,y;
    } Point;

    void printPoint( FILE* fp, const Point* p );

    #pragma specifier (Point*,printPoint)

    ....

    Point point = {1,2};

    printf( "%?\n", &point );
     
    Ian Collins, Mar 8, 2014
    #13
  14. jacob navia

    jacob navia Guest

    Le 08/03/2014 20:22, Ian Collins a écrit :
    !!!!

    THAT LOOKS VERY INTERESTING!

    But the devil is in the details, since, for instance:

    #pragma specifier(int, printint)
    #pragma specifier(long,printlong)


    printf("%?\n",8);

    which one should be called?

    It is better to restrict that specifier to user defined types ONLY.
    They would ALL receive a pointer to their type. The function would
    need to be already declared.

    VERY good idea Ian
     
    jacob navia, Mar 8, 2014
    #14
  15. jacob navia

    Ian Collins Guest

    printint, 8 is an int.

    This is no different from the overloading rules in C++, for example:

    #include <stdio.h>

    void f( int n ) { puts("int"); }
    void f( long n ) { puts("long"); }

    int main()
    {
    f(8l);
    f(8);
    }

    is unambiguous in C++.
    I don't think so, given the above. If the compiler can check argument
    types for printf specifiers, it could equally well select the specifier
    to match the argument type.
    That's the idea.
    I'll frame that!
     
    Ian Collins, Mar 8, 2014
    #15
  16. jacob navia

    BartC Guest

    Wouldn't a freestanding 8 constant be of type int? Or does it always need a
    context to help determine the type? And an 8L might be of type long.

    However one problem with the "?" idea is that it can only map to one default
    format specifier (so any signed int type might use "%d", "%ld" or "%lld").

    Sometimes a choice is needed also, between, say, "%d", "%x, "%X", and "%c"
    (and why not throw in a binary format too"). In this case a way is need to
    specify the alternate format while still being immune from having to
    maintain the right number of "l" width modifiers.
     
    BartC, Mar 8, 2014
    #16
  17. Yes, the literal 8 is always of type int, regardless of the context in
    which it appears. 8L is always of type long, and 8LL is always of type
    long long.

    The rules are N1570 section 6.4.4.1, "Integer constants".
     
    Keith Thompson, Mar 8, 2014
    #17
  18. jacob navia

    Ian Collins Guest

    I would reverse that argument and say the benefit of the "?" idea is the
    compiler can deduce the correct specifier for the parameter type. This
    is similar to writing

    T* p = malloc( sizeof *p );

    rather than

    T* p = malloc( sizeof(T) );
    The current specifiers could still be used, "?" could be a shorthand for
    the default specifier.
     
    Ian Collins, Mar 8, 2014
    #18
  19. jacob navia

    jacob navia Guest

    Le 08/03/2014 22:00, Dr Nick a écrit :

    I use already the second form. Of course I have a structure with a field
    that is a function pointer to a "putchar" similar function that will put
    a character into a file or into a string for implementing with the same
    code sprintf AND printf AND fprintf!

    One of the parameters of the callback will be a function pointer to the
    current output stream, so that the same code works for frintf AND for
    sprintf!
     
    jacob navia, Mar 8, 2014
    #19
  20. I think you proposal is just a little off. You need, in my opinion,
    %?d. The compiler can then determine the right length modifier, but
    surely you know you want a signed decimal conversion? %?x does hex
    conversion for an unsigned type of whatever length. For example, when
    the argument is a size_t, %?x would generate %ux. You could, as final
    touch, allow %?? which would fill in the length *and* decide on one of
    'd', 'u' or 'g' for the conversion specifier based on the type.

    <snip>
     
    Ben Bacarisse, Mar 9, 2014
    #20
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.