extending printf

J

jacob navia

There are several implementations of printf that allow users to define
new %X formats.

gcc and trio_printf propose a new function like

SetNewPrintfFormat(char format, ...); // arguments are compiler specific

for instance:
PrintfFlags pf_flags = FLAG_ALTERNATE;

PrintfCallback MyFormattingFunction;

SetNewPrintfFormat('C',MyFormattingFunction);

Then, you can write:

printf("Customer: %C\n",customer);

The problem with this approach is that all functions that check printf
arguments within the compiler do not know about the %C format and will
report spurious warnings.

I am thinking that a

#pragma printf('C',MyFormattingFunction)

would be much better in the sense that the compiler would be informed of
the substitution and can take it into account in its printf checking
routines.

What do you think?

What problems ould be involved with this approach?

Thanks in advance for your input.

jacob
 
S

Stefan Ram

jacob navia said:
SetNewPrintfFormat(char format, ...); // arguments are compiler specific

The real advantage of OOP as in Java is /extensibilty/.

When a type like »com.example.Currency« is added, this type
can specify its own toString() method for conversions to
a string. The language dictates that the type name
»com.example.Currency« is unique! Therefore, »myCurrency.toString()«
will invoke the com.example.Currency#toString() method,
no matter how many other toString() methods might be there.

In the above case, I do not see how it is resolved that two
separate extensions from two independent manufactures of C
libraries /both/ will define the same format »%C«. The
uppercase English alphabet only has so many letters.
 
B

BartC

jacob navia said:
There are several implementations of printf that allow users to define new
%X formats.

gcc and trio_printf propose a new function like

SetNewPrintfFormat(char format, ...); // arguments are compiler specific

for instance:
PrintfFlags pf_flags = FLAG_ALTERNATE;

PrintfCallback MyFormattingFunction;

SetNewPrintfFormat('C',MyFormattingFunction);

Then, you can write:

printf("Customer: %C\n",customer);

Would the usual flags and options between % and C also be passed?

If 'customer' was some kind of integer type, would it be necessary to write
the requisite number of "l" width modifiers, or is that expected to be
hard-coded within the spec of the custom function?
What do you think?

What problems ould be involved with this approach?

Well, one alternative to this, would be:

printf("Customer: %s\n",MyFormattingFunction(customer));

The custom format would keep things shorter if there are lots of 'customer'
types to be printed. Doing it as a standard function returning a string,
also brings up the issue of managing the string memory, but it's not clear
how the custom handler would deal with the same problem.

But on the question of extensions to printf, something that came up in the
'Padding involved' thread was the myriad different formats and modifiers
that are now needed ("%zu" for size_t", or writing "%lld" for long long int
etc).

How hard would it be to implement something like "%?", where ? is
substituted with an appropriate default format matching each argument? (Or
perhaps %?d where ? is the requisite number of "l" modifers needed.)
 
B

BartC

Stefan Ram said:
The real advantage of OOP as in Java is /extensibilty/.

When a type like »com.example.Currency« is added, this type
can specify its own toString() method for conversions to
a string. The language dictates that the type name
»com.example.Currency« is unique! Therefore, »myCurrency.toString()«
will invoke the com.example.Currency#toString() method,
no matter how many other toString() methods might be there.

In the above case, I do not see how it is resolved that two
separate extensions from two independent manufactures of C
libraries /both/ will define the same format »%C«. The
uppercase English alphabet only has so many letters.

I don't think this is intended as general solution that can do everything
Java might do.

The %A to %Z format specifiers are clearly a limited resource, but that
doesn't mean they should stay unused. This is a mechanism to make some
limited use of them. Avoiding conflicts between different users of the same
format is a separate matter.

(And I can think of one way to share the same format between different parts
of a program, but it might require more elaborate compiler support.)
 
E

Eric Sosman

[...]
Well, one alternative to this, would be:

printf("Customer: %s\n",MyFormattingFunction(customer));

The custom format would keep things shorter if there are lots of 'customer'
types to be printed. Doing it as a standard function returning a string,
also brings up the issue of managing the string memory, but it's not clear
how the custom handler would deal with the same problem.

Here's a technique I've found useful:

/**
* Returns a pointer to the start of a "not too temporary" character
* array of at least the stated size. The array will persist until
* seven more calls have been made, after which it may be deallocated
* or overwritten.
*/
char *
tempTextBuff(size_t size)
{
static struct { /* a text buffer: */
char *text; /* the buffer's start */
size_t size; /* the buffer's size */
} buff[8];
static int index = 0; /* index of last buffer used */

index = (index + 1) & 7;
if (buff[index].size < size) {
free(buff[index].text);
size = (size < 60) ? 60 : size;
buff[index].text = getmem(size);
buff[index].size = size;
}
return buff[index].text;
}

(The call to getmem() is just a "malloc() or die" wrapper,
and the line before it tries to avoid multiple small allocations.)

It's far from a "general solution" because somebody might
try to keep using a pointer after it's gone stale and been
re-gifted to somebody else: If there are P pointers in the pool
and P+1 of them get used in one printf() call, you're sunk.
Still, it seems to work well "in practice."
 
B

Ben Bacarisse

BartC said:
I don't think this is intended as general solution that can do everything
Java might do.

The %A to %Z format specifiers are clearly a limited resource,

The joy of undefined behaviour! The standard allows for a very wide
range of extensions. I can't see why "%Ua", "%Ub" and so on can't be
used, nor, for that matter, "%{currency}" or even
"%{matrix;4x3;float;5.3f}".
 
B

BartC

Ben Bacarisse said:
The joy of undefined behaviour! The standard allows for a very wide
range of extensions. I can't see why "%Ua", "%Ub" and so on can't be
used,

I stopped short of suggesting that multi-letter extensions be used (because
they start to become cryptic, they start to make more demands on
(programmer) memory, and don't really resolve the issue of conflicts, just
delay them).
nor, for that matter, "%{currency}" or even
"%{matrix;4x3;float;5.3f}".

But I didn't realise you could go that far.

Although if the specifier gets too long, the advantages of having it in the
format string, instead of as a function operating on an argument (which will
work anywhere), become less.
 
B

BartC

Eric Sosman said:
[...]
Well, one alternative to this, would be:

printf("Customer: %s\n",MyFormattingFunction(customer));

The custom format would keep things shorter if there are lots of
'customer'
types to be printed. Doing it as a standard function returning a string,
also brings up the issue of managing the string memory, but it's not
clear
how the custom handler would deal with the same problem.

Here's a technique I've found useful:
static struct { /* a text buffer: */
char *text; /* the buffer's start */
size_t size; /* the buffer's size */
} buff[8];
static int index = 0; /* index of last buffer used */
It's far from a "general solution" because somebody might
try to keep using a pointer after it's gone stale and been
re-gifted to somebody else: If there are P pointers in the pool
and P+1 of them get used in one printf() call, you're sunk.
Still, it seems to work well "in practice."

I've used a pool of three strings (static, fixed allocation), for calls to
external dynamic or OS functions.

There are only three because, so far, the maximum number of char* arguments
to such a function is three (this is where counted strings have to be
converted to zero-terminated ones).

I'm just hoping none of those functions do a call-back to my program which
require another such call before the first one returns. Maybe I should have
a pool of six strings instead...
 
M

Melzzzzz

Would the usual flags and options between % and C also be passed?

If 'customer' was some kind of integer type, would it be necessary to
write the requisite number of "l" width modifiers, or is that
expected to be hard-coded within the spec of the custom function?


Well, one alternative to this, would be:

printf("Customer: %s\n",MyFormattingFunction(customer));

The custom format would keep things shorter if there are lots of
'customer' types to be printed. Doing it as a standard function
returning a string, also brings up the issue of managing the string
memory, but it's not clear how the custom handler would deal with the
same problem.

Custom handler could take FILE* as parameter and print directly.
Same for snprintf, custom handler could take char* instead,
so there would be two registered functions.
Or you can combine formatting function with eg %S registered function
that will print and free string.
Problem is that all of this would be compiler specific so one can't use
it universally.
 
S

Stefan Ram

BartC said:
I've used a pool of three strings (static, fixed allocation), for calls to
external dynamic or OS functions.

A general solution is dynamic memory. It must be
communicated clearly to the programmer that he is obliged to
free the memory. Therefore, it might be most non-surprising
to follow the lead of the standard library and name the
functions starting with »malloc_«. For example,

if( s = malloc_sprintf( "%d\n", 2 )){ emit( s ); free( s ); }

or

if( s = malloc_sprintf( "%d\n", 2 ))emit_free( s );

, where »_free« communicates that this function deallocates.

malloc_sprintf can first call »vsnprintf( 0, 0,« to
determine the size of the buffer needed and only then
»vsprintf« to actually print into the buffer.
 
E

Eric Sosman

A general solution is dynamic memory. It must be
communicated clearly to the programmer that he is obliged to
free the memory. Therefore, it might be most non-surprising
to follow the lead of the standard library and name the
functions starting with »malloc_«. For example,

if( s = malloc_sprintf( "%d\n", 2 )){ emit( s ); free( s ); }

or

if( s = malloc_sprintf( "%d\n", 2 ))emit_free( s );

, where »_free« communicates that this function deallocates.

malloc_sprintf can first call »vsnprintf( 0, 0,« to
determine the size of the buffer needed and only then
»vsprintf« to actually print into the buffer.

The problem under discussion is not primarily about managing
the memory, but about adding custom formatting to printf() et al.
My suggestion of using a "not too temporary" buffer was aimed at
a usage like

printf("A = %s, B = %s\n",
thingToString(aThing), thingToString(bThing));

.... which your approach doesn't seem to handle well.

Another advantage of the "not too temporary" buffer is that
since the caller doesn't manage the memory, the formatter may
choose to use or not use dynamic memory as circumstances dictate.
Elsewhere in the program I snipped my code sample from is a
function `const char *htmlEscape(const char *string)', which
does what its name suggests: It returns a pointer to a string in
which HTML meta-characters have been replaced by their escape
sequences. In the case where the original string has no characters
that need escaping it just returns the original string; otherwise,
it builds the replacement in a "not too temporary" buffer and
returns that, instead. The caller needn't care which pointer is
returned, and needn't worry about managing it.

It's not a fully general solution to this class of problem,
as I wrote earlier. Still, I've found it remarkably helpful.
 
M

Malcolm McLean

gcc and trio_printf propose a new function like

SetNewPrintfFormat(char format, ...); // arguments are compiler specific

What problems ould be involved with this approach?
This sort of thing only works as long as only one library is involved.
As soon as you have two, you get conflicts because one person sets %Z to
mean one thing, another takes the same to mean another.
Also, you don't really want to mess with something as fundamental as printf().
Imagine trying to write debug code where you're not sure how printf() will
behave because there are hooks into it which could do anything.

However it's easy enough to write an xprintf() family of functions.
Then it's simply a case of providing xformat(char *fmt, formatfunction fun);
There are rather tedious question about formatfunction. For efficiency reasons
it needs to take a char *outputbuffer rather than a FILE *, so it need to be
passed a buffer size. Then you've got to decide on the behaviour if the buffer
is too small. Remember the format function is user code, you can't impose too
much fiddliness on it.

You've also got to consider how many printf() arguments the format will take.
Things like printf("%.*g", DBL_DIG, x) are useful, even indispensable in
situations where you mustn't lose a single bit of accuracy and must be
portable.

You also want to be utf8-transparent. That solves the "not enough format
specifiers" problem, because you can grab Greek or Chinese characters for
exotic formatting.


However a more general solution is to have a "to string" mechanism.
 
I

Ian Collins

BartC said:
How hard would it be to implement something like "%?", where ? is
substituted with an appropriate default format matching each argument? (Or
perhaps %?d where ? is the requisite number of "l" modifers needed.)

I'm surprised no one had picked up on this, it looks like a good
solution. It would certainly save having to remember the correct
specifiers for built in types!

Specifiers for user defined types could be added with an appropriate
pragma, something like:

typedef struct
{
int x,y;
} Point;

void printPoint( FILE* fp, const Point* p );

#pragma specifier (Point*,printPoint)

....

Point point = {1,2};

printf( "%?\n", &point );
 
J

jacob navia

Le 08/03/2014 20:22, Ian Collins a écrit :
I'm surprised no one had picked up on this, it looks like a good
solution. It would certainly save having to remember the correct
specifiers for built in types!

Specifiers for user defined types could be added with an appropriate
pragma, something like:

typedef struct
{
int x,y;
} Point;

void printPoint( FILE* fp, const Point* p );

#pragma specifier (Point*,printPoint)

...

Point point = {1,2};

printf( "%?\n", &point );

!!!!

THAT LOOKS VERY INTERESTING!

But the devil is in the details, since, for instance:

#pragma specifier(int, printint)
#pragma specifier(long,printlong)


printf("%?\n",8);

which one should be called?

It is better to restrict that specifier to user defined types ONLY.
They would ALL receive a pointer to their type. The function would
need to be already declared.

VERY good idea Ian
 
I

Ian Collins

jacob said:
Le 08/03/2014 20:22, Ian Collins a écrit :

!!!!

THAT LOOKS VERY INTERESTING!

But the devil is in the details, since, for instance:

#pragma specifier(int, printint)
#pragma specifier(long,printlong)


printf("%?\n",8);

which one should be called?

printint, 8 is an int.

This is no different from the overloading rules in C++, for example:

#include <stdio.h>

void f( int n ) { puts("int"); }
void f( long n ) { puts("long"); }

int main()
{
f(8l);
f(8);
}

is unambiguous in C++.
It is better to restrict that specifier to user defined types ONLY.

I don't think so, given the above. If the compiler can check argument
types for printf specifiers, it could equally well select the specifier
to match the argument type.
They would ALL receive a pointer to their type. The function would
need to be already declared.

That's the idea.
VERY good idea Ian

I'll frame that!
 
B

BartC

jacob navia said:
Le 08/03/2014 20:22, Ian Collins a écrit :
#pragma specifier(int, printint)
#pragma specifier(long,printlong)


printf("%?\n",8);

which one should be called?

Wouldn't a freestanding 8 constant be of type int? Or does it always need a
context to help determine the type? And an 8L might be of type long.

However one problem with the "?" idea is that it can only map to one default
format specifier (so any signed int type might use "%d", "%ld" or "%lld").

Sometimes a choice is needed also, between, say, "%d", "%x, "%X", and "%c"
(and why not throw in a binary format too"). In this case a way is need to
specify the alternate format while still being immune from having to
maintain the right number of "l" width modifiers.
 
K

Keith Thompson

BartC said:
Wouldn't a freestanding 8 constant be of type int? Or does it always need a
context to help determine the type? And an 8L might be of type long.

Yes, the literal 8 is always of type int, regardless of the context in
which it appears. 8L is always of type long, and 8LL is always of type
long long.

The rules are N1570 section 6.4.4.1, "Integer constants".
 
I

Ian Collins

BartC said:
Wouldn't a freestanding 8 constant be of type int? Or does it always need a
context to help determine the type? And an 8L might be of type long.

However one problem with the "?" idea is that it can only map to one default
format specifier (so any signed int type might use "%d", "%ld" or "%lld").

I would reverse that argument and say the benefit of the "?" idea is the
compiler can deduce the correct specifier for the parameter type. This
is similar to writing

T* p = malloc( sizeof *p );

rather than

T* p = malloc( sizeof(T) );
Sometimes a choice is needed also, between, say, "%d", "%x, "%X", and "%c"
(and why not throw in a binary format too"). In this case a way is need to
specify the alternate format while still being immune from having to
maintain the right number of "l" width modifiers.

The current specifiers could still be used, "?" could be a shorthand for
the default specifier.
 
J

jacob navia

Le 08/03/2014 22:00, Dr Nick a écrit :
a) The formatting function either has to allocate it's own memory (and
which printf clears up), use a circular buffer list or (and what I
preferred) have access to the magic that printf uses - so get access to
"put this character to the active stream".


I use already the second form. Of course I have a structure with a field
that is a function pointer to a "putchar" similar function that will put
a character into a file or into a string for implementing with the same
code sprintf AND printf AND fprintf!

One of the parameters of the callback will be a function pointer to the
current output stream, so that the same code works for frintf AND for
sprintf!
 
B

Ben Bacarisse

BartC said:
However one problem with the "?" idea is that it can only map to one default
format specifier (so any signed int type might use "%d", "%ld" or "%lld").

Sometimes a choice is needed also, between, say, "%d", "%x, "%X", and
"%c"

I think you proposal is just a little off. You need, in my opinion,
%?d. The compiler can then determine the right length modifier, but
surely you know you want a signed decimal conversion? %?x does hex
conversion for an unsigned type of whatever length. For example, when
the argument is a size_t, %?x would generate %ux. You could, as final
touch, allow %?? which would fill in the length *and* decide on one of
'd', 'u' or 'g' for the conversion specifier based on the type.

<snip>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,764
Messages
2,569,567
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top