Using printf in C++

Martin B. · May 14, 2012

All the C++ I write is in operating systems, hypervisors and other bare-metal
code for which performance is priority #1, and where we don't have the standard
C++ runtime (such as it is) available. No rtti, no exceptions, no templates
and very few reference objects.

I suppose you'd call it C with classes, rather than C++. I'm happy with that.

Just wondering ... I can understand not using RTTI and disabling
exceptions for performance reasons. (And with disabled exceptions, it
obviously follows that much stuff in the standard library doesn't work
properly.)

But no templates? Why? Templates as such certainly don't add anything to
runtime performance and are a tremendously powerful tool.

cheers,
Martin

Luca Risolia · May 14, 2012

More readable is certainly a matter of opinion, in this case.

if (bpp->b_core == -1) {
snprintf(core, sizeof(core), "%s", "All");
} else {
snprintf(core, sizeof(core), "%2.2d", bpp->b_core);
}
lp->log(" %2.2zu %016llx %3.3s %s\n",
b, bpp->b_address, core, bpp->b_enabled?"Yes":"No");

This is much more readable, and much more important to me, it will perform
much better than the C++ stuff you've written above (two function calls
versus many function calls, smaller cache footprint vs. larger cache footprint).

You should prove that your code will "perform much better". If I have to
bet, I'd say your code actually performs worse. Manipulators can be
easily optimized out by any modern compiler (see later for an example).
Also, clog is buffered, so it is faster than cout and printf() probably.

All the C++ I write is in operating systems, hypervisors and other bare-metal
code for which performance is priority #1, and where we don't have the standard
C++ runtime (such as it is) available. No rtti, no exceptions, no templates
and very few reference objects.

Templates themselves have no impact at run-time. Also, C++ streams were
designed with performance in mind and exceptions are deliberately
disabled by default. Note that I am not saying that printf() is actually
slower than clog.

Anyway, in your case it seems you just want to print out a simple table
to the user (from "%2.2zu" it seems the table is supposed have less than
100 entries), so for these simple cases I suggest you paying attention
to writing readable code rather than trying to optimize where the
benefits will not be noticeable.

To interpret your formatting I had to take a C manual (and a coffee..):

lp->log(" %2.2zu %016llx %3.3s %s\n"...

Not to mention the fact that I had to count the number of white spaces
by moving the cursor from right to left.

With pure C++ style you will not have an headache at least. In the
example below I have used a "custom" manipulator to make the code even
shorter than I wrote in my previous post. As an exercise, you may want
to try to implement a whitespace() manipulator to further improve it;

clog << pad(' ', 3) << ' '
<< pad('0', 2) << b
<< pad(' ', 8) << ' '
<< pad('0', 16)
<< hex << reinterpret_cast<unsigned long> (bpp->b_address)
<< pad(' ', 2) << ' '
<< core.str().substr(0, 3) << ' '
<< pad(' ', 3) << ' '
<< (bpp->b_enabled ? "Yes" : "No")
<< '\n';

The pad manipulator is so simple that can be easily expanded in-line by
any compiler, with no performance loss:

// Place this in a library for re-use
struct smanip {
ostream& (*f)(ostream&, char, int);
char c;
int i;
smanip(ostream& (*ff)(ostream&, char, int), char cc, int ii)
: f(ff), c(cc), i(ii) { }
};

ostream& operator<<(ostream& os, const smanip& m) {
return m.f(os, m.c, m.i);
}

ostream& set_pad(ostream& s, char c, int n) {
s.fill(c);
s.width(n);
return s;
}

inline smanip pad(char c, int n) {
return smanip(set_pad, c, n);
}

Ian Collins · May 14, 2012

Ian Collins said:
Ian Collins said:

Scott Lurndal wrote:

Use it and be happy. cout is useless in real applications.

Can you provide an example where cout is useless and printf represents a
better option?

if (p_num_breakpoints> 0) {
lp->log("Breakpoint Address Core Enabled?\n");
lp->log("---------- ---------------- ---- --------\n");

for(size_t b = 0; b< p_num_breakpoints; b++) {
const s_breakpoint *bpp =&p_breakpoints;
char core[16];

if (bpp->b_core == -1) {
snprintf(core, sizeof(core), "%s", "All");
} else {
snprintf(core, sizeof(core), "%2.2d", bpp->b_core);
}
lp->log(" %2.2zu %016llx %3.3s %s\n",
b, bpp->b_address, core, bpp->b_enabled?"Yes":"No");
}
}

Click to expand...

Is %016llx a valid format string?

Click to expand...

Yes. It is posix compliant and C99 compliant. b_address is typed as 'unsigned long long'.

And you can see what happens if it isn't the same size as uint64_t!

Thank you for providing an example of why not to use printf.

Ian Collins · May 14, 2012

Luca Risolia said:
Luca Risolia said:

Le 13/05/12 20:33, Luca Risolia a Ã©crit :
All that stuff can be written in a more readable, type-safe code by
using std::clog (or whatever stream you need) and standard manipulators.

Great!

Go ahead, and show us how.

Click to expand...

if (p_num_breakpoints> 0) {
clog<< "Breakpoint Address Core Enabled?\n"
"---------- ---------------- ---- --------\n";

for (size_t b = 0; b< p_num_breakpoints; b++) {
const s_breakpoint *bpp =&p_breakpoints;
ostringstream core;
bpp->b_core == -1 ? core<< "All" : core<< setw(3)<< bpp->b_core;
clog<< setfill(' ')<< setw(3)<< ' ' // indent
<< setfill('0')<< setw(2)<< b
<< setfill(' ')<< setw(8)<< ''
<< setfill('0')<< setw(16)
<< hex<< reinterpret_cast<unsigned long>(bpp->b_address)
<< setfill(' ')<< setw(2)<< ''
<< core.str().substr(0, 3)<< ''
<< setfill(' ')<< setw(3)<< ''
<< boolalpha<< static_cast<bool>(bpp->b_enabled)
<< '\n';
}
}

Click to expand...

More readable is certainly a matter of opinion, in this case.

if (bpp->b_core == -1) {
snprintf(core, sizeof(core), "%s", "All");
} else {
snprintf(core, sizeof(core), "%2.2d", bpp->b_core);
}
lp->log(" %2.2zu %016llx %3.3s %s\n",
b, bpp->b_address, core, bpp->b_enabled?"Yes":"No");

This is much more readable, and much more important to me, it will perform
much better than the C++ stuff you've written above (two function calls
versus many function calls, smaller cache footprint vs. larger cache footprint).

It will also fail in amusing ways in a 32 bit build.

All the C++ I write is in operating systems, hypervisors and other bare-metal
code for which performance is priority #1, and where we don't have the standard
C++ runtime (such as it is) available. No rtti, no exceptions, no templates
and very few reference objects.

Click to expand...

Not that old FUD again. The only feature there that may impact
performance is RTTI. If anything, the reset will improve performance on
any decent modern compiler. Please, move on from the 90s.

Jorgen Grahn · May 14, 2012

....

And you can see what happens if it isn't the same size as uint64_t!

Thank you for providing an example of why not to use printf.

To be fair, decent compilers warn about printf type mismatch problems,
and can be told to do the same to lp->log().

I use printf and friends in C and C++, but when that warning level is
(for some reason) unavailable, I am very, very careful.

/Jorgen

Dombo · May 14, 2012

Op 14-May-12 20:30, Scott Lurndal schreef:

Code footprint in the icache (and memory) is the primary reason for no
templates. This dates back to the first implementation of templates in Cfront 3.0
which, if templates were specialized for more than one or two types, would bloat
the codebase tremendously (where the OS and all apps were running in 4MB DRAM,
space is important). icache footprint is still pretty scarce, every bit that the
OS uses evicts something the applications need more. Modern templates still cause
code bloat, even if the programmer never sees it (and modern 4+GB DRAM setups
insulate this from most programmers).

Naive use of templates may lead to unnecessary code bloat, but this can
be avoided by separating the parts that depend on the template
parameters from the parts that do not depend on the template parameters,
allowing the parts independent of the template arguments to be shared
between template instantiations.

In the OS, templates would be of very limited use, since the range of data structures
used in an OS is generally limited to tables and linked lists. If you're willing to use
casts, a good double linked list can be implemented as a base class using void* pointers
which can then be used stand-alone or derived from as necessary.

The same trick can be used with templates; implement a container for
void*, and use a template class to take care of the casting to pointers
of the type specified by the template argument. This way only code is
generated for a container of void*, instead of code for each template
instantiation. This technique is explained in the book "The C++
Programming Language" from Bjarne Stroustrup in the chapter about
template specialization.

Sure, you may give up
some type safety, but you gain pretty much everwhere else (footprint and performance).

With proper use of templates you don't have to give up type safety for
the sake of performance or a smaller memory footprint. In the example
you gave there is no reason why a properly implemented template version
would be slower or bigger than one which requires the programmer to
apply the casts at the right times and making sure that not the wrong
type is added to the container.

Ian Collins · May 14, 2012

It's not FUD, but rather 20+ years of hard-won experience using C++. I've benchmarked
exceptions vs. return values (or even siglongjmp).

I'd like to see an example where exceptions slow things down. The first
compiler I did a comparative benchmark with was gcc 2.95 and I've yet to
see a case where an exception base solution was slower.

I've benchmarked the effects of templates on the icache footprint.

Me to. Maybe you were using them inappropriately (the example with
lists indicates you might). A template solution often enables the
compiler to inline calls, improving the icache footprint.

Maybe this weekend, I'll benchmark the cout crap vs. snprintf and see who wins.

You will find it's six of one and half a dozen of the other.

Why? Lamdas? give me a break. Not all new is good. You make the language both
harder to learn and more difficult to maintain, you end up with a fringe language like
Lisp or APL.

Now built-in thread support, that's useful, and I'll be happy to use it. Built-in atomics? Nice.

Also understanding the how modern compilers optimise for specific targets.

Ian Collins · May 14, 2012

Ian Collins said:
Ian Collins said:

Scott Lurndal wrote:

Use it and be happy. cout is useless in real applications.

Can you provide an example where cout is useless and printf represents a
better option?

if (p_num_breakpoints> 0) {
lp->log("Breakpoint Address Core Enabled?\n");
lp->log("---------- ---------------- ---- --------\n");

for(size_t b = 0; b< p_num_breakpoints; b++) {
const s_breakpoint *bpp =&p_breakpoints;
char core[16];

if (bpp->b_core == -1) {
snprintf(core, sizeof(core), "%s", "All");
} else {
snprintf(core, sizeof(core), "%2.2d", bpp->b_core);
}
lp->log(" %2.2zu %016llx %3.3s %s\n",
b, bpp->b_address, core, bpp->b_enabled?"Yes":"No");
}
}

Click to expand...

Is %016llx a valid format string?

Click to expand...

Yes. It is posix compliant and C99 compliant. b_address is typed as 'unsigned long long'.

Fair enough, when I was tinkering with the code I assumed b_address was
an address (void*).

But.... I still illustrates that using printf is more of a maintenance
burden than iostreams.

I should also put my hand up and say that I do use sprintf for small
fixed format, fixed size strings (such as date and times). For example

sprintf( tmp, "%04d%02d%02d%02d%02d%02dZ",
tm.tm_year + 1900, tm.tm_mon + 1, tm.tm_mday,
tm.tm_hour, tm.tm_min, tm.tm_sec );

is by far the most concise way to make an LDAP timestamp string.

But I avoid printf/scanf in C++.

Ian Collins · May 14, 2012

1) You should _never_ use sprintf. snprintf is preferred (but I understand microsoft doesn't have it).

Note what I wrote: "for small fixed format, fixed size strings". A case
where sprintf is acceptable.

2) 'strftime' is much better for formatting timestamps.
True.

3) I tried the cout stuff that was posted:

I wouldn't for your example, given the output is a small fixed format,
fixed size string!

Martin B. · May 15, 2012

Code footprint in the icache (and memory) is the primary reason for no

Sorry for the stupid q, but what is "icache"?

templates. This dates back to the first implementation of templates in Cfront 3.0
which, if templates were specialized for more than one or two types, would bloat
the codebase tremendously (where the OS and all apps were running in 4MB DRAM,
space is important). icache footprint is still pretty scarce, (...)
Modern templates still cause code bloat, even if the programmer never sees it (...)
(and modern 4+GB DRAM setups

Hmmm ... I have to say that I only work with MSVC, so my experience is
obviously very limited, but: While templates will "bloat" the object
code at first, it seems the linker will do a good job of folding
identical functions so that at the end it seems much of the duplicated
code of the templates will be removed.

In the OS, templates would be of very limited use, since the range of data structures
used in an OS is generally limited to tables and linked lists. (...)

What about algorithms? (Algorithm-like functions?)

Ah, obviously I have not much clues wrt OS level code, but still I'm
curious whether you not using templates is based on tradition for your
code base (not sayin' that's a bad thing) or whether it's based on
actual technical limitiations on the platforms you currently use.

cheers,
Martin

Ian Collins · May 15, 2012

Sorry for the stupid q, but what is "icache"?

The processor's instruction cache. These are typically small and fast,
for example see:

http://www.tomshardware.com/reviews/Intel-i7-nehalem-cpu,2041-10.html

Juha Nieminen · May 15, 2012

Scott Lurndal said:
Code footprint in the icache (and memory) is the primary reason for no
templates.

And the alternative is what, exactly?

Writing the functions for each separate type manually? Exactly how would this
be different from the template (other than the template avoiding code
repetition)?

Making the same code support different types? And how exactly would this be
achieved? I see only two possibilities:

1) OO polymorphism, which isn't possible without RTTI (which was also banned
in that list) and would actually increase memory consumption by a significant
lot (because now objects would need to be heap-allocated).

2) Bypass type safety mechanisms and make the code handle things like void
pointers and such. Yeah, great solution. (Not only is it completely horrible
and unsafe code, it also cannot handle everything that a template can, and
in a much safer way at that.)

Religiously avoiding templates is actually detrimental to the quality of
the code and may, in fact, in some cases produce *less* efficient code
(in terms of speed and/or memory usage).

Fearing that caches will fill up quicker if templates are instantiated with
lots of types is also quite moot in most cases. Just because the executable
binary may get larger (which isn't actually always the case) doesn't mean
that caches fill up faster. The CPU only loads into the cache code that is
being run.

Juha Nieminen · May 15, 2012

Dombo said:
Naive use of templates may lead to unnecessary code bloat

Could someone please post a practical example of this mythical "code
bloat" caused by templates, and a better alternative?

(And "practical" above means not artificially contrived to be as
pathological as possible by using completely unconventional code that
no sane programmer would ever write.)

Juha Nieminen · May 15, 2012

Ian Collins said:
The only feature there that may impact performance is RTTI.

The impact of RTTI on performance is greatly exaggerated by people who
haven't actually tested it, but base their claims solely on assumptions
and impressions.

I have actually tested in practice the speed difference between calling
a regular function and a virtual function. The speed difference was not
measurable. In practice calling a virtual function is basically as fast
as calling a regular function. (Sure, there's an extra indirection step
involved in calling a virtual function, but this extra clock cycle or
so gets completely overwhelmed by everything else that's involved in
calling a function, such as putting values on the stack, brancing,
having the function execute its code, and branching back.)

There might be some contrived inheritance situations (especially if you
use multiple inheritance, and especially if you use virtual inheritance)
where calling a virtual function *might* be measurably slower. However,
these situations are quite rare in practice (and a savvy programmer
wouldn't use such a solution in a speed-critical situation anyways).

Ian Collins · May 15, 2012

The impact of RTTI on performance is greatly exaggerated by people who
haven't actually tested it, but base their claims solely on assumptions
and impressions.

Hence "may". RTTI only really costs if you use it, for example a
dynamic_cast.

Tobias MÃ¼ller · May 15, 2012

Juha Nieminen said:
The impact of RTTI on performance is greatly exaggerated by people who
haven't actually tested it, but base their claims solely on assumptions
and impressions.

As others already said, you don't need RTTI for virtual functions.

I have actually tested in practice the speed difference between calling
a regular function and a virtual function. The speed difference was not
measurable. In practice calling a virtual function is basically as fast
as calling a regular function. (Sure, there's an extra indirection step
involved in calling a virtual function, but this extra clock cycle or
so gets completely overwhelmed by everything else that's involved in
calling a function, such as putting values on the stack, brancing,
having the function execute its code, and branching back.)

And this is exactly where the biggest performance penalty of virtual
functions lies. Non-virtual functions can be inlined much easier and thus
omit that "calling a function" stuff entirely.

Tobi

Tobias MÃ¼ller · May 15, 2012

Juha Nieminen said:
1) OO polymorphism, which isn't possible without RTTI (which was also banned
in that list)

Virtual functions don't need RTTI, they are perfectly fine!
However you are possibly losing performance, due to the impossibility of
inlining with that solution.

and would actually increase memory consumption by a significant
lot (because now objects would need to be heap-allocated).

No they don't have to. Why do you think so?

class Base
{
virtual void doSomething() = 0;
};

class Derived1 : public Base
{
virtual void doSomething()
{
// do something
}
}

class Derived2 : public Base
{
virtual void doSomething()
{
// do something other
}
}

void callDoSomething(Base& obj)
{
obj.doSomething();
}

int main()
{
Derived1 obj1;
callDoSomething(obj1);

Derived2 obj2;
callDoSomething(obj2);

return 0;
}

Tobi

BGB · May 15, 2012

As others already said, you don't need RTTI for virtual functions.

yep.

internally, it is generally done with vtables, which are "essentially"
arrays of function pointers. each method call is basically accessing the
table at a fixed index.

at least in the ABIs I am familiar with, RTTI is typically implemented
by having a pointer in the first entry of the vtable pointing to a
structure containing information about the class (which may in turn link
to parent classes, ...). (something like RTTI can then be implemented,
say, by walking this list and comparing the pointers).

a class using MI though will often have multiple vtables and multiple
bodies, usually with the parent classes being places as complete objects
next to each other in memory (following the data for the current class),
whereas with SI it is simply appending new data or methods onto the end
of the existing object or vtable.

(in my VM, it is the reverse, where the object points to the class which
points to the vtable, but there are reasons for doing it this way,
mostly related to classes being mutable at runtime...).

And this is exactly where the biggest performance penalty of virtual
functions lies. Non-virtual functions can be inlined much easier and thus
omit that "calling a function" stuff entirely.

it is a little better on newer HW, which is usually able to
branch-predict through the return instruction (vs older HW where nearly
any indirect call or return would stall the pipeline).

currently, an if-statement and an unpredictable jump are a bit more costly.

in cases where the call target or return are harder to predict, there
will often be a pipeline stall.

this is extra true with a "switch" which can be extra costly given the
CPU will most often get it wrong.

meanwhile, the extra indirection is "nearly free" AFAICT.

BGB · May 15, 2012

What does vtables have to do with RTTI? They are completely orthogonal.

RTTI has a run-time cost to look up the types, and a space cost to store
the type information (the space cost applies to bare-metal applications such
as operating systems and hypervisors, for regular applications, the rtti
information is stored in the ELF or PE codefile and looked up at runtime
using disk I/O).

well, except disk IO isn't generally involved in using RTTI.

as I understand it, it is generally contained within the raw image,
typically in ".data" or ".rdata" or similar, so it is loaded along with
everything else, and treated as raw data.

Dombo · May 15, 2012

Op 15-May-12 11:23, Juha Nieminen schreef:

Could someone please post a practical example of this mythical "code
bloat" caused by templates, and a better alternative?

Please notice the *may* in the above sentence. It depends on the code
and the compiler/linker. For example I know for a fact (having stepped
through the assembly code) that with recent versions of the Visual C++
compiler the void*/casting trick I mentioned in my previous post is not
necessary (at least not for trivial cases); the compiler takes care of
sharing the code between template instantiations if possible. I haven't
checked this with other compilers so your (or Scotts) mileage may vary.

How to multiply two matrices of size in using inline assembly in C++	2	Mar 3, 2024
Linux: using "clone3" and "waitid"	0	Oct 17, 2023
Boomer trying to learn coding in C and C++	6	Dec 16, 2022
C Script Prematurely Terminating	3	Feb 7, 2022
Weird Behavior with Rays in C and OpenGL	4	Feb 13, 2024
How to keep count of right answer and wrong answers in C++?	0	Nov 3, 2021
Open WebBrowser in C#	0	Jul 20, 2022
In C, the longest palindromic subsequence multithread exists	0	Nov 23, 2022

Using printf in C++

Martin B.

Luca Risolia

Ian Collins

Ian Collins

Jorgen Grahn

Dombo

Ian Collins

Ian Collins

Ian Collins

Martin B.

Ian Collins

Juha Nieminen

Juha Nieminen

Juha Nieminen

Ian Collins

Tobias MÃ¼ller

Tobias MÃ¼ller

BGB

BGB

Dombo

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads