Why Does C++ Name-Mangle Identifiers?

  • Thread starter Karl Heinz Buchegger
  • Start date
K

Karl Heinz Buchegger

Randy said:
Why is this necessary? The identifiers provide by the
programmer are unique, and that is what is important - why
map what the user has specified to something else?

Because most C++ compilers are built around a system dependent
linker. Those linker have a simple rule: if 2 functions have
the same name, then they are the same function. This was
true in most (if not all) programming languages which
required linking.

But in C++ it is perfectly valid to have 2 different functions
with the same name, if only their argument types are different.

That's why name mangling was introduced: To add information to
the name of a function to create different names for functions
with the same name but different argument types.

The alternative would have been to write new a new object file
format and a new linker that can deal with that. As long as you
only have C++ this would be no big problem. But as said: on most
'big machines' (such as mainframes), there is only one object
file format defined and only one system wide linker to link
an application. That's a good thing, because it means you can
freely link modules written in different programming languages.
But it also means: You have to introduce some mechanism for
the compiler to make function names unique even if they have identical
names in the source code.
 
K

Karl Heinz Buchegger

osmium said:
What a great post! Even though I am a mainframer by nature I had never
considered the fundamental difference between a mainframe and a PC or work
station.

I first noticed this a 1.5 centuries ago, when people in the company I was
working for linked Fortran with Lisp and C code on a VAX/VMS and were able
to debug the program with the system wide debugger. It worked like a
charm and it made me think.
 
R

Randy Yates

Why is this necessary? The identifiers provide by the
programmer are unique, and that is what is important - why
map what the user has specified to something else?
 
G

Gianni Mariani

Randy said:
Why is this necessary? The identifiers provide by the
programmer are unique, and that is what is important - why
map what the user has specified to something else?

void X();
void X( int );

Which linker symbol should be X ?
 
J

John Harrison

Randy Yates said:
Why is this necessary? The identifiers provide by the
programmer are unique, and that is what is important - why
map what the user has specified to something else?

But they aren't.

int my_overloaded_function(int x)
{
return x;
}

int my_overloaded_function(int x, int y)
{
return x + y;
}

In C++ it's legal to define two different functions with the same name.
Typically C++ implementations mangle the names so non-C++ aware linkers can
distinguish the functions.

john
 
R

Rolf Magnus

Randy said:
Why is this necessary? The identifiers provide by the
programmer are unique,

No, they aren't.
and that is what is important - why map what the user has specified to
something else?

Because they aren't unique at all.

void draw()
{
std::cout << "drawing\n;";
}

void draw(int id)
{
std::cout << "drawing object with id " << id << '\n';
}

class Triangle
{
public:
void draw()
{
std::cout << "drawing a triangle\n";
}

void draw() const
{
std::cout << "dawing a constant triangle\n";
};

namespace SomeAPI
{
void draw()
{
std::cout << "yet another function with name 'draw'\n";
}
}

Now, each of those functions is called draw. How would you resolve that
ambiguity?
 
O

osmium

Karl Heinz Buchegger said:
Because most C++ compilers are built around a system dependent
linker. Those linker have a simple rule: if 2 functions have
the same name, then they are the same function. This was
true in most (if not all) programming languages which
required linking.

But in C++ it is perfectly valid to have 2 different functions
with the same name, if only their argument types are different.

That's why name mangling was introduced: To add information to
the name of a function to create different names for functions
with the same name but different argument types.

The alternative would have been to write new a new object file
format and a new linker that can deal with that. As long as you
only have C++ this would be no big problem. But as said: on most
'big machines' (such as mainframes), there is only one object
file format defined and only one system wide linker to link
an application. That's a good thing, because it means you can
freely link modules written in different programming languages.
But it also means: You have to introduce some mechanism for
the compiler to make function names unique even if they have identical
names in the source code.

What a great post! Even though I am a mainframer by nature I had never
considered the fundamental difference between a mainframe and a PC or work
station.
 
M

Mike Wahler

Randy Yates said:
Why is this necessary? The identifiers provide by the
programmer are unique,

Not always.

and that is what is important - why
map what the user has specified to something else?

void foo(int arg);
void foo(double arg);
void foo(int arg1, int arg2);

Look up 'function overloading'.

-Mike
 
D

Default User

Randy said:
Why is this necessary? The identifiers provide by the
programmer are unique, and that is what is important - why
map what the user has specified to something else?


As others have pointed out, your contention is untrue. That's the
reason C++ provides extern "C".

This tells the compiler not to mangle the name, that way it's available
to link with other languages that aren't expecting name-mangling.
Brian
 
R

Randy Yates

Default User said:
As others have pointed out, your contention is untrue. That's the
reason C++ provides extern "C".

This tells the compiler not to mangle the name, that way it's available
to link with other languages that aren't expecting name-mangling.
Brian

Yes, it's obvious now that you have pointed it out. Thanks everyone!

Thanks for this tip, D.U. - that was precisely what caused me to make this
post - linking C and C+ objects together and getting a problem. I
solved it a different way - I compiled the C program with the C++
compiler - it generates the same mangled name.

But actually now this gives me another question: Is the manner
in which names are mangled standardized? Otherwise how would
code from two different compiler vendors be linkable?
 
G

Greg Schmidt

But actually now this gives me another question: Is the manner
in which names are mangled standardized?
No.

Otherwise how would
code from two different compiler vendors be linkable?

They are not, in general, for more reasons than just name mangling
differences.
 
I

Ivan Vecerina

Randy Yates said:
But actually now this gives me another question: Is the manner
in which names are mangled standardized? Otherwise how would
code from two different compiler vendors be linkable?

No name mangling technique is specified in the ISO C++ standard.
This is because platform-specific techniques are often used,
and the concept of a linker or debugger is out of the scope
of the standard.

However, many platforms/processors/OSes define a standard name
mangling technique that all compilers are required(or invited)
to implement, as well as other aspects required for compatibility
(such as parameter passing convention, exception handling, etc).
This defines an ABI (Application Binary Interface, IIRC).

Try searching for: C++ ABI


hth,
Ivan
 
R

Randy Yates

Greg Schmidt said:
They are not, in general, for more reasons than just name mangling
differences.

So if I e.g. develop an ODBC library, I must deliver the source and
let each client compile for his target? That's hard to believe.
 
S

Serge Paccalin

Le jeudi 4 novembre 2004 à 20:33:00, Randy Yates a écrit dans
comp.lang.c++ :

Calling conventions differ from a compiler to another. To prevent
unintentional mixes, symbol mangling is intentionally different too.
So if I e.g. develop an ODBC library, I must deliver the source and
let each client compile for his target? That's hard to believe.

That's one solution. The other common solution is to deliver binaries
targeted for specific compilers (the compiler each binary has been built
with).

--
___________ 2004-11-04 22:20:07
_/ _ \_`_`_`_) Serge PACCALIN -- sp ad mailclub.net
\ \_L_) Il faut donc que les hommes commencent
-'(__) par n'être pas fanatiques pour mériter
_/___(_) la tolérance. -- Voltaire, 1763
 
J

Jesper Madsen

You are always free to supply a library as a "shared library/dll" adapted to
the different kinds of OS'es you support, but supply a C interface for the
library, and if your are really generous, a set of C++ wrapper classes as
source code. C naming in C++ is accomplished by using ..
extern "C" {
void available_Function();
}
If you want to build a shared library do your self a favor and build a
proper C interface, where all function parameters are PODs and objects are
opaque types. (look at how operating systems API's look like..)
And be sure to specify how the library has been compiled (e.g. thread safe
libraries, with libraries as .dll's
any settings for size of int, float, double or bool... If you pass a struct,
assert that the size of the struct is the same size from the user of your
library, to the your library. (check libJPEG for something like that)
Most .lib files are specific completely compiler specific, and some vendors
choose a few compilers/dev environments to support, and offer .lib files for
those, and supply ActiveX components for all other environments.
 
M

Mike Smith

John said:
But they aren't.

int my_overloaded_function(int x)
{
return x;
}

int my_overloaded_function(int x, int y)
{
return x + y;
}

In C++ it's legal to define two different functions with the same name.
Typically C++ implementations mangle the names so non-C++ aware linkers can
distinguish the functions.

I guess another question would be: why allow implementations to come up
with their own weird incompatible mangling schemes? Why doesn't the
Standard specify a uniform scheme - preferably something simple wherein,
for instance, the two functions above might become:

int!my_overloaded_function(int)

int!my_overloaded_function(int,int)

or something similarly uniform and simple to understand.
 
E

E. Robert Tisdale

Mike said:
Not always.

and that is what is important - why



void foo(int arg);
void foo(double arg);
void foo(int arg1, int arg2);

Look up 'function overloading'.
cat foo.cc
void foo(int arg) { }
void foo(double arg) { }
void foo(int arg1, int arg2) { }
g++ -Wall -ansi -pedantic -c foo.cc
nm foo.o
00000006 T _Z3food
00000000 T _Z3fooi
0000000c T _Z3fooii
c++filt _Z3food foo(double)
c++filt _Z3fooi foo(int)
c++filt _Z3fooii
foo(int, int)

Evidently, the function names are actually:

1. foo(double),
2. foo(int) and
3. foo(int, int)

respectively.
[Notice that the return type is part of the function name.]
These identifier strings
are *not* acceptable symbols for some link editors
so the C++ compiler *mangles* them --
usually by *decorating* the base function name
(foo in this case) with prefixes and/or suffixes --
to construct a unique symbol acceptable to the link editor.

Perhaps, eventually, link editors will be able to accept
the actual function name as a symbol
and mangling will no longer be required.
 
E

E. Robert Tisdale

Mike said:
I guess another question would be,
"Why allow implementations to come up with
their own weird incompatible mangling schemes?"
Why doesn't the Standard specify a uniform scheme?"
Preferably something simple wherein,
for instance, the two functions above might become:

int!my_overloaded_function(int)
my_overloaded_function(int)

int!my_overloaded_function(int,int)

my_overloaded_function(int, int)

Function resolution is independent of return type.
or something similarly uniform and simple to understand.

The problem is that not all link editors recognize such names
as acceptable symbols. This problem appears to be resolving itself
and soon mangling may no longer be necessary.
 
D

David Lindauer

overloaded functions and operators can have the same names but be unique
identifiers.

David
 
J

Jack Klein

I guess another question would be: why allow implementations to come up
with their own weird incompatible mangling schemes? Why doesn't the
Standard specify a uniform scheme - preferably something simple wherein,
for instance, the two functions above might become:

int!my_overloaded_function(int)

int!my_overloaded_function(int,int)

or something similarly uniform and simple to understand.

First of all, if you try to mandate this then why not mandate object
file format (OMF86, DWARF, ELF, COFF, etc.)? Why not mandate the
instruction set, for example 32-bit X86, and to heck with the
riff-raff using PowerPC or ARM or SPARC.

If you read Karl's reply to the original post (or reread it), you will
note that on some systems security concerns mandate the use of system
supplied linker. What if that linker does not support the '!'
character in symbol names?

There isn't even a general language standard for the names of C
external symbols (some preface an '_' character, some add one as a
suffix, some do something else or don't modify the symbol at all).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top