Making a std::string a member of a union ???

Jim Langston · Jan 11, 2007

Peter Olcott said:
I could already infer those details. The detail that I am missing is how
the data itself is declared.

union AnyType {
std::string String;
int Integer;
};

Will not compile. The best that I could figure so far is this:

union AnyType {
std::string* String;
int Integer;
};

I would go with:
class AnyType
{
std::string String;
union {
int Integer;
double Double;
float Float;
// etc...
};

In VC 2003 I get this error if I try t put the std::string inside the union:
error C2621: member 'AnyType::String' of union 'AnyType' has copy
constructor

which makes sense, because a union is *only* for POD. Non POD can not exist
inside a union (according to MS anyway).

Simon G Best · Jan 12, 2007

Peter said:
How can providing a "C" interface to C++ data possibly be either wrong of
confused?

It isn't. But your assertion that "a language that is incapable of
accessing polymorphic functions must have direct access to the
underlying data" certainly seems wrong and confused.

I did not say that there are two interpreted languages.

In one post, you said, "I am creating my own computer language and I
need a simple way to store the various elemental data types." In
another, you said, "the interpreted language is provided by a third
party. I am hooking this third party interpreted language into my system
and then exposing another different interpreted language implemented in
terms of the third party language." Clearly, you /have/ said that there
are two, "different", interpreted languages.

There are two different
abstractions of the same interpreted language.

See what I mean about your lack of clarity? Sometimes it's an
interpreted language you're creating yourself. Sometimes there are two,
different, interpreted languages, one of which is from a third party.
Sometimes they're not different languages after all, and are actually
the same language. Since you don't seem to actually know yourself what
you're doing, it's hardly surprising that I don't, either!

For all practical purposes these
details can be abstracted out of the problem. For all practical purposes the
problem is simply providing "C" access to C++ data.

Oh! Well, why didn't you say so?! *If* I understand you correctly on
this (which is a big 'if'), you basically want your C++ data to be
accessible from within C. Is that it?

If so, then you're doing it really wrongly. Instead of properly
encapsulating your data in C++, you're trying to expose it directly to
the C stuff. What you /should/ be doing is properly encapsulating it as
normal, and then providing a suitable extern "C" interface to wrap up
your C++ stuff. No need to muck about with unions.

Or, of course, you could just have it in C to begin with.

It can often be quite annoying when people insist on having me provide all of
the irrelevant details before they are willing to answer the question, and they
then still refuse to answer the question because they have become confused by
all the irrelevant details.

No one asked for irrelevant details. Well, I certainly didn't. What I
wanted to know was what *specific* problem the union was supposed to
solve. Just as with The Halting Problem, much of the confusion is of
your own making.

I wish that people would stop trying to second guess my questions, and just
answer them.

:-(

Peter Olcott · Jan 12, 2007

Simon G Best said:
It isn't. But your assertion that "a language that is incapable of accessing
polymorphic functions must have direct access to the underlying data"
certainly seems wrong and confused.

I want to minimize the unnecessary overhead so that the resulting interpreted
language is as close as possible to the speed of compiled native code.
Alternatives that do things the "right" way are twenty-five fold slower than an
optimally designed interpreter.

In one post, you said, "I am creating my own computer language and I need a
simple way to store the various elemental data types." In another, you said,
"the interpreted language is provided by a third party. I am hooking this
third party interpreted language into my system and then exposing another
different interpreted language implemented in terms of the third party
language." Clearly, you /have/ said that there are two, "different",
interpreted languages.

I am simultaneously exploring several different alternatives. I want the
resulting design to be optimal for both of these alternatives.

nterpreted language you're creating yourself. Sometimes there are two,
different, interpreted languages, one of which is from a third party.
Sometimes they're not different languages after all, and are actually the same
language. Since you don't seem to actually know yourself what you're doing,
it's hardly surprising that I don't, either!

Oh! Well, why didn't you say so?! *If* I understand you correctly on this
(which is a big 'if'), you basically want your C++ data to be accessible from
within C. Is that it?

If so, then you're doing it really wrongly. Instead of properly encapsulating
your data in C++, you're trying to expose it directly to the C stuff. What
you /should/ be doing is properly encapsulating it as normal, and then
providing a suitable extern "C" interface to wrap up your C++ stuff. No need
to muck about with unions.

I don't want the overhead.

Or, of course, you could just have it in C to begin with.

I do want OOP and OOD and std::vector.

No one asked for irrelevant details. Well, I certainly didn't. What I wanted
to know was what *specific* problem the union was supposed to solve.

I told you this from the very beginning. It has to be able to hold a set of
types including {double, int, std::string}. It is a form of VARIANT that can be
directly accessed from "C". These are GIVEN, and thus immutable requirements.
Sometime exploring alternatives that I have not considered is helpful. This does
not seem to be one of these times.

Simon G Best · Jan 12, 2007

Peter said:
....
I want to minimize the unnecessary overhead so that the resulting interpreted
language is as close as possible to the speed of compiled native code.
Alternatives that do things the "right" way are twenty-five fold slower than an
optimally designed interpreter.

Where did you get the "twenty-five fold slower" figure from?

I am simultaneously exploring several different alternatives. I want the
resulting design to be optimal for both of these alternatives.

Well, if you're going to jump about from alternative to alternative
without being clear about it, it's hardly surprising that confusion results.

I don't want the overhead.

Sounds like you might not understand that famous Knuth quote: "Premature
optimization is the root of all evil."

Anyway, if it's going to be accessible from within C, but itself is
going to be in C++, then there is no alternative than to provide an
interface with C linkage. There is no alternative. That means using
extern "C". (It may well be that your compilers (and whatever) happen
to use compatible linkage conventions for both C and C++, in which case
the extern "C" won't actually introduce any overheads. Otherwise, if
your compilers (etc) use incompatible linkage conventions for C and C++,
you *will* have to use extern "C" anyway. Either way, there's no good
reason not to use extern "C".)

I told you this from the very beginning. It has to be able to hold a set of
types including {double, int, std::string}. It is a form of VARIANT that can be
directly accessed from "C". These are GIVEN, and thus immutable requirements.

They are contradictory requirements. std::strings are *not* accessible
from within C (except indirectly, when you provide an appropriate,
extern "C" interface).

Peter Olcott · Jan 12, 2007

Simon G Best said:
Where did you get the "twenty-five fold slower" figure from?

http://www.softintegration.com/ -
The best C/C++ Interpreter in terms of overall quality and reliability. The
documentation is fabulous. It is 250-fold slower than native code on loops. My
own carefully designed virtual machine-code interpreter is 10-fold slower than
native machine code on the same loops. 250/10 = twenty-five fold slower.

Well, if you're going to jump about from alternative to alternative without
being clear about it, it's hardly surprising that confusion results.

It was all details that you didn't need to know anyway.

Sounds like you might not understand that famous Knuth quote: "Premature
optimization is the root of all evil."

Although this is an error that I may sometimes make, my interpreter design does
beat all other alternatives by a wide margin.

Anyway, if it's going to be accessible from within C, but itself is going to
be in C++, then there is no alternative than to provide an interface with C
linkage. There is no alternative. That means using

Ah yes, but then that still ignores my direct access.

extern "C". (It may well be that your compilers (and whatever) happen to use
compatible linkage conventions for both C and C++, in which case the extern
"C" won't actually introduce any overheads. Otherwise, if your compilers
(etc) use incompatible linkage conventions for C and C++, you *will* have to
use extern "C" anyway. Either way, there's no good reason not to use extern
"C".)

I might do it this way if I have to, but, I don't think that I have to.

They are contradictory requirements. std::strings are *not* accessible from
within C (except indirectly, when you provide an appropriate, extern "C"
interface).

Its not actually going to be a std::string anyway. I just used that as a
simplifying example. It is probably going to be a Unicode string. Maybe I can
translate my FastString to "C". I would lose a few things, but, it could be
stored in a union.

Simon G Best · Jan 12, 2007

Peter said:
http://www.softintegration.com/ -
The best C/C++ Interpreter in terms of overall quality and reliability. The
documentation is fabulous. It is 250-fold slower than native code on loops. My
own carefully designed virtual machine-code interpreter is 10-fold slower than
native machine code on the same loops. 250/10 = twenty-five fold slower.

Even if those were relevant, appropriate figures (which I very much
doubt), on what basis do you justify dividing the 250 by 10? Your
interpreter is still going to be interpreting your C-like language, just
as the C/C++ interpreter does. The speed of your virtual machine code
interpreter seems irrelevant.

Ah yes, but then that still ignores my direct access.

That /includes/ your direct access. You can't do your direct access
from within C unless the data you're directly accessing has C linkage.

I might do it this way if I have to, but, I don't think that I have to.

If it's going to be directly accessed from within C, it *must* have C
linkage.

Its not actually going to be a std::string anyway. I just used that as a
simplifying example. It is probably going to be a Unicode string. Maybe I can
translate my FastString to "C". I would lose a few things, but, it could be
stored in a union.

Well, whatever it is, if it's going to be directly accessed from within
C, it's going to have to have C linkage.

By the sounds of it, it would make sense for you to actually do the
union within C to begin with. But then it's not really a C++ question.

Peter Olcott · Jan 12, 2007

Simon G Best said:
Even if those were relevant, appropriate figures (which I very much doubt), on
what basis do you justify dividing the 250 by 10? Your

I just showed you the basis, my interpreter benchmarks at TEN (that's the basis)
fold slower than native code. The other interpreter benchmarks at 250-fold
slower, therefore my interpreter is 25-fold faster.

interpreter is still going to be interpreting your C-like language, just as
the C/C++ interpreter does. The speed of your virtual machine code
interpreter seems irrelevant.

That /includes/ your direct access. You can't do your direct access from
within C unless the data you're directly accessing has C linkage.

So it is impossible for a "C" function to access an array without calling a
function?

int Array[100];
int Num;
Num = Array[10];

What did you think that I meant by direct access?

If it's going to be directly accessed from within C, it *must* have C linkage.

Well, whatever it is, if it's going to be directly accessed from within C,
it's going to have to have C linkage.

The StringType will probably have to be written in "C" to be interfaced by "C"
and that entails "C" linkage.

Richard · Jan 12, 2007

Simon G Best said:
Where did you get the "twenty-five fold slower" figure from?

Well, if you're going to jump about from alternative to alternative
without being clear about it, it's hardly surprising that confusion
results.

Sounds like you might not understand that famous Knuth quote:
"Premature optimization is the root of all evil."

And with all due respect to Donald Knuth, who is certainly a lot cleverer
than most people here, NOT considering optimizing at an early stage can
lead to horrendously bad design and a framework which can be extremely
difficult and even impossible to alter in an economic time frame and
budget in order to process the data in a realistic time frame.

Like that awful quote about debugging being twice as hard as the
programming itself, this quote about premature optimization probably has
more validity in the dusty corridoors of a university than it does in a
real development environment.

Anyway, if it's going to be accessible from within C, but itself is
going to be in C++, then there is no alternative than to provide an
interface with C linkage. There is no alternative. That means using
extern "C". (It may well be that your compilers (and whatever) happen
to use compatible linkage conventions for both C and C++, in which
case the extern "C" won't actually introduce any overheads.
Otherwise, if your compilers (etc) use incompatible linkage
conventions for C and C++, you *will* have to use extern "C" anyway.
Either way, there's no good reason not to use extern "C".)

They are contradictory requirements. std::strings are *not*
accessible from within C (except indirectly, when you provide an
appropriate, extern "C" interface).

--

Peter Olcott · Jan 12, 2007

Richard said:
And with all due respect to Donald Knuth, who is certainly a lot cleverer
than most people here, NOT considering optimizing at an early stage can
lead to horrendously bad design and a framework which can be extremely
difficult and even impossible to alter in an economic time frame and
budget in order to process the data in a realistic time frame.

Like that awful quote about debugging being twice as hard as the
programming itself, this quote about premature optimization probably has
more validity in the dusty corridoors of a university than it does in a
real development environment.

You are right, and Knuth is right, the trick is finding the perfect balance
between not optimizing enough and optimizing too much. I tend to err on the
optimizing too much side. If you optimize too much development costs can
increase ten-fold or more with little increased performance.

If you don't put reasonable optimization in the design from the beginning we
have the problem that you stated, a bad design that can not be cost-effectively
improved.

I also err on the side of over design. I spend at least half the total project
time on design. It seems that the more time spent on design the
disproportionally less time is required for debugging.

Richard · Jan 12, 2007

Peter Olcott said:
You are right, and Knuth is right, the trick is finding the perfect balance
between not optimizing enough and optimizing too much. I tend to err on the
optimizing too much side. If you optimize too much development costs can
increase ten-fold or more with little increased performance.

If you don't put reasonable optimization in the design from the beginning we
have the problem that you stated, a bad design that can not be cost-effectively
improved.

I also err on the side of over design. I spend at least half the total project
time on design. It seems that the more time spent on design the
disproportionally less time is required for debugging.

Heading a little OT, but I am very "hands on" with design : and
invariably knock up a framework quickly and use the debugger at the
earliest possible stage in order to step through the program flow -
having the program "debugger" friendly is a very crucial point in any
system design IMO - possibly because I have spent a LOT of time in huge
multiprogrammer systems which incorporate a huge legacy as well as newer
modules. Yes I know there are "geniuses" out there who maintain a
debugger is only for people who dont know how to design properly
(although how that relates to programmers coming onto a legacy design is
beyond me) but I am not one of them and find the debugger to be one of
the best tools for new programmers to learn the data flow and program
structure while at the same time using strategic parameter manipulation
in the debugger to pull up unusual cases in an easy and effort free
manner. Part of this has always lead me to ban multistatement lines - a
nightmare to debug.

Simon G Best · Jan 12, 2007

Peter said:
I just showed you the basis, my interpreter benchmarks at TEN (that's the basis)
fold slower than native code. The other interpreter benchmarks at 250-fold
slower, therefore my interpreter is 25-fold faster.

You're comparing someone else's interpreter *for C and C++* with your
interpreter *for virtual machine code!* That's not a sensible
comparison. You're kidding yourself if you think it means your code
itself is 25 times faster.

So it is impossible for a "C" function to access an array without calling a
function?

Linkage isn't just about function calling. It's about data, too.

What did you think that I meant by direct access?

Access without going via intermediate functions.

Simon G Best · Jan 12, 2007

Richard said:
And with all due respect to Donald Knuth, who is certainly a lot cleverer
than most people here, NOT considering optimizing at an early stage can
lead to horrendously bad design and a framework which can be extremely
difficult and even impossible to alter in an economic time frame and
budget in order to process the data in a realistic time frame.

"*Premature* optimization". "*Premature* optimization".

For example, efficiency resulting from good design *is not premature.*

Puppet_Sock · Jan 12, 2007

Peter said:
Is there anyway of doing this besides making my own string from scratch?

union AnyType {
std::string String;
double Number;
};

Quite apart from *what* you can put in a union, I'm thinking,
*why* would you put stuff in a union?

I'm sitting here trying to think of a case where a union would
be the preferred case over polymorphism of some kind.
I suppose it's probably my limited imagination, but I don't
tend to use unions. Or it may be that I've found people
using unions to do stuff that they really ought not to,
and so I'm shy of them.

So, when is a union the preferred way to put multiple format
data into a block?
Socks

Peter Olcott · Jan 12, 2007

Puppet_Sock said:
Quite apart from *what* you can put in a union, I'm thinking,
*why* would you put stuff in a union?

I'm sitting here trying to think of a case where a union would
be the preferred case over polymorphism of some kind.

It must be able to be directly accessed from a "C" (not C++) program, and yet
written in C++. It probably won't be a std::string, I just provided this example
to abstract out most of the irrelevant details. What I really need is a Unicode
string accessible from "C" that has as much of the capabilities of std::string
as possible.

I also need a String (or dynamic array) of user defined type. This type will
store hardware input actions from the mouse, and keyboard. All of the persistent
suggestions of polymorphism can't work with "C".

Peter Olcott · Jan 12, 2007

Simon G Best said:
You're comparing someone else's interpreter *for C and C++* with your
interpreter *for virtual machine code!* That's not a sensible comparison.
You're kidding yourself if you think it means your code itself is 25 times
faster.

Actual benchmark timings indicated that it was 25-fold faster accomplishing
exactly the same end-result. It was also much faster than another interpreter
that precompiled to virtual machine code. I think that a 10-fold degradation
from the speed of native code (which is what my interpreter achieves) is the
upper limit of performance for an interpreter on loop constructs, anything
faster than this probably would not meet the definition of an interpreter.

Simon G Best · Jan 12, 2007

Peter said:
Actual benchmark timings indicated that it was 25-fold faster accomplishing
exactly the same end-result.

But you were still comparing a C/C++ interpreter with a virtual machine
code interpreter. It's still not a sensible comparison.

It was also much faster than another interpreter
that precompiled to virtual machine code. I think that a 10-fold degradation
from the speed of native code (which is what my interpreter achieves) is the
upper limit of performance for an interpreter on loop constructs, anything
faster than this probably would not meet the definition of an interpreter.

Did you compile from the same source in both cases? Did you use the
same compiler for both? Or did you use different compilers? Did you
compile from source code for one, but write it directly in virtual
machine code for the other?

The fact that your virtual machine code interpreter is an order of
magnitude slower than native machine code does not strike me as being at
all remarkable. I've seen nothing here to justify a lack of proper
design and proper encapsulation. But I do think you've been fooling
yourself with clearly inappropriate speed comparisons.

Peter Olcott · Jan 12, 2007

Simon G Best said:
But you were still comparing a C/C++ interpreter with a virtual machine code
interpreter. It's still not a sensible comparison.

Its one C/C++ Virtual Machine code interpreter to another.

Did you compile from the same source in both cases? Did you use the same
compiler for both? Or did you use different compilers? Did you compile from
source code for one, but write it directly in virtual machine code for the
other?

The fact that your virtual machine code interpreter is an order of magnitude
slower than native machine code does not strike me as being at all remarkable.
I've seen nothing here to justify a lack of proper

Try and find another one that is this fast!

Simon G Best · Jan 12, 2007

Peter said:
Its one C/C++ Virtual Machine code interpreter to another.

According to http://www.softintegration.com/products/, Ch "parses and
executes C code directly without intermediate code or byte code."
Calling C and C++ "Virtual Machine code" really is stretching it. If
you're having to stretch things that far to try to justify your claims,
then you must already know that your claims are bogus.

You're only kidding yourself.

Peter Olcott · Jan 12, 2007

Simon G Best said:
According to http://www.softintegration.com/products/, Ch "parses and executes
C code directly without intermediate code or byte code." Calling C and C++
"Virtual Machine code" really is stretching it. If you're having to stretch
things that far to try to justify your claims, then you must already know that
your claims are bogus.

You're only kidding yourself.

http://root.cern.ch/root/Cint.html
I was not referring to their interpreter as using virtual machine code, this is
the interpreter that uses virtual machine byte codes. My implementation is much
faster than this one too. Do you really have to be so disagreeable?

Simon G Best · Jan 12, 2007

Here you say:-

*"C/C++ Virtual Machine code".*

But then you try to change which interpreter you're referring to:-

http://root.cern.ch/root/Cint.html

and say:-

I was not referring to their interpreter as using virtual machine code,

Too late.

this is
the interpreter that uses virtual machine byte codes. My implementation is much
faster than this one too. Do you really have to be so disagreeable?

This is silly. I'm done with this.

Struct Member Variable Problems	1	Jun 21, 2023
Copy string from 2D array to a 1D array in C	1	Nov 1, 2023
TF-IDF	1	Aug 19, 2021
Trouble accessing a value within a JSON string.	1	Jun 16, 2023
I need help making an html website	2	Aug 2, 2023
Address of a union member	17	Aug 13, 2009
Measuring a string of text	1	Sep 15, 2022
static std::vector<std::string> member and a static function	12	Jun 15, 2010

Making a std::string a member of a union ???

Jim Langston

Simon G Best

Peter Olcott

Simon G Best

Peter Olcott

Simon G Best

Peter Olcott

Richard

Peter Olcott

Richard

Simon G Best

Simon G Best

Puppet_Sock

Peter Olcott

Peter Olcott

Simon G Best

Peter Olcott

Simon G Best

Peter Olcott

Simon G Best

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads