Cast object from long in safe manner?

Ben Hutchings · Jan 26, 2004

Peter Koch Larsen wrote:

a) use a hash_map. Not standard yet, but it very soon will be. Lookup
is O(1) amortized.

<snip>

Big-O notation is supposed to tell you the worst case. (Big-omega is
used for the best case.) The worst case is, unfortunately, that all
the entries fall into the same hash bucket and lookup is O(n).
Amortisation makes no difference.

Ben Hutchings · Jan 26, 2004

Stefan Rupp wrote:

However, there is the defined "network byte order", into and out of
which values can be translated for purposes of sending data across
networks in heterogenous environments. The keywords you need to search
for are: htonl and ntohl!

The design of htonl and ntohl is thoroughly wrong-headed. First, the
names suggest that they work on longs when actually they must work on
32-bit integers (similarly htons and ntohs work on 16-bit integers,
not short). Second, they can't be implemented on architectures that
have trap values for integers. Third, they require the caller to
read and write the network-ordered integers into and out of buffers
as 32-bit values, which is impossible if the value is not properly
aligned. (This last is not an issue in TCP/IP, which is what they
were designed for, since all TCP/IP header fields are aligned.)

Portable conversion functions that deal with binary formats should
read and write arrays of bytes. (Even that is problematic if you're
aiming for complete portability, since a byte may have more than 8
bits.)

Ed Avis · Jan 27, 2004

Ben Hutchings said:
Big-O notation is supposed to tell you the worst case. (Big-omega is
used for the best case.)

Not so. To say 'f is O(g)' means that for large enough n, f(n) <=
g(n). (And when giving the name of an algorithm we normally mean the
time taken by that algorithm.) You can say

In the best case, quicksort is O(n log n) in the num. of elements

In the best case, quicksort is O(n ** 2) (also true)

In the worst case, quicksort is O(n ** 2)

In the worst case, quicksort is O(n ** n ** n ** 99)

To say 'f is Omega(g)' means that g is O(f), in other words for large
enough n, f(n) >= g(n). So

Quicksort is Omega(1) (yes, it takes constant time or worse)

Quicksort is Omega(n log n) (also true)

In the worst case, quicksort is Omega(n ** 2).

However there is an informal convention that when quoting big-Os you
quote the best one you can, so saying

Algorithm X takes time O(g(n)) where n is the input size

is, by reading between the lines, saying that it is both O(g(n)) and
Omega(g(n)), because you wouldn't bother to give a worse big-O than is
possible. Otherwise people would go around saying 'quicksort is
O(n**99)' and it would all get a bit silly.

The worst case is, unfortunately, that all the entries fall into the
same hash bucket and lookup is O(n).

Indeed. For such a degenerate hash lookup is O(n) - so it's no worse
than linear in the number of elements - but it is also Omega(n) - so
no better.

Ben Hutchings · Jan 27, 2004

Ed said:
Not so.

My wording was sloppy; I really meant "upper bound" and "lower bound"
which might not actually be reached. However, I realise that even that
is not quite right since they are asymptotic bounds.

To say 'f is O(g)' means that for large enough n, f(n) <= g(n).

That should be f(n) <= k.g(n) for some constant k.

(And when giving the name of an algorithm we normally mean the
time taken by that algorithm.) You can say

In the best case, quicksort is O(n log n) in the num. of elements

In the best case, quicksort is O(n ** 2) (also true)

In the worst case, quicksort is O(n ** 2)

In the worst case, quicksort is O(n ** n ** n ** 99)

<snip>

I see what you mean and I accept that you're right, but it seems
misleading to me to give an asymptotic upper bound that's based to the
best case. It doesn't seem to represent any kind of upper bound at
all.

Dhruv · Jan 28, 2004

However there is an informal convention that when quoting big-Os you
quote the best one you can, so saying

Algorithm X takes time O(g(n)) where n is the input size

is, by reading between the lines, saying that it is both O(g(n)) and
Omega(g(n)), because you wouldn't bother to give a worse big-O than is
possible. Otherwise people would go around saying 'quicksort is
O(n**99)' and it would all get a bit silly.

Why not say that the quoted complexity is asymptotically tight bound or
not?

Regards,
-Dhruv.

Hyman Rosen · Jan 28, 2004

Ed said:
To say 'f is O(g)' means that for large enough n, f(n) <= g(n).

No, this is wrong. To say 'f is O(g)' means that there exist
constants N and C such that for all n > N, |f(n)| <= C * |g(n)|.

You can see that your definition is trivially wrong because it
claims that (n + 1) is not O(n).

Jamie Burns · Jan 28, 2004

Errr, guy's, I appreciate the thoughts on this, but look, what does all that
math mean to mere mortals like myself?!

)

I realise that a hashmap is the more elegant solution, but compared to
simply casting a memory address it is still going to be really slow right
(is that what the math proved)? And I think I have solved the dangers of
passing pointers around by using the magic string lookup.

Interestingly, I just did a quick test to see which is faster (given I am
simply casting now, not using hashmaps):

for (xx=0;xx<100000;xx++) {
rDerived* derived = dynamic_cast<rDerived*>(remoteObject->object);
if (derived) derived->setValue(message.data.messageSetInt.value);
}

against:

for (xx=0;xx<100000;xx++) {
if (typeid(remoteObject->object) == typeid(rDerived*)) ((rDerived*)
remoteObject->object)->setValue(message.data.messageSetInt.value);
}

And the latter blew the former out of the wayer (typeid() with a C style
cast was 94 times faster in my test than dynamic_cast).

Whoa.

Jamie.

Karl Heinz Buchegger · Jan 28, 2004

Jamie said:
Interestingly, I just did a quick test to see which is faster (given I am
simply casting now, not using hashmaps):

for (xx=0;xx<100000;xx++) {
rDerived* derived = dynamic_cast<rDerived*>(remoteObject->object);
if (derived) derived->setValue(message.data.messageSetInt.value);
}

against:

for (xx=0;xx<100000;xx++) {
if (typeid(remoteObject->object) == typeid(rDerived*)) ((rDerived*)
remoteObject->object)->setValue(message.data.messageSetInt.value);
}

And the latter blew the former out of the wayer (typeid() with a C style
cast was 94 times faster in my test than dynamic_cast).

IMHO that number (94) is still too high. Are you sure you didn't measure
how well the optimizer can optimize version 1 against optimizing
version 2.

kanze · Jan 28, 2004

Ben Hutchings said:
I see what you mean and I accept that you're right, but it seems
misleading to me to give an asymptotic upper bound that's based to the
best case. It doesn't seem to represent any kind of upper bound at
all.

Yes and no. Normally, the two interesting values are typical or
average, and worse case. But what do you say about quicksort. The
worst case is O(n^2). The best case is O(n ln n). I don't think that
we know how to calculate the typical or average, but empirically, I
think it has been established that quicksort usually performs fairly
close to its best case. No guarantees, but useful information anyway.

Ben Hutchings · Jan 28, 2004

Jamie said:
Errr, guy's, I appreciate the thoughts on this, but look, what does all that
math mean to mere mortals like myself?!

)

I realise that a hashmap is the more elegant solution, but compared to
simply casting a memory address it is still going to be really slow right
(is that what the math proved)?

It didn't *prove* anything, but the point of all that was that a hash
table lookup can be slow, taking time proportional to the number of
entries.

However, provided that the hash table implementation increases the
number of buckets in proportion to the number of entries, which I think
all hash_map implementations do, and ignoring cache effects which may
come into play if the table grows very large, the *average* time taken
for a hash lookup will be about the same time no matter how many
entries are in the table, and not very long at all.

And I think I have solved the dangers of passing pointers around by
using the magic string lookup.

What does that mean?

Interestingly, I just did a quick test to see which is faster (given I am
simply casting now, not using hashmaps):

for (xx=0;xx<100000;xx++) {
rDerived* derived = dynamic_cast<rDerived*>(remoteObject->object);
if (derived) derived->setValue(message.data.messageSetInt.value);
}

against:

for (xx=0;xx<100000;xx++) {
if (typeid(remoteObject->object) == typeid(rDerived*)) ((rDerived*)
remoteObject->object)->setValue(message.data.messageSetInt.value);
}

And the latter blew the former out of the wayer (typeid() with a C style
cast was 94 times faster in my test than dynamic_cast).

The latter doesn't do what you think it does. The typeid() of a
pointer doesn't depend on what it's pointing to. The condition in the
if statement should be:

typeid(*remoteObject->object) == typeid(rDerived)

kanze · Jan 29, 2004

Errr, guy's, I appreciate the thoughts on this, but look, what does
all that math mean to mere mortals like myself?!

It means that you have to know what you are trying to do before you try
to do it.

)

I realise that a hashmap is the more elegant solution, but compared to
simply casting a memory address it is still going to be really slow
right (is that what the math proved)? And I think I have solved the
dangers of passing pointers around by using the magic string lookup.

The difference is that the hashmap will work. Passing a long around,
and casting it to a pointer whenever you feel like accessing memory,
won't. After that, of course, the quality of the hash function will
affect performance. As others have mentionned, however, if only one
process is doing all of the object creation (and id assignment), there
are fairly simple solutions which will allow using an std::vector (which
is nothing other than a very efficient map size_t->T, for a restricted
range of values on the size_t side).

Keith H Duggar · Feb 1, 2004

Hello Jamie,

This problem is common and you can find literature discussing various
solutions in the client server community.

All of the viable solutions so far have discussed creating a mapping
function :

f(g(object) -> object

where g(object) is often called a "handle". This handle body idiom is
well established and you should consider using one of the solutions
already outline. Yes, you do incur a performance penalty for the added
safety. That is a tradeoff you need evaluate for your application;
but, that is the bottom line.

I just wanted to point you to one interesting alternative to the
mapping functions discussed for far : encryption. The general idiom is
:

encrypt ( this-is-a-valid-pointer + pointer ) -> handle
decrypt ( handle ) -> this-is-a-valid-pointer + pointer

If you don't recover this-is-a-valid-pointer then you don't use the
pointer.

Using this idiom you adjust the safety-vs-performance tradeoff quite
easily by choosing an appropriate encryption algorithm. One advantage
of this approach is that it doesn't require linear space overhead as
do the other handle-body approaches discussed so far. One disadvantage
is that it doesn't allow you to easily correct for server-side memory
management of the objects ( moving them around, etc) since you can't
readily adjust the mapping of handles you have already distributed.

EJB Bindings - Class Cast Exception	0	Sep 21, 2017
Updating JSON object	1	Aug 12, 2023
Generate one HTML from API based on the object key language and their value	2	Aug 19, 2022
Question regarding cast	13	Jun 19, 2013
Is this void* cast safe?	9	Jun 8, 2008
cast safe	1	Dec 14, 2004
Digital Signature field form in PDF generated document from HTML	5	Nov 16, 2022
Reading little-endian data from a file in a portable manner	46	Jul 16, 2010

Cast object from long in safe manner?

Ben Hutchings

Ben Hutchings

Ed Avis

Ben Hutchings

Dhruv

Hyman Rosen

Jamie Burns

Karl Heinz Buchegger

kanze

Ben Hutchings

kanze

Keith H Duggar

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads