J. Campbell wrote:
I recently decided to learn C++ in order to avoid the slow
(on my old compiler) conversions like you outlined above where you
take each byte and shift it to the proper place in the integer then
add to the underlying value.
It really shouldn't be very slow. Besides that the "search for
efficiency" that many programmers feel is a core component of
programming is often misguided. There are many reasons for this. First,
speed simply isn't that important in most case - at least, not compared
to other factors such as correctness, portability, maintainability, and
getting the program done on time. Attempts to speed up code are often at
odds with these other goals. Second, most code is executed infrequently
enough that optimizing it down to nothing would not significantly
improve overall program performance - to be worth while, optimizations
have to be carefully targeted at the parts of the program that really
need it. Third, the efficiency bottlenecks in a program tend to be
things like I/O accesses, not the actual code itself. Fourth,
algorithm-level optimizations almost always give much, much more
dramatic improvements than micro-tuning code, so worrying about things
like a few shift operations is rather foolish.
Don't get me wrong - I don't like slow programs. But it's much more
important to address the overall design than to worry about the
efficiency of any particular section of code, particularly because you
don't know from the start which sections are going to be taking up the
program's execution time. By addressing design first, you can ensure
correctness, get the program running, and pave the way for optimizations
later on if the program is deemed too slow - if it's fast enough, you've
saved yourself the effort. Besides that, good design tends to lead to
reasonably efficient code in the first place.
Sorry about the barely-topical (for the thread) rant. It's one of those
things I'm always going off on.
As such, I thought "C++...groovy, I can
load the data, then create a pointer to whatever position I choose,
select the type of data I want, create a second pointer at that
location, then grab the data without performing *any* conversions."
I used to think that also. There are a number of problems with it,
however. First and foremost, different types may require different
memory alignment. A long might need to be on a 4-byte boundary, for
example, so if I try to access a char array as a long, and that char
array is not on a 4-byte boundary, the program's behavior is undefined.
This particular error results in a bus error (causing a crash) on some
systems. On Intel-bases systems I believe that improper alignment simply
causes your program to take a performance hit.
But that's just the first problem. You also have to worry about whether
the data in the array is the right format for the type you want to
interpret it as, if there's padding bytes, and things like that. Even if
all that checks out, the same data on a different platform won't work
the same way. Byte order can be different, data type sizes can vary, etc.
A few final notes about alignment: void * and char * are both capable of
representing any other object pointer type, and chars don't have
alignment requirements, so you can always access anything as an array of
chars without alignment problems (though unless you use unsigned chars
there is also a possible problem with invalid representations - unsigned
char is the only thing that is required to have none of those, and no
padding bits). Also, memory returned from malloc() is required to be
properly aligned for any type, so in theory it can be used as "common
ground" for any types, but this doesn't seem to be useful very often.
I realize that this is non-portable, and that it's poor practice to
have 2 pointers to the same memory space,
I don't know about 2 pointers to the same place being bad practice. I
can see how it could lead to problems in some cases, but I think such
problems are more a result of other things, such as not sufficiently
limiting the scope of objects, or poor memory management. Multiple
pointers to the same object is harmless by itself.
and I'm not trying to argue
that my method has any merit. I'm so used to thinking in terms of
"how to most efficiently get the results on the platform at hand" that
I feel like I'm missing the point of C++. Perhaps C++ is the wrong
language for me, since it was designed for large projects that need to
be maintained over time. Anyway, C++ is so much faster, that I can
probablly discard the notion that I need to look for efficiency
shortcuts.
In many cases, that is true. Shortcuts for efficiency are often
counter-productive anyway.
I think you'll have greater success with the language if you learn to
use the language itself, rather than learning "C++ for <insert system
name here>". Code relying on particular properties of a given system,
aside from being non-portable, tends to be more brittle as well.
My question is this. If you were writing code for your own use, that
would be unlikely to be used by anyone else, would you bother with
properly converting your bytes to ints in a platform independent
manner, or would you take the shortcut of simply loading the file,
then accessing it in it's native format?
It depends on how useful I expect the program to be. If it's a
quick-and-dirty program that will soon be discarded, it's likely that
I'd use the simplest method that I could think of, which may be
something like what you describe. For anything else, I'd do it the Right
Way, even if I'm only doing it for practice. I consider any and all
programming to be an opportunity to learn, helping me to become a better
programmer.
-Kevin