I have few questions on conversions between "char*" to "unsigned
char*" and vice versa. I am assuming casting "unsigned char*" to
"char*" is safe because "char" can hold all the values that an
"unsigned char" can hold.
This is true if, and only if, you are on a system where "char" and "unsigned
char" have the exact same range of values. Otherwise, there will be values
that you can store in "unsigned char" that can't be stored in "char".
But conversion of "char*" to "unsigned char*" won't be safe as
"char" can hold more values.
No, it can't. At least, so far as I recall, it's absolutely necessary
that "unsigned char" have at least as many possible values as "char".
Is this understanding correct? On what
cases "char*" will have negative values?
Negative values are not coherent for pointers. You probably meant "char".
The answer is, if you're on an implementation where "char" is a signed type,
then sometimes it could have negative values.
I have never seen negative values on a "char*" string. So is that
safe to do conversion from "char*" to "unsigned char*"?
Maybe.
By conversion, I mean using casting - char* c = (char*) string;
where string is a "unsigned char*".
Maybe.
You haven't explained what you mean by "safe", though. If you convert any
numeric value whatsoever to "unsigned char", it is guaranteed "safe" in that
it cannot cause a processor trap, or result in a value that is not valid
for "unsigned char". It may, however, not be the value you expected to get.
For instance, on most modern CPUs, if you convert any of 256, 512, or 1024 to
unsigned char, you will quite safely and reliably get the value 0. But it
won't crash.
If any one wondering, why I use unsigned char - I use it for doing
some UTF8 processing on the string. I need to use that to skip the
multi-byte sequences correctly.
So you probably do. But before you go reinventing the wheel, why not check
to see what your implementation has for existing UTF-8 support.
If you're at a level of experience where you're not quite sure about how
char and unsigned char interact, I would suggest that you are probably not
ready to reliably and consistently implement UTF-8. If you're doing it just
to learn, hey, sounds like a fun project, good luck with that. If you're
doing it because you want to get something done, though, consider using the
existing code that already does it correctly.
-s