Converting char to int.

D

Daz

Hi all!

I know this sounds like a stupid question, but I need to varify whether
ot not a single char (from a char array) is a number. I have tried
using atoi, but it seems to return '0' for any characters. This is ok,
unless the character happens to be a '0'. I am certain I am missing
something here. Any pointers would be fantastic. Each character in the
array is varified individually and a new stringstream string is built
using all of chars that are numbers. All chars that are letters are (or
at least 'should be' ommited.

Thanks in advance.

Daz
 
R

Rolf Magnus

Daz said:
Hi all!

I know this sounds like a stupid question, but I need to varify whether
ot not a single char (from a char array) is a number.

Look up the function std::isdigit(). It does exactly that.
 
D

Daz

Ok, I don't get it :(

I don't understand what 'Locale' does. On MSDN, a snippet from the
example is:
| > locale loc ( "German_Germany" );
| > bool result1 = isdigit ( 'L', loc );
| > bool result2 = isdigit ( '@', loc );
| > bool result3 = isdigit ( '3', loc );

I don't understand how 'loc' ties in with the rest. I have looked up
'Locale', and the examples I found have simply confused me more. Please
could someone set me straight on this?

Best wishes

Daz
 
R

Rolf Magnus

Daz said:
Ok, I don't get it :(

I don't understand what 'Locale' does. On MSDN, a snippet from the
example is:
| > locale loc ( "German_Germany" );
| > bool result1 = isdigit ( 'L', loc );
| > bool result2 = isdigit ( '@', loc );
| > bool result3 = isdigit ( '3', loc );

I don't understand how 'loc' ties in with the rest. I have looked up
'Locale', and the examples I found have simply confused me more. Please
could someone set me straight on this?

Just use the function without locales:

#include <cctype>
#include <iostream>

int main()
{
bool result1 = std::isdigit('L');
bool result2 = std::isdigit('@');
bool result3 = std::isdigit('3');
std::cout << std::boolalpha << result1 << '\n'
<< result2 << '\n'
<< result3 << '\n';
}
 
T

Tomás

Daz posted:
Hehe. Ingenious!

Thanks Rolf! :)

Another alternative would have been:


bool IsDecimalDigit( char const c )
{
return !( c < '0' && c > '9' );
}


(The Standard guarantess that the decimal digits are one-after-another)


-Tomás
 
R

Rolf Magnus

Tomás said:
Another alternative would have been:


bool IsDecimalDigit( char const c )
{
return !( c < '0' && c > '9' );
}

That is not correct. Make it

bool IsDecimalDigit( char const c )
{
return c >= '0' && c <= '9';
}
 
T

Tomás

Rolf Magnus posted:


I should have written:

That is not correct. Make it

bool IsDecimalDigit( char const c )
{
return c >= '0' && c <= '9';
}


I figured that my form would be more efficient.

My form consists of:
Less than
Greater than
Or
Invert

Your form consists of:
Greater than or equal
Less than or equal
And


But if we were going for massive optimization, we'd be better off with:

inline bool NotDecimalDigit( char const c )
{
return c < '0' || c > '9';
}

inline bool IsDecimalDigit( char const c )
{
return !NotDecimalDigit(c);
}


Then use each form when it's most appropriate.

-Tomás
 
R

Rolf Magnus

Tomás said:
Rolf Magnus posted:



I should have written:




I figured that my form would be more efficient.

My form consists of:
Less than
Greater than
Or
Invert

Your form consists of:
Greater than or equal
Less than or equal
And

So yours needs one action more. What makes you think this is more efficient?
Anyway, compiler optimization will likely make something else from the code
anyway.

But if we were going for massive optimization, we'd be better off with:

I wasn't going for "massive optimization", but for correctness, simplicity
and clarity. I don't really think you can get any measurable difference in
execution time between all those variants.
 
J

Jim Langston

Tomás said:
Rolf Magnus posted:



I should have written:




I figured that my form would be more efficient.

My form consists of:
Less than
Greater than
Or
Invert

I count 4 op codes there and they will all be performed.
Your form consists of:
Greater than or equal
Less than or equal
And

I count 3 op codes there.

How do you consider yours more effecient than if it uses one more opcode?

reguardless, I consider his more readable and maintainable.
 
D

Daz

Jim said:
How do you consider yours more effecient than if it uses one more opcode?

reguardless, I consider his more readable and maintainable.
I have to agree with both of you. Yes, technically, one 'may' be
slightly more optimized, but I believe that any speed (if any) lost
during runtime would be utterly neglegable.

Daz
 
K

kwikius

Daz said:
Ok, I don't get it :(

I don't understand what 'Locale' does. On MSDN, a snippet from the
example is:
| > locale loc ( "German_Germany" );
| > bool result1 = isdigit ( 'L', loc );
| > bool result2 = isdigit ( '@', loc );
| > bool result3 = isdigit ( '3', loc );

I don't understand how 'loc' ties in with the rest. I have looked up
'Locale', and the examples I found have simply confused me more. Please
could someone set me straight on this?

Locale is meant to smooth out differences between countries. For
example if you need to output currency units but you dont know what
country user is in you can just indirect via the locale. In theory
anyway!!!

Each locale has 'facets' for money, numeric, time and such. Each facet
has its own set of operations. The following is meant to show currency
symbol for three countries. In fact it gets it wrong for U.K (Shows USD
on my system ). Further the strings are OS specific. IOW it aint really
much use IMO .

regards
Andy Little

#include <iostream>
#include <locale>

void get_currency(const char* country)
{
std::locale loc = std::locale(country);
std::moneypunct<char,true> const & moneypunct
= std::use_facet<std::moneypunct<char,true> >(loc);
std::cout << '\'' << country << "' currency symbol = '"
<< moneypunct.curr_symbol() << "'\n";
}
int main()
{
// Names are os dependent
#ifdef _MSC_VER
const char* countries[] = {"French","German","English-uk",
"Chinese"};
#endif
for (int i = 0;i < 4; ++i){
get_currency(countries);
}
}
 
D

Daz

kwikius said:
Locale is meant to smooth out differences between countries. For
example if you need to output currency units but you dont know what
country user is in you can just indirect via the locale. In theory
anyway!!!

Each locale has 'facets' for money, numeric, time and such. Each facet
has its own set of operations. The following is meant to show currency
symbol for three countries. In fact it gets it wrong for U.K (Shows USD
on my system ). Further the strings are OS specific. IOW it aint really
much use IMO .

regards
Andy Little

#include <iostream>
#include <locale>

void get_currency(const char* country)
{
std::locale loc = std::locale(country);
std::moneypunct<char,true> const & moneypunct
= std::use_facet<std::moneypunct<char,true> >(loc);
std::cout << '\'' << country << "' currency symbol = '"
<< moneypunct.curr_symbol() << "'\n";
}
int main()
{
// Names are os dependent
#ifdef _MSC_VER
const char* countries[] = {"French","German","English-uk",
"Chinese"};
#endif
for (int i = 0;i < 4; ++i){
get_currency(countries);
}
}


Most useful indeed! Thanks a lot Andy!
My understanding of the subject is so much clearer now. :eek:)
 
J

Jerry Coffin

[ ... ]
I should have written:

return !( c < '0' || c > '9' );

[ ... ]
I figured that my form would be more efficient.

My form consists of:
Less than
Greater than
Or
Invert

Your form consists of:
Greater than or equal
Less than or equal
And

That would make sense if 'greater than or equal' was
implemented as two separate tests (and likewise 'less
than or equal').

That's not generally true though -- I can hardly think of
a processor that can't combine each of those into a
single test.
But if we were going for massive optimization, we'd be better off with:

inline bool NotDecimalDigit( char const c )
{
return c < '0' || c > '9';
}

inline bool IsDecimalDigit( char const c )
{
return !NotDecimalDigit(c);
}

I can hardly think of a processor for which I'd consider
this a "massive optimization". If I really had to
implement my own versions of these, I'd consider
something like this:

inline bool IsNotDedimalDigit(char ch) {
return static_cast<unsigned>(ch-'0') > 9;
}

inline bool IsDecimalDigit(char ch) {
return static_cast<unsigned>(ch-'0') <= 9;
}

To give a concrete comparison, here's what Visual C++
produces for the three versions in question:

?IsDecimalDigitRolf@@YI_ND@Z PROC NEAR
cmp cl, 48
jl SHORT $L315
cmp cl, 57
jg SHORT $L315
mov eax, 1
ret 0
$L315:
xor eax, eax
ret 0
?IsDecimalDigitRolf@@YI_ND@Z ENDP

?IsDecimalDigitTomas@@YI_ND@Z PROC NEAR
cmp cl, 48
jl SHORT $L307
cmp cl, 57
jg SHORT $L307
xor eax, eax
xor ecx, ecx
test al, al
sete cl
mov al, cl
ret 0
$L307:
mov eax, 1
xor ecx, ecx
test al, al
sete cl
mov al, cl
ret 0
?IsDecimalDigitTomas@@YI_ND@Z ENDP

?IsDecimalDigitJerry@@YI_ND@Z PROC NEAR
movsx eax, cl
sub eax, 48
mov ecx, 9
cmp ecx, eax
sbb eax, eax
add eax, 1
ret 0
?IsDecimalDigitJerry@@YI_ND@Z ENDP

I doubt anybody needs to read Intel assembly language to
guess that your attempted optimization seems to have
backfired. Rolf's code produces output that's
considerably shorter and simpler.

What may be a lot less obvious is that even though your
code is longer than Rolf's, the real difference in speed
will usually be pretty minimal. In both cases, you have a
couple of conditional branches that will often consume
the bulk of the time. In fact, either one of these might
easily consume 20 or more clock cycles on a modern CPU,
and in a bad case, you might hit that penalty twice.
Perhaps worse, the speed will often vary over a range of
3:1 or more depending on the input data.

My code avoids conditional execution entirely, so it's
not only short, but consistently fast (in fact, probably
always faster than even the best case for either of the
others).

This really isn't meant as a "You suck; I rule" kind of
post either. Rather, it's intended to point out that it
can be _really_ tricky to do micro-optimization like this
at all well. Unless you know quite a lot about your
compiler and your target CPU, it's entirely possible for
an attempted optimization to backfire, sometimes quite
badly.

Just for an obvious example, while my code works well for
a relatively typical target, on a processor that didn't
use two's complement integers, it would almost certainly
be truly terrible -- almost certainly quite a lot bigger
and slower than either your code or Rolf's.
Then use each form when it's most appropriate.

If you're after "massive optimization", I have my doubts
that either is likely to ever be "most appropriate".

The standard library isdigit implementation may well be
better than the ones I've given though. The standard
library will often use a table-drive approach, giving
code vaguely like this:

bool isdigit(int ch) {
return (type_table[ch+1] & _Digit) != 0;
}

On older processors that ran about the same speed as
memory, this was often a big win. On current processors,
that's a lot less dependable. If the table is loaded
entirely into the cache, this will typically execute very
quickly. OTOH, if the data for the table has to be loaded
from main memory very often, this will may easily be the
slowest of all.

As an aside, applying 'const' at the top level as you've
done above is basically pointless -- since the char is
being passed by value, there's no way this function could
possibly modify the original, whether const qualified or
not.
 
M

Michiel.Salters

Daz said:
Ok, I don't get it :(

I don't understand what 'Locale' does. On MSDN, a snippet from the
example is:
| > locale loc ( "German_Germany" );
| > bool result1 = isdigit ( 'L', loc );
| > bool result2 = isdigit ( '@', loc );
| > bool result3 = isdigit ( '3', loc );

I don't understand how 'loc' ties in with the rest. I have looked up
'Locale', and the examples I found have simply confused me more. Please
could someone set me straight on this?

In non-western countries, there are more digits than '0'-'9'. These
countries
can use a non-standard locale. One facet of a locale determines whether
a
char is a digit. The example above will seem fairly stupid, as the
Germans
use exactly the same digits. The function isalpha is less stupid for
them;
while the germans use only '0'-'9', they do use more letters than just
'A'-'Z' and 'a'-'z'. So, clearly there must be an isalpha that accepts
a german
locale.

HTH,
Michiel Salters
 
K

kwikius

kwikius said:
Each locale has 'facets' for money, numeric, time and such. Each facet
has its own set of operations. The following is meant to show currency
symbol for three countries. In fact it gets it wrong for U.K (Shows USD
on my system ). Further the strings are OS specific. IOW it aint really
much use IMO .

Just for the record. As my locale is United Kingdom, then requesting my
currency using the default locale rather than via an os dependent
string does indeed give me my expected currency "GBP", so I had better
retract my statement saying locales aint much use! Rather its my
knowledge of them that is lacking.

Bjarne Strosutrup on locales:

http://public.research.att.com/~bs/3rd_loc.pdf

regards
Andy Little
 
R

Richard Herring

In non-western countries, there are more digits than '0'-'9'.

?!
There are countries where they use alternative representations for
'0'-'9', but is there anywhere (excluding ancient Rome etc.) that uses
more than ten of them?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,070
Latest member
BiogenixGummies

Latest Threads

Top