Help with Code

D

David Williams

Hello all,
I am asking for help with the following code:


if($old=~/checksum=(\d+)/)

I think the =~ is not equal to meaning if
$old (which is a filehandler) is not equal to something.

also, checksum, is that a UNIX command? checksum is not a variable
anywhere in the code I am debugging. I did a man page on checksum
and got nothing back.

Lastly, what is \d+ ? The PERL book says \d is digit but I don't
understand.

Thanks for any help!

David
 
P

Paul Lalli

David said:
I am asking for help with the following code:


if($old=~/checksum=(\d+)/)

I think the =~ is not equal to

No, that's not correct. =~ is the binding operator. It says to look
for a pattern (the right argument) within the string (the left
argument).

The "not equal to" operator is != for numbers, and ne for strings
meaning if
$old (which is a filehandler) is not equal to something.

If $old is a filehandle, then you're not going to be able to do any
useful comparisons on it. You must read a string from the filehandle
and do your comparison or pattern matching on that string. You
generally do this with the < > operator, like so:

my $line = said:
also, checksum, is that a UNIX command? checksum is not a variable
anywhere in the code I am debugging. I did a man page on checksum
and got nothing back.

In the above code, 'checksum' is simply a part of the pattern that is
being searched for in the variable $old. It is not a variable nor a
UNIX command.
Lastly, what is \d+ ? The PERL book says \d is digit but I don't
understand.

\d+ is a regular expression token that means "1 or more of any digit".


The entirety of the code above says:

"if the string contained in $old contains the string 'checksum',
followed by an equals sign, followed by 1 or more of any digit,
then..."

(it also stores the digits that it found in the variable $1, if the
pattern is actually successful).

You can read more about regular expressions by typing these into your
command window:
perldoc perlretut
perldoc perlre
perldoc perlreref

And you can find all the operators and what they mean by typing:
perldoc perlop

Hope this helps,
Paul Lalli
 
D

David Squire

David said:
Hello all,
I am asking for help with the following code:


if($old=~/checksum=(\d+)/)

I think the =~ is not equal to meaning if
$old (which is a filehandler) is not equal to something.

No. It is an operator that associates a string with a regular expression
for matching.
also, checksum, is that a UNIX command? checksum is not a variable
anywhere in the code I am debugging. I did a man page on checksum
and got nothing back.

That's because it is part of the regular expression pattern in the match
in that line. It's a literal pattern of characters.
Lastly, what is \d+ ? The PERL

There's no such language as "PERL". It's Perl.
book says \d is digit but I don't
understand.

You need to go back to your book, or to the documentation that comes
with Perl, and learn the basics about regular expressions and the Perl
operators that use them. See, for example:

perldoc perlretut


DS
 
I

Ian Wilson

David said:
Hello all,
I am asking for help with the following code:

I recommend you read an introductory book such as "Learning Perl".

You may like to review what is available at http://learn.perl.org

Also, at a command prompt you can review the doumentation included with
perl - e.g. view the table of contents using the command `perldoc perltoc`
if($old=~/checksum=(\d+)/)

I think the =~ is not equal to meaning if
$old (which is a filehandler) is not equal to something.

I find it's not worth guessing blindly, reading the documentation works
well for me. Start at `perldoc perlop` and look for "Binding Operators"
also, checksum, is that a UNIX command? checksum is not a variable
anywhere in the code I am debugging. I did a man page on checksum
and got nothing back.

Your "checksum=" is fixed text in a Perl regular expression. It is
something that is being searched for in the contents of the variable
$old. In your case $old presumably contains something like
"lorem ipsum checksum=3489589713485 dolor sit amet" and your program
needs to extract the checksum value.
Lastly, what is \d+ ? The PERL book says \d is digit but I don't
understand.

\d matches "0", "1" ... "8" or "9"
+ means match one or more of the previous character
(\d+) 'captures' a sequence of one or more digits, for example an
integer such as "6" or "1238". Capturing means that perl stores the
matched text in a special variable for you to use later.
Thanks for any help!

Please follow the references before posting more questions of this sort.
 
A

anno4000

David Williams said:
Hello all,
I am asking for help with the following code:


if($old=~/checksum=(\d+)/)

I think the =~ is not equal to meaning if

No. "Not equal" can be expressed as "!=" or "ne" in Perl. The "=~"
you have here is a binding operator. In this case it means to match
the string $old against the pattern (/checksum=(\d+)/) on the right side.
$old (which is a filehandler) is not equal to something.

It doesn't make sense for $old to be a filehandle (not filehander).
The pattern match that happens is only useful with a string.
also, checksum, is that a UNIX command? checksum is not a variable
anywhere in the code I am debugging. I did a man page on checksum
and got nothing back.

It may or may not be a Unix command, that doesn't matter. In your
code it is just part of the pattern to match.
Lastly, what is \d+ ? The PERL book says \d is digit but I don't
understand.

You need to understand regular expressions, which are Perl's (and many
other languages') way of expressing string patterns. The specific
pattern /checksum=(\d+)/ matches any string that contains the characters
"checksum=" immediately followed by one or more digits.

"hihi haha checksum=123 hoho"

would be an example.
Thanks for any help!

You won't be able to debug a Perl program (even a short one) with
ad-hoc explanations given on Usenet. You'll either have to learn
enough Perl to understand the program or get someone else to do it.

Anno
 
D

Dr.Ruud

Ian Wilson schreef:
\d matches "0", "1" ... "8" or "9"

Last time I checked, \d matched 268 different characters. Dear
programmer, if you mean [0-9], then write [0-9].
 
P

Paul Lalli

Dr.Ruud said:
Ian Wilson schreef:
\d matches "0", "1" ... "8" or "9"

Last time I checked, \d matched 268 different characters. Dear
programmer, if you mean [0-9], then write [0-9].

Er. Huh? I realize that \w will match not only 'a'..'z', 'A'..'Z',
'0'..'9', and _, and that all the "international" letters such as á
and Ñ are included as well, depending on locale. But other than the
ten characters Ian implied, what else does \d match?

I did take a look at `perldoc perlreref`, which in turn referred me to
`perldoc perllocale`, but I confess that I don't get it - I'm extremely
naïve when it comes to locales...

Paul Lalli
 
D

Dr.Ruud

Paul Lalli schreef:
Dr.Ruud:
Ian Wilson:
\d matches "0", "1" ... "8" or "9"

Last time I checked, \d matched 268 different characters. Dear
programmer, if you mean [0-9], then write [0-9].

Er. Huh? I realize that \w will match not only 'a'..'z', 'A'..'Z',
'0'..'9', and _, and that all the "international" letters such as á
and Ñ are included as well, depending on locale. But other than the
ten characters Ian implied, what else does \d match?

I did take a look at `perldoc perlreref`, which in turn referred me to
`perldoc perllocale`, but I confess that I don't get it - I'm
extremely naïve when it comes to locales...

The following tries to promote Data::Alias as well:

#!/usr/bin/perl
# Id: unicount.pl
# Subject: show some Unicode statistics

use warnings ;
use strict ;
use Data::Alias ;

binmode STDOUT, ':utf8' ;

my @table =
# +--Name------+---qRegexp--------+-C-+-L-+-U-+
(
[ 'xdigit' , qr/[[:xdigit:]]/ , 0 , 0 , 0 ] ,
[ 'ascii' , qr/[[:ascii:]]/ , 0 , 0 , 0 ] ,
[ '\\d' , qr/\d/ , 0 , 0 , 0 ] ,
[ 'digit' , qr/[[:digit:]]/ , 0 , 0 , 0 ] ,
[ 'IsNumber' , qr/\p{IsNumber}/ , 0 , 0 , 0 ] ,
[ 'alpha' , qr/[[:alpha:]]/ , 0 , 0 , 0 ] ,
[ 'alnum' , qr/[[:alnum:]]/ , 0 , 0 , 0 ] ,
[ 'word' , qr/[[:word:]]/ , 0 , 0 , 0 ] ,
[ 'graph' , qr/[[:graph:]]/ , 0 , 0 , 0 ] ,
[ 'print' , qr/[[:print:]]/ , 0 , 0 , 0 ] ,
[ 'blank' , qr/[[:blank:]]/ , 0 , 0 , 0 ] ,
[ 'space' , qr/[[:space:]]/ , 0 , 0 , 0 ] ,
[ 'punct' , qr/[[:punct:]]/ , 0 , 0 , 0 ] ,
[ 'cntrl' , qr/[[:cntrl:]]/ , 0 , 0 , 0 ] ,
) ;

my @codepoints =
(
0x0000 .. 0xD7FF,
0xE000 .. 0xFDCF,
0xFDF0 .. 0xFFFD,
0x10000 .. 0x1FFFD,
0x20000 .. 0x2FFFD,
# 0x30000 .. 0x3FFFD, # etc.
) ;

for my $row ( @table )
{
alias my ($name, $qrx, $count, $lower, $upper) = @$row ;

printf "\n%s\n", $name ;

my $n = 0 ;

for ( @codepoints )
{
local $_ = chr ; # int-2-char conversion
$n++ ;

if ( /$qrx/ )
{
$count++ ;
$lower++ if / [[:lower:]] /x ;
$upper++ if / [[:upper:]] /x ;
}
}

my $show_lower_upper =
($lower || $upper)
? sprintf( ' (lower:%6d, upper:%6d)'
, $lower
, $upper
)
: '' ;

printf "%6d /%6d =%7.3f%%%s\n"
, $count
, $n
, 100 * $count / $n
, $show_lower_upper
}

print "\n" ;

__END__


Results (v5.8.6, i386-freebsd-64int)

xdigit
22 /194522 = 0.011% (lower: 6, upper: 6)

ascii
128 /194522 = 0.066% (lower: 26, upper: 26)

\d
268 /194522 = 0.138%

digit
268 /194522 = 0.138%

IsNumber
612 /194522 = 0.315%

alpha
91183 /194522 = 46.875% (lower: 1380, upper: 1160)

alnum
91451 /194522 = 47.013% (lower: 1380, upper: 1160)

word
91801 /194522 = 47.193% (lower: 1380, upper: 1160)

graph
102330 /194522 = 52.606% (lower: 1380, upper: 1160)

print
102349 /194522 = 52.616% (lower: 1380, upper: 1160)

blank
18 /194522 = 0.009%

space
24 /194522 = 0.012%

punct
374 /194522 = 0.192%

cntrl
6473 /194522 = 3.328%
 
P

Peter J. Holzer

Dr.Ruud said:
Ian Wilson schreef:
\d matches "0", "1" ... "8" or "9"

Last time I checked, \d matched 268 different characters. Dear
programmer, if you mean [0-9], then write [0-9].

Er. Huh? I realize that \w will match not only 'a'..'z', 'A'..'Z',
'0'..'9', and _, and that all the "international" letters such as á
and Ñ are included as well, depending on locale. But other than the
ten characters Ian implied, what else does \d match?

The digits in all the non-latin scripts. Try:


#!/usr/bin/perl
use warnings;
use strict;
use charnames qw();

for my $c (0x0000 .. 0xD7FF,
0xE000 .. 0xFDCF,
0xFDF0 .. 0xFFFD,
0x1_0000 .. 11_0000
) {
my $s = pack 'U', $c;
if ($s =~ /\d/) {
printf ("%5d %5x %s %s\n", $c, $c, $s, charnames::viacode($c));
}
}

On my system this prints 218 digits:

48 30 0 DIGIT ZERO
49 31 1 DIGIT ONE
50 32 2 DIGIT TWO
51 33 3 DIGIT THREE
52 34 4 DIGIT FOUR
53 35 5 DIGIT FIVE
54 36 6 DIGIT SIX
55 37 7 DIGIT SEVEN
56 38 8 DIGIT EIGHT
57 39 9 DIGIT NINE
1632 660 Ù  ARABIC-INDIC DIGIT ZERO
1633 661 Ù¡ ARABIC-INDIC DIGIT ONE
1634 662 Ù¢ ARABIC-INDIC DIGIT TWO
1635 663 Ù£ ARABIC-INDIC DIGIT THREE
1636 664 Ù¤ ARABIC-INDIC DIGIT FOUR
1637 665 Ù¥ ARABIC-INDIC DIGIT FIVE
1638 666 Ù¦ ARABIC-INDIC DIGIT SIX
1639 667 Ù§ ARABIC-INDIC DIGIT SEVEN
1640 668 Ù¨ ARABIC-INDIC DIGIT EIGHT
1641 669 Ù© ARABIC-INDIC DIGIT NINE
1776 6f0 Û° EXTENDED ARABIC-INDIC DIGIT ZERO
1777 6f1 Û± EXTENDED ARABIC-INDIC DIGIT ONE
1778 6f2 Û² EXTENDED ARABIC-INDIC DIGIT TWO
1779 6f3 Û³ EXTENDED ARABIC-INDIC DIGIT THREE
1780 6f4 Û´ EXTENDED ARABIC-INDIC DIGIT FOUR
1781 6f5 Ûµ EXTENDED ARABIC-INDIC DIGIT FIVE
1782 6f6 Û¶ EXTENDED ARABIC-INDIC DIGIT SIX
1783 6f7 Û· EXTENDED ARABIC-INDIC DIGIT SEVEN
1784 6f8 Û¸ EXTENDED ARABIC-INDIC DIGIT EIGHT
1785 6f9 Û¹ EXTENDED ARABIC-INDIC DIGIT NINE
2406 966 ० DEVANAGARI DIGIT ZERO
2407 967 १ DEVANAGARI DIGIT ONE
2408 968 २ DEVANAGARI DIGIT TWO
2409 969 ३ DEVANAGARI DIGIT THREE
2410 96a ४ DEVANAGARI DIGIT FOUR
2411 96b ५ DEVANAGARI DIGIT FIVE
2412 96c ६ DEVANAGARI DIGIT SIX
2413 96d ७ DEVANAGARI DIGIT SEVEN
2414 96e ८ DEVANAGARI DIGIT EIGHT
2415 96f ९ DEVANAGARI DIGIT NINE
2534 9e6 ০ BENGALI DIGIT ZERO
2535 9e7 ১ BENGALI DIGIT ONE
2536 9e8 ২ BENGALI DIGIT TWO
2537 9e9 ৩ BENGALI DIGIT THREE
2538 9ea ৪ BENGALI DIGIT FOUR
2539 9eb ৫ BENGALI DIGIT FIVE
2540 9ec ৬ BENGALI DIGIT SIX
2541 9ed ৭ BENGALI DIGIT SEVEN
2542 9ee ৮ BENGALI DIGIT EIGHT
2543 9ef ৯ BENGALI DIGIT NINE
2662 a66 ੦ GURMUKHI DIGIT ZERO
2663 a67 ੧ GURMUKHI DIGIT ONE
2664 a68 ੨ GURMUKHI DIGIT TWO
2665 a69 à©© GURMUKHI DIGIT THREE
2666 a6a ੪ GURMUKHI DIGIT FOUR
2667 a6b à©« GURMUKHI DIGIT FIVE
2668 a6c ੬ GURMUKHI DIGIT SIX
2669 a6d à©­ GURMUKHI DIGIT SEVEN
2670 a6e à©® GURMUKHI DIGIT EIGHT
2671 a6f ੯ GURMUKHI DIGIT NINE
2790 ae6 ૦ GUJARATI DIGIT ZERO
2791 ae7 ૧ GUJARATI DIGIT ONE
2792 ae8 ૨ GUJARATI DIGIT TWO
2793 ae9 à«© GUJARATI DIGIT THREE
2794 aea ૪ GUJARATI DIGIT FOUR
2795 aeb à«« GUJARATI DIGIT FIVE
2796 aec ૬ GUJARATI DIGIT SIX
2797 aed à«­ GUJARATI DIGIT SEVEN
2798 aee à«® GUJARATI DIGIT EIGHT
2799 aef ૯ GUJARATI DIGIT NINE
2918 b66 à­¦ ORIYA DIGIT ZERO
2919 b67 à­§ ORIYA DIGIT ONE
2920 b68 à­¨ ORIYA DIGIT TWO
2921 b69 à­© ORIYA DIGIT THREE
2922 b6a à­ª ORIYA DIGIT FOUR
2923 b6b à­« ORIYA DIGIT FIVE
2924 b6c à­¬ ORIYA DIGIT SIX
2925 b6d à­­ ORIYA DIGIT SEVEN
2926 b6e à­® ORIYA DIGIT EIGHT
2927 b6f à­¯ ORIYA DIGIT NINE
3047 be7 ௧ TAMIL DIGIT ONE
3048 be8 ௨ TAMIL DIGIT TWO
3049 be9 ௩ TAMIL DIGIT THREE
3050 bea ௪ TAMIL DIGIT FOUR
3051 beb ௫ TAMIL DIGIT FIVE
3052 bec ௬ TAMIL DIGIT SIX
3053 bed ௭ TAMIL DIGIT SEVEN
3054 bee ௮ TAMIL DIGIT EIGHT
3055 bef ௯ TAMIL DIGIT NINE
3174 c66 ౦ TELUGU DIGIT ZERO
3175 c67 ౧ TELUGU DIGIT ONE
3176 c68 ౨ TELUGU DIGIT TWO
3177 c69 ౩ TELUGU DIGIT THREE
3178 c6a ౪ TELUGU DIGIT FOUR
3179 c6b ౫ TELUGU DIGIT FIVE
3180 c6c ౬ TELUGU DIGIT SIX
3181 c6d à±­ TELUGU DIGIT SEVEN
3182 c6e à±® TELUGU DIGIT EIGHT
3183 c6f ౯ TELUGU DIGIT NINE
3302 ce6 ೦ KANNADA DIGIT ZERO
3303 ce7 ೧ KANNADA DIGIT ONE
3304 ce8 ೨ KANNADA DIGIT TWO
3305 ce9 ೩ KANNADA DIGIT THREE
3306 cea ೪ KANNADA DIGIT FOUR
3307 ceb ೫ KANNADA DIGIT FIVE
3308 cec ೬ KANNADA DIGIT SIX
3309 ced à³­ KANNADA DIGIT SEVEN
3310 cee à³® KANNADA DIGIT EIGHT
3311 cef ೯ KANNADA DIGIT NINE
3430 d66 ൦ MALAYALAM DIGIT ZERO
3431 d67 ൧ MALAYALAM DIGIT ONE
3432 d68 ൨ MALAYALAM DIGIT TWO
3433 d69 ൩ MALAYALAM DIGIT THREE
3434 d6a ൪ MALAYALAM DIGIT FOUR
3435 d6b ൫ MALAYALAM DIGIT FIVE
3436 d6c ൬ MALAYALAM DIGIT SIX
3437 d6d ൭ MALAYALAM DIGIT SEVEN
3438 d6e ൮ MALAYALAM DIGIT EIGHT
3439 d6f ൯ MALAYALAM DIGIT NINE
3664 e50 ๠THAI DIGIT ZERO
3665 e51 ๑ THAI DIGIT ONE
3666 e52 ๒ THAI DIGIT TWO
3667 e53 ๓ THAI DIGIT THREE
3668 e54 ๔ THAI DIGIT FOUR
3669 e55 ๕ THAI DIGIT FIVE
3670 e56 ๖ THAI DIGIT SIX
3671 e57 ๗ THAI DIGIT SEVEN
3672 e58 ๘ THAI DIGIT EIGHT
3673 e59 ๙ THAI DIGIT NINE
3792 ed0 à» LAO DIGIT ZERO
3793 ed1 ໑ LAO DIGIT ONE
3794 ed2 à»’ LAO DIGIT TWO
3795 ed3 ໓ LAO DIGIT THREE
3796 ed4 à»” LAO DIGIT FOUR
3797 ed5 ໕ LAO DIGIT FIVE
3798 ed6 à»– LAO DIGIT SIX
3799 ed7 à»— LAO DIGIT SEVEN
3800 ed8 ໘ LAO DIGIT EIGHT
3801 ed9 à»™ LAO DIGIT NINE
3872 f20 ༠ TIBETAN DIGIT ZERO
3873 f21 ༡ TIBETAN DIGIT ONE
3874 f22 ༢ TIBETAN DIGIT TWO
3875 f23 ༣ TIBETAN DIGIT THREE
3876 f24 ༤ TIBETAN DIGIT FOUR
3877 f25 ༥ TIBETAN DIGIT FIVE
3878 f26 ༦ TIBETAN DIGIT SIX
3879 f27 ༧ TIBETAN DIGIT SEVEN
3880 f28 ༨ TIBETAN DIGIT EIGHT
3881 f29 ༩ TIBETAN DIGIT NINE
4160 1040 ဠMYANMAR DIGIT ZERO
4161 1041 á MYANMAR DIGIT ONE
4162 1042 á‚ MYANMAR DIGIT TWO
4163 1043 რMYANMAR DIGIT THREE
4164 1044 á„ MYANMAR DIGIT FOUR
4165 1045 á… MYANMAR DIGIT FIVE
4166 1046 ᆠMYANMAR DIGIT SIX
4167 1047 ᇠMYANMAR DIGIT SEVEN
4168 1048 ሠMYANMAR DIGIT EIGHT
4169 1049 በMYANMAR DIGIT NINE
4969 1369 á© ETHIOPIC DIGIT ONE
4970 136a ᪠ETHIOPIC DIGIT TWO
4971 136b á« ETHIOPIC DIGIT THREE
4972 136c ᬠETHIOPIC DIGIT FOUR
4973 136d á­ ETHIOPIC DIGIT FIVE
4974 136e á® ETHIOPIC DIGIT SIX
4975 136f ᯠETHIOPIC DIGIT SEVEN
4976 1370 á° ETHIOPIC DIGIT EIGHT
4977 1371 á± ETHIOPIC DIGIT NINE
6112 17e0 ០ KHMER DIGIT ZERO
6113 17e1 ១ KHMER DIGIT ONE
6114 17e2 ២ KHMER DIGIT TWO
6115 17e3 ៣ KHMER DIGIT THREE
6116 17e4 ៤ KHMER DIGIT FOUR
6117 17e5 ៥ KHMER DIGIT FIVE
6118 17e6 ៦ KHMER DIGIT SIX
6119 17e7 ៧ KHMER DIGIT SEVEN
6120 17e8 ៨ KHMER DIGIT EIGHT
6121 17e9 ៩ KHMER DIGIT NINE
6160 1810 á  MONGOLIAN DIGIT ZERO
6161 1811 á ‘ MONGOLIAN DIGIT ONE
6162 1812 á ’ MONGOLIAN DIGIT TWO
6163 1813 á “ MONGOLIAN DIGIT THREE
6164 1814 á ” MONGOLIAN DIGIT FOUR
6165 1815 á • MONGOLIAN DIGIT FIVE
6166 1816 á – MONGOLIAN DIGIT SIX
6167 1817 á — MONGOLIAN DIGIT SEVEN
6168 1818 á ˜ MONGOLIAN DIGIT EIGHT
6169 1819 á ™ MONGOLIAN DIGIT NINE
6470 1946 ᥆ LIMBU DIGIT ZERO
6471 1947 ᥇ LIMBU DIGIT ONE
6472 1948 ᥈ LIMBU DIGIT TWO
6473 1949 ᥉ LIMBU DIGIT THREE
6474 194a ᥊ LIMBU DIGIT FOUR
6475 194b ᥋ LIMBU DIGIT FIVE
6476 194c ᥌ LIMBU DIGIT SIX
6477 194d ᥠLIMBU DIGIT SEVEN
6478 194e ᥎ LIMBU DIGIT EIGHT
6479 194f ᥠLIMBU DIGIT NINE
65296 ff10 ï¼ FULLWIDTH DIGIT ZERO
65297 ff11 1 FULLWIDTH DIGIT ONE
65298 ff12 ï¼’ FULLWIDTH DIGIT TWO
65299 ff13 3 FULLWIDTH DIGIT THREE
65300 ff14 ï¼” FULLWIDTH DIGIT FOUR
65301 ff15 5 FULLWIDTH DIGIT FIVE
65302 ff16 ï¼– FULLWIDTH DIGIT SIX
65303 ff17 ï¼— FULLWIDTH DIGIT SEVEN
65304 ff18 8 FULLWIDTH DIGIT EIGHT
65305 ff19 ï¼™ FULLWIDTH DIGIT NINE
66720 104a0 ð’  OSMANYA DIGIT ZERO
66721 104a1 ð’¡ OSMANYA DIGIT ONE
66722 104a2 ð’¢ OSMANYA DIGIT TWO
66723 104a3 ð’£ OSMANYA DIGIT THREE
66724 104a4 ð’¤ OSMANYA DIGIT FOUR
66725 104a5 ð’¥ OSMANYA DIGIT FIVE
66726 104a6 ð’¦ OSMANYA DIGIT SIX
66727 104a7 ð’§ OSMANYA DIGIT SEVEN
66728 104a8 ð’¨ OSMANYA DIGIT EIGHT
66729 104a9 ð’© OSMANYA DIGIT NINE

hp
 
P

Paul Lalli

Peter said:
Dr.Ruud said:
Ian Wilson schreef:

\d matches "0", "1" ... "8" or "9"

Last time I checked, \d matched 268 different characters. Dear
programmer, if you mean [0-9], then write [0-9].

Er. Huh? I realize that \w will match not only 'a'..'z', 'A'..'Z',
'0'..'9', and _, and that all the "international" letters such as á
and Ñ are included as well, depending on locale. But other than the
ten characters Ian implied, what else does \d match?

The digits in all the non-latin scripts. Try:

It absolutely never even occurred to me that other characters would be
considered digits. Like I said, I'm depressingly un-informed about
locales and internationalization. Thanks for the information.

Paul Lalli
 
I

Ian Wilson

Dr.Ruud said:
Ian Wilson schreef:



Last time I checked, \d matched 268 different characters.

Both the above statements are true :)
All 268 are characters, all are digits, few are numeric!
Dear programmer, if you mean [0-9], then write [0-9].

No one has really followed up on this in the context set by the OP.

Assuming that some program writes a decimal checksum to a file and that
checksum contains non-ASCII numerals, would Perl arithmetic do the
right thing?

-----------------------8<-----------------------------
#!/usr/bin/perl
#
use warnings;
use strict;

checksum('foo 1234 bar');
checksum("fie \x{0101} fum");
checksum("baz \x{0661}\x{0662}\x{0663}\x{0664} qux");

sub checksum {
my $text = shift;
if ($text =~ /(\d+)/) {
print "$1 + 1 = ", $1+1, "\n";
} else {
print "no numbers in '$text' \n";
}
}
-----------------------8<-----------------------------
$ perl -v
This is perl, v5.8.0 built for i386-linux-thread-multi

$ perl numbers.pl
1234 + 1 = 1235
Wide character in print at numbers.pl line 15.
no numbers in 'fie Ä fum'
Argument "\x{661}\x{662}..." isn't numeric in addition (+) at numbers.pl
line 13.
Wide character in print at numbers.pl line 13.
١٢٣٤ + 1 = 1

(Actually the last line looked different before I cut & pasted it, it
ended " + 1 = 1")

Why doesn't perl handle any unicode digit named "XXXX DIGIT NINE" as
numerically equivalent to DIGIT NINE?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,905
Latest member
Kristy_Poole

Latest Threads

Top