How get all digits, letters and punctuation characters in perl?

Peng Yu · Dec 3, 2012

Hi,

In python, I can do the following to get a category of characters. But
I don't find a corresponding thing in perl. Could anybody let me know
if there is one? Thanks!

~/linux/test/python/man/library/string/printable$ cat main.py
#!/usr/bin/env python

import string
print string.digits + string.letters + string.punctuation
print string.printable

Regards,
Peng

Jürgen Exner · Dec 3, 2012

Peng Yu said:
Hi,

In python, I can do the following to get a category of characters. But
I don't find a corresponding thing in perl. Could anybody let me know
if there is one? Thanks!

~/linux/test/python/man/library/string/printable$ cat main.py
#!/usr/bin/env python

import string
print string.digits + string.letters + string.punctuation
print string.printable

I am smelling an x-y problem here. Why do think you need this set of
characters?

If you just want to test if a certain character belongs to a specific
class then you can use POSIX character classes in an RE, e.g.
m/[[:alpha:]]/;
I suppose you could also use this test to enumerate all characters of
this class although this does seem to be somewhat backwards indeed.

jue

Peter Makholm · Dec 4, 2012

Peng Yu said:
In python, I can do the following to get a category of characters. But
I don't find a corresponding thing in perl. Could anybody let me know
if there is one? Thanks!

It depends on your definitions.

The Unicode standard defines 9293 letters, 350 digits, and 582
punctuation characters - and this is just the Basic Multilingual Plane.

Tom Christiansen has made a tool called `unichars` to list characters
matching a number of conditions (availaable in the Unicode::Tussle
distribution). His code basically just iterates over all relevant
codepoints excluding a number of special cases:

for my $codepoint ( $first_codepoint .. $last_codepoint ) {

# gaggy UTF-16 surrogates are invalid UTF-8 code points
next if $codepoint >= 0xD800 && $codepoint <= 0xDFFF;

# from utf8.c in perl src; must avoid fatals in 5.10
next if $codepoint >= 0xFDD0 && $codepoint <= 0xFDEF;

next if 0xFFFE == ($codepoint & 0xFFFE); # both FFFE and FFFF

# debug("testing codepoint $codepoint");

# see "Unicode non-character %s is illegal for interchange" in
perldiag(1)
$_ = do { no warnings "utf8"; chr($codepoint) };

# fixes "the Unicode bug"
unless (utf8::is_utf8($_)) {
$_ = decode("iso-8859-1", $_);
}

# Test the given conditions, e.g. /\p{Digit}/
}

But given your python example, this is probably way overkill for what
you are trying.

//Makholm

matching '?' in a string ending with digits	15	Feb 26, 2011
terminating fifo in perl	0	Dec 6, 2012
How to print a number as if in the python interpreter?	1	Jul 6, 2012
Why queue.empty() returns False even after put() is called?	2	Nov 23, 2012
How to capture all the environment variables from shell?	0	Jul 26, 2010
How to debug a regex with (?DEFINE)?	0	Aug 7, 2012
How to print python commands automatically?	5	Nov 8, 2012
TypeError: not all arguments converted during string formatting	2	Dec 13, 2013

How get all digits, letters and punctuation characters in perl?

Peng Yu

Jürgen Exner

Peter Makholm

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads