How get all digits, letters and punctuation characters in perl?

Discussion in 'Perl Misc' started by Peng Yu, Dec 4, 2012.

  1. Peng Yu

    Peng Yu Guest

    Hi,

    In python, I can do the following to get a category of characters. But
    I don't find a corresponding thing in perl. Could anybody let me know
    if there is one? Thanks!

    ~/linux/test/python/man/library/string/printable$ cat main.py
    #!/usr/bin/env python

    import string
    print string.digits + string.letters + string.punctuation
    print string.printable



    Regards,
    Peng
     
    Peng Yu, Dec 4, 2012
    #1
    1. Advertising

  2. Peng Yu <> wrote:
    >Hi,
    >
    >In python, I can do the following to get a category of characters. But
    >I don't find a corresponding thing in perl. Could anybody let me know
    >if there is one? Thanks!
    >
    >~/linux/test/python/man/library/string/printable$ cat main.py
    >#!/usr/bin/env python
    >
    >import string
    >print string.digits + string.letters + string.punctuation
    >print string.printable


    I am smelling an x-y problem here. Why do think you need this set of
    characters?

    If you just want to test if a certain character belongs to a specific
    class then you can use POSIX character classes in an RE, e.g.
    m/[[:alpha:]]/;
    I suppose you could also use this test to enumerate all characters of
    this class although this does seem to be somewhat backwards indeed.

    jue
     
    Jürgen Exner, Dec 4, 2012
    #2
    1. Advertising

  3. Peng Yu <> writes:

    > In python, I can do the following to get a category of characters. But
    > I don't find a corresponding thing in perl. Could anybody let me know
    > if there is one? Thanks!


    It depends on your definitions.

    The Unicode standard defines 9293 letters, 350 digits, and 582
    punctuation characters - and this is just the Basic Multilingual Plane.

    Tom Christiansen has made a tool called `unichars` to list characters
    matching a number of conditions (availaable in the Unicode::Tussle
    distribution). His code basically just iterates over all relevant
    codepoints excluding a number of special cases:

    for my $codepoint ( $first_codepoint .. $last_codepoint ) {

    # gaggy UTF-16 surrogates are invalid UTF-8 code points
    next if $codepoint >= 0xD800 && $codepoint <= 0xDFFF;

    # from utf8.c in perl src; must avoid fatals in 5.10
    next if $codepoint >= 0xFDD0 && $codepoint <= 0xFDEF;

    next if 0xFFFE == ($codepoint & 0xFFFE); # both FFFE and FFFF

    # debug("testing codepoint $codepoint");

    # see "Unicode non-character %s is illegal for interchange" in
    perldiag(1)
    $_ = do { no warnings "utf8"; chr($codepoint) };

    # fixes "the Unicode bug"
    unless (utf8::is_utf8($_)) {
    $_ = decode("iso-8859-1", $_);
    }

    # Test the given conditions, e.g. /\p{Digit}/
    }

    But given your python example, this is probably way overkill for what
    you are trying.

    //Makholm
     
    Peter Makholm, Dec 4, 2012
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Merrigan
    Replies:
    4
    Views:
    575
    Chris
    Dec 14, 2007
  2. rote
    Replies:
    2
    Views:
    7,398
    Mark Fitzpatrick
    Jan 23, 2008
  3. Shiao
    Replies:
    4
    Views:
    1,275
    jhermann
    Nov 19, 2008
  4. Beznas
    Replies:
    8
    Views:
    196
    Evertjan.
    Sep 10, 2003
  5. Venugopal
    Replies:
    11
    Views:
    1,534
    Tassilo v. Parseval
    Nov 5, 2003
Loading...

Share This Page