Advice please on stripping non-printing characters

Discussion in 'Perl Misc' started by J Taylor, Jan 6, 2005.

  1. J Taylor

    J Taylor Guest

    Hi!

    I've seen various examples of stripping non-printing (and non-ascii)
    chars from a string with something like this:

    $string =~ s/[\000-\037]/ /g;

    I've chosen to strip all but the printable ascii chars (using hex):

    $string =~ s/[^\x20-\x7f]/ /g;

    This *seems* to work better for my purposes, I'm wondering though if are
    gotchas I'm missing.

    Thanks.
     
    J Taylor, Jan 6, 2005
    #1
    1. Advertising

  2. J Taylor

    Anno Siegel Guest

    J Taylor <> wrote in comp.lang.perl.misc:
    > Hi!
    >
    > I've seen various examples of stripping non-printing (and non-ascii)
    > chars from a string with something like this:
    >
    > $string =~ s/[\000-\037]/ /g;


    If you're going to use explicit character codes, you might as well use

    $string =~ tr/\0-\037/ /; # untested

    > I've chosen to strip all but the printable ascii chars (using hex):
    >
    > $string =~ s/[^\x20-\x7f]/ /g;


    $string =~ tr/\x20-\x7f/ /c;

    > This *seems* to work better for my purposes, I'm wondering though if are
    > gotchas I'm missing.


    What gotchas? The difference is obvious, and if that's what you want
    "nonprinting character" to mean, who's going to stop you?

    The approach has the disadvantage that it is hard to adapt to non-ascii
    alphabets. POSIX character classes are locale-sensitive, so

    $string =~ s/[[:print:]]/ /g;

    has better chances. There is also a Unicode equivalent (see perlre).

    Anno
     
    Anno Siegel, Jan 6, 2005
    #2
    1. Advertising

  3. J Taylor <> kirjoitti 06.01.2005:
    >
    > I've chosen to strip all but the printable ascii chars (using hex):
    >
    > $string =~ s/[^\x20-\x7f]/ /g;
    >
    > This *seems* to work better for my purposes, I'm wondering though if are
    > gotchas I'm missing.


    "\x7f" is not printable.

    --
    Ilmari Karonen
    To reply by e-mail, please replace ".invalid" with ".net" in address.
     
    Ilmari Karonen, Jan 15, 2005
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?THU=?=
    Replies:
    4
    Views:
    1,501
    Joerg Jooss
    Sep 2, 2005
  2. VJ

    Stripping characters...

    VJ, Jan 17, 2006, in forum: ASP .Net
    Replies:
    1
    Views:
    465
    John Timney \( MVP \)
    Jan 17, 2006
  3. et
    Replies:
    3
    Views:
    750
  4. Replies:
    1
    Views:
    470
    Hans Kesting
    Jun 1, 2006
  5. Paulers
    Replies:
    7
    Views:
    2,007
    Roedy Green
    Jun 29, 2008
Loading...

Share This Page