Advice please on stripping non-printing characters

Discussion in 'Perl Misc' started by J Taylor, Jan 6, 2005.

  1. J Taylor

    J Taylor Guest

    Hi!

    I've seen various examples of stripping non-printing (and non-ascii)
    chars from a string with something like this:

    $string =~ s/[\000-\037]/ /g;

    I've chosen to strip all but the printable ascii chars (using hex):

    $string =~ s/[^\x20-\x7f]/ /g;

    This *seems* to work better for my purposes, I'm wondering though if are
    gotchas I'm missing.

    Thanks.
     
    J Taylor, Jan 6, 2005
    #1
    1. Advertisements

  2. J Taylor

    Anno Siegel Guest

    J Taylor <> wrote in comp.lang.perl.misc:
    > Hi!
    >
    > I've seen various examples of stripping non-printing (and non-ascii)
    > chars from a string with something like this:
    >
    > $string =~ s/[\000-\037]/ /g;


    If you're going to use explicit character codes, you might as well use

    $string =~ tr/\0-\037/ /; # untested

    > I've chosen to strip all but the printable ascii chars (using hex):
    >
    > $string =~ s/[^\x20-\x7f]/ /g;


    $string =~ tr/\x20-\x7f/ /c;

    > This *seems* to work better for my purposes, I'm wondering though if are
    > gotchas I'm missing.


    What gotchas? The difference is obvious, and if that's what you want
    "nonprinting character" to mean, who's going to stop you?

    The approach has the disadvantage that it is hard to adapt to non-ascii
    alphabets. POSIX character classes are locale-sensitive, so

    $string =~ s/[[:print:]]/ /g;

    has better chances. There is also a Unicode equivalent (see perlre).

    Anno
     
    Anno Siegel, Jan 6, 2005
    #2
    1. Advertisements

  3. J Taylor <> kirjoitti 06.01.2005:
    >
    > I've chosen to strip all but the printable ascii chars (using hex):
    >
    > $string =~ s/[^\x20-\x7f]/ /g;
    >
    > This *seems* to work better for my purposes, I'm wondering though if are
    > gotchas I'm missing.


    "\x7f" is not printable.

    --
    Ilmari Karonen
    To reply by e-mail, please replace ".invalid" with ".net" in address.
     
    Ilmari Karonen, Jan 15, 2005
    #3
    1. Advertisements

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?THU=?=
    Replies:
    4
    Views:
    1,680
    Joerg Jooss
    Sep 2, 2005
  2. VJ

    Stripping characters...

    VJ, Jan 17, 2006, in forum: ASP .Net
    Replies:
    1
    Views:
    550
    John Timney \( MVP \)
    Jan 17, 2006
  3. et
    Replies:
    3
    Views:
    826
  4. Sachin
    Replies:
    3
    Views:
    754
    Roedy Green
    Nov 11, 2005
  5. KK
    Replies:
    2
    Views:
    1,070
    Big Brian
    Oct 14, 2003
  6. =?ISO-8859-1?Q?Marian_Aldenh=F6vel?=

    Printing Filenames with non-Ascii-Characters

    =?ISO-8859-1?Q?Marian_Aldenh=F6vel?=, Feb 1, 2005, in forum: Python
    Replies:
    13
    Views:
    845
    =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=
    Feb 8, 2005
  7. Alex Vinokur

    Printing non-printable characters

    Alex Vinokur, May 18, 2011, in forum: C++
    Replies:
    8
    Views:
    995
    Alf P. Steinbach /Usenet
    May 18, 2011
  8. Peter Jamieson

    Detecting non-printing characters(?)

    Peter Jamieson, Aug 20, 2009, in forum: Perl Misc
    Replies:
    3
    Views:
    241
    Peter Jamieson
    Aug 21, 2009
Loading...