Advice please on stripping non-printing characters

J

J Taylor

Hi!

I've seen various examples of stripping non-printing (and non-ascii)
chars from a string with something like this:

$string =~ s/[\000-\037]/ /g;

I've chosen to strip all but the printable ascii chars (using hex):

$string =~ s/[^\x20-\x7f]/ /g;

This *seems* to work better for my purposes, I'm wondering though if are
gotchas I'm missing.

Thanks.
 
A

Anno Siegel

J Taylor said:
Hi!

I've seen various examples of stripping non-printing (and non-ascii)
chars from a string with something like this:

$string =~ s/[\000-\037]/ /g;

If you're going to use explicit character codes, you might as well use

$string =~ tr/\0-\037/ /; # untested
I've chosen to strip all but the printable ascii chars (using hex):

$string =~ s/[^\x20-\x7f]/ /g;

$string =~ tr/\x20-\x7f/ /c;
This *seems* to work better for my purposes, I'm wondering though if are
gotchas I'm missing.

What gotchas? The difference is obvious, and if that's what you want
"nonprinting character" to mean, who's going to stop you?

The approach has the disadvantage that it is hard to adapt to non-ascii
alphabets. POSIX character classes are locale-sensitive, so

$string =~ s/[[:print:]]/ /g;

has better chances. There is also a Unicode equivalent (see perlre).

Anno
 
I

Ilmari Karonen

J Taylor said:
I've chosen to strip all but the printable ascii chars (using hex):

$string =~ s/[^\x20-\x7f]/ /g;

This *seems* to work better for my purposes, I'm wondering though if are
gotchas I'm missing.

"\x7f" is not printable.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,534
Members
45,007
Latest member
obedient dusk

Latest Threads

Top