"un-meta" the control characters

P

Paul Lalli

A coworker just presented me with this task. I came up with two
solutions, but I don't like either of them. He has a text document
and wants to scan it for characters such as newline, tab, form feed,
carriage return, vertical tab. If found, he wants to replace them
with their typical representation (ie, \n, \t, \f, \r, \v).

I first gave him the obvious:
$string =~ s/\n/\\n/;
$string =~ s/\t/\\t/;
$string =~ s/\f/\\f/;
$string =~ s/\r/\\r/;
$string =~ s/\v/\\v/;

which I don't like because of how much copy/paste is involved. Then I
came up with:

for (qw/n t f r v/) {
my $meta = eval("\\$_");
$string =~ s/$meta/\\$_/;
}

which I don't like, because the comment he'd have to put in the code
to explain it would be longer than the code itself, or the first
version.

So can anyone think of a better way? Is there any kind of intrinsic
link between a newline character and the letter 'n' that could be used
to go "backwards" here?

Thanks,
Paul Lalli
 
U

Uri Guttman

PL> A coworker just presented me with this task. I came up with two
PL> solutions, but I don't like either of them. He has a text document
PL> and wants to scan it for characters such as newline, tab, form feed,
PL> carriage return, vertical tab. If found, he wants to replace them
PL> with their typical representation (ie, \n, \t, \f, \r, \v).

PL> I first gave him the obvious:
PL> $string =~ s/\n/\\n/;
PL> $string =~ s/\t/\\t/;
PL> $string =~ s/\f/\\f/;
PL> $string =~ s/\r/\\r/;
PL> $string =~ s/\v/\\v/;

PL> which I don't like because of how much copy/paste is involved. Then I
PL> came up with:

use a hash table for the conversion:

my %controls = (
"\n" => '\\n',
"\t" => '\\t',
"\r" => '\\r',
"\f" => '\\f',
"\v" => '\\v',
) ;

$string =~ s/([\n\t\r\f\v])/$controls{$1}/g;

and if you want to get anal about dups of the chars do this:

my @controls = qw( n t r f v ) ;
my %control_to_escape = map { eval( "\\$_" ) => "\\$_" } @controls ;

my $controls_re = '[' . join( '', map "\\$_", @controls ) . ']' ;

$string =~ s/($controls_re)/$controls_to_escape{$1}/g;

see ma! only one use of the actual control letters!

uri
 
R

Randal L. Schwartz

PL> A coworker just presented me with this task. I came up with two
PL> solutions, but I don't like either of them. He has a text document
PL> and wants to scan it for characters such as newline, tab, form feed,
PL> carriage return, vertical tab. If found, he wants to replace them
PL> with their typical representation (ie, \n, \t, \f, \r, \v).

PL> I first gave him the obvious:
PL> $string =~ s/\n/\\n/;
PL> $string =~ s/\t/\\t/;
PL> $string =~ s/\f/\\f/;
PL> $string =~ s/\r/\\r/;
PL> $string =~ s/\v/\\v/;

PL> which I don't like because of how much copy/paste is involved. Then I
PL> came up with:

Uri> use a hash table for the conversion:

Uri> my %controls = (
Uri> "\n" => '\\n',
Uri> "\t" => '\\t',
Uri> "\r" => '\\r',
Uri> "\f" => '\\f',
Uri> "\v" => '\\v',
Uri> ) ;

Just to scare people:

my %controls = (
"\n" => '\n',
"\t" => '\t',
"\r" => '\r',
"\f" => '\f',
"\v" => '\v',
);

Ok, that's downright evil. :)

print "Just another Perl hacker,"; # the original
 
S

sln

A coworker just presented me with this task. I came up with two
solutions, but I don't like either of them. He has a text document
and wants to scan it for characters such as newline, tab, form feed,
carriage return, vertical tab. If found, he wants to replace them
with their typical representation (ie, \n, \t, \f, \r, \v).

I first gave him the obvious:
$string =~ s/\n/\\n/;
$string =~ s/\t/\\t/;
$string =~ s/\f/\\f/;
$string =~ s/\r/\\r/;
$string =~ s/\v/\\v/;

which I don't like because of how much copy/paste is involved. Then I
came up with:

for (qw/n t f r v/) {
my $meta = eval("\\$_");
$string =~ s/$meta/\\$_/;
}

which I don't like, because the comment he'd have to put in the code
to explain it would be longer than the code itself, or the first
version.

So can anyone think of a better way? Is there any kind of intrinsic
link between a newline character and the letter 'n' that could be used
to go "backwards" here?

Yet another way..

use strict;
use warnings;

my %translation = (
'\n'=>"\n",
'\t'=>"\t",
'\f'=>"\f",
'\r'=>"\r",
# ,'\v'=>"\v" - no 'v' for 'm'e, vt?
);

my $sample = "line 1\tsome\nline 2\t\t\f\n\rline 3\n";

while (my ($literal,$actual) = each %translation) {
$sample =~ s/$actual/$literal/eg;
}

print $sample;

__END__

-sln
 
J

John W. Krahn

Paul said:
A coworker just presented me with this task. I came up with two
solutions, but I don't like either of them. He has a text document
and wants to scan it for characters such as newline, tab, form feed,
carriage return, vertical tab. If found, he wants to replace them
with their typical representation (ie, \n, \t, \f, \r, \v).

I first gave him the obvious:
$string =~ s/\n/\\n/;
$string =~ s/\t/\\t/;
$string =~ s/\f/\\f/;
$string =~ s/\r/\\r/;
$string =~ s/\v/\\v/;

Perl doesn't have a "\v" character:

$string =~ s/\cK/\\v/;

Or:

$string =~ s/\13/\\v/;

Or:

$string =~ s/\xB/\\v/;




John
 
C

C.DeRykus

A coworker just presented me with this task.  I came up with two
solutions, but I don't like either of them.  He has a text document
and wants to scan it for characters such as newline, tab, form feed,
carriage return, vertical tab.  If found, he wants to replace them
with their typical representation (ie, \n, \t, \f, \r, \v).

I first gave him the obvious:
$string =~ s/\n/\\n/;
$string =~ s/\t/\\t/;
$string =~ s/\f/\\f/;
$string =~ s/\r/\\r/;
$string =~ s/\v/\\v/;

which I don't like because of how much copy/paste is involved.  Then I
came up with:

for (qw/n t f r v/) {
   my $meta = eval("\\$_");
   $string =~ s/$meta/\\$_/;

}
...


Did that work? I don't understand why the eval is needed
at all:

my $string = "1\n 2\t 3\f 4\r 5\cK";
for (qw/n t f r cK/) {
my $meta = "\\$_";
$string =~ s/$meta/\\$_/;
}
print $string; # 1\n 2\t 3\f 4\r 5\cK
 
R

Randal L. Schwartz

Ben> For extra added evil:

Ben> my $bs = "\\";
Ben> $string =~ s/$bs$_/$bs$_/g for qw/n r t f/;

And I thought *I* was being bad.
 
C

C.DeRykus

Quoth "C.DeRykus" <[email protected]>:






That's... evil. It relies on the fact that regexes undergo two separate
expansion phases, and requires that variable expansion happens in the
first phase but other qqish escapes are expanded in the second. I'm not
entirely convinced that's documented behaviour: anyone care to dig out
perlre and prove it one way or the other?

For extra added evil:

    my $bs = "\\";
    $string =~ s/$bs$_/$bs$_/g for qw/n r t f/;

Perl magic is evil? Say it ain't so :)

I didn't spot a full explanation in perlre but I see perlop
steps through the compilation in "gory details of parsing
quoted constructs" and ends with what happens at runtime
in "parsing regular expressions".

This closely mirrors Chapter 7's section - Perl Regular
Expressions in J.Friedl's "Mastering Regular Expressions"
1st ed.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top