cut -c18-31,49-51 in perl?

D

Dan Jacobson

$ perldoc -q cut #no help, so asking here.
How do I emulate UNIX's "cut -c18-31,49-51"?
Certainly there is no one-liner possible unless I express the ranges
differently?
 
U

usenet

Dan said:
How do I emulate UNIX's "cut -c18-31,49-51"?
Certainly there is no one-liner possible unless I express the ranges differently?

I suppose you could do something like this:

#!/usr/bin/perl
use strict; use warnings;
# use Mode::Kludge;
my $string = "abcdefghijklmnopqrstuvwxyz";
print join '', (split '', " $string")[3..7, 10..11];
__END__

#same as `echo abcdefghijklmnopqrstuvwxyz |cut -c3-7,10-11`

Cheers!
 
D

Dr.Ruud

Dan Jacobson schreef:
How do I emulate UNIX's "cut -c18-31,49-51"?
Certainly there is no one-liner possible unless I express the ranges
differently?

perldoc perlrun

The options -e -l -n -p.


perldoc perlre

s/^.{17}(.{14}).{17}(.{1,3})/$1$2/
|| s/^.{17}(.{1,14})/$1/;
 
L

Larry

Dan said:
$ perldoc -q cut #no help, so asking here.
How do I emulate UNIX's "cut -c18-31,49-51"?
Certainly there is no one-liner possible unless I express the ranges
differently?

perl -lne 'print substr($_, 17, 24). substr($_, 48 , 3)'

or (showing my math):

perl -lne 'print substr($_, 18 - 1, (31 - 18) + 1). substr($_, 49 - 1
, (51 - 49) + 1)'

If on Windows, use double-quotes instead of single quotes.
 
A

Anno Siegel

Dan Jacobson said:
$ perldoc -q cut #no help, so asking here.
How do I emulate UNIX's "cut -c18-31,49-51"?
Certainly there is no one-liner possible unless I express the ranges
differently?

Here is a way to do it with unpack(). The relation between the cut
specification and the corresponding template for unpack is made explicit.
The code also demonstrates that "cut" with the original specification
and the same data produces the same output.

In a one-off case one would probably create the template '@17a14 @48a3'
once manually (well, as often as it takes to get it right), and make it
a one-liner.

my $template; # for unpack
for ( [ 18, 31], [ 49, 51] ) { # "cut -c" specifications
my ( $first, $last) = @$_; # "cut" counts from 1
my $start = $first - 1;
my $length = $last - $first + 1;
$template .= ' ' if $template;
$template .= '@' . $start . 'a' . $length;
}
print "template: $template\n";

my $str; # collect DATA to feed into cut command
while ( <DATA> ) {
$str .= $_;
print unpack( $template, $_), "\n";
}
print "\n";

open my $cut, '| cut -c18-31,49-51';
print $cut $str;


__DATA__
0000000000111111111122222222223333333333444444444455555555556666666666
0123456789012345678901234567890123456789012345678901234567890123456789

Anno
 
X

Xicheng

or:
who | perl -lne 'print+(split//)[3..6,9..12];'
is the same as:
who | cut -c4-7,10-13

Dan said:
$ perldoc -q cut #no help, so asking here.
How do I emulate UNIX's "cut -c18-31,49-51"?
Certainly there is no one-liner possible unless I express the ranges
differently?

See

perldoc perlrun

and

perldoc perlvar



$ perl -le 'print +(a..z) x 2;' | cut -c18-31,49-51
rstuvwxyzabcdewxy

$ perl -le 'print +(a..z) x 2;' | perl -F'//' -lane '$[ = 1; print
@F[18..31,49..51];'
rstuvwxyzabcdewxy
 
L

Larry

Xicheng said:
or:
who | perl -lne 'print+(split//)[3..6,9..12];'
is the same as:
who | cut -c4-7,10-13

OK, that is definitely the coolest 1 so far. This is the type of thing
clpm was made for! :)
 
X

xhoster

Abigail said:
Dan Jacobson ([email protected]) wrote on MMMMDIX September MCMXCIII in
<URL:.. $ perldoc -q cut #no help, so asking here.
.. How do I emulate UNIX's "cut -c18-31,49-51"?
.. Certainly there is no one-liner possible unless I express the ranges
.. differently?

I'd use 'system' to call 'cut', but that's just me I guess.

It's not just you. I would too, in many situations. (Although in practise
I'd be more likely to use a pipe open, rather than system, to do it).

Xho
 
D

DJ Stunks

Xicheng said:
or:
who | perl -lne 'print+(split//)[3..6,9..12];'
is the same as:
who | cut -c4-7,10-13

could someone explain what print+<list> is doing above? like &&?

and split // splits $_ on characters? I would have thought that
wouldn't have split the line at all, or split on null character or
something...

TIA
-jp
 
J

jgraber

Dan said:
$ perldoc -q cut #no help, so asking here.
How do I emulate UNIX's "cut -c18-31,49-51"?

Larry said:
perl -lne 'print substr($_, 17, 24). substr($_, 48 , 3)'
which also works if typo of 24 is corrected to 14,
and works fine with either . or , between substrs

Dr.Ruud said:
s/^.{17}(.{14}).{17}(.{1,3})/$1$2/
|| s/^.{17}(.{1,14})/$1/;
which doesnt work because extra characters at end of line
arent removed (note z at end)
% perl -le 'print +(a..z)x2'|\
perl -lpe 's/^.{17}(.{14}).{17}(.{1,3})/$1$2/ || s/^.{17}(.{1,14})/$1/;'
rstuvwxyzabcdewxyz

This method can be done with alternation inside the regexp
and fixed for junk at end of line
% perl -le 'print +(a..z) x 2;' |\
perl -lpe 's/^.{17}(.{1,14})(.{17}(.{1,3}))*.*/$1$3/ '
rstuvwxyzabcdewxy

But it still silently fails to match the cut command
on short lines, and I dont know how to fix it correctly for
both short and medium length lines.
% perl -le 'print +(a..c)x2'| cut -c18-31,49-51 # short (blank)

% perl -le 'print +(a..j) x 2;' | cut -c18-31,49-51 # medium
hij

% perl -le 'print +(a..j)x2'|\
perl -lpe 's/^(.{17}(.{1,14})(.{17}(.{1,3})))*.*/$2$4/ ' # fails med

% perl -le 'print +(a..c)x2'|\
perl -lpe 's/^.{17}(.{1,14})(.{17}(.{1,3}))*.*/$1$3/ ' # fails sh
abcabc


$ perl -le 'print +(a..z) x 2;' |\
perl -F'//' -lane '$[ = 1; print
@F[18..31,49..51];'
rstuvwxyzabcdewxy

FWIW, from perldoc perlvar:
As of release 5 of Perl, assignment to "$[" is
treated as a compiler directive, and cannot
influence the behavior of any other file.
Its use is highly discouraged.
and previously suggested using split, with prepended blank.
% perl -le 'print +(a..z) x 2;' |\
perl -lne 'print join "", (split "", " $_")[18..31,49..51]'
rstuvwxyzabcdewxy

These golfing variations also work
perl -lne 'print ((split "", " $_")[18..31,49..51])' # no join
perl -lpe '$_= join "",(split "", " $_")[18..31,49..51]' # no print

We can unshift an irrelevant array element to correct the column count,
instead of using the deprecated directive $[.

% perl -le 'print +(a..z) x 2;' |\
perl -F'//' -lane 'unshift @F,0;print @F[18..31,49..51];'
rstuvwxyzabcdewxy

(e-mail address removed)-berlin.de (Anno Siegel) suggested unpack
with template of '@17a14 @48a3' which has same args
as method substr($_, 17,14),substr($_,48,3)'
% perl -le 'print +(a..z) x 2;' |\
perl -lne 'print @x=unpack q(@17a14@48a3),$_'
rstuvwxyzabcdewxy

But it fails noisily (instead of quietly)
to match the cut command on short lines.
% perl -le 'print +(a..c) x 2;' |\
perl -lne 'print @x=unpack q(@17a14@48a3),$_'
@ outside of string at -e line 1, <> line 1.

I tried removing strings instead of keeping them,
but it has argument conversion, and fails on short lines.
% perl -le 'print +(a..z) x 2;' |\
perl -lpe 'substr($_,51)="";substr($_,31,-3)="";substr($_,0,17)=""';
rstuvwxyzabcdewxy

**********
To summarize, these two split [slice] versions
have the most similar commandlines to 'cut -c18-31,49-51'
for productivity of one-liner-ing, and linelength compatibility.

perl -lne 'print ((split "", " $_")[18..31,49..51])'
perl -F'//' -lane 'unshift @F,0;print @F[18..31,49..51];'

Dan Jacobson's original comment:
Certainly there is no one-liner possible unless I express the ranges
differently?

The above solutions are one-liner possiblities for sufficiently loose
definitions of un-different range expressions. s/-/../g;

For exactly the same range expression 18-31,49-51
this might work, but I cant quite get it under one line of 80 chars.
Eval on random user inputs are also unsafe, so dont allow that,
but it was a good learning experience.

% perl -e 'print +(a..z) x 2,"\n",+(A..Z) x 2,"\n";' |\
perl -lne 'BEGIN{($c=shift)=~s/-/../g;eval q:sub p{print((split""," $_")[:.$c."])}"}p' 18-31,49-51
rstuvwxyzabcdewxy
RSTUVWXYZABCDEWXY
 
D

Dr.Ruud

(e-mail address removed) schreef:
Dr.Ruud:
s/^.{17}(.{14}).{17}(.{1,3})/$1$2/
|| s/^.{17}(.{1,14})/$1/;

doesn[']t work because extra characters at end of line
aren[']t removed

Right, it needs some ".*".

s/.{17}(.{14}).{17}(.{1,3}).*/$1$2/
|| s/.{17}(.{1,14}).*/$1/
|| s/.*//;

which can be condensed to

s/.{0,17}(.{0,14}).{0,17}(.{0,3}).*/$1$2/
 
L

Larry

Abigail said:
Dan Jacobson ([email protected]) wrote on MMMMDIX September MCMXCIII in
<URL:.. $ perldoc -q cut #no help, so asking here.
.. How do I emulate UNIX's "cut -c18-31,49-51"?
.. Certainly there is no one-liner possible unless I express the ranges
.. differently?


I'd use 'system' to call 'cut', but that's just me I guess.

Well, I would hope you would do so only in very limited situations,
such as 1-time use with small input. Using 'system' to call 'cut' in a
situation where you've got megs of input and an interactive user
waiting would be pretty bad. It forks a process for every line of
input vs. doing everything in Perl itself. I've seen 100-fold
performance improvements by fixing that type of thing.
 
A

Anno Siegel

Larry said:
Well, I would hope you would do so only in very limited situations,
such as 1-time use with small input. Using 'system' to call 'cut' in a
situation where you've got megs of input and an interactive user
waiting would be pretty bad. It forks a process for every line of
input vs. doing everything in Perl itself.

Nonsense. There are many ways to send it multiline input.
I've seen 100-fold
performance improvements by fixing that type of thing.

Only if you do something wrong in the first place. If speed is an
issue at all, a specialized binary will usually beat Perl hands down.

Anno
 
L

Larry

Anno said:
Nonsense. There are many ways to send it multiline input.

So it's better to to research little-known options to "cut" to process
multi-line input, so that Perl can fork off a "cut" process to do
things that can be done easily in Perl itself?
Only if you do something wrong in the first place.

What had been done wrong (and not by me, by the way) was calling an
external Unix utility inside a loop. The increase was achieved by
translating the functionality into Perl to avoid calling the utility.
If speed is an
issue at all, a specialized binary will usually beat Perl hands down.

So instead of fixing 1 line perfectly good Perl script, so that it uses
a Perl construct instead of calling out to an external Unix utility, I
should rewrite the whole thing in C? Just because I don't want an
external utility forked off inside a loop, that means "speed is an
issue" and so I shouldn't even be using Perl? That makes no sense.

My point is that you can get the Perl script to run a lot faster by
avoiding calling external utilities like "cut" and instead using the
language features provided by Perl itself, especially inside a loop.
 
U

usenet

Xicheng said:
who | perl -lne 'print+(split//)[3..6,9..12];'
is the same as:
who | cut -c4-7,10-13

I've never seen the "+" operator used that way [as in
"print+(split//)"]. Can someone explain what is happening here?
 
L

Larry

Xicheng said:
who | perl -lne 'print+(split//)[3..6,9..12];'
is the same as:
who | cut -c4-7,10-13

I've never seen the "+" operator used that way [as in
"print+(split//)"]. Can someone explain what is happening here?

It's a unary plus operator. It has no semantic effect ... it's only
purpose is to change the way Perl parses the expression. If you left
out the +, Perl would interpret the parentheses as delimiting all the
parameters for "print", so the array indirection would happen only to
the return value of "print". By putting in the "+" you cause the
parenthesised expression to be interpreted as just the beginning of the
arguments to "print".

You can always achieve the same effect by adding an extra set of
parentheses. Here it would be: print((split//)[..6,9..12]);
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,780
Messages
2,569,609
Members
45,253
Latest member
BlytheFant

Latest Threads

Top