perl to python

  • Thread starter Olivier Scalbert
  • Start date
D

Duncan Booth

Duncan Booth said:
import sys
for line in sys.stdin:
line = line[:-1].split('\t')
print "%s %s %s %s" % (line[3], line[2], line[1], line[0])
While I agree with you that using the appropriate tool is preferred
over using Python for everything, I don't really see much to choose
between the Python and awk versions here.

1) Python throws an error if you have less than three fields,
requiring more typing to get the same effect.

I would rather have the error when the input isn't formatted as expected.
The alternative would be incorrect output.

If you really want to suppress the error then:

line = (line[:-1]+'\t'*3).split('\t')
2) Python generators on stdin behave strangely. For one thing,
they're not properly line buffered, so you don't get any lines until
eof. But then, eof is handled wrongly, and the loop doesn't exit.

True, if you are trying to reformat interactive input. I had assumed that
the use case here was redirecting input from a file, and in that case the
EOF problem isn't an issue. Buffering may or may not be a problem.
3) There is no efficient RS equivalent, in case you need to read
paragraphs.

In that case I would write a generator to group the lines. Longer than RS,
but also more flexible.
 
A

Andrew Dalke

Carl Banks wrote one, convoluted so it can be on the command line.
Kirk Job-Sluder replied
This looks like using the proverbial hammer to drive the screw.

But you asked use to use the hammer to drive in the screw. In real
life I have more tools to use. For this case I would use Perl or awk.

Here's one for you. I had several mailbox files arranged like
Inbox.mbox/mbox
Send.mbox/mbox
OBF&BOSC.mbox/mbox
Work Email.mbox/mbox

I wanted to raise the "*/mbox" files one directory so that
Inbox.mbox/mbox --becomes--> Inbox.mbox

My solution was to use the interactive Python shell. Something
like (untested)

import glob, os
filenames = glob.glob("*.mbox")
for name in filenames:
os.rename(name + "/mbox", "."+name)
os.rmdir(name)
os.rename("."+name, name+".mbox")

Trying doing that sanely with any programming language expressed
all on the command-line. No credit if you can't handle the '&' and space.

Andrew
(e-mail address removed)
 
A

Andrew Dalke

Scott Schwartz:
1) Python throws an error if you have less than three fields,
requiring more typing to get the same effect.

The spec didn't say how to handle improperly formatted data.
Suppose the code is supposed to complain and stop at that
point - how much code would you need for awk to do the
extra check?
2) Python generators on stdin behave strangely. For one thing,
they're not properly line buffered, so you don't get any lines until
eof. But then, eof is handled wrongly, and the loop doesn't exit.

There's a command-line flag to make stdin/stdout be unbuffered.
Try your test again with 'python -u'.
3) There is no efficient RS equivalent, in case you need to read
paragraphs.

Again, not part of the spec. ;)

Andrew
(e-mail address removed)
 
A

Andrew Bennetts

]
Trying doing that sanely with any programming language expressed
all on the command-line. No credit if you can't handle the '&' and space.

I'm almost positive you can do that entirely with bash[1], actually. I
don't have time to prove it right now, though... but you ought to be able to
use features like ${paramater%word} expansions, e.g.:

$ x='something.mbox/mbox'
$ echo ${x%/mbox}
something.mbox

Obviously you'd need to be careful about quoting things like & and space,
but that doesn't seem too hard.

-Andrew.

[1] And standard file utilties like mv and rmdir, obviously...
 
K

Kirk Job-Sluder

Carl Banks wrote one, convoluted so it can be on the command line.
Kirk Job-Sluder replied

But you asked use to use the hammer to drive in the screw. In real
life I have more tools to use. For this case I would use Perl or awk.

Bing, exactly the point.
My solution was to use the interactive Python shell. Something
like (untested)

Certainly, python is the best solution for many problems.
Trying doing that sanely with any programming language expressed
all on the command-line. No credit if you can't handle the '&' and space.

Missing the point. The point was not that everything should be done
using awk, sed or perl one-liners. The point was that there awk, sed,
or perl one-liners are useful for a subset of tasks where the
explicitness of python gets in the way.
 
A

Andrew Bennetts

Here's one for you. I had several mailbox files arranged like
Inbox.mbox/mbox
Send.mbox/mbox
OBF&BOSC.mbox/mbox
Work Email.mbox/mbox

I wanted to raise the "*/mbox" files one directory so that
Inbox.mbox/mbox --becomes--> Inbox.mbox
[...]

Trying doing that sanely with any programming language expressed
all on the command-line. No credit if you can't handle the '&' and space.

You can do it in one-line with sh:

for d in *.mbox ; do mv "${d}/mbox" ".${d}" ; rmdir "${d}" ; mv ".${d}" "${d}" ; done

Or more readably:

for d in *.mbox ; do
mv "${d}/mbox" ".${d}"
rmdir "${d}"
mv ".${d}" "${d}"
done

That doesn't look particularly insane to me. In fact, it looks quite like
the Python version...

-Andrew.
 
C

Corey Coughlin

I do a lot of unix hacking type stuff here at work, and I understand
the wish to use the right tool for the right job and all that, and
awk, sed and perl can let you do all that quick command line stuff.
My problem with that approach is that after I do some knock-off hack
for something, the boss will come by and say, "That's nice, but can
you do it with this tweak?" or something, and soon it has snowballed
into a full blown script. So why bother with the one-off hacks, when
I can just write a function or something, put it in my utility object,
and it's done? I can use it again if I need to, use it in other
scripts, call it from the python interpreter, whatever I need. And if
(or when) I have to expand it for a more complicated purpose, nothing
could be easier. If I stick to the traditional unix approach, I'd
probably have tools piping into tools piping into shell scripts piping
into whatever to get things done, and I'll have to check the man page
to make sure the non-standard options for the rarely used tool work,
it just winds up being a pain.

Here's an example from today, for instance. There's a scratch area we
have here where files that are more than a month old are summarily
deleted. I don't often use the area for larger projects, but I had to
for my last one, which is now getting to be around a month old. I
still want to keep the files, so I wrote a little python script to
start from an initial directory, check the date on all the files, and
update them if necessary. Now sure, I could have done this with some
combination of 'find', 'touch' and whatever, but then again, I have
all the function blocks I need in my python utility library, so I just
import that and it's a snap. So I mention it to one of my co-workers,
and he now wants to use it for one of his projects, except it's using
a make-like system that will regenerate files if the dates on them get
changed relative to each other, so he wants it to take the times and
readjust them by an amount relative to the dates on each. That's just
the kind of thing (heavy conditionals) that little unix chain commands
as solution get really ugly with, but with my script, it shouldn't be
a problem. In fact, he's never written a script to do this because he
typically relies on the unix tool approach, and he figured it would be
too much trouble. But now, he'll have a solution, I'll still have
mine, and if needs change in the future, it's easy to evolve. Seems
like the right way to go to me.
 
S

Scott Schwartz

Andrew Dalke said:
There's a command-line flag to make stdin/stdout be unbuffered.
Try your test again with 'python -u'.

No effect, because that's not the problem. The generator is reading
ahead and doing it's own buffering, in addition to whatever the stream
is doing (or not doing). Hence the bug.
 
N

Nicolas Fleury

Michael said:
Jarek's answer is the correct one, for almost any real situation.

I disagree, I think actually Perl is a better answer. Using grep, sed,
tr, etc. on Windows is error-prone anyway, because of the shell
difference. Perl is very complete as a parsing solution, personnally
that's what I would prefer, and it's more scalable, since when you want
to do more, you can copy your solution in a script and expand it. Using
a python script makes a lot of sense also, as people have pointed
two-lines solutions.

Regards,
Nicolas
 
A

Andrew Dalke

Scott Schwartz
No effect, because that's not the problem.

Ahh, you're right. I keyword matched "buffer" and thought you meant
the system buffer and not Python's.

Andrew
(e-mail address removed)
 
K

Kirk Job-Sluder

Why should your employer pay for the time for all of its employees to
learn all of those other tools, when Python will do the job?

As was brought up earlier. Creating custom python scripts to reinvent
the wheel is also loaded with training costs. I would argue the
following:

1: Using an existing idiomatic tool takes advantage of standardization.
"perl -pi -e" exhibits the same behavior on multiple platforms and even
different versions of perl. It is an idiomatic expression that is very
well documented, and for which 100s of examples through a quick google
search.

2: Memorizing two lines is more difficult than memorizing one line.

3: The alternative proposal, writing a site-specific module is even
worse when it comes to training requirements.

Or to turn this around, why should anybody learn to use tar, find, less,
rsync or diff when you can do the same thing in python?
 
J

Josh Gilbert

It's in the python man page:

-u Force stdin, stdout and stderr to be totally unbuffered.
On systems where it matters, also put stdin, stdout and stderr in binary
mode. Note that there is internal buffering in xread-lines(),
readlines() and file-object iterators ("for line in sys.stdin") which
is not influenced by this option. To work around this, you will want to
use "sys.stdin.readline()" inside a "while 1:" loop.

When I use sys.stdin.readline I don't need to use to -u flag.

You can also use raw_input() to get data from stdin. This also appears to
strip out the newline.


Josh Gilbert.
 
R

Russell Wallace

There is more to computer usage than sysadmin tasks, sed is an ideal
tool for processing large sets of large files (I have to handle small
files that are only 130 Mb in size, and I have around 140,000 of them).

Out of curiosity, what are you doing that generates 140,000 files of
130 megabytes each?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,780
Messages
2,569,611
Members
45,279
Latest member
LaRoseDermaBottle

Latest Threads

Top