Python dos2unix one liner

@

@ Rocteur CC

Hi,

This morning I am working though Building Skills in Python and was
having problems with string.strip.

Then I found the input file I was using was in DOS format and I
thought it be best to convert it to UNIX and so I started to type perl
-i -pe 's/ and then I though, wait, I'm learning Python, I have to
think in Python, as I'm a Python newbie I fired up Google and typed:

+python convert dos to unix +one +liner

Found perl, sed, awk but no python on the first page

So I tried

+python dos2unix +one +liner -perl

Same thing..

But then I found http://wiki.python.org/moin/Powerful Python One-Liners
and tried this:

cat file.dos | python -c "import sys,re;
[sys.stdout.write(re.compile('\r\n').sub('\n', line)) for line in
sys.stdin]" >file.unix

And it works..

[10:31:11 incc-imac-intel ~/python] cat -vet file.dos
one^M$
two^M$
three^M$
[10:32:10 incc-imac-intel ~/python] cat -vet file.unix
one$
two$
three$

But it is long and just like sed does not do it in place.

Is there a better way in Python or is this kind of thing best done in
Perl ?

Thanks,

Jerry
 
M

Martin P. Hellwig

On 02/27/10 09:36, @ Rocteur CC wrote:
<cut dos2unix oneliners;python vs perl/sed/awk>
Hi a couple of fragmented things popped in my head reading your
question, non of them is very constructive though in what you actually
want, but here it goes anyway.

- Oneline through away script with re as a built in syntax, yup that
sounds like perl to me.

- What is wrong with making an executable script (not being one line)
and call that, this is even shorter.

- ... wait a minute, you are building something in python (problem with
string.strip - why don't you use the built-in string strip method
instead?) which barfs on the input (win/unix line ending), should the
actual solution not be in there, i.e. parsing the line first to check
for line-endings? .. But wait another minute, why are you getting \r\n
in the first place, python by default uses universal new lines?

Hope that helps a bit, maybe you could post the part of the code what
you are doing for some better suggestions.
 
P

Peter Otten

@ Rocteur CC said:
But then I found
http://wiki.python.org/moin/Powerful Python One-Liners
and tried this:

cat file.dos | python -c "import sys,re;
[sys.stdout.write(re.compile('\r\n').sub('\n', line)) for line in
sys.stdin]" >file.unix

And it works..

- Don't build list comprehensions just to throw them away, use a for-loop
instead.

- You can often use string methods instead of regular expressions. In this
case line.replace("\r\n", "\n").
But it is long and just like sed does not do it in place.

Is there a better way in Python or is this kind of thing best done in
Perl ?

open(..., "U") ("universal" mode) converts arbitrary line endings to just
"\n"

$ cat -e file.dos
alpha^M$
beta^M$
gamma^M$

$ python -c'open("file.unix", "wb").writelines(open("file.dos", "U"))'

$ cat -e file.unix
alpha$
beta$
gamma$

But still, if you want very short (and often cryptic) code Perl is hard to
beat. I'd say that Python doesn't even try.

Peter
 
S

Steven D'Aprano

cat file.dos | python -c "import sys,re;
[sys.stdout.write(re.compile('\r\n').sub('\n', line)) for line in
sys.stdin]" >file.unix

Holy cow!!!!!!! Calling a regex just for a straight literal-to-literal
string replacement! You've been infected by too much Perl coding!

*wink*

Regexes are expensive, even in Perl, but more so in Python. When you
don't need the 30 pound sledgehammer of regexes, use lightweight string
methods.

import sys; sys.stdout.write(sys.stdin.read().replace('\r\n', '\n'))

ought to do it. It's not particularly short, but Python doesn't value
extreme brevity -- code golf isn't terribly exciting in Python.

[steve@sylar ~]$ cat -vet file.dos
one^M$
two^M$
three^M$
[steve@sylar ~]$ cat file.dos | python -c "import sys; sys.stdout.write
(sys.stdin.read().replace('\r\n', '\n'))" > file.unix
[steve@sylar ~]$ cat -vet file.unix
one$
two$
three$
[steve@sylar ~]$

Works fine. Unfortunately it still doesn't work in-place, although I
think that's probably a side-effect of the shell, not Python. To do it in
place, I would pass the file name:

# Tested and working in the interactive interpreter.
import sys
filename = sys.argv[1]
text = open(filename, 'rb').read().replace('\r\n', '\n')
open(filename, 'wb').write(text)


Turning that into a one-liner isn't terribly useful or interesting, but
here we go:

python -c "import sys;open(sys.argv[1], 'wb').write(open(sys.argv[1],
'rb').read().replace('\r\n', '\n'))" file

Unfortunately, this does NOT work: I suspect it is because the file gets
opened for writing (and hence emptied) before it gets opened for reading.
Here's another attempt:

python -c "import sys;t=open(sys.argv[1], 'rb').read().replace('\r\n',
'\n');open(sys.argv[1], 'wb').write(t)" file


[steve@sylar ~]$ cp file.dos file.txt
[steve@sylar ~]$ python -c "import sys;t=open(sys.argv[1], 'rb').read
().replace('\r\n', '\n');open(sys.argv[1], 'wb').write(t)" file.txt
[steve@sylar ~]$ cat -vet file.txt
one$
two$
three$
[steve@sylar ~]$


Success!

Of course, none of these one-liners are good practice. The best thing to
use is a dedicated utility, or write a proper script that has proper
error testing.

Is there a better way in Python or is this kind of thing best done in
Perl ?

If by "this kind of thing" you mean text processing, then no, Python is
perfectly capable of doing text processing. Regexes aren't as highly
optimized as in Perl, but they're more than good enough for when you
actually need a regex.

If you mean "code golf" and one-liners, then, yes, this is best done in
Perl :)
 
@

@ Rocteur CC

cat file.dos | python -c "import sys,re;
[sys.stdout.write(re.compile('\r\n').sub('\n', line)) for line in
sys.stdin]" >file.unix

Holy cow!!!!!!! Calling a regex just for a straight literal-to-literal
string replacement! You've been infected by too much Perl coding!

Thanks for the replies I'm looking at them now, however, for those who
misunderstood, the above cat file.dos pipe pythong does not come from
Perl but comes from:

http://wiki.python.org/moin/Powerful Python One-Liners
Apply regular expression to lines from stdin
[another command] | python -c "import sys,re;
[sys.stdout.write(re.compile('PATTERN').sub('SUBSTITUTION', line))
for line in sys.stdin]"


Nothing to do with Perl, Perl only takes a handful of characters to do
this and certainly does not require the creation an intermediate file,
I simply found the above example on wiki.python.org whilst searching
Google for a quick conversion solution.

Thanks again for the replies I've learned a few things and I
appreciate your help.

Jerry
 
S

ssteinerX

Nothing to do with Perl, Perl only takes a handful of characters to do this and certainly does not require the creation an intermediate file

Perl may be better for you for throw-away code. Use Python for the code you want to keep (and read and understand later).

S
 
G

Grant Edwards

Nothing to do with Perl, Perl only takes a handful of characters to do
this and certainly does not require the creation an intermediate file,

Are you sure about that?

Or does it just hide the intermediate file from you the way
that sed -i does?
 
J

John Bokma

Perl may be better for you for throw-away code. Use Python for the
code you want to keep (and read and understand later).

Amusing how long those Python toes can be. In several replies I have
noticed (often clueless) opinions on Perl. When do people learn that a
language is just a tool to do a job?
 
A

Alf P. Steinbach

* @ Rocteur CC:
cat file.dos | python -c "import sys,re;
[sys.stdout.write(re.compile('\r\n').sub('\n', line)) for line in
sys.stdin]" >file.unix

Holy cow!!!!!!! Calling a regex just for a straight literal-to-literal
string replacement! You've been infected by too much Perl coding!

Thanks for the replies I'm looking at them now, however, for those who
misunderstood, the above cat file.dos pipe pythong does not come from
Perl but comes from:

http://wiki.python.org/moin/Powerful Python One-Liners

Steven is right with the "Holy Cow" and multiple exclamation marks.

For those unfamiliar with that, just google "multiple exclamation marks", I
think that should work... ;-)

Not only is a regular expression overkill & inefficient, but the snippet also
needlessly constructs an array with size the number of lines.

Consider instead e.g.

<hack>
import sys; sum(int(bool(sys.stdout.write(line.replace('\r\n','\n')))) for line
in sys.stdin)
</hack>

But better, consider that it's less work to save the code in a file than copying
and pasting it in a command interpreter, and then it doesn't need to be 1 line.


Apply regular expression to lines from stdin
[another command] | python -c "import
sys,re;[sys.stdout.write(re.compile('PATTERN').sub('SUBSTITUTION',
line)) for line in sys.stdin]"


Nothing to do with Perl, Perl only takes a handful of characters to do
this and certainly does not require the creation an intermediate file, I
simply found the above example on wiki.python.org whilst searching
Google for a quick conversion solution.

Thanks again for the replies I've learned a few things and I appreciate
your help.

Cheers,

- Alf
 
S

Steven D'Aprano

cat file.dos | python -c "import sys,re;
[sys.stdout.write(re.compile('\r\n').sub('\n', line)) for line in
sys.stdin]" >file.unix

Holy cow!!!!!!! Calling a regex just for a straight literal-to-literal
string replacement! You've been infected by too much Perl coding!

Thanks for the replies I'm looking at them now, however, for those who
misunderstood, the above cat file.dos pipe pythong does not come from
Perl but comes from:

http://wiki.python.org/moin/Powerful Python One-Liners

Whether it comes from Larry Wall himself, or a Python wiki, using regexes
for a simple string replacement is like using an 80 lb sledgehammer to
crack a peanut.

Apply regular expression to lines from stdin [another command] | python
-c "import sys,re;
[sys.stdout.write(re.compile('PATTERN').sub('SUBSTITUTION', line)) for
line in sys.stdin]"

And if PATTERN is an actual regex, rather than just a simple substring,
that would be worthwhile. But if PATTERN is a literal string, then string
methods are much faster and use much less memory.
Nothing to do with Perl, Perl only takes a handful of characters to do
this

I'm sure it does. If I were interested in code-golf, I'd be impressed.

and certainly does not require the creation an intermediate file,

The solution I gave you doesn't use an intermediate file either.

*slaps head and is enlightened*
Oh, I'm an idiot!

Since you're reading text files, there's no need to call
replace('\r\n','\n'). Since there shouldn't be any bare \r characters in
a DOS-style text file, just use replace('\r', '').

Of course, that's an unsafe assumption in the real world. But for a quick
and dirty one-liner (and all one-liners are quick and dirty), it should
be good enough.
 
S

ssteinerX

Amusing how long those Python toes can be. In several replies I have
noticed (often clueless) opinions on Perl. When do people learn that a
language is just a tool to do a job?

I'm not sure how "use it for what it's good for" has anything to do with toes.

I've written lots of both Python and Perl and sometimes, for one-off's, Perl is quicker; if you know it.

I sure don't want to maintain Perl applications though; even ones I've written.

When all you have is a nail file, everything looks like a toe; that doesn't mean you want to have to maintain it. Or something.

S
 
G

Grant Edwards

cat file.dos | python -c "import sys,re;
[sys.stdout.write(re.compile('\r\n').sub('\n', line)) for line in
sys.stdin]" >file.unix

Holy cow!!!!!!! Calling a regex just for a straight literal-to-literal
string replacement! You've been infected by too much Perl coding!

Thanks for the replies I'm looking at them now, however, for those who
misunderstood, the above cat file.dos pipe pythong does not come from
Perl but comes from:

http://wiki.python.org/moin/Powerful Python One-Liners
Apply regular expression to lines from stdin
[another command] | python -c "import sys,re;
[sys.stdout.write(re.compile('PATTERN').sub('SUBSTITUTION', line))
for line in sys.stdin]"

Nothing to do with Perl, Perl only takes a handful of
characters to do this and certainly does not require the
creation an intermediate file,

In _theory_ you can do a simple string-replace in situ as long
as the replacement string is shorter than the original string.
But I have a hard time believing that Perl actually does it
that. Since I don't speak line-noise, will you please post the
Perl script that you claim does the conversion without creating
an intermediate file?

The only way I can think of to do a general in-situ file
modification is to buffer the entire file's worth of output in
memory and then overwrite the file after all of the processing
has finished. Python can do that too, but it's not generally a
very good approach.
 
J

John Bokma

I'm not sure how "use it for what it's good for" has anything to do
with toes.

I've the feeling that some people who use Python are easily offended by
everthing Perl related. Which is silly; zealotism in general is, for
that matter.
I've written lots of both Python and Perl and sometimes, for
one-off's, Perl is quicker; if you know it.

I sure don't want to maintain Perl applications though; even ones I've
written.

Ouch, I am afraid that that tells a lot about your Perl programming
skills.
 
S

ssteinerx

Ouch, I am afraid that that tells a lot about your Perl programming
skills.

Nah, it tells you about my preferences.

I can, and have, written maintainable things in many languages, including Perl.

However, I *choose* Python.

S
 
A

Aahz

Amusing how long those Python toes can be. In several replies I have
noticed (often clueless) opinions on Perl. When do people learn that a
language is just a tool to do a job?

When do people learn that language makes a difference? I used to be a
Perl programmer; these days, you'd have to triple my not-small salary to
get me to even think about programming in Perl.
--
Aahz ([email protected]) <*> http://www.pythoncraft.com/

"Many customs in this life persist because they ease friction and promote
productivity as a result of universal agreement, and whether they are
precisely the optimal choices is much less important." --Henry Spencer
 
S

Steven D'Aprano

When do people learn that a
language is just a tool to do a job?

When do people learn that there are different sorts of tools? A
professional wouldn't use a screwdriver when they need a hammer.

Perl has strengths: it can be *extremely* concise, regexes are optimized
much more than in Python, and you can do some things as a one-liner short
enough to use from the command line easily. Those are values, as seen by
the millions of people who swear by Perl, but they are not Python's
values.

If you want something which can make fine cuts in metal, you would use a
hacksaw, not a keyhole saw or a crosscut saw. If you want to cut through
an three foot tree truck, you would use a ripsaw or a chainsaw, and not a
hacksaw. If you want concise one-liners, you would use Perl, not Python,
and if you want readable, self-documenting code, you're more likely to
get it from Python than from Perl.

If every tool is the same, why aren't we all using VB? Or C, or
Javascript, or SmallTalk, or Forth, or ... ? In the real world, all these
languages have distinguishing characteristics and different strengths and
weaknesses, which is why there are still people using PL/I and Cobol as
well as people using Haskell and Lisp and Boo and PHP and D and ...

Languages are not just nebulous interchangeable "tools", they're tools
for a particular job with particular strengths and weaknesses, and
depending on what strengths you value and what weaknesses you dislike,
some tools simply are better than other tools for certain tasks.
 
S

staticd

Amusing how long those Python toes can be. In several replies I have
When do people learn that language makes a difference?  I used to be a
Perl programmer; these days, you'd have to triple my not-small salary to
get me to even think about programming in Perl.

dude, you nailed it. many times, if not _always_, the correct output
is important. the method used to produce the output is irrelevant.
 
S

Steven D'Aprano

dude, you nailed it. many times, if not _always_, the correct output is
important. the method used to produce the output is irrelevant.

Oh really?

Then by that logic, you would consider that these two functions are both
equally good. Forget readability, forget maintainability, forget
efficiency, we have no reason for preferring one over the other since the
method is irrelevant.


def greet1(name):
"""Print 'Hello <name>' for any name."""
print "Hello", name


def greet2(name):
"""Print 'Hello <name>' for any name."""
count = 0
for i in range(0, ("Hello", name).__len__(), 1):
word = ("Hello", name).__getitem__(i)
for i in range(0, word[:].__len__(), 1):
c = word.__getitem__(i)
import sys
import string
empty = ''
maketrans = getattr.__call__(string, 'maketrans')
chars = maketrans.__call__(empty, empty)
stdout = getattr.__call__(sys, 'stdout')
write = getattr.__call__(stdout, 'write')
write.__call__(c)
count = count.__add__(1)
import operator
eq = getattr.__call__(operator, 'eq')
ne = getattr.__call__(operator, 'ne')
if eq.__call__(count, 2):
pass
elif not ne.__call__(count, 2):
continue
write.__call__(chr.__call__(32))
write.__call__(chr.__call__(10))
return None



There ought to be some kind of competition for the least efficient
solution to programming problems-ly y'rs,
 
S

Stefan Behnel

Steven D'Aprano, 28.02.2010 09:48:
There ought to be some kind of competition for the least efficient
solution to programming problems

That wouldn't be very interesting. You could just write a code generator
that spits out tons of garbage code including a line that solves the
problem, and then let it execute the code afterwards. That beast would
always win.

Stefan
 
M

Martin P. Hellwig

Steven D'Aprano, 28.02.2010 09:48:

That wouldn't be very interesting. You could just write a code generator
that spits out tons of garbage code including a line that solves the
problem, and then let it execute the code afterwards. That beast would
always win.

Stefan
Well that would be an obvious rule that garbage code that does not
contribute to the end result (ie can be taken out without affecting the
end result) would not be allowed. Enforcing the rule is another beast
though, but I would leave that to the competition.

Though the idea of a code generator is solid, but instead of generating
garbage, produces a virtual machine that implements a generator that
produces a virtual machine, etc. etc.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top