Python dos2unix one liner

Discussion in 'Python' started by @ Rocteur CC, Feb 27, 2010.

  1. @ Rocteur CC

    @ Rocteur CC Guest

    Hi,

    This morning I am working though Building Skills in Python and was
    having problems with string.strip.

    Then I found the input file I was using was in DOS format and I
    thought it be best to convert it to UNIX and so I started to type perl
    -i -pe 's/ and then I though, wait, I'm learning Python, I have to
    think in Python, as I'm a Python newbie I fired up Google and typed:

    +python convert dos to unix +one +liner

    Found perl, sed, awk but no python on the first page

    So I tried

    +python dos2unix +one +liner -perl

    Same thing..

    But then I found http://wiki.python.org/moin/Powerful Python One-Liners
    and tried this:

    cat file.dos | python -c "import sys,re;
    [sys.stdout.write(re.compile('\r\n').sub('\n', line)) for line in
    sys.stdin]" >file.unix

    And it works..

    [10:31:11 incc-imac-intel ~/python] cat -vet file.dos
    one^M$
    two^M$
    three^M$
    [10:32:10 incc-imac-intel ~/python] cat -vet file.unix
    one$
    two$
    three$

    But it is long and just like sed does not do it in place.

    Is there a better way in Python or is this kind of thing best done in
    Perl ?

    Thanks,

    Jerry
     
    @ Rocteur CC, Feb 27, 2010
    #1
    1. Advertising

  2. On 02/27/10 09:36, @ Rocteur CC wrote:
    <cut dos2unix oneliners;python vs perl/sed/awk>
    Hi a couple of fragmented things popped in my head reading your
    question, non of them is very constructive though in what you actually
    want, but here it goes anyway.

    - Oneline through away script with re as a built in syntax, yup that
    sounds like perl to me.

    - What is wrong with making an executable script (not being one line)
    and call that, this is even shorter.

    - ... wait a minute, you are building something in python (problem with
    string.strip - why don't you use the built-in string strip method
    instead?) which barfs on the input (win/unix line ending), should the
    actual solution not be in there, i.e. parsing the line first to check
    for line-endings? .. But wait another minute, why are you getting \r\n
    in the first place, python by default uses universal new lines?

    Hope that helps a bit, maybe you could post the part of the code what
    you are doing for some better suggestions.

    --
    mph
     
    Martin P. Hellwig, Feb 27, 2010
    #2
    1. Advertising

  3. @ Rocteur CC

    Peter Otten Guest

    @ Rocteur CC wrote:

    > But then I found
    > http://wiki.python.org/moin/Powerful Python One-Liners
    > and tried this:
    >
    > cat file.dos | python -c "import sys,re;
    > [sys.stdout.write(re.compile('\r\n').sub('\n', line)) for line in
    > sys.stdin]" >file.unix
    >
    > And it works..


    - Don't build list comprehensions just to throw them away, use a for-loop
    instead.

    - You can often use string methods instead of regular expressions. In this
    case line.replace("\r\n", "\n").

    > But it is long and just like sed does not do it in place.
    >
    > Is there a better way in Python or is this kind of thing best done in
    > Perl ?


    open(..., "U") ("universal" mode) converts arbitrary line endings to just
    "\n"

    $ cat -e file.dos
    alpha^M$
    beta^M$
    gamma^M$

    $ python -c'open("file.unix", "wb").writelines(open("file.dos", "U"))'

    $ cat -e file.unix
    alpha$
    beta$
    gamma$

    But still, if you want very short (and often cryptic) code Perl is hard to
    beat. I'd say that Python doesn't even try.

    Peter
     
    Peter Otten, Feb 27, 2010
    #3
  4. On Sat, 27 Feb 2010 10:36:41 +0100, @ Rocteur CC wrote:

    > cat file.dos | python -c "import sys,re;
    > [sys.stdout.write(re.compile('\r\n').sub('\n', line)) for line in
    > sys.stdin]" >file.unix


    Holy cow!!!!!!! Calling a regex just for a straight literal-to-literal
    string replacement! You've been infected by too much Perl coding!

    *wink*

    Regexes are expensive, even in Perl, but more so in Python. When you
    don't need the 30 pound sledgehammer of regexes, use lightweight string
    methods.

    import sys; sys.stdout.write(sys.stdin.read().replace('\r\n', '\n'))

    ought to do it. It's not particularly short, but Python doesn't value
    extreme brevity -- code golf isn't terribly exciting in Python.

    [steve@sylar ~]$ cat -vet file.dos
    one^M$
    two^M$
    three^M$
    [steve@sylar ~]$ cat file.dos | python -c "import sys; sys.stdout.write
    (sys.stdin.read().replace('\r\n', '\n'))" > file.unix
    [steve@sylar ~]$ cat -vet file.unix
    one$
    two$
    three$
    [steve@sylar ~]$

    Works fine. Unfortunately it still doesn't work in-place, although I
    think that's probably a side-effect of the shell, not Python. To do it in
    place, I would pass the file name:

    # Tested and working in the interactive interpreter.
    import sys
    filename = sys.argv[1]
    text = open(filename, 'rb').read().replace('\r\n', '\n')
    open(filename, 'wb').write(text)


    Turning that into a one-liner isn't terribly useful or interesting, but
    here we go:

    python -c "import sys;open(sys.argv[1], 'wb').write(open(sys.argv[1],
    'rb').read().replace('\r\n', '\n'))" file

    Unfortunately, this does NOT work: I suspect it is because the file gets
    opened for writing (and hence emptied) before it gets opened for reading.
    Here's another attempt:

    python -c "import sys;t=open(sys.argv[1], 'rb').read().replace('\r\n',
    '\n');open(sys.argv[1], 'wb').write(t)" file


    [steve@sylar ~]$ cp file.dos file.txt
    [steve@sylar ~]$ python -c "import sys;t=open(sys.argv[1], 'rb').read
    ().replace('\r\n', '\n');open(sys.argv[1], 'wb').write(t)" file.txt
    [steve@sylar ~]$ cat -vet file.txt
    one$
    two$
    three$
    [steve@sylar ~]$


    Success!

    Of course, none of these one-liners are good practice. The best thing to
    use is a dedicated utility, or write a proper script that has proper
    error testing.


    > Is there a better way in Python or is this kind of thing best done in
    > Perl ?


    If by "this kind of thing" you mean text processing, then no, Python is
    perfectly capable of doing text processing. Regexes aren't as highly
    optimized as in Perl, but they're more than good enough for when you
    actually need a regex.

    If you mean "code golf" and one-liners, then, yes, this is best done in
    Perl :)


    --
    Steven
     
    Steven D'Aprano, Feb 27, 2010
    #4
  5. @ Rocteur CC

    @ Rocteur CC Guest

    On 27 Feb 2010, at 12:44, Steven D'Aprano wrote:

    > On Sat, 27 Feb 2010 10:36:41 +0100, @ Rocteur CC wrote:
    >
    >> cat file.dos | python -c "import sys,re;
    >> [sys.stdout.write(re.compile('\r\n').sub('\n', line)) for line in
    >> sys.stdin]" >file.unix

    >
    > Holy cow!!!!!!! Calling a regex just for a straight literal-to-literal
    > string replacement! You've been infected by too much Perl coding!


    Thanks for the replies I'm looking at them now, however, for those who
    misunderstood, the above cat file.dos pipe pythong does not come from
    Perl but comes from:

    http://wiki.python.org/moin/Powerful Python One-Liners

    > Apply regular expression to lines from stdin
    > [another command] | python -c "import sys,re;
    > [sys.stdout.write(re.compile('PATTERN').sub('SUBSTITUTION', line))
    > for line in sys.stdin]"



    Nothing to do with Perl, Perl only takes a handful of characters to do
    this and certainly does not require the creation an intermediate file,
    I simply found the above example on wiki.python.org whilst searching
    Google for a quick conversion solution.

    Thanks again for the replies I've learned a few things and I
    appreciate your help.

    Jerry
     
    @ Rocteur CC, Feb 27, 2010
    #5
  6. @ Rocteur CC

    Guest

    On Feb 27, 2010, at 10:01 AM, @ Rocteur CC wrote:
    > Nothing to do with Perl, Perl only takes a handful of characters to do this and certainly does not require the creation an intermediate file


    Perl may be better for you for throw-away code. Use Python for the code you want to keep (and read and understand later).

    S
     
    , Feb 27, 2010
    #6
  7. On 2010-02-27, @ Rocteur CC <> wrote:

    > Nothing to do with Perl, Perl only takes a handful of characters to do
    > this and certainly does not require the creation an intermediate file,


    Are you sure about that?

    Or does it just hide the intermediate file from you the way
    that sed -i does?

    --
    Grant
     
    Grant Edwards, Feb 27, 2010
    #7
  8. @ Rocteur CC

    John Bokma Guest

    "" <> writes:

    > On Feb 27, 2010, at 10:01 AM, @ Rocteur CC wrote:
    >> Nothing to do with Perl, Perl only takes a handful of characters to
    >> do this and certainly does not require the creation an intermediate
    >> file

    >
    > Perl may be better for you for throw-away code. Use Python for the
    > code you want to keep (and read and understand later).


    Amusing how long those Python toes can be. In several replies I have
    noticed (often clueless) opinions on Perl. When do people learn that a
    language is just a tool to do a job?

    --
    John Bokma j3b

    Hacking & Hiking in Mexico - http://johnbokma.com/
    http://castleamber.com/ - Perl & Python Development
     
    John Bokma, Feb 27, 2010
    #8
  9. * @ Rocteur CC:
    >
    > On 27 Feb 2010, at 12:44, Steven D'Aprano wrote:
    >
    >> On Sat, 27 Feb 2010 10:36:41 +0100, @ Rocteur CC wrote:
    >>
    >>> cat file.dos | python -c "import sys,re;
    >>> [sys.stdout.write(re.compile('\r\n').sub('\n', line)) for line in
    >>> sys.stdin]" >file.unix

    >>
    >> Holy cow!!!!!!! Calling a regex just for a straight literal-to-literal
    >> string replacement! You've been infected by too much Perl coding!

    >
    > Thanks for the replies I'm looking at them now, however, for those who
    > misunderstood, the above cat file.dos pipe pythong does not come from
    > Perl but comes from:
    >
    > http://wiki.python.org/moin/Powerful Python One-Liners


    Steven is right with the "Holy Cow" and multiple exclamation marks.

    For those unfamiliar with that, just google "multiple exclamation marks", I
    think that should work... ;-)

    Not only is a regular expression overkill & inefficient, but the snippet also
    needlessly constructs an array with size the number of lines.

    Consider instead e.g.

    <hack>
    import sys; sum(int(bool(sys.stdout.write(line.replace('\r\n','\n')))) for line
    in sys.stdin)
    </hack>

    But better, consider that it's less work to save the code in a file than copying
    and pasting it in a command interpreter, and then it doesn't need to be 1 line.



    >> Apply regular expression to lines from stdin
    >> [another command] | python -c "import
    >> sys,re;[sys.stdout.write(re.compile('PATTERN').sub('SUBSTITUTION',
    >> line)) for line in sys.stdin]"

    >
    >
    > Nothing to do with Perl, Perl only takes a handful of characters to do
    > this and certainly does not require the creation an intermediate file, I
    > simply found the above example on wiki.python.org whilst searching
    > Google for a quick conversion solution.
    >
    > Thanks again for the replies I've learned a few things and I appreciate
    > your help.


    Cheers,

    - Alf
     
    Alf P. Steinbach, Feb 27, 2010
    #9
  10. On Sat, 27 Feb 2010 16:01:53 +0100, @ Rocteur CC wrote:

    > On 27 Feb 2010, at 12:44, Steven D'Aprano wrote:
    >
    >> On Sat, 27 Feb 2010 10:36:41 +0100, @ Rocteur CC wrote:
    >>
    >>> cat file.dos | python -c "import sys,re;
    >>> [sys.stdout.write(re.compile('\r\n').sub('\n', line)) for line in
    >>> sys.stdin]" >file.unix

    >>
    >> Holy cow!!!!!!! Calling a regex just for a straight literal-to-literal
    >> string replacement! You've been infected by too much Perl coding!

    >
    > Thanks for the replies I'm looking at them now, however, for those who
    > misunderstood, the above cat file.dos pipe pythong does not come from
    > Perl but comes from:
    >
    > http://wiki.python.org/moin/Powerful Python One-Liners


    Whether it comes from Larry Wall himself, or a Python wiki, using regexes
    for a simple string replacement is like using an 80 lb sledgehammer to
    crack a peanut.


    >> Apply regular expression to lines from stdin [another command] | python
    >> -c "import sys,re;
    >> [sys.stdout.write(re.compile('PATTERN').sub('SUBSTITUTION', line)) for
    >> line in sys.stdin]"


    And if PATTERN is an actual regex, rather than just a simple substring,
    that would be worthwhile. But if PATTERN is a literal string, then string
    methods are much faster and use much less memory.

    > Nothing to do with Perl, Perl only takes a handful of characters to do
    > this


    I'm sure it does. If I were interested in code-golf, I'd be impressed.


    > and certainly does not require the creation an intermediate file,


    The solution I gave you doesn't use an intermediate file either.

    *slaps head and is enlightened*
    Oh, I'm an idiot!

    Since you're reading text files, there's no need to call
    replace('\r\n','\n'). Since there shouldn't be any bare \r characters in
    a DOS-style text file, just use replace('\r', '').

    Of course, that's an unsafe assumption in the real world. But for a quick
    and dirty one-liner (and all one-liners are quick and dirty), it should
    be good enough.



    --
    Steven
     
    Steven D'Aprano, Feb 27, 2010
    #10
  11. @ Rocteur CC

    Guest

    On Feb 27, 2010, at 12:27 PM, John Bokma wrote:

    > "" <> writes:
    >
    >> On Feb 27, 2010, at 10:01 AM, @ Rocteur CC wrote:
    >>> Nothing to do with Perl, Perl only takes a handful of characters to
    >>> do this and certainly does not require the creation an intermediate
    >>> file

    >>
    >> Perl may be better for you for throw-away code. Use Python for the
    >> code you want to keep (and read and understand later).

    >
    > Amusing how long those Python toes can be. In several replies I have
    > noticed (often clueless) opinions on Perl. When do people learn that a
    > language is just a tool to do a job?


    I'm not sure how "use it for what it's good for" has anything to do with toes.

    I've written lots of both Python and Perl and sometimes, for one-off's, Perl is quicker; if you know it.

    I sure don't want to maintain Perl applications though; even ones I've written.

    When all you have is a nail file, everything looks like a toe; that doesn't mean you want to have to maintain it. Or something.

    S
     
    , Feb 27, 2010
    #11
  12. On 2010-02-27, @ Rocteur CC <> wrote:
    >
    > On 27 Feb 2010, at 12:44, Steven D'Aprano wrote:
    >
    >> On Sat, 27 Feb 2010 10:36:41 +0100, @ Rocteur CC wrote:
    >>
    >>> cat file.dos | python -c "import sys,re;
    >>> [sys.stdout.write(re.compile('\r\n').sub('\n', line)) for line in
    >>> sys.stdin]" >file.unix

    >>
    >> Holy cow!!!!!!! Calling a regex just for a straight literal-to-literal
    >> string replacement! You've been infected by too much Perl coding!

    >
    > Thanks for the replies I'm looking at them now, however, for those who
    > misunderstood, the above cat file.dos pipe pythong does not come from
    > Perl but comes from:
    >
    > http://wiki.python.org/moin/Powerful Python One-Liners
    >
    >> Apply regular expression to lines from stdin
    >> [another command] | python -c "import sys,re;
    >> [sys.stdout.write(re.compile('PATTERN').sub('SUBSTITUTION', line))
    >> for line in sys.stdin]"

    >
    > Nothing to do with Perl, Perl only takes a handful of
    > characters to do this and certainly does not require the
    > creation an intermediate file,


    In _theory_ you can do a simple string-replace in situ as long
    as the replacement string is shorter than the original string.
    But I have a hard time believing that Perl actually does it
    that. Since I don't speak line-noise, will you please post the
    Perl script that you claim does the conversion without creating
    an intermediate file?

    The only way I can think of to do a general in-situ file
    modification is to buffer the entire file's worth of output in
    memory and then overwrite the file after all of the processing
    has finished. Python can do that too, but it's not generally a
    very good approach.

    --
    Grant
     
    Grant Edwards, Feb 27, 2010
    #12
  13. @ Rocteur CC

    John Bokma Guest

    "" <> writes:

    > I'm not sure how "use it for what it's good for" has anything to do
    > with toes.


    I've the feeling that some people who use Python are easily offended by
    everthing Perl related. Which is silly; zealotism in general is, for
    that matter.

    > I've written lots of both Python and Perl and sometimes, for
    > one-off's, Perl is quicker; if you know it.
    >
    > I sure don't want to maintain Perl applications though; even ones I've
    > written.


    Ouch, I am afraid that that tells a lot about your Perl programming
    skills.

    --
    John Bokma j3b

    Hacking & Hiking in Mexico - http://johnbokma.com/
    http://castleamber.com/ - Perl & Python Development
     
    John Bokma, Feb 27, 2010
    #13
  14. @ Rocteur CC

    Guest

    On Feb 27, 2010, at 1:15 PM, John Bokma wrote:

    >> I sure don't want to maintain Perl applications though; even ones I've
    >> written.

    >
    > Ouch, I am afraid that that tells a lot about your Perl programming
    > skills.


    Nah, it tells you about my preferences.

    I can, and have, written maintainable things in many languages, including Perl.

    However, I *choose* Python.

    S
     
    , Feb 27, 2010
    #14
  15. @ Rocteur CC

    Aahz Guest

    In article <>,
    John Bokma <> wrote:
    >
    >Amusing how long those Python toes can be. In several replies I have
    >noticed (often clueless) opinions on Perl. When do people learn that a
    >language is just a tool to do a job?


    When do people learn that language makes a difference? I used to be a
    Perl programmer; these days, you'd have to triple my not-small salary to
    get me to even think about programming in Perl.
    --
    Aahz () <*> http://www.pythoncraft.com/

    "Many customs in this life persist because they ease friction and promote
    productivity as a result of universal agreement, and whether they are
    precisely the optimal choices is much less important." --Henry Spencer
     
    Aahz, Feb 27, 2010
    #15
  16. On Sat, 27 Feb 2010 11:27:04 -0600, John Bokma wrote:

    > When do people learn that a
    > language is just a tool to do a job?


    When do people learn that there are different sorts of tools? A
    professional wouldn't use a screwdriver when they need a hammer.

    Perl has strengths: it can be *extremely* concise, regexes are optimized
    much more than in Python, and you can do some things as a one-liner short
    enough to use from the command line easily. Those are values, as seen by
    the millions of people who swear by Perl, but they are not Python's
    values.

    If you want something which can make fine cuts in metal, you would use a
    hacksaw, not a keyhole saw or a crosscut saw. If you want to cut through
    an three foot tree truck, you would use a ripsaw or a chainsaw, and not a
    hacksaw. If you want concise one-liners, you would use Perl, not Python,
    and if you want readable, self-documenting code, you're more likely to
    get it from Python than from Perl.

    If every tool is the same, why aren't we all using VB? Or C, or
    Javascript, or SmallTalk, or Forth, or ... ? In the real world, all these
    languages have distinguishing characteristics and different strengths and
    weaknesses, which is why there are still people using PL/I and Cobol as
    well as people using Haskell and Lisp and Boo and PHP and D and ...

    Languages are not just nebulous interchangeable "tools", they're tools
    for a particular job with particular strengths and weaknesses, and
    depending on what strengths you value and what weaknesses you dislike,
    some tools simply are better than other tools for certain tasks.



    --
    Steven
     
    Steven D'Aprano, Feb 28, 2010
    #16
  17. @ Rocteur CC

    staticd Guest

    > >Amusing how long those Python toes can be. In several replies I have
    > >noticed (often clueless) opinions on Perl. When do people learn that a
    > >language is just a tool to do a job?

    >
    > When do people learn that language makes a difference?  I used to be a
    > Perl programmer; these days, you'd have to triple my not-small salary to
    > get me to even think about programming in Perl.


    dude, you nailed it. many times, if not _always_, the correct output
    is important. the method used to produce the output is irrelevant.
     
    staticd, Feb 28, 2010
    #17
  18. On Sat, 27 Feb 2010 19:37:50 -0800, staticd wrote:

    >> >Amusing how long those Python toes can be. In several replies I have
    >> >noticed (often clueless) opinions on Perl. When do people learn that a
    >> >language is just a tool to do a job?

    >>
    >> When do people learn that language makes a difference?  I used to be a
    >> Perl programmer; these days, you'd have to triple my not-small salary
    >> to get me to even think about programming in Perl.

    >
    > dude, you nailed it. many times, if not _always_, the correct output is
    > important. the method used to produce the output is irrelevant.


    Oh really?

    Then by that logic, you would consider that these two functions are both
    equally good. Forget readability, forget maintainability, forget
    efficiency, we have no reason for preferring one over the other since the
    method is irrelevant.


    def greet1(name):
    """Print 'Hello <name>' for any name."""
    print "Hello", name


    def greet2(name):
    """Print 'Hello <name>' for any name."""
    count = 0
    for i in range(0, ("Hello", name).__len__(), 1):
    word = ("Hello", name).__getitem__(i)
    for i in range(0, word[:].__len__(), 1):
    c = word.__getitem__(i)
    import sys
    import string
    empty = ''
    maketrans = getattr.__call__(string, 'maketrans')
    chars = maketrans.__call__(empty, empty)
    stdout = getattr.__call__(sys, 'stdout')
    write = getattr.__call__(stdout, 'write')
    write.__call__(c)
    count = count.__add__(1)
    import operator
    eq = getattr.__call__(operator, 'eq')
    ne = getattr.__call__(operator, 'ne')
    if eq.__call__(count, 2):
    pass
    elif not ne.__call__(count, 2):
    continue
    write.__call__(chr.__call__(32))
    write.__call__(chr.__call__(10))
    return None



    There ought to be some kind of competition for the least efficient
    solution to programming problems-ly y'rs,


    --
    Steven
     
    Steven D'Aprano, Feb 28, 2010
    #18
  19. Steven D'Aprano, 28.02.2010 09:48:
    > There ought to be some kind of competition for the least efficient
    > solution to programming problems


    That wouldn't be very interesting. You could just write a code generator
    that spits out tons of garbage code including a line that solves the
    problem, and then let it execute the code afterwards. That beast would
    always win.

    Stefan
     
    Stefan Behnel, Feb 28, 2010
    #19
  20. On 02/28/10 11:05, Stefan Behnel wrote:
    > Steven D'Aprano, 28.02.2010 09:48:
    >> There ought to be some kind of competition for the least efficient
    >> solution to programming problems

    >
    > That wouldn't be very interesting. You could just write a code generator
    > that spits out tons of garbage code including a line that solves the
    > problem, and then let it execute the code afterwards. That beast would
    > always win.
    >
    > Stefan
    >

    Well that would be an obvious rule that garbage code that does not
    contribute to the end result (ie can be taken out without affecting the
    end result) would not be allowed. Enforcing the rule is another beast
    though, but I would leave that to the competition.

    Though the idea of a code generator is solid, but instead of generating
    garbage, produces a virtual machine that implements a generator that
    produces a virtual machine, etc. etc.

    --
    mph
     
    Martin P. Hellwig, Feb 28, 2010
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jeff Thies

    dos2unix

    Jeff Thies, Oct 27, 2004, in forum: HTML
    Replies:
    3
    Views:
    574
    Lemming
    Oct 28, 2004
  2. =?ISO-8859-1?Q?Morris_Carr=E9?=

    Python passes the Turing test with a one-liner !

    =?ISO-8859-1?Q?Morris_Carr=E9?=, Apr 1, 2004, in forum: Python
    Replies:
    1
    Views:
    358
    Jon Perez
    Apr 2, 2004
  3. Xah Lee
    Replies:
    0
    Views:
    365
    Xah Lee
    Oct 20, 2005
  4. Krzysztof Cierpisz

    fromdos dos2unix in ruby

    Krzysztof Cierpisz, Aug 18, 2009, in forum: Ruby
    Replies:
    17
    Views:
    357
    Xavier Noria
    Aug 19, 2009
  5. Larry
    Replies:
    1
    Views:
    121
    Martien Verbruggen
    Feb 3, 2005
Loading...

Share This Page