Python Equivalent for dd & fold

Discussion in 'Python' started by seldan24, Jul 15, 2009.

  1. seldan24

    seldan24 Guest

    Hello,

    I have a shell script, that I'm attempting to convert to Python. It
    FTP's files down from an AS/400 machine. That part is working fine.
    Once the files arrive, the script converts them from EBCDIC to ASCII
    and then formats their line width based on a pre-determined size.

    For example, if I have the file TESTFILE and I know it should be
    formatted to 32 characters/line, I run:

    dd if=TESTFILE,EBCDIC conv=ASCII | fold -w 32 > TESTFILE.ASCII

    Luckily, the files have the packed decimal format common in COBOL
    programs converted already prior to reaching me, so all I have to do
    is the conversion and line width formatting. The above works fine,
    and I know I could (and may) just embed it using subprocess.Popen.
    This, I've figured out how to do. But, ideally, I'd like to convert
    as much shell to native Python as possible, so that I can learn more.

    I think I can figure out the first part by using the codecs module...
    found some information on Google related to that. My question is,
    what can I use as the equivalent for the Unix 'fold' command? I don't
    really need to know how to do it, just a push in the right direction.

    Thanks.
     
    seldan24, Jul 15, 2009
    #1
    1. Advertising

  2. seldan24 wrote:

    > what can I use as the equivalent for the Unix 'fold' command?


    def fold(s,len):
    while s:
    print s[:len]
    s=s[len:]

    s="A very long string indeed. Really that long? Indeed."
    fold(s,10)

    Output:

    A very lon
    g string i
    ndeed. Rea
    lly that l
    ong? Indee
    d.

    Greetings,

    --
    "The ability of the OSS process to collect and harness
    the collective IQ of thousands of individuals across
    the Internet is simply amazing." - Vinod Valloppillil
    http://www.catb.org/~esr/halloween/halloween4.html
     
    Michiel Overtoom, Jul 15, 2009
    #2
    1. Advertising

  3. seldan24

    seldan24 Guest

    On Jul 15, 12:47 pm, Michiel Overtoom <> wrote:
    > seldan24 wrote:
    > > what can I use as the equivalent for the Unix 'fold' command?

    >
    > def fold(s,len):
    >      while s:
    >          print s[:len]
    >          s=s[len:]
    >
    > s="A very long string indeed. Really that long? Indeed."
    > fold(s,10)
    >
    > Output:
    >
    > A very lon
    > g string i
    > ndeed. Rea
    > lly that l
    > ong? Indee
    > d.
    >
    > Greetings,
    >
    > --
    > "The ability of the OSS process to collect and harness
    > the collective IQ of thousands of individuals across
    > the Internet is simply amazing." - Vinod Valloppillilhttp://www.catb.org/~esr/halloween/halloween4.html


    Wow, I feel like a dork. I should have done more research prior to
    posting. Anyway, thanks for the advice. The trouble with Python is
    that things make 'too much' sense. Loving this language.
     
    seldan24, Jul 15, 2009
    #3
  4. seldan24

    MRAB Guest

    seldan24 wrote:
    > On Jul 15, 12:47 pm, Michiel Overtoom <> wrote:
    >> seldan24 wrote:
    >>> what can I use as the equivalent for the Unix 'fold' command?

    >> def fold(s,len):
    >> while s:
    >> print s[:len]
    >> s=s[len:]
    >>
    >> s="A very long string indeed. Really that long? Indeed."
    >> fold(s,10)
    >>
    >> Output:
    >>
    >> A very lon
    >> g string i
    >> ndeed. Rea
    >> lly that l
    >> ong? Indee
    >> d.
    >>

    >
    > Wow, I feel like a dork. I should have done more research prior to
    > posting. Anyway, thanks for the advice. The trouble with Python is
    > that things make 'too much' sense. Loving this language.


    You might still need to tweak the above code as regards how line endings
    are handled.
     
    MRAB, Jul 15, 2009
    #4
  5. On 7/15/2009 10:23 AM MRAB said...
    >> On Jul 15, 12:47 pm, Michiel Overtoom <> wrote:
    >>> seldan24 wrote:
    >>>> what can I use as the equivalent for the Unix 'fold' command?
    >>> def fold(s,len):
    >>> while s:
    >>> print s[:len]
    >>> s=s[len:]
    >>>

    <snip>
    > You might still need to tweak the above code as regards how line endings
    > are handled.


    You might also want to tweak it if the strings are _really_ long to
    simply slice out the substrings as opposed to reassigning the balance to
    a newly created s on each iteration.

    Emile
     
    Emile van Sebille, Jul 15, 2009
    #5
  6. seldan24

    seldan24 Guest

    On Jul 15, 1:48 pm, Emile van Sebille <> wrote:
    > On 7/15/2009 10:23 AM MRAB said...
    >
    > >> On Jul 15, 12:47 pm, Michiel Overtoom <> wrote:
    > >>> seldan24 wrote:
    > >>>> what can I use as the equivalent for the Unix 'fold' command?
    > >>> def fold(s,len):
    > >>>      while s:
    > >>>          print s[:len]
    > >>>          s=s[len:]

    >
    > <snip>
    > > You might still need to tweak the above code as regards how line endings
    > > are handled.

    >
    > You might also want to tweak it if the strings are _really_ long to
    > simply slice out the substrings as opposed to reassigning the balance to
    > a newly created s on each iteration.
    >
    > Emile


    Thanks for all of the help. I'm almost there. I have it working now,
    but the 'fold' piece is very slow. When I use the 'fold' command in
    shell it is almost instantaneous. I was able to do the EBCDIC->ASCII
    conversion usng the decode method in the built-in str type. I didn't
    have to import the codecs module. I just decoded the data to cp037
    which works fine.

    So now, I'm left with a large file, consisting of one extremely long
    line of ASCII data that needs to be sliced up into 35 character
    lines. I did the following, which works but takes a very long time:

    f = open(ascii_file, 'w')
    while ascii_data:
    f.write(ascii_data[:len])
    ascii_data = ascii_data[len:]
    f.close()

    I know that Emile suggested that I can slice out the substrings rather
    than do the gradual trimming of the string variable as is being done
    by moving around the length. So, I'm going to give that a try... I'm
    a bit confused by what that means, am guessing that slice can break up
    a string based on characters; will research. Thanks for the help thus
    far. I'll post again when all is working fine.
     
    seldan24, Jul 16, 2009
    #6
  7. seldan24 wrote:

    > I know that Emile suggested that I can slice out the substrings rather
    > than do the gradual trimming of the string variable as is being done
    > by moving around the length.


    An excellent idea.

    def fold(s,chunklength):
    offset=0
    while offset<len(s):
    print s[offset:eek:ffset+chunklength]
    offset+=chunklength

    s="A very long string indeed. Really that long? Indeed."
    fold(s,10)


    --
    "The ability of the OSS process to collect and harness
    the collective IQ of thousands of individuals across
    the Internet is simply amazing." - Vinod Valloppillil
    http://www.catb.org/~esr/halloween/halloween4.html
     
    Michiel Overtoom, Jul 16, 2009
    #7
  8. On Jul 16, 10:12 am, seldan24 <> wrote:
    > On Jul 15, 1:48 pm, Emile van Sebille <> wrote:
    >
    >
    >
    > > On 7/15/2009 10:23 AM MRAB said...

    >
    > > >> On Jul 15, 12:47 pm, Michiel Overtoom <> wrote:
    > > >>> seldan24 wrote:
    > > >>>> what can I use as the equivalent for the Unix 'fold' command?
    > > >>> def fold(s,len):
    > > >>>      while s:
    > > >>>          print s[:len]
    > > >>>          s=s[len:]

    >
    > > <snip>
    > > > You might still need to tweak the above code as regards how line endings
    > > > are handled.

    >
    > > You might also want to tweak it if the strings are _really_ long to
    > > simply slice out the substrings as opposed to reassigning the balance to
    > > a newly created s on each iteration.

    >
    > > Emile

    >
    > Thanks for all of the help.  I'm almost there.  I have it working now,
    > but the 'fold' piece is very slow.  When I use the 'fold' command in
    > shell it is almost instantaneous.  I was able to do the EBCDIC->ASCII
    > conversion usng the decode method in the built-in str type.  I didn't
    > have to import the codecs module.  I just decoded the data to cp037
    > which works fine.
    >
    > So now, I'm left with a large file, consisting of one extremely long
    > line of ASCII data that needs to be sliced up into 35 character
    > lines.  I did the following, which works but takes a very long time:
    >
    > f = open(ascii_file, 'w')
    > while ascii_data:
    >     f.write(ascii_data[:len])
    >     ascii_data = ascii_data[len:]
    > f.close()
    >
    > I know that Emile suggested that I can slice out the substrings rather
    > than do the gradual trimming of the string variable as is being done
    > by moving around the length.  So, I'm going to give that a try... I'm
    > a bit confused by what that means, am guessing that slice can break up
    > a string based on characters; will research.  Thanks for the help thus
    > far.  I'll post again when all is working fine.


    The problem is that it creates a new string every time you iterate
    through the "ascii_data = ascii_data[len:]". I believe Emile was
    suggesting that you just keep moving the starting index through the
    same string, something like (warning - untested code!):

    >>> i = 0
    >>> str_len = len(ascii_data)
    >>> while i < str_len:
    >>> j = min(i + length, str_len)
    >>> print ascii_data[i:j]
    >>> i = j
     
    Casey Webster, Jul 16, 2009
    #8
  9. seldan24

    pdpi Guest

    On Jul 16, 3:12 pm, seldan24 <> wrote:
    > On Jul 15, 1:48 pm, Emile van Sebille <> wrote:
    >
    >
    >
    >
    >
    > > On 7/15/2009 10:23 AM MRAB said...

    >
    > > >> On Jul 15, 12:47 pm, Michiel Overtoom <> wrote:
    > > >>> seldan24 wrote:
    > > >>>> what can I use as the equivalent for the Unix 'fold' command?
    > > >>> def fold(s,len):
    > > >>>      while s:
    > > >>>          print s[:len]
    > > >>>          s=s[len:]

    >
    > > <snip>
    > > > You might still need to tweak the above code as regards how line endings
    > > > are handled.

    >
    > > You might also want to tweak it if the strings are _really_ long to
    > > simply slice out the substrings as opposed to reassigning the balance to
    > > a newly created s on each iteration.

    >
    > > Emile

    >
    > Thanks for all of the help.  I'm almost there.  I have it working now,
    > but the 'fold' piece is very slow.  When I use the 'fold' command in
    > shell it is almost instantaneous.  I was able to do the EBCDIC->ASCII
    > conversion usng the decode method in the built-in str type.  I didn't
    > have to import the codecs module.  I just decoded the data to cp037
    > which works fine.
    >
    > So now, I'm left with a large file, consisting of one extremely long
    > line of ASCII data that needs to be sliced up into 35 character
    > lines.  I did the following, which works but takes a very long time:
    >
    > f = open(ascii_file, 'w')
    > while ascii_data:
    >     f.write(ascii_data[:len])
    >     ascii_data = ascii_data[len:]
    > f.close()
    >
    > I know that Emile suggested that I can slice out the substrings rather
    > than do the gradual trimming of the string variable as is being done
    > by moving around the length.  So, I'm going to give that a try... I'm
    > a bit confused by what that means, am guessing that slice can break up
    > a string based on characters; will research.  Thanks for the help thus
    > far.  I'll post again when all is working fine.


    Assuming your rather large text file is 1 meg long, you have 1 million
    characters in there. 1000000/35 = ~29k lines. The size remaining
    string decreases linearly, so the average size is (1000000 + 0) / 2 or
    500k. All said and done, you're allocating and copying a 500K string
    -- not once, but 29 thousand times. That's where your slowdown resides.
     
    pdpi, Jul 16, 2009
    #9
  10. seldan24

    ryles Guest

    On Jul 15, 1:14 pm, seldan24 <> wrote:
    > On Jul 15, 12:47 pm, Michiel Overtoom <> wrote:
    >
    >
    >
    > > seldan24 wrote:
    > > > what can I use as the equivalent for the Unix 'fold' command?

    >
    > > def fold(s,len):
    > >      while s:
    > >          print s[:len]
    > >          s=s[len:]

    >
    > > s="A very long string indeed. Really that long? Indeed."
    > > fold(s,10)

    >
    > > Output:

    >
    > > A very lon
    > > g string i
    > > ndeed. Rea
    > > lly that l
    > > ong? Indee
    > > d.

    >
    > > Greetings,

    >
    > > --
    > > "The ability of the OSS process to collect and harness
    > > the collective IQ of thousands of individuals across
    > > the Internet is simply amazing." - Vinod Valloppillilhttp://www.catb.org/~esr/halloween/halloween4.html

    >
    > Wow, I feel like a dork.  I should have done more research prior to
    > posting.  Anyway, thanks for the advice.  The trouble with Python is
    > that things make 'too much' sense.  Loving this language.


    You might also find the textwrap module useful:

    http://docs.python.org/library/textwrap.html
     
    ryles, Jul 16, 2009
    #10
  11. seldan24

    MRAB Guest

    seldan24 wrote:
    > On Jul 15, 1:48 pm, Emile van Sebille <> wrote:
    >> On 7/15/2009 10:23 AM MRAB said...
    >>
    >>>> On Jul 15, 12:47 pm, Michiel Overtoom <> wrote:
    >>>>> seldan24 wrote:
    >>>>>> what can I use as the equivalent for the Unix 'fold' command?
    >>>>> def fold(s,len):
    >>>>> while s:
    >>>>> print s[:len]
    >>>>> s=s[len:]

    >> <snip>
    >>> You might still need to tweak the above code as regards how line endings
    >>> are handled.

    >> You might also want to tweak it if the strings are _really_ long to
    >> simply slice out the substrings as opposed to reassigning the balance to
    >> a newly created s on each iteration.
    >>
    >> Emile

    >
    > Thanks for all of the help. I'm almost there. I have it working now,
    > but the 'fold' piece is very slow. When I use the 'fold' command in
    > shell it is almost instantaneous. I was able to do the EBCDIC->ASCII
    > conversion usng the decode method in the built-in str type. I didn't
    > have to import the codecs module. I just decoded the data to cp037
    > which works fine.
    >
    > So now, I'm left with a large file, consisting of one extremely long
    > line of ASCII data that needs to be sliced up into 35 character
    > lines. I did the following, which works but takes a very long time:
    >
    > f = open(ascii_file, 'w')
    > while ascii_data:
    > f.write(ascii_data[:len])
    > ascii_data = ascii_data[len:]
    > f.close()
    >

    The 'write' method doesn't append any line ending, so that code gives
    the same output as f.write(ascii_data).

    > I know that Emile suggested that I can slice out the substrings rather
    > than do the gradual trimming of the string variable as is being done
    > by moving around the length. So, I'm going to give that a try... I'm
    > a bit confused by what that means, am guessing that slice can break up
    > a string based on characters; will research. Thanks for the help thus
    > far. I'll post again when all is working fine.
     
    MRAB, Jul 16, 2009
    #11
  12. seldan24

    MRAB Guest

    Michiel Overtoom wrote:
    > seldan24 wrote:
    >
    >> I know that Emile suggested that I can slice out the substrings rather
    >> than do the gradual trimming of the string variable as is being done
    >> by moving around the length.

    >
    > An excellent idea.
    >
    > def fold(s,chunklength):
    > offset=0
    > while offset<len(s):
    > print s[offset:eek:ffset+chunklength]
    > offset+=chunklength
    >

    More Pythonic:

    for offset in range(0, len(s), chunklength):
    print s[offset : offset + chunklength]

    > s="A very long string indeed. Really that long? Indeed."
    > fold(s,10)
    >
     
    MRAB, Jul 16, 2009
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. dean

    Unix fold command in python

    dean, Jun 14, 2004, in forum: Python
    Replies:
    3
    Views:
    341
    Dennis Lee Bieber
    Jun 23, 2004
  2. dean

    Unix fold command in python

    dean, Jun 14, 2004, in forum: Python
    Replies:
    2
    Views:
    388
    Kannan Vijayan
    Jun 14, 2004
  3. Li Daobing
    Replies:
    1
    Views:
    1,095
    Sean Richards
    Oct 13, 2004
  4. Mark Dickinson
    Replies:
    23
    Views:
    683
    Raymond Hettinger
    Aug 26, 2005
  5. JohnE

    a two fold question

    JohnE, Feb 7, 2010, in forum: ASP .Net
    Replies:
    0
    Views:
    335
    JohnE
    Feb 7, 2010
Loading...

Share This Page