Python Equivalent for dd & fold

seldan24 · Jul 15, 2009

Hello,

I have a shell script, that I'm attempting to convert to Python. It
FTP's files down from an AS/400 machine. That part is working fine.
Once the files arrive, the script converts them from EBCDIC to ASCII
and then formats their line width based on a pre-determined size.

For example, if I have the file TESTFILE and I know it should be
formatted to 32 characters/line, I run:

dd if=TESTFILE,EBCDIC conv=ASCII | fold -w 32 > TESTFILE.ASCII

Luckily, the files have the packed decimal format common in COBOL
programs converted already prior to reaching me, so all I have to do
is the conversion and line width formatting. The above works fine,
and I know I could (and may) just embed it using subprocess.Popen.
This, I've figured out how to do. But, ideally, I'd like to convert
as much shell to native Python as possible, so that I can learn more.

I think I can figure out the first part by using the codecs module...
found some information on Google related to that. My question is,
what can I use as the equivalent for the Unix 'fold' command? I don't
really need to know how to do it, just a push in the right direction.

Thanks.

Michiel Overtoom · Jul 15, 2009

seldan24 said:
what can I use as the equivalent for the Unix 'fold' command?

def fold(s,len):
while s:
print s[:len]
s=s[len:]

s="A very long string indeed. Really that long? Indeed."
fold(s,10)

Output:

A very lon
g string i
ndeed. Rea
lly that l
ong? Indee
d.

Greetings,

seldan24 · Jul 15, 2009

seldan24 said:
seldan24 said:

what can I use as the equivalent for the Unix 'fold' command?

Click to expand...

def fold(s,len):
while s:
print s[:len]
s=s[len:]

s="A very long string indeed. Really that long? Indeed."
fold(s,10)

Output:

A very lon
g string i
ndeed. Rea
lly that l
ong? Indee
d.

Greetings,

Wow, I feel like a dork. I should have done more research prior to
posting. Anyway, thanks for the advice. The trouble with Python is
that things make 'too much' sense. Loving this language.

MRAB · Jul 15, 2009

seldan24 said:
seldan24 said:

what can I use as the equivalent for the Unix 'fold' command?

Click to expand...

def fold(s,len):
while s:
print s[:len]
s=s[len:]

s="A very long string indeed. Really that long? Indeed."
fold(s,10)

Output:

A very lon
g string i
ndeed. Rea
lly that l
ong? Indee
d.

Click to expand...

Wow, I feel like a dork. I should have done more research prior to
posting. Anyway, thanks for the advice. The trouble with Python is
that things make 'too much' sense. Loving this language.

You might still need to tweak the above code as regards how line endings
are handled.

Emile van Sebille · Jul 15, 2009

On 7/15/2009 10:23 AM MRAB said...

seldan24 wrote:
what can I use as the equivalent for the Unix 'fold' command?
def fold(s,len):
while s:
print s[:len]
s=s[len:]

Click to expand...

Click to expand...

You might still need to tweak the above code as regards how line endings
are handled.

You might also want to tweak it if the strings are _really_ long to
simply slice out the substrings as opposed to reassigning the balance to
a newly created s on each iteration.

Emile

seldan24 · Jul 16, 2009

On 7/15/2009 10:23 AM MRAB said...

seldan24 wrote:
what can I use as the equivalent for the Unix 'fold' command?
def fold(s,len):
while s:
print s[:len]
s=s[len:]

Click to expand...

Click to expand...

You might still need to tweak the above code as regards how line endings
are handled.

Click to expand...

You might also want to tweak it if the strings are _really_ long to
simply slice out the substrings as opposed to reassigning the balance to
a newly created s on each iteration.

Emile

Thanks for all of the help. I'm almost there. I have it working now,
but the 'fold' piece is very slow. When I use the 'fold' command in
shell it is almost instantaneous. I was able to do the EBCDIC->ASCII
conversion usng the decode method in the built-in str type. I didn't
have to import the codecs module. I just decoded the data to cp037
which works fine.

So now, I'm left with a large file, consisting of one extremely long
line of ASCII data that needs to be sliced up into 35 character
lines. I did the following, which works but takes a very long time:

f = open(ascii_file, 'w')
while ascii_data:
f.write(ascii_data[:len])
ascii_data = ascii_data[len:]
f.close()

I know that Emile suggested that I can slice out the substrings rather
than do the gradual trimming of the string variable as is being done
by moving around the length. So, I'm going to give that a try... I'm
a bit confused by what that means, am guessing that slice can break up
a string based on characters; will research. Thanks for the help thus
far. I'll post again when all is working fine.

Michiel Overtoom · Jul 16, 2009

seldan24 said:
I know that Emile suggested that I can slice out the substrings rather
than do the gradual trimming of the string variable as is being done
by moving around the length.

An excellent idea.

def fold(s,chunklength):
offset=0
while offset<len(s):
print s[offset

ffset+chunklength]
offset+=chunklength

s="A very long string indeed. Really that long? Indeed."
fold(s,10)

Casey Webster · Jul 16, 2009

On 7/15/2009 10:23 AM MRAB said...

seldan24 wrote:
what can I use as the equivalent for the Unix 'fold' command?
def fold(s,len):
while s:
print s[:len]
s=s[len:]

Click to expand...

Click to expand...

You might also want to tweak it if the strings are _really_ long to
simply slice out the substrings as opposed to reassigning the balance to
a newly created s on each iteration.

Click to expand...

Emile

Click to expand...

Thanks for all of the help. I'm almost there. I have it working now,
but the 'fold' piece is very slow. When I use the 'fold' command in
shell it is almost instantaneous. I was able to do the EBCDIC->ASCII
conversion usng the decode method in the built-in str type. I didn't
have to import the codecs module. I just decoded the data to cp037
which works fine.

So now, I'm left with a large file, consisting of one extremely long
line of ASCII data that needs to be sliced up into 35 character
lines. I did the following, which works but takes a very long time:

f = open(ascii_file, 'w')
while ascii_data:
f.write(ascii_data[:len])
ascii_data = ascii_data[len:]
f.close()

I know that Emile suggested that I can slice out the substrings rather
than do the gradual trimming of the string variable as is being done
by moving around the length. So, I'm going to give that a try... I'm
a bit confused by what that means, am guessing that slice can break up
a string based on characters; will research. Thanks for the help thus
far. I'll post again when all is working fine.

The problem is that it creates a new string every time you iterate
through the "ascii_data = ascii_data[len:]". I believe Emile was
suggesting that you just keep moving the starting index through the
same string, something like (warning - untested code!):

i = 0
str_len = len(ascii_data)
while i < str_len:
j = min(i + length, str_len)
print ascii_data[i:j]
i = j

Click to expand...

Click to expand...

pdpi · Jul 16, 2009

On 7/15/2009 10:23 AM MRAB said...

seldan24 wrote:
what can I use as the equivalent for the Unix 'fold' command?
def fold(s,len):
while s:
print s[:len]
s=s[len:]

Click to expand...

Click to expand...

You might also want to tweak it if the strings are _really_ long to
simply slice out the substrings as opposed to reassigning the balance to
a newly created s on each iteration.

Click to expand...

Emile

Click to expand...

Thanks for all of the help. I'm almost there. I have it working now,
but the 'fold' piece is very slow. When I use the 'fold' command in
shell it is almost instantaneous. I was able to do the EBCDIC->ASCII
conversion usng the decode method in the built-in str type. I didn't
have to import the codecs module. I just decoded the data to cp037
which works fine.

So now, I'm left with a large file, consisting of one extremely long
line of ASCII data that needs to be sliced up into 35 character
lines. I did the following, which works but takes a very long time:

f = open(ascii_file, 'w')
while ascii_data:
f.write(ascii_data[:len])
ascii_data = ascii_data[len:]
f.close()

I know that Emile suggested that I can slice out the substrings rather
than do the gradual trimming of the string variable as is being done
by moving around the length. So, I'm going to give that a try... I'm
a bit confused by what that means, am guessing that slice can break up
a string based on characters; will research. Thanks for the help thus
far. I'll post again when all is working fine.

Assuming your rather large text file is 1 meg long, you have 1 million
characters in there. 1000000/35 = ~29k lines. The size remaining
string decreases linearly, so the average size is (1000000 + 0) / 2 or
500k. All said and done, you're allocating and copying a 500K string
-- not once, but 29 thousand times. That's where your slowdown resides.

ryles · Jul 16, 2009

def fold(s,len):
while s:
print s[:len]
s=s[len:]

Click to expand...

s="A very long string indeed. Really that long? Indeed."
fold(s,10)

A very lon
g string i
ndeed. Rea
lly that l
ong? Indee
d.

Click to expand...

Wow, I feel like a dork. I should have done more research prior to
posting. Anyway, thanks for the advice. The trouble with Python is
that things make 'too much' sense. Loving this language.

You might also find the textwrap module useful:

http://docs.python.org/library/textwrap.html

MRAB · Jul 16, 2009

seldan24 said:
On 7/15/2009 10:23 AM MRAB said...

seldan24 wrote:
what can I use as the equivalent for the Unix 'fold' command?
def fold(s,len):
while s:
print s[:len]
s=s[len:]

Click to expand...

You might still need to tweak the above code as regards how line endings
are handled.

Click to expand...

You might also want to tweak it if the strings are _really_ long to
simply slice out the substrings as opposed to reassigning the balance to
a newly created s on each iteration.

Emile

Click to expand...

Thanks for all of the help. I'm almost there. I have it working now,
but the 'fold' piece is very slow. When I use the 'fold' command in
shell it is almost instantaneous. I was able to do the EBCDIC->ASCII
conversion usng the decode method in the built-in str type. I didn't
have to import the codecs module. I just decoded the data to cp037
which works fine.

So now, I'm left with a large file, consisting of one extremely long
line of ASCII data that needs to be sliced up into 35 character
lines. I did the following, which works but takes a very long time:

f = open(ascii_file, 'w')
while ascii_data:
f.write(ascii_data[:len])
ascii_data = ascii_data[len:]
f.close()

The 'write' method doesn't append any line ending, so that code gives
the same output as f.write(ascii_data).

MRAB · Jul 16, 2009

Michiel said:
seldan24 said:

I know that Emile suggested that I can slice out the substrings rather
than do the gradual trimming of the string variable as is being done
by moving around the length.

Click to expand...

An excellent idea.

def fold(s,chunklength):
offset=0
while offset<len(s):
print s[offsetffset+chunklength]
offset+=chunklength

More Pythonic:

for offset in range(0, len(s), chunklength):
print s[offset : offset + chunklength]

Converting DD MM YYYY into YYYY-MM-DD?	18	Aug 17, 2009
Grep Equivalent for Python	15	Mar 14, 2007
help in obtaining binary equivalent of a decimal number in python	2	May 23, 2013
Python equivalent for C module	10	Oct 20, 2008
Why is Python telling me variable is local not global?	3	Sep 2, 2023
Unix fold command in python	2	Jun 14, 2004
How to I do this in Python ?	6	Aug 16, 2013
python tr equivalent (non-ascii)	3	Aug 13, 2008

Python Equivalent for dd & fold

seldan24

Michiel Overtoom

seldan24

MRAB

Emile van Sebille

seldan24

Michiel Overtoom

Casey Webster

pdpi

ryles

MRAB

MRAB

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads