Python Equivalent for dd & fold

S

seldan24

Hello,

I have a shell script, that I'm attempting to convert to Python. It
FTP's files down from an AS/400 machine. That part is working fine.
Once the files arrive, the script converts them from EBCDIC to ASCII
and then formats their line width based on a pre-determined size.

For example, if I have the file TESTFILE and I know it should be
formatted to 32 characters/line, I run:

dd if=TESTFILE,EBCDIC conv=ASCII | fold -w 32 > TESTFILE.ASCII

Luckily, the files have the packed decimal format common in COBOL
programs converted already prior to reaching me, so all I have to do
is the conversion and line width formatting. The above works fine,
and I know I could (and may) just embed it using subprocess.Popen.
This, I've figured out how to do. But, ideally, I'd like to convert
as much shell to native Python as possible, so that I can learn more.

I think I can figure out the first part by using the codecs module...
found some information on Google related to that. My question is,
what can I use as the equivalent for the Unix 'fold' command? I don't
really need to know how to do it, just a push in the right direction.

Thanks.
 
M

Michiel Overtoom

seldan24 said:
what can I use as the equivalent for the Unix 'fold' command?

def fold(s,len):
while s:
print s[:len]
s=s[len:]

s="A very long string indeed. Really that long? Indeed."
fold(s,10)

Output:

A very lon
g string i
ndeed. Rea
lly that l
ong? Indee
d.

Greetings,
 
S

seldan24

seldan24 said:
what can I use as the equivalent for the Unix 'fold' command?

def fold(s,len):
     while s:
         print s[:len]
         s=s[len:]

s="A very long string indeed. Really that long? Indeed."
fold(s,10)

Output:

A very lon
g string i
ndeed. Rea
lly that l
ong? Indee
d.

Greetings,

Wow, I feel like a dork. I should have done more research prior to
posting. Anyway, thanks for the advice. The trouble with Python is
that things make 'too much' sense. Loving this language.
 
M

MRAB

seldan24 said:
seldan24 said:
what can I use as the equivalent for the Unix 'fold' command?
def fold(s,len):
while s:
print s[:len]
s=s[len:]

s="A very long string indeed. Really that long? Indeed."
fold(s,10)

Output:

A very lon
g string i
ndeed. Rea
lly that l
ong? Indee
d.

Wow, I feel like a dork. I should have done more research prior to
posting. Anyway, thanks for the advice. The trouble with Python is
that things make 'too much' sense. Loving this language.

You might still need to tweak the above code as regards how line endings
are handled.
 
E

Emile van Sebille

On 7/15/2009 10:23 AM MRAB said...
seldan24 wrote:
what can I use as the equivalent for the Unix 'fold' command?
def fold(s,len):
while s:
print s[:len]
s=s[len:]
You might still need to tweak the above code as regards how line endings
are handled.

You might also want to tweak it if the strings are _really_ long to
simply slice out the substrings as opposed to reassigning the balance to
a newly created s on each iteration.

Emile
 
S

seldan24

On 7/15/2009 10:23 AM MRAB said...
seldan24 wrote:
what can I use as the equivalent for the Unix 'fold' command?
def fold(s,len):
     while s:
         print s[:len]
         s=s[len:]

You might still need to tweak the above code as regards how line endings
are handled.

You might also want to tweak it if the strings are _really_ long to
simply slice out the substrings as opposed to reassigning the balance to
a newly created s on each iteration.

Emile

Thanks for all of the help. I'm almost there. I have it working now,
but the 'fold' piece is very slow. When I use the 'fold' command in
shell it is almost instantaneous. I was able to do the EBCDIC->ASCII
conversion usng the decode method in the built-in str type. I didn't
have to import the codecs module. I just decoded the data to cp037
which works fine.

So now, I'm left with a large file, consisting of one extremely long
line of ASCII data that needs to be sliced up into 35 character
lines. I did the following, which works but takes a very long time:

f = open(ascii_file, 'w')
while ascii_data:
f.write(ascii_data[:len])
ascii_data = ascii_data[len:]
f.close()

I know that Emile suggested that I can slice out the substrings rather
than do the gradual trimming of the string variable as is being done
by moving around the length. So, I'm going to give that a try... I'm
a bit confused by what that means, am guessing that slice can break up
a string based on characters; will research. Thanks for the help thus
far. I'll post again when all is working fine.
 
M

Michiel Overtoom

seldan24 said:
I know that Emile suggested that I can slice out the substrings rather
than do the gradual trimming of the string variable as is being done
by moving around the length.

An excellent idea.

def fold(s,chunklength):
offset=0
while offset<len(s):
print s[offset:eek:ffset+chunklength]
offset+=chunklength

s="A very long string indeed. Really that long? Indeed."
fold(s,10)
 
C

Casey Webster

On 7/15/2009 10:23 AM MRAB said...
seldan24 wrote:
what can I use as the equivalent for the Unix 'fold' command?
def fold(s,len):
     while s:
         print s[:len]
         s=s[len:]
You might also want to tweak it if the strings are _really_ long to
simply slice out the substrings as opposed to reassigning the balance to
a newly created s on each iteration.

Thanks for all of the help.  I'm almost there.  I have it working now,
but the 'fold' piece is very slow.  When I use the 'fold' command in
shell it is almost instantaneous.  I was able to do the EBCDIC->ASCII
conversion usng the decode method in the built-in str type.  I didn't
have to import the codecs module.  I just decoded the data to cp037
which works fine.

So now, I'm left with a large file, consisting of one extremely long
line of ASCII data that needs to be sliced up into 35 character
lines.  I did the following, which works but takes a very long time:

f = open(ascii_file, 'w')
while ascii_data:
    f.write(ascii_data[:len])
    ascii_data = ascii_data[len:]
f.close()

I know that Emile suggested that I can slice out the substrings rather
than do the gradual trimming of the string variable as is being done
by moving around the length.  So, I'm going to give that a try... I'm
a bit confused by what that means, am guessing that slice can break up
a string based on characters; will research.  Thanks for the help thus
far.  I'll post again when all is working fine.

The problem is that it creates a new string every time you iterate
through the "ascii_data = ascii_data[len:]". I believe Emile was
suggesting that you just keep moving the starting index through the
same string, something like (warning - untested code!):
i = 0
str_len = len(ascii_data)
while i < str_len:
j = min(i + length, str_len)
print ascii_data[i:j]
i = j
 
P

pdpi

On 7/15/2009 10:23 AM MRAB said...
seldan24 wrote:
what can I use as the equivalent for the Unix 'fold' command?
def fold(s,len):
     while s:
         print s[:len]
         s=s[len:]
You might also want to tweak it if the strings are _really_ long to
simply slice out the substrings as opposed to reassigning the balance to
a newly created s on each iteration.

Thanks for all of the help.  I'm almost there.  I have it working now,
but the 'fold' piece is very slow.  When I use the 'fold' command in
shell it is almost instantaneous.  I was able to do the EBCDIC->ASCII
conversion usng the decode method in the built-in str type.  I didn't
have to import the codecs module.  I just decoded the data to cp037
which works fine.

So now, I'm left with a large file, consisting of one extremely long
line of ASCII data that needs to be sliced up into 35 character
lines.  I did the following, which works but takes a very long time:

f = open(ascii_file, 'w')
while ascii_data:
    f.write(ascii_data[:len])
    ascii_data = ascii_data[len:]
f.close()

I know that Emile suggested that I can slice out the substrings rather
than do the gradual trimming of the string variable as is being done
by moving around the length.  So, I'm going to give that a try... I'm
a bit confused by what that means, am guessing that slice can break up
a string based on characters; will research.  Thanks for the help thus
far.  I'll post again when all is working fine.

Assuming your rather large text file is 1 meg long, you have 1 million
characters in there. 1000000/35 = ~29k lines. The size remaining
string decreases linearly, so the average size is (1000000 + 0) / 2 or
500k. All said and done, you're allocating and copying a 500K string
-- not once, but 29 thousand times. That's where your slowdown resides.
 
R

ryles

def fold(s,len):
     while s:
         print s[:len]
         s=s[len:]
s="A very long string indeed. Really that long? Indeed."
fold(s,10)

A very lon
g string i
ndeed. Rea
lly that l
ong? Indee
d.

Wow, I feel like a dork.  I should have done more research prior to
posting.  Anyway, thanks for the advice.  The trouble with Python is
that things make 'too much' sense.  Loving this language.

You might also find the textwrap module useful:

http://docs.python.org/library/textwrap.html
 
M

MRAB

seldan24 said:
On 7/15/2009 10:23 AM MRAB said...
seldan24 wrote:
what can I use as the equivalent for the Unix 'fold' command?
def fold(s,len):
while s:
print s[:len]
s=s[len:]
You might still need to tweak the above code as regards how line endings
are handled.
You might also want to tweak it if the strings are _really_ long to
simply slice out the substrings as opposed to reassigning the balance to
a newly created s on each iteration.

Emile

Thanks for all of the help. I'm almost there. I have it working now,
but the 'fold' piece is very slow. When I use the 'fold' command in
shell it is almost instantaneous. I was able to do the EBCDIC->ASCII
conversion usng the decode method in the built-in str type. I didn't
have to import the codecs module. I just decoded the data to cp037
which works fine.

So now, I'm left with a large file, consisting of one extremely long
line of ASCII data that needs to be sliced up into 35 character
lines. I did the following, which works but takes a very long time:

f = open(ascii_file, 'w')
while ascii_data:
f.write(ascii_data[:len])
ascii_data = ascii_data[len:]
f.close()
The 'write' method doesn't append any line ending, so that code gives
the same output as f.write(ascii_data).
 
M

MRAB

Michiel said:
seldan24 said:
I know that Emile suggested that I can slice out the substrings rather
than do the gradual trimming of the string variable as is being done
by moving around the length.

An excellent idea.

def fold(s,chunklength):
offset=0
while offset<len(s):
print s[offset:eek:ffset+chunklength]
offset+=chunklength
More Pythonic:

for offset in range(0, len(s), chunklength):
print s[offset : offset + chunklength]
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,599
Members
45,163
Latest member
Sasha15427
Top