Text processing and file creation

malibuster · Sep 5, 2007

I have a text source file of about 20.000 lines.

From this file, I like to write the first 5 lines to a new file. Close

that file, grab the next 5 lines write these to a new file... grabbing
5 lines and creating new files until processing of all 20.000 lines is
done.
Is there an efficient way to do this in Python?
In advance, thanks for your help.

kyosohma · Sep 5, 2007

I have a text source file of about 20.000 lines.>From this file, I like to write the first 5 lines to a new file. Close

that file, grab the next 5 lines write these to a new file... grabbing
5 lines and creating new files until processing of all 20.000 lines is
done.
Is there an efficient way to do this in Python?
In advance, thanks for your help.

I would use a counter in a for loop using the readline method to
iterate over the 20,000 line file. Reset the counter every 5 lines/
iterations and close the file. To name files with unique names, use
the time module. Something like this:

x = 'filename-%s.txt' % time.time()

Have fun!

Mike

Arnau Sanchez · Sep 5, 2007

(e-mail address removed) escribió:

I have a text source file of about 20.000 lines.
that file, grab the next 5 lines write these to a new file... grabbing
5 lines and creating new files until processing of all 20.000 lines is
done.
Is there an efficient way to do this in Python?

Perhaps you could provide some code to see how you approached it?

Bjoern Schliessmann · Sep 5, 2007

I would use a counter in a for loop using the readline method to
iterate over the 20,000 line file.

file objects are iterables themselves, so there's no need to do that
by using a method.

Reset the counter every 5 lines/ iterations and close the file.

I'd use a generator that fetches five lines of the file per
iteration and iterate over it instead of the file directly.

Have fun!

Definitely -- and also do your homework yourself

Regards,

Björn

Shawn Milochik · Sep 5, 2007

I have a text source file of about 20.000 lines.
that file, grab the next 5 lines write these to a new file... grabbing
5 lines and creating new files until processing of all 20.000 lines is
done.
Is there an efficient way to do this in Python?
In advance, thanks for your help.

I have written a working test of this. Here's the basic setup:

open the input file

function newFileName:
generate a filename (starting with 00001.tmp).
If filename exists, increment and test again (0002.tmp and so on).
return fileName

read a line until input file is empty:

test to see whether I have written five lines. If so, get a new
file name, close file, and open new file

write line to file

close output file final time

Once you get some code running, feel free to post it and we'll help.

kyosohma · Sep 5, 2007

file objects are iterables themselves, so there's no need to do that
by using a method.

Very true! Darn it!

I'd use a generator that fetches five lines of the file per
iteration and iterate over it instead of the file directly.

I still haven't figured out how to use generators, so this didn't even
come to mind. I usually see something like this example for reading a
file:

f = open(somefile)
for line in f:
# do something

http://docs.python.org/tut/node9.html

Okay, so they didn't use readline. I wonder where I saw that.

Definitely -- and also do your homework yourself

Regards,

Björn

Mike

James Stroud · Sep 5, 2007

I have a text source file of about 20.000 lines.
that file, grab the next 5 lines write these to a new file... grabbing
5 lines and creating new files until processing of all 20.000 lines is
done.
Is there an efficient way to do this in Python?

You should use a nested loop.

In advance, thanks for your help.

You're welcome.

--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com/

Paddy · Sep 5, 2007

I have a text source file of about 20.000 lines.>From this file, I like to write the first 5 lines to a new file. Close

that file, grab the next 5 lines write these to a new file... grabbing
5 lines and creating new files until processing of all 20.000 lines is
done.
Is there an efficient way to do this in Python?
In advance, thanks for your help.

If its on unix: use split.
If its your homework: show us what you have so far...

- Paddy.

malibuster · Sep 5, 2007

If its on unix: use split.
If its your homework: show us what you have so far...

- Paddy.

Paddy,

Thanks for making me aware of the (UNIX) split command (split -l 5
inFile.txt), it's short, it's fast, it's beautiful.

I am still wondering how to do this efficiently in Python (being kind
of new to it... and it's not for homework).

-- Martin.

I am still wondering how to do this in Python (being new to Python)

Arnaud Delobelle · Sep 5, 2007

I have a text source file of about 20.000 lines.>From this file, I like to write the first 5 lines to a new file. Close

that file, grab the next 5 lines write these to a new file... grabbing
5 lines and creating new files until processing of all 20.000 lines is
done.
Is there an efficient way to do this in Python?
Sure!

In advance, thanks for your help.

from my_useful_functions import new_file, write_first_5_lines,
done_processing_file, grab_next_5_lines, another_new_file, write_these

in_f = open('myfile')
out_f = new_file()
write_first_5_lines(in_f, out_f) # write first 5 lines
close(out_f)
while not done_processing_file(in_f): # until done processing
lines = grab_next_5_lines(in_f) # grab next 5 lines
out_f = another_new_file()
write_these(lines, out_f) # write these
close(out_f)
print "all done!" # All done
print "Now there are 4000 files in this directory..."

Python 3.0 - ready (I've used open() instead of file())

HTH

Steve Holden · Sep 6, 2007

Arnaud Delobelle wrote:
[...]

from my_useful_functions import new_file, write_first_5_lines,
done_processing_file, grab_next_5_lines, another_new_file, write_these

in_f = open('myfile')
out_f = new_file()
write_first_5_lines(in_f, out_f) # write first 5 lines
close(out_f)
while not done_processing_file(in_f): # until done processing
lines = grab_next_5_lines(in_f) # grab next 5 lines
out_f = another_new_file()
write_these(lines, out_f) # write these
close(out_f)
print "all done!" # All done
print "Now there are 4000 files in this directory..."

Python 3.0 - ready (I've used open() instead of file())

bzzzzzzzzzzt!

Python 3.0a1 (py3k:57844, Aug 31 2007, 16:54:27) ...
Type "help", "copyright", "credits" or "license" for more information. File "<stdin>", line 1
print "all done!" # All done
^
SyntaxError: invalid syntax
Close, but no cigar ;-)

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden
--------------- Asciimercial ------------------
Get on the web: Blog, lens and tag the Internet
Many services currently offer free registration
----------- Thank You for Reading -------------

Arnaud Delobelle · Sep 6, 2007

Arnaud Delobelle wrote: [...]

print "all done!" # All done
print "Now there are 4000 files in this directory..."

Click to expand...

Python 3.0 - ready (I've used open() instead of file())

Click to expand...

bzzzzzzzzzzt!

Python 3.0a1 (py3k:57844, Aug 31 2007, 16:54:27) ...
Type "help", "copyright", "credits" or "license" for more information.File "<stdin>", line 1
print "all done!" # All done
^
SyntaxError: invalid syntax

Damn! That'll teach me to make such bold claims.
At least I'm unlikely to forget again now...

Alberto Griggio · Sep 6, 2007

Thanks for making me aware of the (UNIX) split command (split -l 5

inFile.txt), it's short, it's fast, it's beautiful.

I am still wondering how to do this efficiently in Python (being kind
of new to it... and it's not for homework).

Something like this should do the job:

def nlines(num, fileobj):
done = [False]
def doit():
for i in xrange(num):
l = fileobj.readline()
if not l:
done[0] = True
return
yield l
while not done[0]:
yield doit()

for i, group in enumerate(nlines(5, open('bigfile.txt'))):
out = open('chunk_%d.txt' % i)
for line in group:
out.write(line)

I am still wondering how to do this in Python (being new to Python)

This is just one way of doing it, but not as concise as using split...

Alberto

Arnau Sanchez · Sep 6, 2007

(e-mail address removed) escribió:

I am still wondering how to do this efficiently in Python (being kind
of new to it... and it's not for homework).

You should post some code anyway, it would be easier to give useful advice (it
would also demonstrate that you put some effort on it).

Anyway, here is an option. Text-file objects are line-iterable, so you could use
itertools (perhaps a bit difficult module for a newbie...):

from itertools import islice, takewhile, repeat

def take(it, n):
return list(islice(it, n))

def readnlines(fd, n):
return takewhile(bool, (take(fd, n) for _ in repeat(None)))

def splitfile(path, prefix, nlines, suffix_digits):
sformat = "%%0%dd" % suffix_digits
for index, lines in enumerate(readnlines(file(path), nlines)):
open("%s_%s"%(prefix, sformat % index), "w").writelines(lines)

splitfile("/etc/services", "out", 5, 4)

arnau

Shawn Milochik · Sep 6, 2007

Here's my solution, for what it's worth:

#!/usr/bin/env python

import os

input = open("test.txt", "r")

counter = 0
fileNum = 0
fileName = ""

def newFileName():

global fileNum, fileName

while os.path.exists(fileName) or fileName == "":
fileNum += 1
x = "%0.5d" % fileNum
fileName = "%s.tmp" % x

return fileName

for line in input:

if (fileName == "") or (counter == 5):
if fileName:
output.close()
fileName = newFileName()
counter = 0
output = open(fileName, "w")

output.write(line)
counter += 1

output.close()

=?ISO-8859-1?Q?Ricardo_Ar=E1oz?= · Sep 6, 2007

Maybe (untested):

def read5Lines(f):
L = f.readline()
while L :
yield (L,f.readline(),f.readline(),f.readline(),f.readline())
L = f.readline()

in = open('C:\YourFile','rb')
for fileNo, fiveLines in enumerate(read5Lines(in)) :
out = open('c:\OutFile'+str(fileNo), 'wb')
out.writelines(fiveLines)
out.close()

or something similar? (notice that in the last output file you may have
a few (4 at most) blank lines)

George Sakkis · Sep 7, 2007

Paddy,

Thanks for making me aware of the (UNIX) split command (split -l 5
inFile.txt), it's short, it's fast, it's beautiful.

I am still wondering how to do this efficiently in Python (being kind
of new to it... and it's not for homework).

-- Martin.

I am still wondering how to do this in Python (being new to Python)

If this was a code golf challenge, a decent entry (146 chars) could
be:

import itertools as it
for i,g in it.groupby(enumerate(open('input.txt')),lambda(i,_):i/
5)

pen("output.%d.txt"%i,'w').writelines(s for _,s in g)

or a bit less cryptically:

import itertools as it
for chunk,enum_lines in it.groupby(enumerate(open('input.txt')),
lambda (i,line): i//5):
open("output.%d.txt" % chunk, 'w').writelines(line for _,line
in enum_lines)

George

Paddy · Sep 7, 2007

On Sep 5, 5:17 pm, "(e-mail address removed)" <[email protected]>
wrote:
If this was a code golf challenge,

I'd choose the Unix split solution and be both maintainable as well as
concise

- Paddy.

Processing in Python help	0	Aug 31, 2022
Text File Only Programming	1	May 10, 2023
Write a Python program according to the task, using modules of the standard library: os, os.path and pickle. Provide processing of the specified	1	Nov 3, 2022
Write a Python program according to the task, using modules of the standard library: os, os.path and pickle.Provide processing of the specified	0	Oct 24, 2022
Help with importing from multiple files and printing lines in designated spot to spit out one file.	1	Jan 16, 2023
Php combine identical lines in text file	4	Oct 11, 2023
emacs lisp text processing example (html5 figure/figcaption)	7	Jul 4, 2011
Text processing	29	Sep 26, 2011

Text processing and file creation

malibuster

kyosohma

Arnau Sanchez

Bjoern Schliessmann

Shawn Milochik

kyosohma

James Stroud

Paddy

malibuster

Arnaud Delobelle

Steve Holden

Arnaud Delobelle

Alberto Griggio

Arnau Sanchez

Shawn Milochik

=?ISO-8859-1?Q?Ricardo_Ar=E1oz?=

George Sakkis

Paddy

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads