randomly write to a file

R

rohit

hi,
i am developing a desktop search.For the index of the files i have
developed an algorithm with which
i should be able to read and write to a line if i know its line
number.
i can read a specified line by using the module linecache
but i am struck as to how to implement writing to the n(th) line in a
file EFFICIENTLY
which means i don't want to traverse the file sequentially to reach
the n(th) line

Please help.
Regards
Rohit
 
K

kyosohma

hi,
i am developing a desktop search.For the index of the files i have
developed an algorithm with which
i should be able to read and write to a line if i know its line
number.
i can read a specified line by using the module linecache
but i am struck as to how to implement writing to the n(th) line in a
file EFFICIENTLY
which means i don't want to traverse the file sequentially to reach
the n(th) line

Please help.
Regards
Rohit

Hi,

Looking through the archives, it looks like some recommend reading the
file into a list and doing it that way. And if they file is too big,
than use a database. See links below:

http://mail.python.org/pipermail/tutor/2006-March/045571.html
http://mail.python.org/pipermail/tutor/2006-March/045572.html

I also found this interesting idea that explains what would be needed
to accomplish this task:

http://mail.python.org/pipermail/python-list/2001-April/076890.html

Have fun!

Mike
 
G

Gabriel Genellina

i am developing a desktop search.For the index of the files i have
developed an algorithm with which
i should be able to read and write to a line if i know its line
number.
i can read a specified line by using the module linecache
but i am struck as to how to implement writing to the n(th) line in a
file EFFICIENTLY
which means i don't want to traverse the file sequentially to reach
the n(th) line

You can only replace a line in-place with another of exactly the same
length. If the lengths differ, you have to write the modified line and all
the following ones.
If all your lines are of fixed length, you have a "record". To read record
N (counting from 0):
a_file.seek(N*record_length)
return a_file.read(record_length)
And then you are reinventing ISAM.
 
N

Nick Vatamaniuc

Rohit,

Consider using an SQLite database. It comes with Python 2.5 and
higher. SQLite will do a nice job keeping track of the index. You can
easily find the line you need with a SQL query and your can write to
it as well. When you have a file and you write to one line of the
file, all of the rest of the lines will have to be shifted to
accommodate, the potentially larger new line.

-Nick Vatamaniuc
 
R

rohit

nick,
i just wanted to ask for time constrained applications like searching
won't sqlite be a expensive approach.
i mean searching and editing o the files is less expensive by the time
taken .
so i need an approach which will allow me writing randomly to a line
in file without using a database
 
R

rohit

hi gabriel,
i am utilizing file names and their paths which are written to a file
on a singe line.
now if i use records that would be wasting too much space as there is
no limit on the no. of characters (at max) in the path.
next best approach i can think of is reading the file in memory
editing it and writing the portion that has just been altered and the
followiing lines
but is there a better approach you can highlight?
 
S

Steven D'Aprano

i can read a specified line by using the module linecache but i am
struck as to how to implement writing to the n(th) line in a file
EFFICIENTLY
which means i don't want to traverse the file sequentially to reach the
n(th) line

Unless you are lucky enough to be using an OS that supports random-access
line access to text files natively, if such a thing even exists, you
can't because you don't know how long each line will be.

If you can guarantee fixed-length lines, then you can use file.seek() to
jump to the appropriate byte position.

If the lines are random lengths, but you can control access to the files
so other applications can't write to them, you can keep an index table,
which you update as needed.

Otherwise, if the files are small enough, say up to 20 or 40MB each, just
read them entirely into memory.

Otherwise, you're out of luck.
 
S

Steven D'Aprano

Rohit,

Consider using an SQLite database. It comes with Python 2.5 and higher.
SQLite will do a nice job keeping track of the index. You can easily
find the line you need with a SQL query and your can write to it as
well. When you have a file and you write to one line of the file, all of
the rest of the lines will have to be shifted to accommodate, the
potentially larger new line.


Using an database for tracking line number and byte position -- isn't
that a bit overkill?

I would have thought something as simple as a list of line lengths would
do:

offsets = [35, # first line is 35 bytes long
19, # second line is 19 bytes long...
45, 12, 108, 67]


To get to the nth line, you have to seek to byte position:

sum(offsets[:n])
 
A

Alex Martelli

Steven D'Aprano said:
Rohit,

Consider using an SQLite database. It comes with Python 2.5 and higher.
SQLite will do a nice job keeping track of the index. You can easily
find the line you need with a SQL query and your can write to it as
well. When you have a file and you write to one line of the file, all of
the rest of the lines will have to be shifted to accommodate, the
potentially larger new line.


Using an database for tracking line number and byte position -- isn't
that a bit overkill?

I would have thought something as simple as a list of line lengths would
do:

offsets = [35, # first line is 35 bytes long
19, # second line is 19 bytes long...
45, 12, 108, 67]


To get to the nth line, you have to seek to byte position:

sum(offsets[:n])

....and then you STILL can't write there (without reading and rewriting
all the succeeding part of the file) unless the line you're writing is
always the same length as the one you're overwriting, which doesn't seem
to be part of the constraints in the OP's original application. I'm
with Nick in recommending SQlite for the purpose -- it _IS_ quite
"lite", as its name suggests. BSD-DB (a DB that's much more complicated
to use, being far lower-level, but by the same token affords you
extremely fine-grained control of operations) might be an alternative
IF, after first having coded the application with SQLite, you can indeed
prove, profiler in hand, that it's a serious bottleneck. However,
premature optimization is the root of all evil in programming.


Alex
 
S

Steven D'Aprano

Steven D'Aprano said:
Rohit,

Consider using an SQLite database. It comes with Python 2.5 and
higher. SQLite will do a nice job keeping track of the index. You can
easily find the line you need with a SQL query and your can write to
it as well. When you have a file and you write to one line of the
file, all of the rest of the lines will have to be shifted to
accommodate, the potentially larger new line.


Using an database for tracking line number and byte position -- isn't
that a bit overkill?

I would have thought something as simple as a list of line lengths
would do:

offsets = [35, # first line is 35 bytes long
19, # second line is 19 bytes long... 45, 12, 108, 67]


To get to the nth line, you have to seek to byte position:

sum(offsets[:n])

...and then you STILL can't write there (without reading and rewriting
all the succeeding part of the file) unless the line you're writing is
always the same length as the one you're overwriting, which doesn't seem
to be part of the constraints in the OP's original application. I'm
with Nick in recommending SQlite for the purpose -- it _IS_ quite
"lite", as its name suggests.


Hang on, as I understand it, Nick just suggesting using SQlite for
holding indexes into the file! That's why I said it was overkill. So
whether the indexes are in a list or a database, you've _still_ got to
deal with writing to the file.

If I've misunderstood Nick's suggestion, if he actually meant to read the
entire text file into the database, well, that's just a heavier version
of reading the file into a list of strings, isn't it? If the database
gives you more and/or better functionality than file.readlines(), then I
have no problem with using the right tool for the job.
 
A

Alex Martelli

Steven D'Aprano said:
Hang on, as I understand it, Nick just suggesting using SQlite for
holding indexes into the file! That's why I said it was overkill. So
whether the indexes are in a list or a database, you've _still_ got to
deal with writing to the file.

If I've misunderstood Nick's suggestion, if he actually meant to read the
entire text file into the database, well, that's just a heavier version
of reading the file into a list of strings, isn't it? If the database
gives you more and/or better functionality than file.readlines(), then I
have no problem with using the right tool for the job.

Ah well, I may have misunderstood myself. I'd keep the whole thing in
an SQlite table, definitely NOT a table + an external file -- no, that's
not going to be heavier than reading things in memory, SQLite is smarter
than one might think:). Obviously, I'm assuming that one's dealing
with an amount of data that doesn't just comfortably and easily fit in
memory, or at least one that gives pause at the thought of sucking it
all into memory and writing it back out again at every program run.


Alex
 
D

Dennis Lee Bieber

Unless you are lucky enough to be using an OS that supports random-access
line access to text files natively, if such a thing even exists, you
can't because you don't know how long each line will be.
Xerox CP/V (mid 1970s)... The default format for text files that
have been passed through the text editor was "keyed", wherein the editor
line number (scaled by 10^3 or so as the editor supported line numbers
of 10.123 if one had inserted lines between others) became an ISAM key
for the record (those keys were also directly usable in FORTRAN-IV for
direct access I/O). One had to use a separate command line utility to
convert the file from "keyed" to "consecutive" (equivalent to a Unix
text stream -- no structure, just a stream of bytes, with I/O utilities
considering lines by the line-ending character(s)). CP/V also had
"random" files -- which were a set of contiguous disk blocks
(consecutive and keyed could be scattered across the disk sectors, but
not so for random), and the OS did nothing about the contents; the
application basically maintained all control.

Of course, CP/V also had four open modes: input (read only), output
(write only), scratch (two I/O pointers, must write 1 or more records
before performing a read), update (two pointers, must read 1 or more
records before performing a write).

--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,014
Latest member
BiancaFix3

Latest Threads

Top