randomly write to a file

Discussion in 'Python' started by rohit, May 7, 2007.

  1. rohit

    rohit Guest

    hi,
    i am developing a desktop search.For the index of the files i have
    developed an algorithm with which
    i should be able to read and write to a line if i know its line
    number.
    i can read a specified line by using the module linecache
    but i am struck as to how to implement writing to the n(th) line in a
    file EFFICIENTLY
    which means i don't want to traverse the file sequentially to reach
    the n(th) line

    Please help.
    Regards
    Rohit
     
    rohit, May 7, 2007
    #1
    1. Advertising

  2. rohit

    Guest

    On May 7, 2:51 pm, rohit <> wrote:
    > hi,
    > i am developing a desktop search.For the index of the files i have
    > developed an algorithm with which
    > i should be able to read and write to a line if i know its line
    > number.
    > i can read a specified line by using the module linecache
    > but i am struck as to how to implement writing to the n(th) line in a
    > file EFFICIENTLY
    > which means i don't want to traverse the file sequentially to reach
    > the n(th) line
    >
    > Please help.
    > Regards
    > Rohit


    Hi,

    Looking through the archives, it looks like some recommend reading the
    file into a list and doing it that way. And if they file is too big,
    than use a database. See links below:

    http://mail.python.org/pipermail/tutor/2006-March/045571.html
    http://mail.python.org/pipermail/tutor/2006-March/045572.html

    I also found this interesting idea that explains what would be needed
    to accomplish this task:

    http://mail.python.org/pipermail/python-list/2001-April/076890.html

    Have fun!

    Mike
     
    , May 7, 2007
    #2
    1. Advertising

  3. En Mon, 07 May 2007 16:51:37 -0300, rohit <>
    escribió:

    > i am developing a desktop search.For the index of the files i have
    > developed an algorithm with which
    > i should be able to read and write to a line if i know its line
    > number.
    > i can read a specified line by using the module linecache
    > but i am struck as to how to implement writing to the n(th) line in a
    > file EFFICIENTLY
    > which means i don't want to traverse the file sequentially to reach
    > the n(th) line


    You can only replace a line in-place with another of exactly the same
    length. If the lengths differ, you have to write the modified line and all
    the following ones.
    If all your lines are of fixed length, you have a "record". To read record
    N (counting from 0):
    a_file.seek(N*record_length)
    return a_file.read(record_length)
    And then you are reinventing ISAM.

    --
    Gabriel Genellina
     
    Gabriel Genellina, May 7, 2007
    #3
  4. Rohit,

    Consider using an SQLite database. It comes with Python 2.5 and
    higher. SQLite will do a nice job keeping track of the index. You can
    easily find the line you need with a SQL query and your can write to
    it as well. When you have a file and you write to one line of the
    file, all of the rest of the lines will have to be shifted to
    accommodate, the potentially larger new line.

    -Nick Vatamaniuc


    On May 7, 3:51 pm, rohit <> wrote:
    > hi,
    > i am developing a desktop search.For the index of the files i have
    > developed an algorithm with which
    > i should be able to read and write to a line if i know its line
    > number.
    > i can read a specified line by using the module linecache
    > but i am struck as to how to implement writing to the n(th) line in a
    > file EFFICIENTLY
    > which means i don't want to traverse the file sequentially to reach
    > the n(th) line
    >
    > Please help.
    > Regards
    > Rohit
     
    Nick Vatamaniuc, May 7, 2007
    #4
  5. rohit

    rohit Guest

    nick,
    i just wanted to ask for time constrained applications like searching
    won't sqlite be a expensive approach.
    i mean searching and editing o the files is less expensive by the time
    taken .
    so i need an approach which will allow me writing randomly to a line
    in file without using a database
    On May 8, 2:41 am, Nick Vatamaniuc <> wrote:
    > Rohit,
    >
    > Consider using an SQLite database. It comes with Python 2.5 and
    > higher. SQLite will do a nice job keeping track of the index. You can
    > easily find the line you need with a SQL query and your can write to
    > it as well. When you have a file and you write to one line of the
    > file, all of the rest of the lines will have to be shifted to
    > accommodate, the potentially larger new line.
    >
    > -Nick Vatamaniuc
    >
     
    rohit, May 7, 2007
    #5
  6. rohit

    rohit Guest

    hi gabriel,
    i am utilizing file names and their paths which are written to a file
    on a singe line.
    now if i use records that would be wasting too much space as there is
    no limit on the no. of characters (at max) in the path.
    next best approach i can think of is reading the file in memory
    editing it and writing the portion that has just been altered and the
    followiing lines
    but is there a better approach you can highlight?

    > You can only replace a line in-place with another of exactly the same
    > length. If the lengths differ, you have to write the modified line and all
    > the following ones.
    > If all your lines are of fixed length, you have a "record". To read record
    > N (counting from 0):
    > a_file.seek(N*record_length)
    > return a_file.read(record_length)
    > And then you are reinventing ISAM.
    >
    > --
    > Gabriel Genellina
     
    rohit, May 7, 2007
    #6
  7. On Mon, 07 May 2007 12:51:37 -0700, rohit wrote:

    > i can read a specified line by using the module linecache but i am
    > struck as to how to implement writing to the n(th) line in a file
    > EFFICIENTLY
    > which means i don't want to traverse the file sequentially to reach the
    > n(th) line


    Unless you are lucky enough to be using an OS that supports random-access
    line access to text files natively, if such a thing even exists, you
    can't because you don't know how long each line will be.

    If you can guarantee fixed-length lines, then you can use file.seek() to
    jump to the appropriate byte position.

    If the lines are random lengths, but you can control access to the files
    so other applications can't write to them, you can keep an index table,
    which you update as needed.

    Otherwise, if the files are small enough, say up to 20 or 40MB each, just
    read them entirely into memory.

    Otherwise, you're out of luck.


    --
    Steven.
     
    Steven D'Aprano, May 8, 2007
    #7
  8. On Mon, 07 May 2007 14:41:02 -0700, Nick Vatamaniuc wrote:

    > Rohit,
    >
    > Consider using an SQLite database. It comes with Python 2.5 and higher.
    > SQLite will do a nice job keeping track of the index. You can easily
    > find the line you need with a SQL query and your can write to it as
    > well. When you have a file and you write to one line of the file, all of
    > the rest of the lines will have to be shifted to accommodate, the
    > potentially larger new line.



    Using an database for tracking line number and byte position -- isn't
    that a bit overkill?

    I would have thought something as simple as a list of line lengths would
    do:

    offsets = [35, # first line is 35 bytes long
    19, # second line is 19 bytes long...
    45, 12, 108, 67]


    To get to the nth line, you have to seek to byte position:

    sum(offsets[:n])



    --
    Steven.
     
    Steven D'Aprano, May 8, 2007
    #8
  9. Steven D'Aprano <> wrote:

    > On Mon, 07 May 2007 14:41:02 -0700, Nick Vatamaniuc wrote:
    >
    > > Rohit,
    > >
    > > Consider using an SQLite database. It comes with Python 2.5 and higher.
    > > SQLite will do a nice job keeping track of the index. You can easily
    > > find the line you need with a SQL query and your can write to it as
    > > well. When you have a file and you write to one line of the file, all of
    > > the rest of the lines will have to be shifted to accommodate, the
    > > potentially larger new line.

    >
    >
    > Using an database for tracking line number and byte position -- isn't
    > that a bit overkill?
    >
    > I would have thought something as simple as a list of line lengths would
    > do:
    >
    > offsets = [35, # first line is 35 bytes long
    > 19, # second line is 19 bytes long...
    > 45, 12, 108, 67]
    >
    >
    > To get to the nth line, you have to seek to byte position:
    >
    > sum(offsets[:n])


    ....and then you STILL can't write there (without reading and rewriting
    all the succeeding part of the file) unless the line you're writing is
    always the same length as the one you're overwriting, which doesn't seem
    to be part of the constraints in the OP's original application. I'm
    with Nick in recommending SQlite for the purpose -- it _IS_ quite
    "lite", as its name suggests. BSD-DB (a DB that's much more complicated
    to use, being far lower-level, but by the same token affords you
    extremely fine-grained control of operations) might be an alternative
    IF, after first having coded the application with SQLite, you can indeed
    prove, profiler in hand, that it's a serious bottleneck. However,
    premature optimization is the root of all evil in programming.


    Alex
     
    Alex Martelli, May 8, 2007
    #9
  10. On Mon, 07 May 2007 20:00:57 -0700, Alex Martelli wrote:

    > Steven D'Aprano <> wrote:
    >
    >> On Mon, 07 May 2007 14:41:02 -0700, Nick Vatamaniuc wrote:
    >>
    >> > Rohit,
    >> >
    >> > Consider using an SQLite database. It comes with Python 2.5 and
    >> > higher. SQLite will do a nice job keeping track of the index. You can
    >> > easily find the line you need with a SQL query and your can write to
    >> > it as well. When you have a file and you write to one line of the
    >> > file, all of the rest of the lines will have to be shifted to
    >> > accommodate, the potentially larger new line.

    >>
    >>
    >> Using an database for tracking line number and byte position -- isn't
    >> that a bit overkill?
    >>
    >> I would have thought something as simple as a list of line lengths
    >> would do:
    >>
    >> offsets = [35, # first line is 35 bytes long
    >> 19, # second line is 19 bytes long... 45, 12, 108, 67]
    >>
    >>
    >> To get to the nth line, you have to seek to byte position:
    >>
    >> sum(offsets[:n])

    >
    > ...and then you STILL can't write there (without reading and rewriting
    > all the succeeding part of the file) unless the line you're writing is
    > always the same length as the one you're overwriting, which doesn't seem
    > to be part of the constraints in the OP's original application. I'm
    > with Nick in recommending SQlite for the purpose -- it _IS_ quite
    > "lite", as its name suggests.



    Hang on, as I understand it, Nick just suggesting using SQlite for
    holding indexes into the file! That's why I said it was overkill. So
    whether the indexes are in a list or a database, you've _still_ got to
    deal with writing to the file.

    If I've misunderstood Nick's suggestion, if he actually meant to read the
    entire text file into the database, well, that's just a heavier version
    of reading the file into a list of strings, isn't it? If the database
    gives you more and/or better functionality than file.readlines(), then I
    have no problem with using the right tool for the job.


    --
    Steven.
     
    Steven D'Aprano, May 8, 2007
    #10
  11. Steven D'Aprano <> wrote:
    ...
    > Hang on, as I understand it, Nick just suggesting using SQlite for
    > holding indexes into the file! That's why I said it was overkill. So
    > whether the indexes are in a list or a database, you've _still_ got to
    > deal with writing to the file.
    >
    > If I've misunderstood Nick's suggestion, if he actually meant to read the
    > entire text file into the database, well, that's just a heavier version
    > of reading the file into a list of strings, isn't it? If the database
    > gives you more and/or better functionality than file.readlines(), then I
    > have no problem with using the right tool for the job.


    Ah well, I may have misunderstood myself. I'd keep the whole thing in
    an SQlite table, definitely NOT a table + an external file -- no, that's
    not going to be heavier than reading things in memory, SQLite is smarter
    than one might think:). Obviously, I'm assuming that one's dealing
    with an amount of data that doesn't just comfortably and easily fit in
    memory, or at least one that gives pause at the thought of sucking it
    all into memory and writing it back out again at every program run.


    Alex
     
    Alex Martelli, May 8, 2007
    #11
  12. On Tue, 08 May 2007 00:56:59 -0000, Steven D'Aprano
    <> declaimed the following in
    comp.lang.python:

    > Unless you are lucky enough to be using an OS that supports random-access
    > line access to text files natively, if such a thing even exists, you
    > can't because you don't know how long each line will be.
    >

    Xerox CP/V (mid 1970s)... The default format for text files that
    have been passed through the text editor was "keyed", wherein the editor
    line number (scaled by 10^3 or so as the editor supported line numbers
    of 10.123 if one had inserted lines between others) became an ISAM key
    for the record (those keys were also directly usable in FORTRAN-IV for
    direct access I/O). One had to use a separate command line utility to
    convert the file from "keyed" to "consecutive" (equivalent to a Unix
    text stream -- no structure, just a stream of bytes, with I/O utilities
    considering lines by the line-ending character(s)). CP/V also had
    "random" files -- which were a set of contiguous disk blocks
    (consecutive and keyed could be scattered across the disk sectors, but
    not so for random), and the OS did nothing about the contents; the
    application basically maintained all control.

    Of course, CP/V also had four open modes: input (read only), output
    (write only), scratch (two I/O pointers, must write 1 or more records
    before performing a read), update (two pointers, must read 1 or more
    records before performing a write).

    --
    Wulfraed Dennis Lee Bieber KD6MOG

    HTTP://wlfraed.home.netcom.com/
    (Bestiaria Support Staff: )
    HTTP://www.bestiaria.com/
     
    Dennis Lee Bieber, May 8, 2007
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. George Ter-Saakov

    Re: Randomly Assign File

    George Ter-Saakov, Aug 21, 2003, in forum: ASP .Net
    Replies:
    0
    Views:
    383
    George Ter-Saakov
    Aug 21, 2003
  2. Jack Moskowitz

    Re: Randomly Assign File

    Jack Moskowitz, Aug 23, 2003, in forum: ASP .Net
    Replies:
    0
    Views:
    436
    Jack Moskowitz
    Aug 23, 2003
  3. =?Utf-8?B?TWFydGluSg==?=

    Upload file functionality fails randomly

    =?Utf-8?B?TWFydGluSg==?=, May 8, 2007, in forum: ASP .Net
    Replies:
    2
    Views:
    324
    Walter Wang [MSFT]
    May 14, 2007
  4. Jonathan Wood
    Replies:
    0
    Views:
    633
    Jonathan Wood
    Jan 23, 2008
  5. Zachary
    Replies:
    76
    Views:
    563
    Matt Garrish
    Apr 30, 2006
Loading...

Share This Page