Removing Duplicate entries in a file...

Discussion in 'Python' started by sri2097, Jan 6, 2006.

  1. sri2097

    sri2097 Guest

    Hi all, I'm storing number of dictionary values into a file using the
    'cPickle' module and then am retrieving it. The following is the code
    for it -

    # Code for storing the values in the file
    import cPickle

    book = {raw_input("Name: "): [int(raw_input("Phone: ")),
    raw_input("Address: ")] }
    file_object = file(database, 'w+')
    cPickle.dump(book, file_object)
    file_object.close()

    # Code for retrieving values and modifiing them.
    tobe_modified_name = raw_input("Enter name to be modified: ")
    file_object = file(database)

    while file_object.tell() != EOFError:
    try:
    stored_dict = cPickle.load(file_object)
    if stored_dict.has_key(tobe_modified_name):
    print ("Entry found !")
    # I want to modify the values retrieved from the file and
    then put it back to the file without duplicate entry.
    file_object = file(database, 'a+')
    except EOFError:
    break
    file_object.close()


    Now, my problem is after finding the entry in the file, I want to make
    changes to the 'values' under the searched 'key' and then insert it
    back to the file. But in doing so I'm having duplicate entries for the
    same key. I want to remove the previous key and value entry in the file
    and key the latest one. How to solve this problem ?

    I actually thought of 2 ways -

    1) In Java there is something called 'file_pointer' concept where in
    after you find the entry you are looking for you move all the entries
    below this entry. Then you get the searched entry at the bottom of the
    file. After this truncate the file by a certain bytes to remove the old
    entry. Can we do this in Python using the file.truncate([size]) method
    ?

    2) Although this is a really crappy way but nevertheless I'll put it
    across. First after finding the entry you are looking for in the file,
    make a copy of this file without the entry found in the previous file.
    Make the changes to the 'values' under this key and insert this into
    the second file what you have created. Before exiting delete the first
    file.

    Are there any more ways to solve my problem ? Any criticisms are
    welcome....
    sri2097, Jan 6, 2006
    #1
    1. Advertising

  2. sri2097

    Fuzzyman Guest

    sri2097 wrote:
    > Hi all, I'm storing number of dictionary values into a file using the
    > 'cPickle' module and then am retrieving it. The following is the code
    > for it -
    >
    > # Code for storing the values in the file
    > import cPickle
    >
    > book = {raw_input("Name: "): [int(raw_input("Phone: ")),
    > raw_input("Address: ")] }
    > file_object = file(database, 'w+')
    > cPickle.dump(book, file_object)
    > file_object.close()
    >


    I may be misunderstanding you - but it seems you just want to read a
    pickle, modify it, and then write it back ?

    What you're doing is appending the modified pickle to the original one
    - which is more complicated than what you want to achieve.

    file_object = open(filename, 'rb')
    stored_dict = cPickle.load(file_object)
    file_object.close()

    .... code that modifies stored_dict

    file_object = open(filename, 'wb')
    cPickle.dump(stored_dict, file_object)
    file_object.close()

    Any reason why that shouldn't do what you want ?

    All the best,

    Fuzzyman
    http://www.voidspace.org.uk/python/index.shtml
    Fuzzyman, Jan 6, 2006
    #2
    1. Advertising

  3. sri2097

    sri2097 Guest

    Hi there, I'm just curious to know as to how the changes you have
    suggested will solve the problem. Instead of appending (what I was
    doing), now we are opening and storing the files in 'binary' format.
    All the other entries in my file will be gone when I write into the
    file again.

    What I actuall need is this -

    I have some dictionary values stored in a file. I retrieve these
    entries based on the key value specified by the user. Now if I want to
    modify the values under a particular key, I first search if that key
    exists in the file and if yes retrieve the values associated with the
    key and modify them. Now when I re-insert this modified key-value pair
    back in the file. I have 2 entries now (one is the old wntry and the
    second is the new modified one). So if I search for that key the next
    time I'll have 2 entries for it. That's not what we want. So how do I
    remove the old entry without the other values getting deleted ? In
    other words, keeping the other entries as it is, I want to update a
    particular key-value pair.

    Let me know in case any bright idea strikes...
    sri2097, Jan 6, 2006
    #3
  4. sri2097

    Mike Meyer Guest

    "sri2097" <> writes:
    > Hi all, I'm storing number of dictionary values into a file using the
    > 'cPickle' module and then am retrieving it. The following is the code
    > for it -
    >
    > # Code for storing the values in the file
    > import cPickle
    >
    > book = {raw_input("Name: "): [int(raw_input("Phone: ")),
    > raw_input("Address: ")] }
    > file_object = file(database, 'w+')
    > cPickle.dump(book, file_object)
    > file_object.close()
    >
    > # Code for retrieving values and modifiing them.
    > tobe_modified_name = raw_input("Enter name to be modified: ")
    > file_object = file(database)
    >
    > while file_object.tell() != EOFError:
    > try:
    > stored_dict = cPickle.load(file_object)
    > if stored_dict.has_key(tobe_modified_name):
    > print ("Entry found !")
    > # I want to modify the values retrieved from the file and
    > then put it back to the file without duplicate entry.
    > file_object = file(database, 'a+')
    > except EOFError:
    > break
    > file_object.close()
    >
    >
    > Now, my problem is after finding the entry in the file, I want to make
    > changes to the 'values' under the searched 'key' and then insert it
    > back to the file. But in doing so I'm having duplicate entries for the
    > same key. I want to remove the previous key and value entry in the file
    > and key the latest one. How to solve this problem ?


    First, file_object.tell won't return EOFError. Nothing should return
    EOFError - it's an exception. It should be raised.

    As you noticed, cPickle.load will raise EOFError when called on a file
    that you've reached the end of. However, you want to narrow the
    try clause as much as possible:

    try:
    stored_dict = cPickle.load(file_object)
    except EOFError:
    break

    # Work with stored dict here.

    If you weren't doing a break in the except clause, you'd work with the
    dictionary in an else clause.

    > I actually thought of 2 ways -
    >
    > 1) In Java there is something called 'file_pointer' concept where in
    > after you find the entry you are looking for you move all the entries
    > below this entry. Then you get the searched entry at the bottom of the
    > file. After this truncate the file by a certain bytes to remove the old
    > entry. Can we do this in Python using the file.truncate([size]) method
    > ?


    Yup, this would work. You'd have to save the value from
    file_object.tell() before calling cPickle.load, so you could go back
    to that point to write the next object. You'd either have to load all
    the following objects into memory, or shuttle back and forth between
    the read and write positions. The latter sounds "really crappy" to me.

    > 2) Although this is a really crappy way but nevertheless I'll put it
    > across. First after finding the entry you are looking for in the file,
    > make a copy of this file without the entry found in the previous file.
    > Make the changes to the 'values' under this key and insert this into
    > the second file what you have created. Before exiting delete the first
    > file.


    Actually, there's a good reason for doing it that way. But first,
    another alternative.

    Unless your file is huge (more than a few hundred megabytes), you
    might consider loading the entire thing into memory. Instead of
    calling cPickle.dump multiple times, put all the dictionaries in a
    list, then call cPickle.dump on the list. When you want to update the
    list, cPickle.load will load the entire list, so you can use Python to
    work on it.

    As for saving the file, best practice for updating a file is to write
    it to a temporary file, and then rename the new file to the old name
    after the write has successfully finished. This way, if the write
    fails for some reason, your working file isn't corrupted. Doing it
    this way also makes dealing with the case of the the list being to big
    load into memory easy:

    # Warning, untested code

    while 1:
    try:
    stored_dict = cPickle.load(input_file)
    except EOFError:
    break
    if stored_dict.has_key(tobe_modified_name):
    print "Entry found !"
    # Modify stored_dict here
    cPickle.dump(stored_dict, output_file)

    output_file.close()
    os.unlink(database) # May not be required; depends on your os
    os.rename(datebase_temp, database)


    You'll probably want to handle exceptions from cPickle.dump and
    output_file.close cleanly as well.

    <mike
    --
    Mike Meyer <> http://www.mired.org/home/mwm/
    Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
    Mike Meyer, Jan 6, 2006
    #4
  5. sri2097

    sri2097 Guest

    Thanx Mike, My problem solved !! I loaded the entire file contnets into
    list and my job was a piece of cake after that.

    Srikar
    sri2097, Jan 10, 2006
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    2
    Views:
    552
    Andy Dingley
    Dec 6, 2006
  2. Replies:
    1
    Views:
    441
    Paul Lutus
    Dec 6, 2006
  3. Amit
    Replies:
    3
    Views:
    415
    Richard Herring
    Sep 5, 2005
  4. Don Bruder
    Replies:
    3
    Views:
    964
    spikeysnack
    Aug 3, 2010
  5. ppnair
    Replies:
    0
    Views:
    409
    ppnair
    Oct 11, 2012
Loading...

Share This Page