Re: How to safely maintain a status file

Discussion in 'Python' started by Plumo, Jul 9, 2012.

  1. Plumo

    Plumo Guest

    > What are you keeping in this status file that needs to be saved
    > several times per second?  Depending on what type of state you're
    > storing and how persistent it needs to be, there may be a better way
    > to store it.
    >
    > Michael


    This is for a threaded web crawler. I want to cache what URL's are
    currently in the queue so if terminated the crawler can continue next
    time from the same point.
    Plumo, Jul 9, 2012
    #1
    1. Advertising

  2. Please consider batching this data and doing larger writes. Thrashing
    the hard drive is not a good plan for performance or hardware
    longevity. For example, crawl an entire FQDN and then write out the
    results in one operation. If your job fails in the middle and you
    have to start that FQDN over, no big deal. If that's too big of a
    chunk for your purposes, perhaps break each FQDN up into top-level
    directories and crawl each of those in one operation before writing to
    disk.

    There are existing solutions for managing job queues, so you can
    choose what you like. If you're unfamiliar, maybe start by looking at
    celery.

    Michael

    On Mon, Jul 9, 2012 at 1:52 AM, Plumo <> wrote:
    >> What are you keeping in this status file that needs to be saved
    >> several times per second? Depending on what type of state you're
    >> storing and how persistent it needs to be, there may be a better way
    >> to store it.
    >>
    >> Michael

    >
    > This is for a threaded web crawler. I want to cache what URL's are
    > currently in the queue so if terminated the crawler can continue next
    > time from the same point.
    > --
    > http://mail.python.org/mailman/listinfo/python-list
    Michael Hrivnak, Jul 9, 2012
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Richard Baron Penman

    How to safely maintain a status file

    Richard Baron Penman, Jul 8, 2012, in forum: Python
    Replies:
    0
    Views:
    179
    Richard Baron Penman
    Jul 8, 2012
  2. Dennis Lee Bieber

    Re: How to safely maintain a status file

    Dennis Lee Bieber, Jul 8, 2012, in forum: Python
    Replies:
    1
    Views:
    169
    Plumo
    Jul 9, 2012
  3. Laszlo Nagy

    Re: How to safely maintain a status file

    Laszlo Nagy, Jul 8, 2012, in forum: Python
    Replies:
    1
    Views:
    205
    Nobody
    Jul 9, 2012
  4. Plumo
    Replies:
    1
    Views:
    202
    Laszlo Nagy
    Jul 12, 2012
  5. John Nagle
    Replies:
    2
    Views:
    300
    Laszlo Nagy
    Jul 12, 2012
Loading...

Share This Page