Speeding up network access: threading?

Discussion in 'Python' started by Jens Müller, Jan 4, 2010.

  1. Jens Müller

    Jens Müller Guest

    Hello,

    what would be best practise for speeding up a larger number of http-get
    requests done via urllib? Until now they are made in sequence, each request
    taking up to one second. The results must be merged into a list, while the
    original sequence needs not to be kept.

    I think speed could be improved by parallizing. One could use multiple
    threads.
    Are there any python best practises, or even existing modules, for creating
    and handling a task queue with a fixed number of concurrent threads?

    Thanks and regards!
    Jens Müller, Jan 4, 2010
    #1
    1. Advertising

  2. Jens Müller

    Guest

    On 04:22 pm, wrote:
    >Hello,
    >
    >what would be best practise for speeding up a larger number of http-get
    >requests done via urllib? Until now they are made in sequence, each
    >request taking up to one second. The results must be merged into a
    >list, while the original sequence needs not to be kept.
    >
    >I think speed could be improved by parallizing. One could use multiple
    >threads.
    >Are there any python best practises, or even existing modules, for
    >creating and handling a task queue with a fixed number of concurrent
    >threads?


    Using multiple threads is one approach. There are a few thread pool
    implementations lying about; one is part of Twisted,
    <http://twistedmatrix.com/documents/current/api/twisted.python.threadpool.ThreadPool.html>.

    Another approach is to use non-blocking or asynchronous I/O to make
    multiple requests without using multiple threads. Twisted can help you
    out with this, too. There's two async HTTP client APIs available. The
    older one:

    http://twistedmatrix.com/documents/current/api/twisted.web.client.getPage.html
    http://twistedmatrix.com/documents/current/api/twisted.web.client.HTTPClientFactory.html

    And the newer one, introduced in 9.0:

    http://twistedmatrix.com/documents/current/api/twisted.web.client.Agent.html

    Jean-Paul
    , Jan 4, 2010
    #2
    1. Advertising

  3. Jens Müller

    Terry Reedy Guest

    On 1/4/2010 11:22 AM, Jens Müller wrote:
    > Hello,
    >
    > what would be best practise for speeding up a larger number of http-get
    > requests done via urllib? Until now they are made in sequence, each
    > request taking up to one second. The results must be merged into a list,
    > while the original sequence needs not to be kept.
    >
    > I think speed could be improved by parallizing. One could use multiple
    > threads.
    > Are there any python best practises, or even existing modules, for
    > creating and handling a task queue with a fixed number of concurrent
    > threads?


    I believe code of this type has been published here in various threads.
    The fairly obvious thing to do is use a queue.queue for tasks and
    another for results and a pool of threads that read, fetch, and write.
    Terry Reedy, Jan 4, 2010
    #3
  4. Jens Müller

    Jens Müller Guest

    Hello,

    > The fairly obvious thing to do is use a queue.queue for tasks and another
    > for results and a pool of threads that read, fetch, and write.


    Thanks, indeed.

    Is a list thrad-safe or do I need to lock when adding the results of my
    worker threads to a list? The order of the elements in the list does not
    matter.

    Jens
    Jens Müller, Jan 5, 2010
    #4
  5. Jens Müller

    Jens Müller Guest

    Hello,

    > The fairly obvious thing to do is use a queue.queue for tasks and another
    > for results and a pool of threads that read, fetch, and write.


    Thanks, indeed.

    Is a list thrad-safe or do I need to lock when adding the results of my
    worker threads to a list? The order of the elements in the list does not
    matter.

    Jens
    Jens Müller, Jan 5, 2010
    #5
  6. Jens Müller

    MRAB Guest

    Jens Müller wrote:
    > Hello,
    >
    >> The fairly obvious thing to do is use a queue.queue for tasks and another
    >> for results and a pool of threads that read, fetch, and write.

    >
    > Thanks, indeed.
    >
    > Is a list thrad-safe or do I need to lock when adding the results of my
    > worker threads to a list? The order of the elements in the list does not
    > matter.
    >

    Terry said "queue". not "list". Use the Queue class (it's thread-safe)
    in the "Queue" module (assuming you're using Python 2.x; in Python 3.x
    it's called the "queue" module).
    MRAB, Jan 5, 2010
    #6
  7. Le Tue, 05 Jan 2010 15:04:56 +0100, Jens Müller a écrit :
    >
    > Is a list thrad-safe or do I need to lock when adding the results of my
    > worker threads to a list? The order of the elements in the list does not
    > matter.


    The built-in list type is thread-safe, but is doesn't provide the waiting
    features that queue.Queue provides.

    Regards

    Antoine.
    Antoine Pitrou, Jan 5, 2010
    #7
  8. Jens Müller

    Jens Müller Guest

    Hi and sorry for double posting - had mailer problems,

    > Terry said "queue". not "list". Use the Queue class (it's thread-safe)
    > in the "Queue" module (assuming you're using Python 2.x; in Python 3.x
    > it's called the "queue" module).


    Yes yes, I know. I use a queue to realize the thread pool queue, that works
    all right.

    But each worker thread calculates a result and needs to make it avaialable
    to the application in the main thread again. Therefore, it appends its
    result to a common list. This seems works as well, but I was thinking of
    possible conflict situations that maybe could happen when two threads append
    their results to that same result list at the same moment.

    Regards,
    Jens
    Jens Müller, Jan 5, 2010
    #8
  9. Jens Müller

    Steve Holden Guest

    Jens Müller wrote:
    > Hi and sorry for double posting - had mailer problems,
    >
    >> Terry said "queue". not "list". Use the Queue class (it's thread-safe)
    >> in the "Queue" module (assuming you're using Python 2.x; in Python 3.x
    >> it's called the "queue" module).

    >
    > Yes yes, I know. I use a queue to realize the thread pool queue, that
    > works all right.
    >
    > But each worker thread calculates a result and needs to make it
    > avaialable to the application in the main thread again. Therefore, it
    > appends its result to a common list. This seems works as well, but I was
    > thinking of possible conflict situations that maybe could happen when
    > two threads append their results to that same result list at the same
    > moment.
    >

    If you don't need to take anything off the list ever, just create a
    separate thread that reads items from an output Queue and appends them
    to the list.

    If you *do* take them off, then use a Queue.

    regards
    Steve
    --
    Steve Holden +1 571 484 6266 +1 800 494 3119
    PyCon is coming! Atlanta, Feb 2010 http://us.pycon.org/
    Holden Web LLC http://www.holdenweb.com/
    UPCOMING EVENTS: http://holdenweb.eventbrite.com/
    Steve Holden, Jan 5, 2010
    #9
  10. Jens Müller

    Steve Holden Guest

    Jens Müller wrote:
    > Hi and sorry for double posting - had mailer problems,
    >
    >> Terry said "queue". not "list". Use the Queue class (it's thread-safe)
    >> in the "Queue" module (assuming you're using Python 2.x; in Python 3.x
    >> it's called the "queue" module).

    >
    > Yes yes, I know. I use a queue to realize the thread pool queue, that
    > works all right.
    >
    > But each worker thread calculates a result and needs to make it
    > avaialable to the application in the main thread again. Therefore, it
    > appends its result to a common list. This seems works as well, but I was
    > thinking of possible conflict situations that maybe could happen when
    > two threads append their results to that same result list at the same
    > moment.
    >

    If you don't need to take anything off the list ever, just create a
    separate thread that reads items from an output Queue and appends them
    to the list.

    If you *do* take them off, then use a Queue.

    regards
    Steve
    --
    Steve Holden +1 571 484 6266 +1 800 494 3119
    PyCon is coming! Atlanta, Feb 2010 http://us.pycon.org/
    Holden Web LLC http://www.holdenweb.com/
    UPCOMING EVENTS: http://holdenweb.eventbrite.com/
    Steve Holden, Jan 5, 2010
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. James R. Saker Jr.

    network server threading

    James R. Saker Jr., Aug 17, 2004, in forum: Python
    Replies:
    0
    Views:
    301
    James R. Saker Jr.
    Aug 17, 2004
  2. Replies:
    9
    Views:
    1,016
    Mark Space
    Dec 29, 2007
  3. Steve555
    Replies:
    10
    Views:
    639
    Steve555
    Dec 5, 2008
  4. Steven Woody
    Replies:
    0
    Views:
    394
    Steven Woody
    Jan 9, 2009
  5. Steven Woody
    Replies:
    0
    Views:
    436
    Steven Woody
    Jan 9, 2009
Loading...

Share This Page