how to start thread by group?

Discussion in 'Python' started by oyster, Oct 6, 2008.

  1. oyster

    oyster Guest

    my code is not right, can sb give me a hand? thanx

    for example, I have 1000 urls to be downloaded, but only 5 thread at one time
    def threadTask(ulr):
    download(url)

    threadsAll=[]
    for url in all_url:
    task=threading.Thread(target=threadTask, args=) threadsAll.append(task) for... while everytask.isAlive(): pass
    oyster, Oct 6, 2008
    #1
    1. Advertising

  2. oyster

    Guest

    On 6 Ott, 15:24, oyster <> wrote:
    > my code is not right, can sb give me a hand? thanx
    >
    > for example, I have 1000 urls to be downloaded, but only 5 thread at one time
    > def threadTask(ulr):
    >   download(url)
    >
    > threadsAll=[]
    > for url in all_url:
    >      task=threading.Thread(target=threadTask, args=) >      threadsAll.append(task) > ... time.sleep( DELAY ) HTH Ciao ----- FB
    , Oct 6, 2008
    #2
    1. Advertising

  3. En Mon, 06 Oct 2008 11:24:51 -0300, <> escribió:

    > On 6 Ott, 15:24, oyster <> wrote:
    >> my code is not right, can sb give me a hand? thanx
    >>
    >> for example, I have 1000 urls to be downloaded, but only 5 thread at
    >> one time


    > I would restructure my code with someting like this ( WARNING: the
    > following code is
    > ABSOLUTELY UNTESTED and shall be considered only as pseudo-code to
    > express my idea of
    > the algorithm (which, also, could be wrong:) ):


    Your code creates one thread per url (but never more than MAX_THREADS
    alive at the same time). Usually it's more efficient to create all the
    MAX_THREADS at once, and continuously feed them with tasks to be done. A
    Queue object is the way to synchronize them; from the documentation:

    <code>
    from Queue import Queue
    from threading import Thread

    num_worker_threads = 3
    list_of_urls = ["http://foo.com", "http://bar.com",
    "http://baz.com", "http://spam.com",
    "http://egg.com",
    ]

    def do_work(url):
    from time import sleep
    from random import randrange
    from threading import currentThread
    print "%s downloading %s" % (currentThread().getName(), url)
    sleep(randrange(5))
    print "%s done" % currentThread().getName()

    # from this point on, copied almost verbatim from the Queue example
    # at the end of http://docs.python.org/library/queue.html

    def worker():
    while True:
    item = q.get()
    do_work(item)
    q.task_done()

    q = Queue()
    for i in range(num_worker_threads):
    t = Thread(target=worker)
    t.setDaemon(True)
    t.start()

    for item in list_of_urls:
    q.put(item)

    q.join() # block until all tasks are done
    print "Finished"
    </code>


    --
    Gabriel Genellina
    Gabriel Genellina, Oct 7, 2008
    #3
  4. In message <>, Gabriel
    Genellina wrote:

    > Usually it's more efficient to create all the MAX_THREADS at once, and
    > continuously feed them with tasks to be done.


    Given that the bottleneck is most likely to be the internet connection, I'd
    say the "premature optimization is the root of all evil" adage applies
    here.
    Lawrence D'Oliveiro, Oct 7, 2008
    #4
  5. oyster

    Terry Reedy Guest

    Lawrence D'Oliveiro wrote:
    > In message <>, Gabriel
    > Genellina wrote:
    >
    >> Usually it's more efficient to create all the MAX_THREADS at once, and
    >> continuously feed them with tasks to be done.

    >
    > Given that the bottleneck is most likely to be the internet connection, I'd
    > say the "premature optimization is the root of all evil" adage applies
    > here.


    There is also the bottleneck of programmer time to understand, write,
    and maintain. In this case, 'more efficient' is simpler, and to me,
    more efficient of programmer time. Feeding a fixed pool of worker
    threads with a Queue() is a standard design that is easy to understand
    and one the OP should learn. Re-using tested code is certainly
    efficient of programmer time. Managing a variable pool of workers that
    die and need to be replaced is more complex (two loops nested within a
    loop) and error prone (though learning that alternative is probably not
    a bad idea also).

    tjr
    Terry Reedy, Oct 7, 2008
    #5
  6. En Tue, 07 Oct 2008 13:25:01 -0300, Terry Reedy <>
    escribió:
    > Lawrence D'Oliveiro wrote:
    >> In message <>,
    >> Gabriel Genellina wrote:
    >>
    >>> Usually it's more efficient to create all the MAX_THREADS at once, and
    >>> continuously feed them with tasks to be done.

    >> Given that the bottleneck is most likely to be the internet
    >> connection, I'd
    >> say the "premature optimization is the root of all evil" adage applies
    >> here.

    >
    > There is also the bottleneck of programmer time to understand, write,
    > and maintain. In this case, 'more efficient' is simpler, and to me,
    > more efficient of programmer time. Feeding a fixed pool of worker
    > threads with a Queue() is a standard design that is easy to understand
    > and one the OP should learn. Re-using tested code is certainly
    > efficient of programmer time. Managing a variable pool of workers that
    > die and need to be replaced is more complex (two loops nested within a
    > loop) and error prone (though learning that alternative is probably not
    > a bad idea also).


    I'd like to add that debugging a program that continuously creates and
    destroys threads is a real PITA.

    --
    Gabriel Genellina
    Gabriel Genellina, Oct 7, 2008
    #6
  7. oyster

    Guest

    On 7 Ott, 06:37, "Gabriel Genellina" <> wrote:
    > En Mon, 06 Oct 2008 11:24:51 -0300, <> escribió:
    >
    > > On 6 Ott, 15:24, oyster <> wrote:
    > >> my code is not right, can sb give me a hand? thanx

    >
    > >> for example, I have 1000 urls to be downloaded, but only 5 thread at  
    > >> one time

    > > I would restructure my code with someting like this ( WARNING: the
    > > following code is
    > > ABSOLUTELY UNTESTED and shall be considered only as pseudo-code to
    > > express my idea of
    > > the algorithm (which, also, could be wrong:) ):

    >
    > Your code creates one thread per url (but never more than MAX_THREADS  
    > alive at the same time). Usually it's more efficient to create all the  
    > MAX_THREADS at once, and continuously feed them with tasks to be done. A  
    > Queue object is the way to synchronize them; from the documentation:
    >
    > <code>
    >  from Queue import Queue
    >  from threading import Thread
    >
    > num_worker_threads = 3
    > list_of_urls = ["http://foo.com", "http://bar.com",
    >                  "http://baz.com", "http://spam.com",
    >                  "http://egg.com",
    >                 ]
    >
    > def do_work(url):
    >      from time import sleep
    >      from random import randrange
    >      from threading import currentThread
    >      print "%s downloading %s" % (currentThread().getName(), url)
    >      sleep(randrange(5))
    >      print "%s done" % currentThread().getName()
    >
    > # from this point on, copied almost verbatim from the Queue example
    > # at the end ofhttp://docs.python.org/library/queue.html
    >
    > def worker():
    >      while True:
    >          item = q.get()
    >          do_work(item)
    >          q.task_done()
    >
    > q = Queue()
    > for i in range(num_worker_threads):
    >       t = Thread(target=worker)
    >       t.setDaemon(True)
    >       t.start()
    >
    > for item in list_of_urls:
    >      q.put(item)
    >
    > q.join()       # block until all tasks are done
    > print "Finished"
    > </code>
    >
    > --
    > Gabriel Genellina



    Agreed.
    I was trying to do what the OP was trying to do, but in a way that
    works.
    But keeping the thread alive and feeding them the URL is a better
    design, definitly.
    And no, I don't think its 'premature optimization': it is just
    cleaner.

    Ciao
    ------
    FB
    , Oct 8, 2008
    #7
  8. In message <>, Gabriel
    Genellina wrote:

    > En Tue, 07 Oct 2008 13:25:01 -0300, Terry Reedy <>
    > escribió:
    >
    >> Lawrence D'Oliveiro wrote:
    >>
    >>> In message <>,
    >>> Gabriel Genellina wrote:
    >>>
    >>>> Usually it's more efficient to create all the MAX_THREADS at once, and
    >>>> continuously feed them with tasks to be done.
    >>>
    >>> Given that the bottleneck is most likely to be the internet
    >>> connection, I'd say the "premature optimization is the root of all evil"
    >>> adage applies here.

    >>
    >> Feeding a fixed pool of worker threads with a Queue() is a standard
    >> design that is easy to understand and one the OP should learn. Re-using
    >> tested code is certainly efficient of programmer time.

    >
    > I'd like to add that debugging a program that continuously creates and
    > destroys threads is a real PITA.


    That's God trying to tell you to avoid threads altogether.
    Lawrence D'Oliveiro, Oct 13, 2008
    #8
  9. oyster

    Guest

    On Oct 13, 6:54 am, Lawrence D'Oliveiro <l...@geek-
    central.gen.new_zealand> wrote:
    > In message <>, Gabriel
    >
    >
    >
    > Genellina wrote:
    > > En Tue, 07 Oct 2008 13:25:01 -0300, Terry Reedy <>
    > > escribió:

    >
    > >> Lawrence D'Oliveiro wrote:

    >
    > >>> In message <>,
    > >>> Gabriel Genellina wrote:

    >
    > >>>> Usually it's more efficient to create all the MAX_THREADS at once, and
    > >>>> continuously feed them with tasks to be done.

    >
    > >>> Given that the bottleneck is most likely to be the internet
    > >>> connection, I'd say the "premature optimization is the root of all evil"
    > >>> adage applies here.

    >
    > >> Feeding a fixed pool of worker threads with a Queue() is a standard
    > >> design that is easy to understand and one the OP should learn. Re-using
    > >> tested code is certainly efficient of programmer time.

    >
    > > I'd like to add that debugging a program that continuously creates and
    > > destroys threads is a real PITA.

    >
    > That's God trying to tell you to avoid threads altogether.


    Especially in a case like this that's tailor made for a trivial state-
    machine solution if you really want multiple connections.
    , Oct 13, 2008
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Enigma Curry
    Replies:
    1
    Views:
    471
    Peter Hansen
    Mar 15, 2006
  2. Akaketwa
    Replies:
    1
    Views:
    4,821
    impaler
    Sep 22, 2006
  3. =?Utf-8?B?cGxleDRy?=
    Replies:
    0
    Views:
    346
    =?Utf-8?B?cGxleDRy?=
    Nov 13, 2007
  4. Replies:
    5
    Views:
    1,585
    Roedy Green
    Jun 20, 2008
  5. Glazner
    Replies:
    0
    Views:
    334
    Glazner
    Jan 6, 2010
Loading...

Share This Page