LWP and threads: anything to look out for?

J

John Bokma

I want to move from Parallell UserAgent to using threads. Are there things
I should be aware of, or is this a piece of cake?

(Another option might be forking since IIRC Windows does this using
threads).
 
Z

zentara

I want to move from Parallell UserAgent to using threads. Are there things
I should be aware of, or is this a piece of cake?

(Another option might be forking since IIRC Windows does this using
threads).

The only good reason to use threads, is if you want an easy way to share
realtime data between threads. Things such as progress data for a
progress indicator, or when you want to regex the downloaded results,
and do something in your main program, based on the regex.

Otherwise it is probably more efficient and easier to just fork. There
is Parallel::ForkManager.

But threads are pretty easy once you get the hang of the little details.
One thing to watch out for, is if you run any "exec" from any thread, it
will kill and replace all running threads with the exec'd code.
 
X

xhoster

John Bokma said:
I want to move from Parallell UserAgent to using threads.
Why?

Are there
things I should be aware of, or is this a piece of cake?

I don't think you will have a problem specific to LWP as long you don't try
to share/clone LWP objects across threads. Of course, threaded programming
in general is more difficult than non-threaded, so I wouldn't say it will
be a piece of cake, unless you are already experienced in threaded
programming.
(Another option might be forking since IIRC Windows does this using
threads).

Maybe forking is better. I prefer it over threading in most circumstances.
(It seems to be a theme this week). But it is hard to tell without knowing
more about what you are trying to do.

Xho
 
J

John Bokma


Good question :) I think that ParallellUA = UA + threading, and doesn't
add anything, and since UA is more a core module, I prefer the latter.

Also, with ParallellUA the documentation was a bit unclear to me.
I don't think you will have a problem specific to LWP as long you
don't try to share/clone LWP objects across threads. Of course,
threaded programming in general is more difficult than non-threaded,
so I wouldn't say it will be a piece of cake, unless you are already
experienced in threaded programming.

Java, and enough CGI experience :) (sharing resources, locking, etc).
Maybe forking is better. I prefer it over threading in most
circumstances. (It seems to be a theme this week). But it is hard to
tell without knowing more about what you are trying to do.

I want to have n workers in parallell, each getting a request from a
Queue, fetching the page, storing the result, and next. Sleeping (not
wasting CPU cycles) in between each fetch.
 
X

xhoster

John Bokma said:
Good question :) I think that ParallellUA = UA + threading, and doesn't
add anything, and since UA is more a core module, I prefer the latter.

I think ParallelUA = UA + nonblocking IO, rather than threading. Assuming
it is well implemented (I haven't used ParallelUA enough to know), I think
non-blocking IO is better than threads for this task.
Also, with ParallellUA the documentation was a bit unclear to me.

OK, fair enough. Anything in particular you found unclear?
I want to have n workers in parallell, each getting a request from a
Queue, fetching the page, storing the result, and next.

Store the results on the filesystem or DB, or in Perl memory?

Is the queue dynamically added to (based on the results returned from
earlier tasks in the queue) or is it built in a start-up phase and then
only consumed from then on?

If the queue is dynamically added to, that argues for threads. If each
page-fetch takes less than 1/20 of a second or so (and there are tens of
thousands of them), that argues for threads, (althought I might instead
just batch them up into chunks of several page fetches) . Otherwise, I'd
go with forking with Parallel::ForkManager. (or ParallelUA :) ).
Sleeping (not
wasting CPU cycles) in between each fetch.

This part I'm not sure of. Why sleep rather than just fetch the next
item from the queue? Are you sleeping only in the case of an empty queue
(which of course only makes sense if the queue is dynamic)? Or to avoid
overloading the remote server(s) you are fetching from?

Xho
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top