Advice/Help with Multithreading

Discussion in 'Java' started by DyslexicAnaboko, Jan 17, 2007.

  1. I wrote a method that will take a URL, and return its page in String
    form.

    Now depending on which webpage is being visited is how long it will
    take to download its contents. There is a difference between getting
    the contents of google vs. yahoo, obviously the page sizes differ.

    Since I would have many pages to download, downloading them 1 at a time
    takes forever. I just want to speed things up. I figured that
    multithreading would be my answer since I could create several threads
    to download pages simultaneously. I am inexperienced with
    multithreading though, so I was just hoping that anyone could give me
    some pointers or advice on where to begin.

    Basically I want to do the following:

    1. I want to create X threads, lets just say 10 for arguments sake.

    2. I want each thread to get its own assigned URL. Will there be a
    problem with more than one thread accessing the same method?

    3. After downloading the contents of the page I intend to put the
    strings into a list. Will there be a problem with more than one thread
    accessing the same object? If so, should I use semaphores?

    I'm not asking anyone to write this for me, I just don't know where to
    begin. If anyone can spare an example or any advice I am all ears.

    Thanks,

    Eli
    DyslexicAnaboko, Jan 17, 2007
    #1
    1. Advertising

  2. DyslexicAnaboko wrote:
    > I wrote a method that will take a URL, and return its page in String
    > form.
    >
    > Now depending on which webpage is being visited is how long it will
    > take to download its contents. There is a difference between getting
    > the contents of google vs. yahoo, obviously the page sizes differ.
    >
    > Since I would have many pages to download, downloading them 1 at a time
    > takes forever. I just want to speed things up. I figured that
    > multithreading would be my answer since I could create several threads
    > to download pages simultaneously. I am inexperienced with
    > multithreading though, so I was just hoping that anyone could give me
    > some pointers or advice on where to begin.
    >
    > Basically I want to do the following:
    >
    > 1. I want to create X threads, lets just say 10 for arguments sake.
    >
    > 2. I want each thread to get its own assigned URL. Will there be a
    > problem with more than one thread accessing the same method?
    >
    > 3. After downloading the contents of the page I intend to put the
    > strings into a list. Will there be a problem with more than one thread
    > accessing the same object? If so, should I use semaphores?
    >
    > I'm not asking anyone to write this for me, I just don't know where to
    > begin. If anyone can spare an example or any advice I am all ears.
    >
    > Thanks,
    >
    > Eli
    >


    You can run the same method in multiple threads. Assuming that you
    synchronize access to any variables that are accessed by multiple
    threads. So if you write a method, getString(URL url) you can then
    create a thread to run that method in as follows:

    Runnable r = new Runnable() {
    public void run() {
    getString(url);
    }
    };
    new Thread(r).start();

    You will need some code after the call to getString() to put it
    somewhere but that is really all there is to it.

    Start writing the program and post your progress.

    --

    Knute Johnson
    email s/nospam/knute/
    Knute Johnson, Jan 17, 2007
    #2
    1. Advertising

  3. Will do, thank you that was very helpful, that is exactly what I needed
    to get me started.

    Eli

    Knute Johnson wrote:
    > DyslexicAnaboko wrote:
    > > I wrote a method that will take a URL, and return its page in String
    > > form.
    > >
    > > Now depending on which webpage is being visited is how long it will
    > > take to download its contents. There is a difference between getting
    > > the contents of google vs. yahoo, obviously the page sizes differ.
    > >
    > > Since I would have many pages to download, downloading them 1 at a time
    > > takes forever. I just want to speed things up. I figured that
    > > multithreading would be my answer since I could create several threads
    > > to download pages simultaneously. I am inexperienced with
    > > multithreading though, so I was just hoping that anyone could give me
    > > some pointers or advice on where to begin.
    > >
    > > Basically I want to do the following:
    > >
    > > 1. I want to create X threads, lets just say 10 for arguments sake.
    > >
    > > 2. I want each thread to get its own assigned URL. Will there be a
    > > problem with more than one thread accessing the same method?
    > >
    > > 3. After downloading the contents of the page I intend to put the
    > > strings into a list. Will there be a problem with more than one thread
    > > accessing the same object? If so, should I use semaphores?
    > >
    > > I'm not asking anyone to write this for me, I just don't know where to
    > > begin. If anyone can spare an example or any advice I am all ears.
    > >
    > > Thanks,
    > >
    > > Eli
    > >

    >
    > You can run the same method in multiple threads. Assuming that you
    > synchronize access to any variables that are accessed by multiple
    > threads. So if you write a method, getString(URL url) you can then
    > create a thread to run that method in as follows:
    >
    > Runnable r = new Runnable() {
    > public void run() {
    > getString(url);
    > }
    > };
    > new Thread(r).start();
    >
    > You will need some code after the call to getString() to put it
    > somewhere but that is really all there is to it.
    >
    > Start writing the program and post your progress.
    >
    > --
    >
    > Knute Johnson
    > email s/nospam/knute/
    DyslexicAnaboko, Jan 17, 2007
    #3
  4. DyslexicAnaboko

    Daniel Pitts Guest

    DyslexicAnaboko wrote:
    > I wrote a method that will take a URL, and return its page in String
    > form.
    >
    > Now depending on which webpage is being visited is how long it will
    > take to download its contents. There is a difference between getting
    > the contents of google vs. yahoo, obviously the page sizes differ.
    >
    > Since I would have many pages to download, downloading them 1 at a time
    > takes forever. I just want to speed things up. I figured that
    > multithreading would be my answer since I could create several threads
    > to download pages simultaneously. I am inexperienced with
    > multithreading though, so I was just hoping that anyone could give me
    > some pointers or advice on where to begin.
    >
    > Basically I want to do the following:
    >
    > 1. I want to create X threads, lets just say 10 for arguments sake.
    >
    > 2. I want each thread to get its own assigned URL. Will there be a
    > problem with more than one thread accessing the same method?
    >
    > 3. After downloading the contents of the page I intend to put the
    > strings into a list. Will there be a problem with more than one thread
    > accessing the same object? If so, should I use semaphores?
    >
    > I'm not asking anyone to write this for me, I just don't know where to
    > begin. If anyone can spare an example or any advice I am all ears.
    >
    > Thanks,
    >
    > Eli


    Look at the java.util.concurrent package, it has helpful classes for
    almost everything you're asking about.
    <http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/package-summary.html>

    Specifically ThreadPoolExecutor, and BlockingQueue.

    You can submit download requests to the executor, and have them stuff
    the results into the blocking queue. You would have one or more
    seperate thread reading from the blocking queue and processing the
    results. If you want all the results to end up in one List, then you
    either need to syncronize on that list, or have only one thread reading
    from the BlockingQueue and writing to the list.

    If you are writing a Spider (or Robot, or whatever)... Be sure to
    follow good netiquette and respect robots.txt
    <http://www.robotstxt.org/>
    Daniel Pitts, Jan 17, 2007
    #4
  5. I never thought of my program as a robot, but I guess it could be
    called that, never thought about it that way before.

    I was also worried about servers thinking that I may be attacking them
    (DOS attacks), not my intentions at all.
    I will look through that link you provided, it never even crossed my
    mind, thanks for the heads up.

    I am collecting anonymous information about random people on MySpace
    and my friend is using the information for statistics. Everything is
    nameless and faceless, we are using peoples MySpace ID's only. It is
    really neat stuff. That is why I am trying to speed up the program
    because it is really painful to sit and wait for one page to be
    downloaded at a time, especially when you are waiting on a sample of
    10,000 people or more. There are +/- 149,142,765 accounts.

    I will try working with the concurrent class as suggested.

    Thank you,

    Eli
    DyslexicAnaboko, Jan 18, 2007
    #5
  6. I wanted to apologize for not doing a follow up post. The semester
    started for me and I couldn't even think about the program after that.
    I did however purchase a book on java threads. Thanks again to
    everyone for their help.
    DyslexicAnaboko, Feb 16, 2007
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Asun Friere
    Replies:
    1
    Views:
    488
    Paul Boddie
    Aug 27, 2003
  2. Peter Hansen
    Replies:
    23
    Views:
    855
    Anton Vredegoor
    Sep 5, 2003
  3. Gerrit Holl
    Replies:
    16
    Views:
    606
    Tom Plunket
    Aug 29, 2003
  4. Asun Friere
    Replies:
    0
    Views:
    469
    Asun Friere
    Aug 28, 2003
  5. Michele Simionato
    Replies:
    2
    Views:
    378
    Jacek Generowicz
    Sep 1, 2003
Loading...

Share This Page