Is it better to use threads or fork in the following case

Discussion in 'Python' started by grocery_stocker, May 3, 2009.

  1. Let's say there is a new zip file with updated information every 30
    minutes on a remote website. Now, I wanna connect to this website
    every 30 minutes, download the file, extract the information, and then
    have the program search the file search for certain items.

    Would it be better to use threads to break this up? I have one thread
    download the data and then have another to actually process the data .
    Or would it be better to use fork?
    grocery_stocker, May 3, 2009
    #1
    1. Advertising

  2. grocery_stocker schrieb:
    > Let's say there is a new zip file with updated information every 30
    > minutes on a remote website. Now, I wanna connect to this website
    > every 30 minutes, download the file, extract the information, and then
    > have the program search the file search for certain items.
    >
    > Would it be better to use threads to break this up? I have one thread
    > download the data and then have another to actually process the data .
    > Or would it be better to use fork?


    Neither. Why do you think you need concurrency at all?

    Diez
    Diez B. Roggisch, May 3, 2009
    #2
    1. Advertising

  3. On May 3, 1:16 pm, "Diez B. Roggisch" <> wrote:
    > grocery_stocker schrieb:
    >
    > > Let's say there is a new zip file with updated information every 30
    > > minutes on a remote website. Now, I wanna connect to this website
    > > every 30 minutes, download the file, extract the information, and then
    > > have the program search the file search for certain items.

    >
    > > Would it be better to use threads to break this up? I have one thread
    > > download the data and then have another to actually process the data .
    > > Or would it be better to use fork?

    >
    > Neither. Why do you think you need concurrency at all?
    >


    Okay, here is what was going through my mind. I'm a 56k dialup modem.
    What happens it takes me 15 minutes to download the file? Now let's
    say during those 15 minutes, the program needs to parse the data in
    the existing file.
    grocery_stocker, May 3, 2009
    #3
  4. grocery_stocker schrieb:
    > On May 3, 1:16 pm, "Diez B. Roggisch" <> wrote:
    >> grocery_stocker schrieb:
    >>
    >>> Let's say there is a new zip file with updated information every 30
    >>> minutes on a remote website. Now, I wanna connect to this website
    >>> every 30 minutes, download the file, extract the information, and then
    >>> have the program search the file search for certain items.
    >>> Would it be better to use threads to break this up? I have one thread
    >>> download the data and then have another to actually process the data .
    >>> Or would it be better to use fork?

    >> Neither. Why do you think you need concurrency at all?
    >>

    >
    > Okay, here is what was going through my mind. I'm a 56k dialup modem.
    > What happens it takes me 15 minutes to download the file? Now let's
    > say during those 15 minutes, the program needs to parse the data in
    > the existing file.


    Is this an exercise in asking 20 hypothetical questions?

    Getting concurrency right isn't trivial, so if you absolute don't need
    this, don't do it.

    Diez
    Diez B. Roggisch, May 3, 2009
    #4
  5. grocery_stocker

    CTO Guest

    Probably better just to check HEAD and see if its updated within the
    time you're
    looking at before any unpack. Even on a 56k that's going to be pretty
    fast, and
    you don't risk unpacking an old file while a new version is on the
    way.

    If you still want to be able to unpack the old file if there's an
    update then
    you're probably right about needing to run it concurrently, and
    personally I'd
    just fork it for ease of use- it doesn't sound like you're trying to
    run 100,000
    of these at the same time, and you're saving the file anyway.

    Geremy Condra
    CTO, May 3, 2009
    #5
  6. grocery_stocker

    Paul Hankin Guest

    On May 3, 10:29 pm, grocery_stocker <> wrote:
    > On May 3, 1:16 pm, "Diez B. Roggisch" <> wrote:
    >
    > > grocery_stocker schrieb:

    >
    > > > Let's say there is a new zip file with updated information every 30
    > > > minutes on a remote website. Now, I wanna connect to this website
    > > > every 30 minutes, download the file, extract the information, and then
    > > > have the program search the file search for certain items.

    >
    > > > Would it be better to use threads to break this up? I have one thread
    > > > download the data and then have another to actually process the data ..
    > > > Or would it be better to use fork?

    >
    > > Neither. Why do you think you need concurrency at all?

    >
    > Okay, here is what was going through my mind. I'm a 56k dialup modem.
    > What happens it takes me 15 minutes to download the file? Now let's
    > say during those 15 minutes, the program needs to parse the data in
    > the existing file.


    If your modem is going at full speed for those 15 minutes, you'll have
    around 6.3Mb of data. Even after decompressing, and unless the data is
    in some quite difficult to parse format, it'll take seconds to
    process.

    --
    Paul Hankin
    Paul Hankin, May 3, 2009
    #6
  7. On May 3, 1:40 pm, "Diez B. Roggisch" <> wrote:
    > grocery_stocker schrieb:
    >
    >
    >
    > > On May 3, 1:16 pm, "Diez B. Roggisch" <> wrote:
    > >> grocery_stocker schrieb:

    >
    > >>> Let's say there is a new zip file with updated information every 30
    > >>> minutes on a remote website. Now, I wanna connect to this website
    > >>> every 30 minutes, download the file, extract the information, and then
    > >>> have the program search the file search for certain items.
    > >>> Would it be better to use threads to break this up? I have one thread
    > >>> download the data and then have another to actually process the data .
    > >>> Or would it be better to use fork?
    > >> Neither. Why do you think you need concurrency at all?

    >
    > > Okay, here is what was going through my mind. I'm a 56k dialup modem.
    > > What happens it takes me 15 minutes to download the file? Now let's
    > > say during those 15 minutes, the program needs to parse the data in
    > > the existing file.

    >
    > Is this an exercise in asking 20 hypothetical questions?
    >


    No. This the prelude to me writing a real life python program.
    grocery_stocker, May 3, 2009
    #7
  8. En Sun, 03 May 2009 17:45:36 -0300, Paul Hankin <>
    escribió:
    > On May 3, 10:29 pm, grocery_stocker <> wrote:
    >> On May 3, 1:16 pm, "Diez B. Roggisch" <> wrote:
    >> > grocery_stocker schrieb:


    >> > > Would it be better to use threads to break this up? I have one

    >> thread
    >> > > download the data and then have another to actually process the

    >> data .
    >> > > Or would it be better to use fork?

    >>
    >> > Neither. Why do you think you need concurrency at all?

    >>
    >> Okay, here is what was going through my mind. I'm a 56k dialup modem.
    >> What happens it takes me 15 minutes to download the file? Now let's
    >> say during those 15 minutes, the program needs to parse the data in
    >> the existing file.

    >
    > If your modem is going at full speed for those 15 minutes, you'll have
    > around 6.3Mb of data. Even after decompressing, and unless the data is
    > in some quite difficult to parse format, it'll take seconds to
    > process.


    In addition, the zip file format stores the directory at the end of the
    file. So you can't process it until it's completely downloaded.
    Concurrency doesn't help here.

    --
    Gabriel Genellina
    Gabriel Genellina, May 3, 2009
    #8
  9. grocery_stocker

    CTO Guest

    > In addition, the zip file format stores the directory at the end of the  
    > file. So you can't process it until it's completely downloaded.  
    > Concurrency doesn't help here.


    Don't think that's relevant, if I'm understanding the OP correctly.
    Lets say you've downloaded the file once and you're doing whatever
    the app does with it. Now, while that's happening the half an hour
    time limit comes up. Now you want to start another download, but
    you also want to continue to work with the old version. Voila,
    concurrency.
    CTO, May 3, 2009
    #9
  10. On Sun, 3 May 2009 13:59:11 -0700 (PDT), grocery_stocker
    <> declaimed the following in
    gmane.comp.python.general:

    > No. This the prelude to me writing a real life python program.


    Lots of "real life python programs" don't need threading or other
    spawned processes...

    Your 56K dial-up is probably only running around 44kbps (no "56K"
    modem, in the US, ever reaches that speed -- the FCC limited the maximum
    allowed bit-rate on phone lines to around 52kbps, and since the actual
    speed is affected by the cleanliness of the signal on the lines rarely
    hits even 50kbps). Assuming 44,000bps, no handshake/protocol overhead,
    that comes to 5,500bytes/sec => 330,000 bytes/min => 4,950,000 in 15
    minutes... call it 5MB... What type of processing are you planning that
    would take any fairly recent computer 15 minutes to handle 5MB of data
    -- 5MB is about 6 minutes of MP3 audio, or 3-4 3.5MP JPEGs

    Presuming your processing really does have the risk of running over
    into the next download interval, I'd suggest at most two threads
    (pseudo-code):

    worklist = Queue.Queue()

    def downloader():
    while True:
    startTime = time.time()
    #imagine proper format conversions for strings
    filename = BASEFILENAME + startTime
    doDownload(filename)
    worklist.put(filename)
    #compute next download time taking into account elapsed time
    sleep (startTime + 30mins) - time.time()

    def processor():
    while True:
    filename = worklist.get()
    doFileProcessing(filename)


    This ensures that downloads start every 30 minutes (unless a
    download runs over 30 minutes, in which case the sleep is negative, and
    probably returns immediately) regardless of the processing duration. It
    also ensures that the files are processed IN ORDER OF DOWNLOAD with NO
    OVERLAPS.

    Threading is probably suited, since the downloader is blocked on a
    sleep call, letting the processor run full speed; and if the processor
    is fast, it will block waiting for the next file to be available,
    meaning the downloader gets full CPU usage.

    --
    Wulfraed Dennis Lee Bieber KD6MOG

    HTTP://wlfraed.home.netcom.com/
    (Bestiaria Support Staff: )
    HTTP://www.bestiaria.com/
    Dennis Lee Bieber, May 3, 2009
    #10
  11. CTO wrote:

    >> In addition, the zip file format stores the directory at the end of the
    >> file. So you can't process it until it's completely downloaded.
    >> Concurrency doesn't help here.

    >
    > Don't think that's relevant, if I'm understanding the OP correctly.
    > Lets say you've downloaded the file once and you're doing whatever
    > the app does with it. Now, while that's happening the half an hour
    > time limit comes up. Now you want to start another download, but
    > you also want to continue to work with the old version. Voila,
    > concurrency.


    Which brings us backs to the "20 questions"-part of my earlier post. It
    could be, but it could also be that processing takes seconds. Or it takes
    so long that even concurrency won't help. Who knows?

    Diez
    Diez B. Roggisch, May 4, 2009
    #11
  12. grocery_stocker

    CTO Guest

    > Which brings us backs to the "20 questions"-part of my earlier post. It
    > could be, but it could also be that processing takes seconds. Or it takes
    > so long that even concurrency won't help. Who knows?


    Probably the OP ;)

    Geremy Condra
    CTO, May 4, 2009
    #12
  13. grocery_stocker

    JanC Guest

    Gabriel Genellina wrote:

    > In addition, the zip file format stores the directory at the end of the
    > file. So you can't process it until it's completely downloaded.


    Well, you *can* download the directory part first (if the HTTP server
    supports it), and if you only need some files, you could then only
    download these files out of the .zip, saving a lot in download time...


    --
    JanC
    JanC, May 8, 2009
    #13
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?ISO-8859-1?Q?Andreas_M=FCller?=

    fork or threads

    =?ISO-8859-1?Q?Andreas_M=FCller?=, May 10, 2005, in forum: C++
    Replies:
    3
    Views:
    6,395
    Kanenas
    May 18, 2005
  2. Ajay Bakhshi
    Replies:
    0
    Views:
    407
    Ajay Bakhshi
    May 3, 2004
  3. shuisheng
    Replies:
    3
    Views:
    311
    Daniel T.
    Dec 12, 2006
  4. Eric Snow

    os.fork and pty.fork

    Eric Snow, Jan 8, 2009, in forum: Python
    Replies:
    0
    Views:
    571
    Eric Snow
    Jan 8, 2009
  5. Tomasz Pajor

    fork, threads and proper closing

    Tomasz Pajor, Jun 28, 2009, in forum: Python
    Replies:
    5
    Views:
    254
    Lawrence D'Oliveiro
    Jun 30, 2009
Loading...

Share This Page