web crawler in python or C?

Discussion in 'C Programming' started by abhinav, Feb 16, 2006.

  1. abhinav

    abhinav Guest

    Hi guys.I have to implement a topical crawler as a part of my
    project.What language should i implement
    C or Python?Python though has fast development cycle but my concern is
    speed also.I want to strke a balance between development speed and
    crawler speed.Since Python is an interpreted language it is rather
    slow.The crawler which will be working on huge set of pages should be
    as fast as possible.One possible implementation would be implementing
    partly in C and partly in Python so that i can have best of both
    worlds.But i don't know to approach about it.Can anyone guide me on
    what part should i implement in C and what should be in Python?
     
    abhinav, Feb 16, 2006
    #1
    1. Advertising

  2. abhinav

    Guest

    abhinav wrote:
    > Hi guys.I have to implement a topical crawler as a part of my
    > project.What language should i implement
    > C or Python?Python though has fast development cycle but my concern is
    > speed also. I want to strke a balance between development speed and
    > crawler speed.


    Web crawling is an inherently network limited activity. The way to
    speed up crawling is through parallel downloading. The language
    performance is not going to have a relevant effect. Python does not
    support multithreading, but it does support weak coroutines. (Of
    course, C does not support any kind of multithreading, except by
    platform specific extensions -- but these extensions are widespread.)

    For the problem of parsing and handling data structures for this
    activity, however, Python is *FAR* superior to C in terms of
    development speed.

    > [...] Since Python is an interpreted language it is rather
    > slow.The crawler which will be working on huge set of pages should be
    > as fast as possible.One possible implementation would be implementing
    > partly in C and partly in Python so that i can have best of both
    > worlds. But i don't know to approach about it.Can anyone guide me on
    > what part should i implement in C and what should be in Python?


    Actually, I have, in fact, done it this way myself in the past (before
    Python had weak coroutines.) The way I did it is I wrote a
    command-line tool for pulling down a collection of URLs from a control
    file in C (the URLs would be downloaded in a multithreaded manner),
    then I drove this tool from a Python program. Asymptotically, this
    pegs my download bandwidth for the majority of the runtime, thus making
    it basically within striking distance of theoretically optimal.

    The problem is that you've picked completely the wrong newsgroup to ask
    this question. Unfortunately, there is not clue to this fact from the
    name of this newsgroup. This is actually a newsgroup that discusses
    only the ANSI/ISO C standard as it exists, and none of platform
    specific extensions (including sockets, and multithreading). Nor is
    the discussion of the development of real applications considered
    on-topic in this newsgroup. Neither is performance considered on topic
    -- by the standard, apparently you can't know even the *relative* speed
    of anything in C. comp.programming would probaby have been a better
    place to post this.

    --
    Paul Hsieh
    http://www.pobox.com/~qed/
    http://bstring.sf.net/
     
    , Feb 16, 2006
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. abhinav

    web crawler in python or C?

    abhinav, Feb 16, 2006, in forum: Python
    Replies:
    13
    Views:
    1,293
  2. Replies:
    11
    Views:
    2,864
    subeen
    Jun 22, 2008
  3. sonich

    Web crawler on python

    sonich, Oct 26, 2008, in forum: Python
    Replies:
    4
    Views:
    8,728
  4. yura

    Web crawler on python

    yura, Oct 30, 2008, in forum: Python
    Replies:
    1
    Views:
    327
    James Mills
    Oct 30, 2008
  5. Philip Semanchuk

    Re: web crawler in python

    Philip Semanchuk, Dec 10, 2009, in forum: Python
    Replies:
    0
    Views:
    493
    Philip Semanchuk
    Dec 10, 2009
Loading...

Share This Page