web crawler in python or C?

Discussion in 'C Programming' started by abhinav, Feb 16, 2006.

  1. abhinav

    abhinav Guest

    Hi guys.I have to implement a topical crawler as a part of my
    project.What language should i implement
    C or Python?Python though has fast development cycle but my concern is
    speed also.I want to strke a balance between development speed and
    crawler speed.Since Python is an interpreted language it is rather
    slow.The crawler which will be working on huge set of pages should be
    as fast as possible.One possible implementation would be implementing
    partly in C and partly in Python so that i can have best of both
    worlds.But i don't know to approach about it.Can anyone guide me on
    what part should i implement in C and what should be in Python?
     
    abhinav, Feb 16, 2006
    #1
    1. Advertisements

  2. abhinav

    Guest

    abhinav wrote:
    > Hi guys.I have to implement a topical crawler as a part of my
    > project.What language should i implement
    > C or Python?Python though has fast development cycle but my concern is
    > speed also. I want to strke a balance between development speed and
    > crawler speed.


    Web crawling is an inherently network limited activity. The way to
    speed up crawling is through parallel downloading. The language
    performance is not going to have a relevant effect. Python does not
    support multithreading, but it does support weak coroutines. (Of
    course, C does not support any kind of multithreading, except by
    platform specific extensions -- but these extensions are widespread.)

    For the problem of parsing and handling data structures for this
    activity, however, Python is *FAR* superior to C in terms of
    development speed.

    > [...] Since Python is an interpreted language it is rather
    > slow.The crawler which will be working on huge set of pages should be
    > as fast as possible.One possible implementation would be implementing
    > partly in C and partly in Python so that i can have best of both
    > worlds. But i don't know to approach about it.Can anyone guide me on
    > what part should i implement in C and what should be in Python?


    Actually, I have, in fact, done it this way myself in the past (before
    Python had weak coroutines.) The way I did it is I wrote a
    command-line tool for pulling down a collection of URLs from a control
    file in C (the URLs would be downloaded in a multithreaded manner),
    then I drove this tool from a Python program. Asymptotically, this
    pegs my download bandwidth for the majority of the runtime, thus making
    it basically within striking distance of theoretically optimal.

    The problem is that you've picked completely the wrong newsgroup to ask
    this question. Unfortunately, there is not clue to this fact from the
    name of this newsgroup. This is actually a newsgroup that discusses
    only the ANSI/ISO C standard as it exists, and none of platform
    specific extensions (including sockets, and multithreading). Nor is
    the discussion of the development of real applications considered
    on-topic in this newsgroup. Neither is performance considered on topic
    -- by the standard, apparently you can't know even the *relative* speed
    of anything in C. comp.programming would probaby have been a better
    place to post this.

    --
    Paul Hsieh
    http://www.pobox.com/~qed/
    http://bstring.sf.net/
     
    , Feb 16, 2006
    #2
    1. Advertisements

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Paul Morrison

    Web Crawler

    Paul Morrison, Oct 17, 2005, in forum: Java
    Replies:
    3
    Views:
    5,225
    lamantpirate
    Jun 30, 2012
  2. Sanjay Patra

    Web Crawler

    Sanjay Patra, Nov 17, 2004, in forum: C++
    Replies:
    2
    Views:
    985
  3. abhinav

    web crawler in python or C?

    abhinav, Feb 16, 2006, in forum: Python
    Replies:
    13
    Views:
    1,458
  4. Sanjay Patra

    C Web crawler code

    Sanjay Patra, Nov 18, 2004, in forum: C Programming
    Replies:
    1
    Views:
    1,751
    Raymond Martineau
    Nov 18, 2004
  5. Oscarian

    Web crawler

    Oscarian, Jan 11, 2007, in forum: C Programming
    Replies:
    5
    Views:
    826
    Oscarian
    Jan 12, 2007
  6. Replies:
    11
    Views:
    3,081
    subeen
    Jun 22, 2008
  7. sonich

    Web crawler on python

    sonich, Oct 26, 2008, in forum: Python
    Replies:
    4
    Views:
    8,846
  8. yura

    Web crawler on python

    yura, Oct 30, 2008, in forum: Python
    Replies:
    1
    Views:
    414
    James Mills
    Oct 30, 2008
Loading...