Not necessarily related to python Web Crawlers

Discussion in 'Python' started by disappearedng@gmail.com, Jul 5, 2008.

  1. Guest

    Hi
    Does anyone here have a good recommendation for an open source crawler
    that I could get my hands on? It doesn't have to be python based. I am
    interested in learning how crawling works. I think python based
    crawlers will ensure a high degree of flexibility but at the same time
    I am also torn between looking for open source crawlers in python vs C
    ++ because the latter is much more efficient(or so I heard. I will be
    crawling on very cheap hardware.)

    I am definitely open to suggestions.

    Thx
     
    , Jul 5, 2008
    #1
    1. Advertising

  2. defn noob Guest

    just crawling is supereasy. its how to index and search that is hard.
    just start at yahoo.com, scrape out all the links and then for every
    site visit every link.
    i wrote a crawler in 15 lines of code. but then it all it did was
    visit the sites, not indexing them or anything.

    you could write a faster one in C++ probably but if you are new to it
    doing it in python will let you experiment and learn faster.

    some links:
    http://infolab.stanford.edu/~backrub/google.html
    http://www-csli.stanford.edu/~hinrich/information-retrieval-book.html



    http://www.example-code.com/python/pythonspider.asp
    http://www.example-code.com/python/spider_simpleCrawler.asp
     
    defn noob, Jul 5, 2008
    #2
    1. Advertising

  3. subeen Guest

    On Jul 5, 2:31 pm, wrote:
    > Hi
    > Does anyone here have a good recommendation for an open source crawler
    > that I could get my hands on? It doesn't have to be python based. I am
    > interested in learning how crawling works. I think python based
    > crawlers will ensure a high degree of flexibility but at the same time
    > I am also torn between looking for open source crawlers in python vs C
    > ++ because the latter is much more efficient(or so I heard. I will be
    > crawling on very cheap hardware.)
    >
    > I am definitely open to suggestions.
    >
    > Thx


    You can check my python blog. There are some tips and codes on
    crawlers.
    http://love-python.blogspot.com/

    regards,
    Subeen
    http://love-python.blogspot.com/
     
    subeen, Jul 6, 2008
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Duncan Smith
    Replies:
    4
    Views:
    334
  2. Tempo
    Replies:
    18
    Views:
    766
    gene tani
    Feb 10, 2006
  3. Wardie
    Replies:
    4
    Views:
    514
    Wardie
    Nov 8, 2006
  4. viza
    Replies:
    17
    Views:
    698
    santosh
    Jul 9, 2008
  5. InertEmployer
    Replies:
    9
    Views:
    1,423
    Jonathan N. Little
    Aug 31, 2011
Loading...

Share This Page