C Web crawler code

Discussion in 'C Programming' started by Sanjay Patra, Nov 18, 2004.

  1. Sanjay Patra

    Sanjay Patra Guest

    Hi All,

    I am looking for a simple C/ C++ web crawler code.

    It should be very simple with minimal functionality.
    I am particularly interested in the code to grab the content of a url
    and the code to search this content for other urls.

    Thanks.
    Sanjay Patra, Nov 18, 2004
    #1
    1. Advertising

  2. On 17 Nov 2004 16:01:44 -0800, (Sanjay Patra) wrote:

    >Hi All,
    >
    >I am looking for a simple C/ C++ web crawler code.
    >
    >It should be very simple with minimal functionality.


    You will have to consult the newsgroup of the platform you wish the web
    crawler to run on.

    For example:

    news:comp.os.msdos.programmer DOS, BIOS, Memory Models,
    interrupts, screen handling,
    hardware
    news:comp.os.ms-windows.programmer.misc MS/Windows: Mice, DLLs, hardware
    news:comp.os.ms-windows.programmer.win32 MS 32-bit API
    news:comp.os.os2.programmer.misc OS/2 Programming
    news:comp.sys.mac.programmer.misc Macintosh Programming
    news:comp.unix.programmer General Unix: processes, pipes,
    POSIX, curses, sockets
    news:comp.unix.[vendor] Various Unix vendors
    news:comp.os.linux.development.apps Linux application programming


    >I am particularly interested in the code to grab the content of a url
    >and the code to search this content for other urls.


    Code for retrieving URLs is not possible in standard C and needs a
    third-party library (or a set of wrapper functions), but it is on-topic to
    discuss how to retrieve URLs from the retrieved content.

    The simplest way to detect a Url is to call char *strstr( const char
    *string, const char *strCharSet ), the first parameter being a line of the
    file being scanned, and the second parameter being a simple "http://". If
    you want to do a case-insensitive search, you will have to roll your own
    function, since there isn't even a "stristr" in a non-standard
    implementation.

    I can't tell you how to detect the end-point of the URL - you will have to
    take a look at the RFCs for that.
    Raymond Martineau, Nov 18, 2004
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Paul Morrison

    Web Crawler

    Paul Morrison, Oct 17, 2005, in forum: Java
    Replies:
    3
    Views:
    4,898
    lamantpirate
    Jun 30, 2012
  2. Sanjay Patra

    Web Crawler

    Sanjay Patra, Nov 17, 2004, in forum: C++
    Replies:
    2
    Views:
    693
  3. abhinav

    web crawler in python or C?

    abhinav, Feb 16, 2006, in forum: Python
    Replies:
    13
    Views:
    1,252
  4. abhinav

    web crawler in python or C?

    abhinav, Feb 16, 2006, in forum: C Programming
    Replies:
    1
    Views:
    1,395
  5. Oscarian

    Web crawler

    Oscarian, Jan 11, 2007, in forum: C Programming
    Replies:
    5
    Views:
    560
    Oscarian
    Jan 12, 2007
Loading...

Share This Page