extract character strings from displayed web page.

Discussion in 'C Programming' started by A Causal, Oct 15, 2003.

  1. A Causal

    A Causal Guest

    I'm an experienced C programmer, but I have never worked with any sort
    of internet programming. I would like to write a program to search for
    certain character strings in a currently displayed web page, and then
    get the string that immediatly follows the one that I searched for. It
    seems like an easy thing to do, after all the stuff that I want is
    staring me right in the face, but I have no idea where that stuff is
    stored or how to access it.


    Thanks

    Ron
     
    A Causal, Oct 15, 2003
    #1
    1. Advertisements

  2. (A Causal) wrote:

    >I'm an experienced C programmer, but I have never worked with any sort
    >of internet programming. I would like to write a program to search for
    >certain character strings in a currently displayed web page, and then
    >get the string that immediatly follows the one that I searched for. It
    >seems like an easy thing to do, after all the stuff that I want is
    >staring me right in the face, but I have no idea where that stuff is
    >stored or how to access it.


    As this is highly OS/application dependent, you should ask this in a
    newsgroup dedicated to that, as comp.lang.c is only about portable
    ISO-C. See http://www.angelfire.com/ms3/bchambless0/welcome_to_clc.html


    Regards
    --
    Irrwahn
    ()
     
    Irrwahn Grausewitz, Oct 15, 2003
    #2
    1. Advertisements

  3. Greetings.

    In article <>, A Causal
    wrote:
    > I'm an experienced C programmer, but I have never worked with any sort
    > of internet programming. I would like to write a program to search for
    > certain character strings in a currently displayed web page, and then
    > get the string that immediatly follows the one that I searched for. It
    > seems like an easy thing to do, after all the stuff that I want is
    > staring me right in the face, but I have no idea where that stuff is
    > stored or how to access it.


    Interfacing with web browsers or the http protocol is not something which is
    built into C, so there is no standard answer to your query. It will depend
    on your particular compiler, operating system, and/or whatever third-party
    libraries you use. If you assume that the user has already saved the HTML
    file to disk, however, then it's just a regular text file which C can
    process.

    Note that C isn't particularly well-suited for intensive text processing,
    though; unless it's being integrated in a much larger C program, it would
    be better and faster to write the sort of application you describe using
    some regexp-based tool such as sed or perl.

    Regards,
    Tristan

    --
    _
    _V.-o Tristan Miller [en,(fr,de,ia)] >< Space is limited
    / |`-' -=-=-=-=-=-=-=-=-=-=-=-=-=-=-= <> In a haiku, so it's hard
    (7_\\ http://www.nothingisreal.com/ >< To finish what you
     
    Tristan Miller, Oct 15, 2003
    #3
  4. Tristan Miller <> spoke thus:

    > Note that C isn't particularly well-suited for intensive text processing,
    > though; unless it's being integrated in a much larger C program, it would
    > be better and faster to write the sort of application you describe using
    > some regexp-based tool such as sed or perl.


    Why do you say that? (note that this is an honest question, not a challenge)

    --
    Christopher Benson-Manica | I *should* know what I'm talking about - if I
    ataru(at)cyberspace.org | don't, I need to know. Flames welcome.
     
    Christopher Benson-Manica, Oct 15, 2003
    #4
  5. Greetings.

    In article <bmk98r$atb$>, Christopher Benson-Manica wrote:
    >> Note that C isn't particularly well-suited for intensive text processing,
    >> though; unless it's being integrated in a much larger C program, it would
    >> be better and faster to write the sort of application you describe using
    >> some regexp-based tool such as sed or perl.

    >
    > Why do you say that? (note that this is an honest question, not a
    > challenge)


    It's simply a question of specialization of tools. It's certainly possible
    to drive in a nail using the blunt end of a screwdriver, though it would be
    faster and less accident-prone to use a hammer. Likewise, building
    applications (even small ones) which deal almost exclusively with text
    processing is usually more efficient (with respect to development time and
    ease of debugging, not necessarily execution speed) when using a language
    specifically devoted to that task. A program to do regular expression
    search-and-replacement on multiple files is literally four characters long
    in sed (not counting the filenames and regular expressions themselves); the
    corresponding program in C would necessarily be several lines long, even if
    one used a third-party regexp library. You would need to include the
    regexp and stdio headers, define the main function, declare a file pointer,
    open each file in argv[] for reading (including error checking), loop
    through each line of the file, do the regexp replacement, write out the new
    line, close the file, and finally return from main. Sure, the compiled C
    program might run a hundred times faster than the corresponding interpreted
    sed or perl code, but if it's just a one-off program, you've just wasted
    five minutes to write the C program plus 0.00001 seconds to run it versus
    spending five seconds to write the sed program plus 0.001 seconds to run
    it.

    Regards,
    Tristan

    --
    _
    _V.-o Tristan Miller [en,(fr,de,ia)] >< Space is limited
    / |`-' -=-=-=-=-=-=-=-=-=-=-=-=-=-=-= <> In a haiku, so it's hard
    (7_\\ http://www.nothingisreal.com/ >< To finish what you
     
    Tristan Miller, Oct 16, 2003
    #5
  6. Greetings.

    In article <bmm2nc$grp$>, Christopher Benson-Manica wrote:
    > You've clearly never hit your thumb with a hammer ;) Seriously, on
    > reading your original post, I thought you were speaking of execution
    > efficiency, which
    > you were not. No complaints from me in that case... Would you say, then,
    > that C is pretty good for text processing as far as execution efficiency
    > is concerned?


    Optimization for speed and memory use is compiler-dependent, but generally
    speaking, yes, a well-written algorithm in compiled C will be faster at
    running text processing applications than the same application executed in
    interpreted sed. With C, you're simply "closer to the hardware", plus
    there's no need to load in a potentially huge interpreter every time you
    want to run your application.

    In this day and age, however, you aren't going to gain that much in text
    processing even if you do use C. The bottleneck in the application is more
    likely to be the inherent sloth of I/O rather than inefficient code. I
    work in natural-language processing, and I can attest that even those
    researchers who routinely process text corpora ranging into the gigabytes
    don't flinch at using high-level text- or logic-oriented languages like
    Perl or Prolog to munge the data. We tend to use C more for plain old
    number-crunching, as with the large co-occurrence matrices the
    aforementioned mungers may produce.

    Regards,
    Tristan

    --
    _
    _V.-o Tristan Miller [en,(fr,de,ia)] >< Space is limited
    / |`-' -=-=-=-=-=-=-=-=-=-=-=-=-=-=-= <> In a haiku, so it's hard
    (7_\\ http://www.nothingisreal.com/ >< To finish what you
     
    Tristan Miller, Oct 16, 2003
    #6
  7. A Causal

    Derk Gwen Guest

    (A Causal) wrote:
    # I'm an experienced C programmer, but I have never worked with any sort
    # of internet programming. I would like to write a program to search for
    # certain character strings in a currently displayed web page, and then
    # get the string that immediatly follows the one that I searched for. It
    # seems like an easy thing to do, after all the stuff that I want is
    # staring me right in the face, but I have no idea where that stuff is
    # stored or how to access it.

    You'll need some library to open the socket and fetch the page; this is not
    part of standard C. You'll also need to decide exactly what you mean by 'string'
    and 'after' if you are fetching (as normal) an HTML page; you can also find
    libraries to parse HTML if you need that. If you don't need to parse the HTML,
    you can just read the socket stream with stdio and use state table or
    strstr() or other such techniques to scan the input.

    You can also use something other than C. Scripting languages can do this kind
    of stuff in half a dozen lines.

    --
    Derk Gwen http://derkgwen.250free.com/html/index.html
    What kind of convenience store do you run here?
     
    Derk Gwen, Oct 16, 2003
    #7
    1. Advertisements

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. John Lenton
    Replies:
    0
    Views:
    445
    John Lenton
    Jul 15, 2004
  2. boney
    Replies:
    1
    Views:
    669
  3. Ben

    Strings, Strings and Damned Strings

    Ben, Jun 22, 2006, in forum: C Programming
    Replies:
    14
    Views:
    898
    Malcolm
    Jun 24, 2006
  4. Replies:
    6
    Views:
    424
    CBFalconer
    Mar 24, 2007
  5. Peter K

    arrange order of displayed strings?

    Peter K, Apr 27, 2010, in forum: ASP .Net Web Controls
    Replies:
    0
    Views:
    759
    Peter K
    Apr 27, 2010
Loading...

Share This Page