HTML-Parser / SGML-Parser

Discussion in 'Ruby' started by Zach Dennis, Oct 1, 2003.

  1. Zach Dennis

    Zach Dennis Guest

    Ok, silly question.

    I am writing a script to determine my router's WAN ip address and then to
    email me once an hour in case it changes. Currently I am running a web
    server at work that returns a page with the client's ip address. I need to
    parse out the info on the page so I can extract the ip address of my router
    when my script/program connects.

    I am using the html-parser, sgml-parser and formatter ruby libraries
    provided from raa and I have made the changes to the regexp regarding image
    width and height. So I'm good there.

    In my test.rb file I say:
    ------------------------------------------------
    h = Net::HTTP.new('www.zachstestip.com' , 80 )
    resp,data = h.get('/index.php' , nil )

    w = DumbWriter.new
    f = AbstractFormatter.new(w)
    p = HTMLParser.new(f)
    p.feed(data)
    p.close
    ------------------------------------------------

    Here comes the silly part. The function "feed" is inherited by sgml-parser
    to html-parser. It passes "data" along to the sgml-parser function
    "goahead". It prints everything to stdout or stderr( i dont know, but it
    makes it to my screen =), but there is no print, put, etc... etc... call to
    send it there!!! I cant for the life of me determine where in the feed or
    goahead functions are outputting my parsed results from data! This is damn
    silly of me to ask I know, but how is it getting to my CLI?

    In the "goahead" function there is a giant while loop. If i place a print or
    puts statement at the right before the loop and right after the loop, then
    nothing is outputted( except for my explicit print/puts statements).

    Am I losing it?

    Zach
    Zach Dennis, Oct 1, 2003
    #1
    1. Advertising

  2. Zach Dennis

    Sean O'Dell Guest

    Zach Dennis wrote:
    > Ok, silly question.
    >
    > I am writing a script to determine my router's WAN ip address and then to
    > email me once an hour in case it changes. Currently I am running a web
    > server at work that returns a page with the client's ip address. I need to
    > parse out the info on the page so I can extract the ip address of my router
    > when my script/program connects.
    >
    > I am using the html-parser, sgml-parser and formatter ruby libraries
    > provided from raa and I have made the changes to the regexp regarding image
    > width and height. So I'm good there.
    >
    > In my test.rb file I say:
    > ------------------------------------------------
    > h = Net::HTTP.new('www.zachstestip.com' , 80 )
    > resp,data = h.get('/index.php' , nil )
    >
    > w = DumbWriter.new
    > f = AbstractFormatter.new(w)
    > p = HTMLParser.new(f)
    > p.feed(data)
    > p.close
    > ------------------------------------------------
    >
    > Here comes the silly part. The function "feed" is inherited by sgml-parser
    > to html-parser. It passes "data" along to the sgml-parser function
    > "goahead". It prints everything to stdout or stderr( i dont know, but it
    > makes it to my screen =), but there is no print, put, etc... etc... call to
    > send it there!!! I cant for the life of me determine where in the feed or
    > goahead functions are outputting my parsed results from data! This is damn
    > silly of me to ask I know, but how is it getting to my CLI?
    >
    > In the "goahead" function there is a giant while loop. If i place a print or
    > puts statement at the right before the loop and right after the loop, then
    > nothing is outputted( except for my explicit print/puts statements).
    >
    > Am I losing it?


    Why not just qualify your IP address with something like >>>>IP<<<< and
    then you can regex for it like this:

    match = />>>>(.+)<<<</.match(HTML)

    match[1] => your IP address

    Sean O'Dell
    Sean O'Dell, Oct 1, 2003
    #2
    1. Advertising

  3. Zach Dennis

    Ara.T.Howard Guest

    On Wed, 1 Oct 2003, Zach Dennis wrote:

    > Ok, silly question.
    >
    > I am writing a script to determine my router's WAN ip address and then to
    > email me once an hour in case it changes. Currently I am running a web
    > server at work that returns a page with the client's ip address. I need to
    > parse out the info on the page so I can extract the ip address of my router
    > when my script/program connects.


    check out dyndns.org - they have scripts for just about every router that does
    this.

    > I am using the html-parser, sgml-parser and formatter ruby libraries
    > provided from raa and I have made the changes to the regexp regarding image
    > width and height. So I'm good there.
    >
    > In my test.rb file I say:
    > ------------------------------------------------
    > h = Net::HTTP.new('www.zachstestip.com' , 80 )
    > resp,data = h.get('/index.php' , nil )
    >
    > w = DumbWriter.new
    > f = AbstractFormatter.new(w)
    > p = HTMLParser.new(f)
    > p.feed(data)
    > p.close
    > ------------------------------------------------


    one thing i might point out here - i myself have spent hours trying to figure
    out weird bugs after naming a variable 'p'. worth a check...

    > Here comes the silly part. The function "feed" is inherited by sgml-parser
    > to html-parser. It passes "data" along to the sgml-parser function
    > "goahead". It prints everything to stdout or stderr( i dont know, but it
    > makes it to my screen =), but there is no print, put, etc... etc... call to
    > send it there!!! I cant for the life of me determine where in the feed or
    > goahead functions are outputting my parsed results from data! This is damn
    > silly of me to ask I know, but how is it getting to my CLI?
    >
    > In the "goahead" function there is a giant while loop. If i place a print or
    > puts statement at the right before the loop and right after the loop, then
    > nothing is outputted( except for my explicit print/puts statements).


    you could also try something like this to track the problem:

    alias __p p
    alias __print print
    alias __puts puts

    def p(*args);STDERR.p(caller.join("\n")); __p(*args);end
    def print(*args);STDERR.print(caller.join("\n")); __print(*args);end
    def puts(*args);STDERR.puts(caller.join("\n")); __puts(*args);end

    i'm note sure you'd need all three but... you get the picture.

    -a
    ====================================
    | Ara Howard
    | NOAA Forecast Systems Laboratory
    | Information and Technology Services
    | Data Systems Group
    | R/FST 325 Broadway
    | Boulder, CO 80305-3328
    | Email:
    | Phone: 303-497-7238
    | Fax: 303-497-7259
    | The difference between art and science is that science is what we understand
    | well enough to explain to a computer. Art is everything else.
    | -- Donald Knuth, "Discover"
    | ~ > /bin/sh -c 'for lang in ruby perl; do $lang -e "print \"\x3a\x2d\x29\x0a\""; done'
    ====================================
    Ara.T.Howard, Oct 1, 2003
    #3
  4. This doesn't answer your questions about Ruby, but most of what you want
    exists already.

    Look at http://www.dyndns.org. I've been using them for a year or so.
    Every 5 minutes, a Perl daemon (ddclient) on my system wakes up, grabs
    the WAN address from my Linksys box, and if it's changed, updates
    dyndns. I can ssh into my system at home using the name
    'tidal.dyndns.org', even though the address actually belongs to my ISP.
    It works great, and it's free.

    Steve
    Steven Jenkins, Oct 1, 2003
    #4
  5. Zach Dennis

    Ben Giddings Guest

    Zach Dennis wrote:
    > I am using the html-parser, sgml-parser and formatter ruby libraries
    > provided from raa and I have made the changes to the regexp regarding image
    > width and height. So I'm good there.


    I think the HTML parser might be abandoned (RAA says the last update was
    2001-07-10 13:35:40 GMT).

    You might have better luck using (my) htmltokenizer. It has a really
    simple interface, and it might be more what you need:

    http://raa.ruby-lang.org/list.rhtml?name=htmltokenizer

    If you really want to use the html-parser, sorry, I can't help you. I
    never managed to understand how to work it, which is why I ported the
    htmltokenizer.

    Ben
    Ben Giddings, Oct 1, 2003
    #5
  6. > ------------------------------------------------
    > h = Net::HTTP.new('www.zachstestip.com' , 80 )
    > resp,data = h.get('/index.php' , nil )
    >
    > w = DumbWriter.new
    > f = AbstractFormatter.new(w)
    > p = HTMLParser.new(f)
    > p.feed(data)
    > p.close
    > ------------------------------------------------
    >
    > Here comes the silly part. The function "feed" is inherited by
    > sgml-parser to html-parser. It passes "data" along to the sgml-parser
    > function "goahead". It prints everything to stdout or stderr( i dont
    > know, but it makes it to my screen =), but there is no print, put,
    > etc... etc... call to send it there!!! I cant for the life of me
    > determine where in the feed or goahead functions are outputting my
    > parsed results from data! This is damn silly of me to ask I know, but
    > how is it getting to my CLI?


    Through the DumbWriter. Check its implementation in
    ....\Ruby\lib\ruby\site_ruby\formatter.rb
    that's where the "write" statements live.

    Often times when you want to parse HTML, it is simpler to use
    the (misleadingly named) SGMLParser. Anyway these libraries are
    direct ports of python modules, and can only be understood by
    checking the documentation of the originals.
    See eg: http://www.python.org/doc/1.5.2/lib/module-sgmllib.html
    And usage examples (in python ;-)
    http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/52281
    http://www.oreilly.com/catalog/pythonsl/chapter/ch05.html#t4

    Cheers,

    Bernard.
    Bernard Delmée, Oct 1, 2003
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. (Pete Cresswell)

    SGML Parser doesn't like <script> contents?

    (Pete Cresswell), Dec 24, 2004, in forum: HTML
    Replies:
    30
    Views:
    1,388
    dszady
    Dec 27, 2004
  2. Clifford W. Racz
    Replies:
    4
    Views:
    2,016
    Clifford W. Racz
    Feb 13, 2004
  3. Benjamin Niemann

    Validating SGML parser?

    Benjamin Niemann, Aug 21, 2004, in forum: Python
    Replies:
    2
    Views:
    356
    Benjamin Niemann
    Aug 22, 2004
  4. bug in SGML parser

    , Dec 7, 2006, in forum: Ruby
    Replies:
    0
    Views:
    134
  5. Srijayanth Sridhar
    Replies:
    0
    Views:
    95
    Srijayanth Sridhar
    May 23, 2008
Loading...

Share This Page