Problem with a regular expression

Discussion in 'Ruby' started by charles.nadeau@gmail.com, Oct 13, 2006.

  1. Guest

    I have the following code snippet:

    require 'net/http'
    begin
    hdoc =
    Net::HTTP.get(URI.parse('http://finance.yahoo.com/lookup?s=Dupont&t=S&m=US'))

    re = /<TD>(.*)</TD>/
    if hdoc =~ re
    print "#{$&}\n"
    else
    print "Nothing\n"
    end
    end

    The regular expression is never matched when I use the code as shown
    above (the expression for re is just a simple one for my testing).
    However, if I replace the variable name hdoc by a string like
    "<TD>Test</TD>Test1", the regular expression is matched. The type of
    hdoc is String. What is wrong with the snippet above. I even tried to
    replace hdoc by hdoc.to_s and it still doesn't work.
    Thanks for your help!

    Charles
    ------
    http://radio.weblogs.com/0111823/
    http://charlesnadeau.blogspot.com/
    , Oct 13, 2006
    #1
    1. Advertising

  2. Hi,

    On Sat, Oct 14, 2006 at 02:55:10AM +0900, wrote:
    > I have the following code snippet:
    >
    > require 'net/http'
    > begin
    > hdoc =
    > Net::HTTP.get(URI.parse('http://finance.yahoo.com/lookup?s=Dupont&t=S&m=US'))
    >
    > re = /<TD>(.*)</TD>/
    > if hdoc =~ re
    > print "#{$&}\n"
    > else
    > print "Nothing\n"
    > end
    > end
    >
    > The regular expression is never matched when I use the code as shown
    > above (the expression for re is just a simple one for my testing).
    > However, if I replace the variable name hdoc by a string like
    > "<TD>Test</TD>Test1", the regular expression is matched. The type of
    > hdoc is String. What is wrong with the snippet above. I even tried to
    > replace hdoc by hdoc.to_s and it still doesn't work.
    > Thanks for your help!


    It looks like there are no upper case "TD" tags in the page that you are
    fetching. Try this instead:

    begin
    hdoc = Net::HTTP.get(URI.parse('http://finance.yahoo.com/lookup?s=Dupont&t=S&m=US'))

    re = /<TD>(.*)<\/TD>/i
    if hdoc =~ re
    print "#{$&}\n"
    else
    print "Nothing\n"
    end
    end

    Your regular expression was case sensitive, I changed it to be case
    insensitive by adding the "i" switch.

    >
    > Charles
    > ------
    > http://radio.weblogs.com/0111823/
    > http://charlesnadeau.blogspot.com/
    >
    >


    --
    Aaron Patterson
    http://tenderlovemaking.com/
    Aaron Patterson, Oct 13, 2006
    #2
    1. Advertising

  3. On Oct 13, 2006, at 1:55 PM, wrote:

    > require 'net/http'
    > begin
    > hdoc =
    > Net::HTTP.get(URI.parse('http://finance.yahoo.com/lookup?
    > s=Dupont&t=S&m=US'))
    >
    > re = /<TD>(.*)</TD>/
    > if hdoc =~ re
    > print "#{$&}\n"
    > else
    > print "Nothing\n"
    > end
    > end


    Whrn I substitute '\/TD' for '/TD' and make the regex case
    insensitive, I get a match. See below:

    <code>
    ! /usr/bin/env ruby -w
    require 'net/http'
    hdoc = Net::HTTP.get(URI.parse('http://finance.yahoo.com/lookup?
    s=Dupont&t=S&m=US'))
    re = /<TD>(.*)<\/TD>/i ### note changes
    if hdoc =~ re
    puts "#{$&}\n"
    else
    puts "Nothing\n"
    end
    </code>

    <result>
    <td><table border="0" cellpadding="6" width="100%"
    cellspacing="0"><tr><td bgcolor="#556f93"><big><b
    style="color:#ffffff">Symbol Lookup </b></big></td></tr></table></
    td></tr><tr><td></td></tr></table></td></tr><tr><td><table
    cellpadding="0" border="0" cellspacing="0"><tr><td></td></tr></
    table></td></tr><tr><td valign="top"><form><table border="0"
    cellpadding="4" bgcolor="a0b8c8" cellspacing="1"><tr><td
    bgcolor="eeeeee"><table cellpadding="1" width="100%"
    cellspacing="0"><tr><td>Name:</td><td>Type:</td><td>Market:</td><td></
    td></tr><tr><td><input size="30" name="s"></td><td><select
    name="t"><option selected value="S"> Stocks </option><option
    value="E"> ETFs </option><option value="I"> Indices </option><option
    value="M"> Mutual Funds </option><option value="F"> Futures </
    option></select></td><td><select name="m"><option selected
    value="US">U.S. & Canada</option><option value="ALL">World Market</
    option></select></td><td><input value="Look Up" type="submit"></td></
    tr><tr><td valign="bottom" colspan="4"><small><a href="http://
    finance.yahoo.com/exchanges">View supported exchanges</a></small></
    td></tr></table></td></tr></table></form><table><tr><td
    align="left">2 results for <b>'Dupont'</b> (type=<b>Stocks</b>,
    market=<b>U.S. &amp; Canada</b>)</td></result>

    Regards, Morton
    Morton Goldberg, Oct 13, 2006
    #3
  4. --------------enig280BCDB6206058B773A41139
    Content-Type: text/plain; charset=ISO-8859-1
    Content-Transfer-Encoding: quoted-printable

    > re =3D /<TD>(.*)</TD>/


    Use a HTML parser? Hpricot considered sexy recently.

    David Vallner


    --------------enig280BCDB6206058B773A41139
    Content-Type: application/pgp-signature; name="signature.asc"
    Content-Description: OpenPGP digital signature
    Content-Disposition: attachment; filename="signature.asc"

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.5 (MingW32)

    iD8DBQFFL+Iwy6MhrS8astoRAr78AJ9k6XyYBuGHGq4rrULWGzG3sgfVFACdGOfP
    ysZLu4JbDY+8t/hq8Ro0ahc=
    =mbxv
    -----END PGP SIGNATURE-----

    --------------enig280BCDB6206058B773A41139--
    David Vallner, Oct 13, 2006
    #4
  5. Guest

    Morton Goldberg wrote:
    > On Oct 13, 2006, at 1:55 PM, wrote:
    >
    > > require 'net/http'
    > > begin
    > > hdoc =
    > > Net::HTTP.get(URI.parse('http://finance.yahoo.com/lookup?
    > > s=Dupont&t=S&m=US'))
    > >
    > > re = /<TD>(.*)</TD>/
    > > if hdoc =~ re
    > > print "#{$&}\n"
    > > else
    > > print "Nothing\n"
    > > end
    > > end

    >
    > Whrn I substitute '\/TD' for '/TD' and make the regex case
    > insensitive, I get a match. See below:
    >
    > <code>
    > ! /usr/bin/env ruby -w
    > require 'net/http'
    > hdoc = Net::HTTP.get(URI.parse('http://finance.yahoo.com/lookup?
    > s=Dupont&t=S&m=US'))
    > re = /<TD>(.*)<\/TD>/i ### note changes
    > if hdoc =~ re
    > puts "#{$&}\n"
    > else
    > puts "Nothing\n"
    > end
    > </code>
    >
    > <result>
    > <td><table border="0" cellpadding="6" width="100%"
    > cellspacing="0"><tr><td bgcolor="#556f93"><big><b
    > style="color:#ffffff">Symbol Lookup </b></big></td></tr></table></
    > td></tr><tr><td></td></tr></table></td></tr><tr><td><table
    > cellpadding="0" border="0" cellspacing="0"><tr><td></td></tr></
    > table></td></tr><tr><td valign="top"><form><table border="0"
    > cellpadding="4" bgcolor="a0b8c8" cellspacing="1"><tr><td
    > bgcolor="eeeeee"><table cellpadding="1" width="100%"
    > cellspacing="0"><tr><td>Name:</td><td>Type:</td><td>Market:</td><td></
    > td></tr><tr><td><input size="30" name="s"></td><td><select
    > name="t"><option selected value="S"> Stocks </option><option
    > value="E"> ETFs </option><option value="I"> Indices </option><option
    > value="M"> Mutual Funds </option><option value="F"> Futures </
    > option></select></td><td><select name="m"><option selected
    > value="US">U.S. & Canada</option><option value="ALL">World Market</
    > option></select></td><td><input value="Look Up" type="submit"></td></
    > tr><tr><td valign="bottom" colspan="4"><small><a href="http://
    > finance.yahoo.com/exchanges">View supported exchanges</a></small></
    > td></tr></table></td></tr></table></form><table><tr><td
    > align="left">2 results for <b>'Dupont'</b> (type=<b>Stocks</b>,
    > market=<b>U.S. &amp; Canada</b>)</td></result>
    >
    > Regards, Morton


    Morton, Aaron,

    You are both right, thanks a lot! I also added "m" at the end of the
    regular expression to match whatever might span two lines.
    Cheers!

    Charles
    ------
    http://radio.weblogs.com/0111823/
    http://charlesnadeau.blogspot.com/
    , Oct 13, 2006
    #5
  6. Guest

    Morton Goldberg wrote:
    > On Oct 13, 2006, at 1:55 PM, wrote:
    >
    > > require 'net/http'
    > > begin
    > > hdoc =
    > > Net::HTTP.get(URI.parse('http://finance.yahoo.com/lookup?
    > > s=Dupont&t=S&m=US'))
    > >
    > > re = /<TD>(.*)</TD>/
    > > if hdoc =~ re
    > > print "#{$&}\n"
    > > else
    > > print "Nothing\n"
    > > end
    > > end

    >
    > Whrn I substitute '\/TD' for '/TD' and make the regex case
    > insensitive, I get a match. See below:
    >
    > <code>
    > ! /usr/bin/env ruby -w
    > require 'net/http'
    > hdoc = Net::HTTP.get(URI.parse('http://finance.yahoo.com/lookup?
    > s=Dupont&t=S&m=US'))
    > re = /<TD>(.*)<\/TD>/i ### note changes
    > if hdoc =~ re
    > puts "#{$&}\n"
    > else
    > puts "Nothing\n"
    > end
    > </code>
    >
    > <result>
    > <td><table border="0" cellpadding="6" width="100%"
    > cellspacing="0"><tr><td bgcolor="#556f93"><big><b
    > style="color:#ffffff">Symbol Lookup </b></big></td></tr></table></
    > td></tr><tr><td></td></tr></table></td></tr><tr><td><table
    > cellpadding="0" border="0" cellspacing="0"><tr><td></td></tr></
    > table></td></tr><tr><td valign="top"><form><table border="0"
    > cellpadding="4" bgcolor="a0b8c8" cellspacing="1"><tr><td
    > bgcolor="eeeeee"><table cellpadding="1" width="100%"
    > cellspacing="0"><tr><td>Name:</td><td>Type:</td><td>Market:</td><td></
    > td></tr><tr><td><input size="30" name="s"></td><td><select
    > name="t"><option selected value="S"> Stocks </option><option
    > value="E"> ETFs </option><option value="I"> Indices </option><option
    > value="M"> Mutual Funds </option><option value="F"> Futures </
    > option></select></td><td><select name="m"><option selected
    > value="US">U.S. & Canada</option><option value="ALL">World Market</
    > option></select></td><td><input value="Look Up" type="submit"></td></
    > tr><tr><td valign="bottom" colspan="4"><small><a href="http://
    > finance.yahoo.com/exchanges">View supported exchanges</a></small></
    > td></tr></table></td></tr></table></form><table><tr><td
    > align="left">2 results for <b>'Dupont'</b> (type=<b>Stocks</b>,
    > market=<b>U.S. &amp; Canada</b>)</td></result>
    >
    > Regards, Morton


    Morton, Aaron,

    You are both right, thanks a lot! I also added "m" at the end of the
    regular expression to match whatever might span two lines.
    Cheers!

    Charles
    ------
    http://radio.weblogs.com/0111823/
    http://charlesnadeau.blogspot.com/
    , Oct 13, 2006
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. godfather2

    Regular Expression problem

    godfather2, Jul 20, 2003, in forum: Perl
    Replies:
    1
    Views:
    998
    Shawn Corey
    Jul 20, 2003
  2. VSK
    Replies:
    2
    Views:
    2,287
  3. Nazir
    Replies:
    3
    Views:
    5,063
    jayendra gadhavi
    Jan 2, 2008
  4. =?iso-8859-1?B?bW9vcJk=?=

    Matching abitrary expression in a regular expression

    =?iso-8859-1?B?bW9vcJk=?=, Dec 1, 2005, in forum: Java
    Replies:
    8
    Views:
    844
    Alan Moore
    Dec 2, 2005
  5. GIMME
    Replies:
    3
    Views:
    11,948
    vforvikash
    Dec 29, 2008
Loading...

Share This Page