search question

Discussion in 'Perl Misc' started by hgwoss@gmx.de, Sep 18, 2005.

  1. Guest

    Hi,

    I would like to extract a certain link url and link title from an html
    document, which is stored in a text file.

    it may look like this:

    "A lot of text. <a href="linkurl.html">Link Title</a> Even more Text."

    My question is: What is the most efficient way of doing that?
     
    , Sep 18, 2005
    #1
    1. Advertising

  2. Matija Papec Guest

    X-Ftn-To:

    wrote:
    >I would like to extract a certain link url and link title from an html
    >document, which is stored in a text file.
    >
    >it may look like this:
    >
    >"A lot of text. <a href="linkurl.html">Link Title</a> Even more Text."
    >
    >My question is: What is the most efficient way of doing that?


    from perldoc,
    perldoc -q extract
    =========
    How do I extract URLs?
    You can easily extract all sorts of URLs from HTML with
    "HTML::SimpleLinkExtor" which handles anchors, images, objects,
    frames, and many other tags that can contain a URL. If you need
    anything more complex, you can create your own subclass of
    "HTML::LinkExtor" or "HTML::parser". You might even use
    "HTML::SimpleLinkExtor" as an example for something specifically
    suited to your needs.

    You can use URI::Find to extract URLs from an arbitrary text
    document.


    --
    Matija
     
    Matija Papec, Sep 18, 2005
    #2
    1. Advertising

  3. wrote:
    > Hi,
    >
    > I would like to extract a certain link url and link title from an html
    > document, which is stored in a text file.
    >
    > it may look like this:
    >
    > "A lot of text. <a href="linkurl.html">Link Title</a> Even more Text."
    >
    > My question is: What is the most efficient way of doing that?


    text = <<HERE
    A lot of text. <a href="linkurl.html">Link Title</a>
    Even more Text.
    HERE

    if text =~ /<a href="(.*?)">(.*?)<\/a>/m
    printf "%s links to %s.\n", $2, $1
    end
     
    William James, Sep 19, 2005
    #3
  4. William James wrote:
    > wrote:
    >> Hi,
    >>
    >> I would like to extract a certain link url and link title from an
    >> html document, which is stored in a text file.
    >>
    >> it may look like this:
    >>
    >> "A lot of text. <a href="linkurl.html">Link Title</a> Even more
    >> Text."
    >>
    >> My question is: What is the most efficient way of doing that?

    >
    > text = <<HERE
    > A lot of text. <a href="linkurl.html">Link Title</a>
    > Even more Text.
    > HERE
    >
    > if text =~ /<a href="(.*?)">(.*?)<\/a>/m


    Which works for the given example but of course fails for a myriad of other,
    probably legitimate examples. See the FAQ and Google about why using simple
    REs for parsing HTML is not a good idea at all.

    jue
     
    Jürgen Exner, Sep 19, 2005
    #4
  5. John Bokma Guest

    "Bill Segraves" <> wrote:

    > "William James" <> wrote in message


    > In Perl, the above code has numerous errors, and as such, is
    > undeserving of the implied superlatives you assigned to it. Perhaps
    > you intended to post the code to a different Usenet newsgroup.


    Please ignore the Ruby troll.

    --
    John Small Perl scripts: http://johnbokma.com/perl/
    Perl programmer available: http://castleamber.com/
    Happy Customers: http://castleamber.com/testimonials.html
     
    John Bokma, Sep 19, 2005
    #5
  6. John Bokma Guest

    "Bill Segraves" <> wrote:

    > "John Bokma" <> wrote in message
    > news:Xns96D68FB6434E8castleamber@130.133.1.4...
    >> "Bill Segraves" <> wrote:
    >>
    >> > "William James" <> wrote in message

    >>
    >> > In Perl, the above code has numerous errors, and as such, is
    >> > undeserving of the implied superlatives you assigned to it. Perhaps
    >> > you intended to post the code to a different Usenet newsgroup.

    >>
    >> Please ignore the Ruby troll.

    >
    > Normally, I do.
    >
    > In this case, however, the Ruby troll neglected to mention his code
    > was written in Ruby, which might have been misleading to the OP,
    > especially re: "jue" Exner's response. My response was intended for
    > the benefit of the OP.


    Ah, ok, I understand, apologies :)

    > For the further benefit of the OP, what could be simpler than the
    > first example given in the documentation for HTML::TokeParser? This
    > "correct" code parses HTML with <A> tags and textual information
    > spread across multiple lines, while the code the Ruby troll posted
    > fails miserably on similarly-mangled HTML.


    Yup, it's a troll. I mean, why is it hanging out in a Perl related group,
    there must be a Ruby group.

    --
    John Small Perl scripts: http://johnbokma.com/perl/
    Perl programmer available: http://castleamber.com/
    Happy Customers: http://castleamber.com/testimonials.html
     
    John Bokma, Sep 20, 2005
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?TGFrc2htaSBOYXJheWFuYW4uUg==?=

    Google search result like site search!! How?

    =?Utf-8?B?TGFrc2htaSBOYXJheWFuYW4uUg==?=, May 5, 2005, in forum: ASP .Net
    Replies:
    3
    Views:
    684
    Lucas Tam
    May 6, 2005
  2. Andy
    Replies:
    1
    Views:
    364
    Jack Klein
    Nov 25, 2003
  3. Anand Pillai

    String search vs regexp search

    Anand Pillai, Oct 12, 2003, in forum: Python
    Replies:
    10
    Views:
    601
    Anand Pillai
    Oct 15, 2003
  4. mason66
    Replies:
    0
    Views:
    426
    mason66
    Jul 27, 2006
  5. Abby Lee
    Replies:
    5
    Views:
    427
    Abby Lee
    Aug 2, 2004
Loading...

Share This Page