regex problem

Discussion in 'Ruby' started by K. R., Nov 27, 2007.

  1. K. R.

    K. R. Guest

    hi @all

    I would like to scan a string of html-tags. I need it to take out all
    links (a-tags) in the string, but I become only the last one. What is
    wrong? See the code below...

    response = '<a href="hello1.html">test1</a> - <a
    href="hello2.html">test2</a>'
    response.scan(/<a.*href="(.*?)"/) do |line|
    puts line
    end

    thanks for helping!
    --
    Posted via http://www.ruby-forum.com/.
     
    K. R., Nov 27, 2007
    #1
    1. Advertising

  2. K. R.

    franco Guest

    the first kleene star might need to be non greedy? in other words stop
    at the first href consumed, not the last.
    /<a.*?href="(.*?)"/

    On Nov 27, 11:28 am, "K. R." <> wrote:
    > hi @all
    >
    > I would like to scan a string of html-tags. I need it to take out all
    > links (a-tags) in the string, but I become only the last one. What is
    > wrong? See the code below...
    >
    > response = '<a href="hello1.html">test1</a> - <a
    > href="hello2.html">test2</a>'
    > response.scan(/<a.*href="(.*?)"/) do |line|
    > puts line
    > end
    >
    > thanks for helping!
    > --
    > Posted viahttp://www.ruby-forum.com/.
     
    franco, Nov 27, 2007
    #2
    1. Advertising

  3. K. R.

    franco Guest

    On Nov 27, 12:00 pm, Christian von Kleist <>
    wrote:
    > On Nov 27, 2007 11:28 AM, K. R. <> wrote:
    >
    >
    >
    > > hi @all

    >
    > > I would like to scan a string of html-tags. I need it to take out all
    > > links (a-tags) in the string, but I become only the last one. What is
    > > wrong? See the code below...

    >
    > > response = '<a href="hello1.html">test1</a> - <a
    > > href="hello2.html">test2</a>'
    > > response.scan(/<a.*href="(.*?)"/) do |line|
    > > puts line
    > > end

    but what if href is not the first attribute of <a/>?
    >
    > > thanks for helping!
    > > --
    > > Posted viahttp://www.ruby-forum.com/.

    >
    > Franco is right. You could fix it by doing "a.*?href". However, I
    > would change "a.*href" to "a\s+href" since you're looking for any
    > amount of whitespace after the "a" and before the "href".
    >
    > response = '<a href="hello1.html">test1</a> - <a href="hello2.html">test2</a>'
    > response.scan(/<a\s+href="(.*?)"/s) do |line|
    > puts line
    > end
     
    franco, Dec 1, 2007
    #3
  4. K. R.

    K. R. Guest

    >> response.scan(/<a.*href="(.*?)"/) do |line|
    > but what if href is not the first attribute of <a/>?


    Regardless which order has the attributes, because you can have any
    sequence (.*) between the <a tag and href.
    --
    Posted via http://www.ruby-forum.com/.
     
    K. R., Dec 2, 2007
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?SmViQnVzaGVsbA==?=

    Is ASP Validator Regex Engine Same As VS2003 Find Regex Engine?

    =?Utf-8?B?SmViQnVzaGVsbA==?=, Oct 22, 2005, in forum: ASP .Net
    Replies:
    2
    Views:
    748
    =?Utf-8?B?SmViQnVzaGVsbA==?=
    Oct 22, 2005
  2. Rick Venter

    perl regex to java regex

    Rick Venter, Oct 29, 2003, in forum: Java
    Replies:
    5
    Views:
    1,694
    Ant...
    Nov 6, 2003
  3. Replies:
    2
    Views:
    630
  4. Xah Lee
    Replies:
    1
    Views:
    973
    Ilias Lazaridis
    Sep 22, 2006
  5. Replies:
    3
    Views:
    834
    Reedick, Andrew
    Jul 1, 2008
Loading...

Share This Page