regex problem

K. R. · Nov 27, 2007

hi @all

I would like to scan a string of html-tags. I need it to take out all
links (a-tags) in the string, but I become only the last one. What is
wrong? See the code below...

response = '<a href="hello1.html">test1</a> - <a
href="hello2.html">test2</a>'
response.scan(/<a.*href="(.*?)"/) do |line|
puts line
end

thanks for helping!

franco · Nov 27, 2007

the first kleene star might need to be non greedy? in other words stop
at the first href consumed, not the last.
/<a.*?href="(.*?)"/

franco · Dec 1, 2007

but what if href is not the first attribute of said:
Franco is right. You could fix it by doing "a.*?href". However, I
would change "a.*href" to "a\s+href" since you're looking for any
amount of whitespace after the "a" and before the "href".

response = '<a href="hello1.html">test1</a> - <a href="hello2.html">test2</a>'
response.scan(/<a\s+href="(.*?)"/s) do |line|
puts line
end

K. R. · Dec 2, 2007

response.scan(/<a.*href="(.*?)"/) do |line|

but what if href is not the first attribute of <a/>?

Regardless which order has the attributes, because you can have any
sequence (.*) between the <a tag and href.

RegEx	0	Sep 1, 2022
Only one table shows up with the information	2	Mar 29, 2023
Working on mobile css menu with plenty of frustration!	2	Dec 29, 2022
Creating a regex to get multiple values and print	0	Jan 10, 2021
Timing problem	4	May 1, 2023
Background image not showing up on html page	3	Sep 23, 2023
Possible PHP/WP problem with code, trouble accessing custom archive links	1	Jan 5, 2023
JavaScript code not working!!	6	Jun 13, 2023

regex problem

K. R.

franco

franco

K. R.

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads