Return first line of parsing

H

Haze Noc

mysite.each {|line|
if line =~ /<p><a href="(.+)"><b>(.+)<\/b>/
puts "#{$2} found at: #{$1}"
end
}

Ok guys, Lets say the website has 50+ lines.. and i only want to return
the first one, any ideas?
 
T

Tim Pease

mysite.each {|line|
if line =~ /<p><a href="(.+)"><b>(.+)<\/b>/
puts "#{$2} found at: #{$1}"
end
}

Ok guys, Lets say the website has 50+ lines.. and i only want to return
the first one, any ideas?

%r/^(.*)$/.match(mysite)[1]
 
J

John Joyce

mysite.each {|line|
if line =~ /<p><a href="(.+)"><b>(.+)<\/b>/
puts "#{$2} found at: #{$1}"
end
}

Ok guys, Lets say the website has 50+ lines.. and i only want to
return
the first one, any ideas?

%r/^(.*)$/.match(mysite)[1]
Careful,
What if the site's white space has been stripped? (no CR or LF at all)
or if the html/xhtml is screwy? (old html without closed elements,
or just poorly formed or badly nested)
 
K

Konrad Meyer

--DSPAM_MULTIPART_EX-1582
Content-Type: multipart/signed;
boundary="nextPart13209588.lWzY7ESHj5";
protocol="application/pgp-signature";
micalg=pgp-sha1

--nextPart13209588.lWzY7ESHj5
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

=20
On Aug 17, 2007, at 8:59 AM, Tim Pease wrote:
=20
mysite.each {|line|
if line =3D~ /<p><a href=3D"(.+)"><b>(.+)<\/b>/
puts "#{$2} found at: #{$1}"
end
}

Ok guys, Lets say the website has 50+ lines.. and i only want to =20
return
the first one, any ideas?

%r/^(.*)$/.match(mysite)[1]
Careful,
What if the site's white space has been stripped? (no CR or LF at all)
or if the html/xhtml is screwy? (old html without closed elements, =20
or just poorly formed or badly nested)

The first line of any file doesn't depend on the html in it. His code
should return the first line of any file -- html or no.

=2D-=20
Konrad Meyer <[email protected]> http://konrad.sobertillnoon.com/

--nextPart13209588.lWzY7ESHj5
Content-Type: application/pgp-signature; name=signature.asc
Content-Description: This is a digitally signed message part.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQBGxipkCHB0oCiR2cwRAt0GAKCCmmALSq0QYEJkjZg2PkVEfU9sswCgwKCc
gQrofBrQVZDDF/TqSs6LSqo=
=w9Ml
-----END PGP SIGNATURE-----

--nextPart13209588.lWzY7ESHj5--

--DSPAM_MULTIPART_EX-1582
Content-Type: text/plain
X-DSPAM-Signature: 46c62a6715821228095555

!DSPAM:46c62a6715821228095555!
--DSPAM_MULTIPART_EX-1582--
 
Y

yermej

mysite.each {|line|
if line =~ /<p><a href="(.+)"><b>(.+)<\/b>/
puts "#{$2} found at: #{$1}"
end

}

Ok guys, Lets say the website has 50+ lines.. and i only want to return
the first one, any ideas?

If you want to use essentially the same block as above, but just take
the first matching line:

mysite.each {|line|
if line =~ /<p><a href="(.+)"><b>(.+)<\/b>/
puts "#{$2} found at: #{$1}"
break
end
}

Tim's solution would give you the first line of the actual html file
and, as John mentions, that could be the entire web page if there are
no CR/LF characters in the file.

Jeremy
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,434
Messages
2,571,685
Members
48,796
Latest member
Greg L.

Latest Threads

Top