ruby regex on html file

eggie5 · Sep 26, 2007

I'm trying to write a rake task to extract all the script tags out of
my html file and save them to an array. How can I do this?

Below is a snippet form my file management.rhtml, I would like to get
paths to the script files from all the script tags inside the  HTML comment tags.

Expected results are:

/javascripts/prototype.js,
management/javascripts/management.js,
/javascripts/scriptaculous.js,
/javascripts/effects.js,
/javascripts/controls.js

Snippet:



<script type="text/javascript" src="/javascripts/prototype.js"></
script>

<script type="text/javascript" src="management/javascripts/
management.js"></script>

<script src="/javascripts/scriptaculous.js" type="text/
javascript"></script>

<script src="/javascripts/effects.js" type="text/javascript"></
script>

<script src="/javascripts/controls.js" type="text/javascript"></
script>

Ari Brown · Sep 26, 2007

<sigh>
I have no shame....

For something as large and (maybe) complex as this, you might want to
try generating your regexp through TextualRegexp.

gem install TextualRegexp

Good luck
ari

I'm trying to write a rake task to extract all the script tags out of
my html file and save them to an array. How can I do this?

Below is a snippet form my file management.rhtml, I would like to get
paths to the script files from all the script tags inside the  HTML comment tags.

---------------------------------------------------------------|
~Ari
"I don't suffer from insanity. I enjoy every minute of it" --1337est
man alive

Une Bévue · Sep 26, 2007

eggie5 said:
I'm trying to write a rake task to extract all the script tags out of
my html file and save them to an array. How can I do this?

Is that a solution 4 u ??? :

#! /usr/bin/env ruby

html = ' 

<script type="text/javascript"
src="/javascripts/prototype.js">
</script>

<script type="text/javascript"
src="management/javascripts/management.js">
</script>

<script src="/javascripts/scriptaculous.js"
type="text/javascript"></script>

<script src="/javascripts/effects.js"
type="text/javascript"></script>

<script src="/javascripts/controls.js"
type="text/javascript"></script>


'
js = []
html.each {|l|
js << l.chomp.gsub(/.* src="(.*[^ ])"[ >].*/, '\1').gsub(/(.*)"
type=.*/, '\1') if /<script / === l
}
p js

gives :
RubyMate r6354 running Ruby r1.8.6 (/opt/local/bin/ruby)
["/javascripts/prototype.js", "management/javascripts/management.js",
"/javascripts/scriptaculous.js", "/javascripts/effects.js",
"/javascripts/controls.js"]

on Mac OS X 10.4.10

i didn't found a solution with only one gsub...
sure it exits :[

eggie5 · Sep 26, 2007

eggie5 said:
eggie5 said:

I'm trying to write a rake task to extract all the script tags out of
my html file and save them to an array. How can I do this?

Click to expand...

Is that a solution 4 u ??? :

#! /usr/bin/env ruby

html = ' 

<script type="text/javascript"
src="/javascripts/prototype.js">
</script>

<script type="text/javascript"
src="management/javascripts/management.js">
</script>

<script src="/javascripts/scriptaculous.js"
type="text/javascript"></script>

<script src="/javascripts/effects.js"
type="text/javascript"></script>

<script src="/javascripts/controls.js"
type="text/javascript"></script>


'
js = []
html.each {|l|
js << l.chomp.gsub(/.* src="(.*[^ ])"[ >].*/, '\1').gsub(/(.*)"
type=.*/, '\1') if /<script / === l}

p js

gives :
RubyMate r6354 running Ruby r1.8.6 (/opt/local/bin/ruby)

["/javascripts/prototype.js", "management/javascripts/management.js",
"/javascripts/scriptaculous.js", "/javascripts/effects.js",
"/javascripts/controls.js"]

on Mac OS X 10.4.10

i didn't found a solution with only one gsub...
sure it exits :[

Thank you so must for your effort. This is much more succinct than
what I came up with!

File.open("app/views/layouts/management.rhtml", "r") do |infile|
file_text=""
while (line = infile.gets)
file_text << line
end

script_block=file_text.match("[\\S\\s]*?")

script_block=script_block.to_s
script_refs=script_block.scan(/[^\"]+.js/)

script_refs.length

script_refs.each do |ref|
base_path = "public/"
puts "#{base_path}#{ref}"
end
end

William James · Sep 26, 2007

I'm trying to write a rake task to extract all the script tags out of
my html file and save them to an array. How can I do this?

Below is a snippet form my file management.rhtml, I would like to get
paths to the script files from all the script tags inside the  HTML comment tags.

Expected results are:

/javascripts/prototype.js,
management/javascripts/management.js,
/javascripts/scriptaculous.js,
/javascripts/effects.js,
/javascripts/controls.js

Snippet:



<script type="text/javascript" src="/javascripts/prototype.js"></
script>

<script type="text/javascript" src="management/javascripts/
management.js"></script>

<script src="/javascripts/scriptaculous.js" type="text/
javascript"></script>

<script src="/javascripts/effects.js" type="text/javascript"></
script>

<script src="/javascripts/controls.js" type="text/javascript"></
script>

puts DATA.read.scan( /<script\s+[^>]*src="(.*?)"/m ).flatten

__END__


<script type="text/javascript" src="/javascripts/prototype.js">
</script>

<script type="text/javascript" src="management/javascripts/
management.js">
</script>

<script src="/javascripts/scriptaculous.js" type="text/javascript">
</script>

<script src="/javascripts/effects.js" type="text/javascript"></
script>

<script src="/javascripts/controls.js" type="text/javascript"></
script>

Une Bévue · Sep 26, 2007

eggie5 said:
Thank you so must for your effort. This is much more succinct than
what I came up with!

I found it with only one gsub :

#! /usr/bin/env ruby

html = ' 

<script type="text/javascript"
src="/javascripts/prototype.js">
</script>

<script type="text/javascript"
src="management/javascripts/management.js">
</script>

<script src="/javascripts/scriptaculous.js"
type="text/javascript"></script>

<script src="/javascripts/effects.js"
type="text/javascript"></script>

<script src="/javascripts/controls.js"
type="text/javascript"></script>


'
js = []
html.each {|l|
js << l.chomp.gsub(/^\s+<script\s+[^>]*src="([^ "]*).*/, '\1') if
/<script / === l
}
p js

gives :

["/javascripts/prototype.js", "management/javascripts/management.js",
"/javascripts/scriptaculous.js", "/javascripts/effects.js",
"/javascripts/controls.js"]

best,

Une Bévue · Sep 26, 2007

William James said:
puts DATA.read.scan( /<script\s+[^>]*src="(.*?)"/m ).flatten

i don't understand your "?" here --------------^

what is his meaning after * ???

Konrad Meyer · Sep 26, 2007

--nextPart1595087.hTo1LGE6MD
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

Quoth Une B=E9vue:

William James said:
William James said:

puts DATA.read.scan( /<script\s+[^>]*src=3D"(.*?)"/m ).flatten

Click to expand...

i don't understand your "?" here --------------^
=20
what is his meaning after * ???
--=20
Une B=E9vue

Non-greedy match. Find as few characters as possible to match, which in thi=
s=20
case means don't match quote characters.

HTH,
=2D-=20
Konrad Meyer <[email protected]> http://konrad.sobertillnoon.com/

--nextPart1595087.hTo1LGE6MD
Content-Type: application/pgp-signature; name=signature.asc
Content-Description: This is a digitally signed message part.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQBG+dzXCHB0oCiR2cwRAk7CAKDA+CL2xgZ/DlNWD2IeC7yFwGyXAgCgty0v
QjoBIBGnzB/c08j7+AtcojE=
=kxlF
-----END PGP SIGNATURE-----

--nextPart1595087.hTo1LGE6MD--

Daniel Sheppard · Sep 26, 2007

I'm trying to write a rake task to extract all the script tags out of

my html file and save them to an array. How can I do this?

Your subject says regex, but your request says Hpricot:

require 'hpricot'
doc =3D Hpricot(input)
scripts =3D (doc/'script').map {|x| x['src']}.compact

eggie5 · Sep 26, 2007

I'm trying to write a rake task to extract all the script tags out of
my html file and save them to an array. How can I do this?

Click to expand...

Your subject says regex, but your request says Hpricot:

require 'hpricot'
doc = Hpricot(input)
scripts = (doc/'script').map {|x| x['src']}.compact

Ahh, that looks beautiful right there! But will hpricot work on
a .rhtml file?

Daniel Sheppard · Sep 26, 2007

Your subject says regex, but your request says Hpricot:

require 'hpricot'
doc =3D Hpricot(input)
scripts =3D (doc/'script').map {|x| x['src']}.compact

Click to expand...

=20
Ahh, that looks beautiful right there! But will hpricot work on
a .rhtml file?

Probably - Hpricot should treat all the rhtml guff as if you're just=20
really really bad at writing html and treat the rhtml bits as just raw.

Hpricot('<%=3D <script src=3D"monkey"> %>').at('script')['src']
=3D> "monkey"

The rhtml will get in the way of Hpricot seeing your tree correctly, so
finding script tags only within the head section or something like that
might not work, but for simple finds it should be fine.

Dan.

Une Bévue · Sep 26, 2007

Konrad Meyer said:
Non-greedy match. Find as few characters as possible to match, which in this
case means don't match quote characters.

OK, fine, thanks a lot to remaind me...

How to have two html audio players on one page?	0	May 3, 2022
Image upload not working in browser	4	Sep 9, 2022
multiline regex expression	4	Jul 21, 2007
regex replace on a file	2	Oct 23, 2007
I want to Display Excel As HTML In js	2	Feb 24, 2023
CORS/Express: Getting data from server from domain html	2	Sep 3, 2022
Script stops working when using variables to save time typing...	4	Oct 31, 2022
How to save JSON Data to a file using fetch() api?	2	Apr 28, 2022

ruby regex on html file

eggie5

Ari Brown

Une Bévue

eggie5

William James

Une Bévue

Une Bévue

Konrad Meyer

Daniel Sheppard

eggie5

Daniel Sheppard

Une Bévue

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads