Can't control regular expressions

G

Guillermo.Acilu

[Note: parts of this message were removed to make it a legal post.]

Hello guys,

I need to extract from an html file all the scripts. So I have written the
following regular expression for a first test:

%r|<script(.+)script>|m

The problem I am having is that the expression takes the first <script and
the very last script>. So it matches the beginning of the first script in
the document and the end of the last script in the document with
everything in the middle. I want to extract just the scripts one by one.
How do I do it?

Thanks for your help,

Guillermo
 
L

Lars Christensen

I need to extract from an html file all the scripts. So I have written the
following regular expression for a first test:

%r|<script(.+)script>|m

The problem I am having is that the expression takes the first <script and
the very last script>. So it matches the beginning of the first script in
the document and the end of the last script in the document with
everything in the middle. I want to extract just the scripts one by one.
How do I do it?

You can use the '?' regexp operator to make a lazy match rather than a
greedy.

%r|<script(.+?)script>|m

However, I suggest trying Hpricot for more robust HTML parsing.

Lars
 
F

Florian Gilcher

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Hello guys,

I need to extract from an html file all the scripts. So I have
written the
following regular expression for a first test:

%r|<script(.+)script>|m

The problem I am having is that the expression takes the first
<script and
the very last script>. So it matches the beginning of the first
script in
the document and the end of the last script in the document with
everything in the middle. I want to extract just the scripts one by
one.
How do I do it?

Thanks for your help,

Guillermo


Hi, regexps are not the right tool for this. You can find some
explanation on why that is, you can
will find some in this topic:

http://groups.google.com/group/ruby-talk-google/browse_thread/thread/2d86d106b5c8797a

Regards,
Florian Gilcher
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)

iEYEARECAAYFAkiO98kACgkQJA/zY0IIRZb6zQCdFNi3h+bgYIVIebozgKachGEG
dxIAoId9e7cZVRQr4FYfVKsMKi3ye5Ug
=oXM6
-----END PGP SIGNATURE-----
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,479
Members
44,899
Latest member
RodneyMcAu

Latest Threads

Top