Regexp with Ruby

Ajay Vijey · Nov 15, 2006

Hallo @ all,

I have to replace in a File the image tags with an other!

File Data:
-------------
<td bordercolor="#FFFFFF">
<table border="0" id="table2" bgcolor="#FFFFFF" width="100%">
<tr>
<td align="left" valign="top" width="25%"><a href="../personal/po.htm">
<img src="images/animationpundo_schwarz_kl.gif"
alt="Personal und Organisation" border="0" width="84"
height="64"></a></td>
<td align="left" valign="top" width="25%"><font face="Arial"><a
href="../personal/po.htm">

Will scan this image tag:
--------------------------
<img src="images/animationpundo_schwarz_kl.gif"
alt="Personal und Organisation" border="0" width="84" height="64">

I already tested this with the follow code:
-----------------------------------------------
...scan(/<img.*>/m)
and with
...scan(/<img.*?>/m)

But the result was always:
----------------------------
<img src="images/animationpundo_schwarz_kl.gif"
alt="Personal und Organisation" border="0" width="84"
height="64"></a></td>
<td align="left" valign="top" width="25%"><font face="Arial"><a
href="../personal/po.htm">

I hope someone can help me! Thanks a lot!

Kind Regards
Ajay

Hugh Sasse · Nov 15, 2006

Hallo @ all,

I have to replace in a File the image tags with an other!

File Data:
------------- [trimmed]

Will scan this image tag:
--------------------------
<img src="images/animationpundo_schwarz_kl.gif"
alt="Personal und Organisation" border="0" width="84" height="64">

I already tested this with the follow code:
-----------------------------------------------
...scan(/<img.*>/m)
and with
...scan(/<img.*?>/m)

But the result was always:
----------------------------
<img src="images/animationpundo_schwarz_kl.gif"
alt="Personal und Organisation" border="0" width="84"
height="64"></a></td>
<td align="left" valign="top" width="25%"><font face="Arial"><a
href="../personal/po.htm">

I hope someone can help me! Thanks a lot!

I'd agree with your choice of regexp. I think we need to see more of
the surrounding code to fix this.

Kind Regards
Ajay

Hugh

Ajay Vijey · Nov 15, 2006

Hugh said:
I'd agree with your choice of regexp. I think we need to see more of
the surrounding code to fix this.

rubyscript
--------------
datei_new = IO.read(â€œindex.htmâ€)
datei_regexp = datei_new.scan(/(<img.*>)/m)

puts datei_regexp

index.htm
------------

<html>
<head><title>test</title></head>
<body>
<table>
<tr>
<td bordercolor="#FFFFFF">

<table border="0" id="table2" bgcolor="#FFFFFF" width="100%">
<tr>

<td align="left" valign="top" width="25%"><a href="../personal/po.htm">
<img src="images/animationpundo_schwarz_kl.gif"
alt="Personal und Organisation" border="0" width="84"
height="64"></a></td>

<td align="left" valign="top" width="25%"><font face="Arial"><a
href="../personal/po.htm"></font></td>

</tr>
</table>

</td>
</tr>
</table>
</body>
</html>

hemant · Nov 15, 2006

As another poster has pointed out, you aren't showing enough code for an
analysis, and, while you are replacing tags, please reformat your IMG tags
thus:

<img src="..."/>

Note the self-closing form. This won't bother older browsers, and it will
allow you to meet the newer (X)HTML standards as well.

Here is sample program that extracts all the IMG tags from a Web page (of
both the old and new varieties):

----------------------------------------
#!/usr/bin/ruby -w

data = File.read("sample.html")

extract = data.scan(%r{<img.*?/>}m)

puts extract.join("\n")
----------------------------------------

This outputs from my sample page:

<img src="../images/leftarrow.png" border="0" alt="" />
<img src="../images/rightarrow.png" border="0" alt="" />
<img src="rock_ptarmigan_chick_small.jpg" width="300" height="289" alt=""/>
<img src="pws_naked_island003_cropped_small.jpg" width="300" height="232"
alt=""/>
<img src="pws_naked_island011_cropped_small.jpg" width="300" height="225"
alt=""/>
<img src="pws_naked_island007_cropped_small.jpg" width="300" height="236"
alt=""/>
<img src="pws_naked_island012_small.jpg" width="300" height="200" alt=""/>
<img src="pws_naked_island013_cropped_small.jpg" width="300" height="236"
alt=""/>
<img src="../images/leftarrow.png" border="0" alt="" />
<img src="../images/rightarrow.png" border="0" alt="" />

If i were to do this..I would use hpricot.

kbloom · Nov 15, 2006

Ajay said:
rubyscript
--------------
datei_new = IO.read("index.htm")
datei_regexp = datei_new.scan(/(<img.*>)/m)

puts datei_regexp

index.htm
------------

<html>
<head><title>test</title></head>
<body>
<table>
<tr>
<td bordercolor="#FFFFFF">

<table border="0" id="table2" bgcolor="#FFFFFF" width="100%">
<tr>

<td align="left" valign="top" width="25%"><a href="../personal/po.htm">
<img src="images/animationpundo_schwarz_kl.gif"
alt="Personal und Organisation" border="0" width="84"
height="64"></a></td>

<td align="left" valign="top" width="25%"><font face="Arial"><a
href="../personal/po.htm"></font></td>

</tr>
</table>

</td>
</tr>
</table>
</body>
</html>

Works for me with datei_new.scan(/(<img.*?>)/m) (the .*? performs a
non-greedy match so it stops with the smallest match it can make,
rather than the longest)

The parentheses you have around the text of the regexp are unnecessary,
they cause the results to be more deeply nested in arrays. You should
use /<img.*?>/m

--Ken Bloom

Help with Visual Lightbox: Scripts	2	May 3, 2023
Help with my responsive home page	2	Dec 14, 2022
SendGrid email issue in responsive Gmail	1	Nov 4, 2021
Image shifts to the right when export the page to pdf	4	May 5, 2023
Help with code	0	Jun 12, 2022
Can someone tell me if this a real tracker? Or is it one designed to show you a different message at certain times, ie. acting like one?	0	Jan 10, 2021
Having difficulty with the layout of these images / video for this web page	2	Jul 5, 2022
Uncaught ReferenceError: item is not defined at HTMLButtonElement.onclick in the: <button onclick="item.inserir()">Inserir dados</button>	1	Apr 22, 2023

Regexp with Ruby

Ajay Vijey

Hugh Sasse

Ajay Vijey

hemant

kbloom

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads