Need help with a regexp

rpheath · Dec 8, 2006

I'm trying to write a regular expression to replace a <pre>...</pre>
block or a <blockquote><p>...</p></blockquote> block with a blank ('').
I can only get the <pre>...</pre> to work correctly. Here's what I
have:

text.gsub(/^<pre>[^<]*<\/pre>$|^<blockquote><p>(.*?)<\/p><\/blockquote>$/,'')

Can someone help me figure out why the blockquote is still showing
up??? Thanks in advance.

Daniel Finnie · Dec 8, 2006

Why are you doing a gsub but then anchoring the Regexp to the start &
ends? Use a normal sub or take out all the ^s and $s (except for the
character class definitions, i.e., the ones in square brackets).

Please post some sample text, not of what you would like to remove but
of what you would like to remove it from.

Dan

rpheath · Dec 8, 2006

Thanks for the reply. I'm relatively new to regular expressions, and
misinterpretted the ^s and $s. I was thinking they were for that
specific check, so it was either the first string "|" (or) the second
string.

Here's sample text that would be passed into it.
-----------------------
<p>This is the first sentence. Now I'll post a code snippet:</p>

<pre>
def strip_blocks(text)
text.gsub([regex],'')
end
</pre>

<p>This is another sentence before the block quote.</p>

<blockquote>
<p>This is a quote</p>
</blockquote>

<p>This is one more sentence</p>
----------------------

What I would like to have left is this:

----------------------
<p>This is the first sentence. Now I'll post a code snippet:</p>

<p>This is another sentence before the block quote.</p>

<p>This is one more sentence</p>

Edwin Fine · Dec 8, 2006

rpheath said:
Thanks for the reply. I'm relatively new to regular expressions, and
misinterpretted the ^s and $s. I was thinking they were for that
specific check, so it was either the first string "|" (or) the second
string.

Here's sample text that would be passed into it.
-----------------------
<p>This is the first sentence. Now I'll post a code snippet:</p>

<pre>
def strip_blocks(text)
text.gsub([regex],'')
end
</pre>

<p>This is another sentence before the block quote.</p>

<blockquote>
<p>This is a quote</p>
</blockquote>

<p>This is one more sentence</p>
----------------------

What I would like to have left is this:

----------------------
<p>This is the first sentence. Now I'll post a code snippet:</p>

<p>This is another sentence before the block quote.</p>

<p>This is one more sentence</p>

Try this. It uses the "non-greedy" operator '?' and multiline
case-insensitive matching. Not using the 'non-greedy' operator would
gobble up everything between two tags, including nested tags of the
same name. This is probably not what you would want.

def remove_tag_block(tag, text)
text.gsub(/<#{tag}>.*?<\/#{tag}>/im, '')
end

irb(main):054:0> text
=> "<p>This is the first sentence. Now I'll post a code
snippet:</p>\n\n<pre>\ndef strip_blocks(text)\n
text.gsub([regex],'')\nend\n</pre>\n\n<p>This is another sentence before
the block quote.</p>\n\n<blockquote>\n <p>This is a
quote</p>\n</blockquote>\n\n<p>This is one more sentence</p>"

irb(main):055:0> t=remove_tag_block("pre", text)

=> "<p>This is the first sentence. Now I'll post a code
snippet:</p>\n\n\n\n<p>This is another sentence before the block
quote.</p>\n\n<blockquote>\n <p>This is a
quote</p>\n</blockquote>\n\n<p>This is one more sentence</p>"

irb(main):056:0> remove_tag_block("blockquote", t)

=> "<p>This is the first sentence. Now I'll post a code
snippet:</p>\n\n\n\n<p>This is another sentence before the block
quote.</p>\n\n\n\n<p>This is one more sentence</p>"

The problem is that this won't work with nested tags, e.g.

<table><tr><td><table>stuff</table></td></tr></table>

irb(main):065:0>
x="<table><tr><td><table>stuff</table></td></tr></table>"
=> "<table><tr><td><table>stuff</table></td></tr></table>"
irb(main):066:0> remove_tag_block("table", x)
=> "</td></tr></table>"

This is because *regular* regular expressions

can't match nested
pairs, such as "((()(())()))". I think I read somewhere a phrase that
regexp's can't count. You have to use *recursive* regular expressions,
which are found in PCRE (Perl RE), but AFAIK not in the current Ruby
regexp engine. Maybe Oniguruma has it - I dunno. I saw a PCRE extension
for Ruby somewhere, but I don't know anything about it.

The Perl RE for matching nested parentheses is apparently as follows
(from
http://www.sitepoint.com/blogs/2006/09/26/the-joy-of-regular-expressions-1/)

$((?>[^()]+)|(?R))*$

I believe that to do this correctly without PCRE, you have to resort to
some text parsing or use a SAX parser or similar. Maybe some Ruby guru
(i.e. not me) will be able to pull out an RE or some easy way to do
this.

greg · Dec 8, 2006

You are missing the 'm' flag which will allow '.' to match new lines

pre_match = /<pre>.*?<\pre>/m
block_match = /<blockquote>.*?:<p>.*?<\/p>.*?<\/blockquote>/m

Rob Biedenharn · Dec 8, 2006

Thanks for the reply. I'm relatively new to regular expressions, and
misinterpretted the ^s and $s. I was thinking they were for that
specific check, so it was either the first string "|" (or) the second
string.

http://groups.google.com/group/rubyonrails-talk/browse_frm/thread/
6c75d5d4df368186/2743494eb303014c#2743494eb303014c

And might I suggest picking ONE mailing list on which to ask your
questions (Ruby is actually the better one for this question about
regular expressions), and then JUST ASK ONCE.

-Rob

Rob Biedenharn http://agileconsultingllc.com
(e-mail address removed)

I Need Help with making a function that draws in a canvas using location data.	1	Dec 17, 2021
I need help making a zooming function	11	Dec 14, 2021
I need help with a program	2	Apr 17, 2023
I need help making an html website	2	Aug 2, 2023
Need help with this code	2	May 10, 2023
Need help with code on website (noob)	2	Jul 18, 2022
Need Help with Repository Program (Beginner)	1	Jul 7, 2023
I need help fixing my website	2	Oct 15, 2023

Need help with a regexp

rpheath

Daniel Finnie

rpheath

Edwin Fine

greg

Rob Biedenharn

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads