Detect any "<a href=mailto:...>...</a>" string in a string?

J

Joshua Muheim

Hi all

I have a long, long string of HTML tags. There might be some unprotected
Email links in there like this:

<a href="mailto:[email protected]">Some Email</a>

or

<a href="mailto:[email protected]?subject=Something">Some Email</a>

or...

I need to detect these email links and substitute them with something
different, like a JavaScript function to obfuscate them:

obfuscate("some","email.xx","Something","Some Email")

or something like that. Sadly I have no idea how to find the needed
string parts. I stumbled upon the TMail GEM and guess it could help me a
lot... But I don't get any further now.

Any help is appreciated! Thanks!
Josh
 
G

Greg Donald

I have a long, long string of HTML tags. There might be some unprotected
Email links in there like this:

<a href="mailto:[email protected]">Some Email</a>

or

<a href="mailto:[email protected]?subject=Something">Some Email</a>

or...

I need to detect these email links and substitute them with something
different, like a JavaScript function to obfuscate them:

obfuscate("some","email.xx","Something","Some Email")

or something like that. Sadly I have no idea how to find the needed
string parts. I stumbled upon the TMail GEM and guess it could help me a
lot... But I don't get any further now.


Here's how I do it in PHP, if you wanna rework it into Ruby:

$GLOBALS[ 'EMAIL_LINK_REGEX' ] = "#<a[^>]*mailto:([^'\" ]*)['\"
]>([^<]*)</a>#i";

$html = preg_replace_callback( $GLOBALS[ 'EMAIL_LINK_REGEX' ],
'fubarEmail', $html );

function fubarEmail( $matches )
{
$strNewAddress = replaceEntities( $matches[ 1 ] );

$strText = replaceEntities( $matches[ 2 ] );

$arrEmail = explode( '@', $strNewAddress );

$strTag = "<script language='Javascript' type='text/javascript'>\r";
$strTag .= "<!--\r";
$strTag .= "document.write('<a href=\"mai');\r";
$strTag .= "document.write('lto');\r";
$strTag .= "document.write(':$arrEmail[0]');\r";
$strTag .= "document.write('@');\r";
$strTag .= "document.write('$arrEmail[1]\">');\r";
$strTag .= "document.write('$strText<\/a>');\r";
$strTag .= "// -->\r";
$strTag .= "</script><noscript>$arrEmail[0] at \r";
$strTag .= str_replace( '.', ' dot ', $arrEmail[ 1 ] ) . '</noscript>';

return $strTag;
}
 
7

7stud --

Joshua said:
Hi all

I have a long, long string of HTML tags. There might be some unprotected
Email links in there like this:

<a href="mailto:[email protected]">Some Email</a>

or

<a href="mailto:[email protected]?subject=Something">Some Email</a>

or...

I need to detect these email links and substitute them with something
different, like a JavaScript function to obfuscate them:

obfuscate("some","email.xx","Something","Some Email")

or something like that. Sadly I have no idea how to find the needed
string parts.

1)regexes
2)gsub()
3)split()


html =<<ENDOFHTML
<html>
<head>
<title>html page</title>
</head>
<body>
<a href="mailto:[email protected]">Some Email</a>
<div>hello</div>
<div>world</div>
<div>goodbye</div>
<a href="mailto:[email protected]?subject=Something&cost=10">Some
Email</a>
</body>
</html>
ENDOFHTML

new_html = html.gsub(/<a href="(.+?)">(.+?)<\/a>/) do |match|
p match
addy = $1
link = $2
p addy, link

pieces = addy.split("?")
if pieces.length == 2
puts "there is a query string to parse"
name_vals = pieces[1].split("&")
p name_vals
end

puts

"the replacement string cobbled together from the pieces above"
end

puts new_html


--output:--
"<a href=\"mailto:[email protected]\">Some Email</a>"
"mailto:[email protected]"
"Some Email"

"<a href=\"mailto:[email protected]?subject=Something&cost=10\">Some
Email</a>"
"mailto:[email protected]?subject=Something&cost=10"
"Some Email"
there is a query string to parse
["subject=Something", "cost=10"]

<html>
<head>
<title>html page</title>
</head>
<body>
the replacement string cobbled together from the pieces above
<div>hello</div>
<div>world</div>
<div>goodbye</div>
the replacement string cobbled together from the pieces above
</body>
</html>
 
7

7stud --

Joshua said:
I need to detect these email links and substitute them with something
different, like a JavaScript function to obfuscate them:

obfuscate("some","email.xx","Something","Some Email")

By the way, you can't substitute js functions for <a> tags.
 
D

Daniel Danopia

I have a long, long string of HTML tags. There might be some unprotected
Email links in there like this:
<a href="mailto:[email protected]">Some Email</a>

<a href="mailto:[email protected]?subject=Something">Some Email</a>

I need to detect these email links and substitute them with something
different, like a JavaScript function to obfuscate them:
obfuscate("some","email.xx","Something","Some Email")
or something like that. Sadly I have no idea how to find the needed
string parts. I stumbled upon the TMail GEM and guess it could help me a
lot... But I don't get any further now.

Here's how I do it in PHP, if you wanna rework it into Ruby:

$GLOBALS[ 'EMAIL_LINK_REGEX' ] = "#<a[^>]*mailto:([^'\" ]*)['\"
]>([^<]*)</a>#i";

$html = preg_replace_callback( $GLOBALS[ 'EMAIL_LINK_REGEX' ],
'fubarEmail', $html );

function fubarEmail( $matches )
{
  $strNewAddress = replaceEntities( $matches[ 1 ] );

  $strText = replaceEntities( $matches[ 2 ] );

  $arrEmail = explode( '@', $strNewAddress );

  $strTag = "<script language='Javascript' type='text/javascript'>\r";
  $strTag .= "<!--\r";
  $strTag .= "document.write('<a href=\"mai');\r";
  $strTag .= "document.write('lto');\r";
  $strTag .= "document.write(':$arrEmail[0]');\r";
  $strTag .= "document.write('@');\r";
  $strTag .= "document.write('$arrEmail[1]\">');\r";
  $strTag .= "document.write('$strText<\/a>');\r";
  $strTag .= "// -->\r";
  $strTag .= "</script><noscript>$arrEmail[0] at \r";
  $strTag .= str_replace( '.', ' dot ', $arrEmail[ 1 ] ) . '</noscript>';

  return $strTag;

}

You could also use a library such as hpricot or nokogiri to search and
replace all the <a> tags.

And you should have \r\n, not \r, if you are writing HTML.
 
J

Joshua Muheim

Daniel said:
Here's how I do it in PHP, if you wanna rework it into Ruby:

� $strTag .= "document.write('$arrEmail[1]\">');\r";
Greg Donaldhttp://destiney.com/
You could also use a library such as hpricot or nokogiri to search and
replace all the <a> tags.

And you should have \r\n, not \r, if you are writing HTML.

Thank you guys. Nokogiri looks really useful, I will take a look at it.
:)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,764
Messages
2,569,564
Members
45,040
Latest member
papereejit

Latest Threads

Top