Which RegEx Testing Tool Do You Prefer?

K

Kevin Spencer

Regex Buddy is very good. It costs around $30.00, includes quite a few nice
features, including the ability to copy regular expressions in various
language string syntaxes, including C#. It has the ability to create
libraries of regular expressions, a nice visual builder, color-coding, and
quite a bit more. Good testing environment. And it has some nice reference
material included.

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer
Ambiguity has a certain quality to it.
 
C

clintonG

I was looking at PowerGrep from the same dev group but like Regex Buddy I
don't like the buy before you try business model so that choice has to be on
the shelf for the moment but thanks for bringing it up. I assume you've used
Regex Buddy?

<%= Clinton Gallagher



Kevin Spencer said:
Regex Buddy is very good. It costs around $30.00, includes quite a few
nice features, including the ability to copy regular expressions in
various language string syntaxes, including C#. It has the ability to
create libraries of regular expressions, a nice visual builder,
color-coding, and quite a bit more. Good testing environment. And it has
some nice reference material included.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

clintonG said:
I'm using an .aspx tool I found at [1] but as nice as the interface is I
think I need to consider using others. Some can generate C# I understand.
Your preferences please...

<%= Clinton Gallagher

[1] http://forta.com/books/0672325667/
 
K

Kevin Spencer

Hi Clinton,

Yes, I have it. I previously used the freeware Regex Coach Utility, but it
is nowhere near as complete in its support for various newer Regular
Expression syntax and programming languages in general. It did have one nice
feature about it. You could split a Regular Expression across multiple
lines, which often made it easier to analyze. However, Regex Buddy has the
graphical tree view, and it is synchronized with the Regular Expression
itself, which more than makes up for the omission of breaking a Regular
Expression across multiple lines.

BTW, it also has a GREP utility built in.

In short, it is well worth the 30 bucks.

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer
Ambiguity has a certain quality to it.

clintonG said:
I was looking at PowerGrep from the same dev group but like Regex Buddy I
don't like the buy before you try business model so that choice has to be
on the shelf for the moment but thanks for bringing it up. I assume you've
used Regex Buddy?

<%= Clinton Gallagher



Kevin Spencer said:
Regex Buddy is very good. It costs around $30.00, includes quite a few
nice features, including the ability to copy regular expressions in
various language string syntaxes, including C#. It has the ability to
create libraries of regular expressions, a nice visual builder,
color-coding, and quite a bit more. Good testing environment. And it has
some nice reference material included.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

clintonG said:
I'm using an .aspx tool I found at [1] but as nice as the interface is I
think I need to consider using others. Some can generate C# I
understand. Your preferences please...

<%= Clinton Gallagher

[1] http://forta.com/books/0672325667/
 
K

Kevin Spencer

I saw a response to this question in the CSharp group, regarding a product
named "Expresso"

http://www.ultrapico.com/Expresso.htm

Expresso is .Net freeware, and after downloading, installing, and playing
with it, I'd give it a try! So far I have found it to be excellent, having
capabilities that Regex Buddy does not have, and a much more intuitive GUI.

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer
Ambiguity has a certain quality to it.

Kevin Spencer said:
Hi Clinton,

Yes, I have it. I previously used the freeware Regex Coach Utility, but it
is nowhere near as complete in its support for various newer Regular
Expression syntax and programming languages in general. It did have one
nice feature about it. You could split a Regular Expression across
multiple lines, which often made it easier to analyze. However, Regex
Buddy has the graphical tree view, and it is synchronized with the Regular
Expression itself, which more than makes up for the omission of breaking a
Regular Expression across multiple lines.

BTW, it also has a GREP utility built in.

In short, it is well worth the 30 bucks.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

clintonG said:
I was looking at PowerGrep from the same dev group but like Regex Buddy I
don't like the buy before you try business model so that choice has to be
on the shelf for the moment but thanks for bringing it up. I assume you've
used Regex Buddy?

<%= Clinton Gallagher



Kevin Spencer said:
Regex Buddy is very good. It costs around $30.00, includes quite a few
nice features, including the ability to copy regular expressions in
various language string syntaxes, including C#. It has the ability to
create libraries of regular expressions, a nice visual builder,
color-coding, and quite a bit more. Good testing environment. And it has
some nice reference material included.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

message I'm using an .aspx tool I found at [1] but as nice as the interface is
I think I need to consider using others. Some can generate C# I
understand. Your preferences please...

<%= Clinton Gallagher

[1] http://forta.com/books/0672325667/
 
C

clintonG

Thanks Kevin. I saw that post too and am going to download Expresso in a few
minutes. I know you don't need to be psychic to figure out what I'm likely
to be asking next :)

<%= Clinton Gallagher


Kevin Spencer said:
I saw a response to this question in the CSharp group, regarding a product
named "Expresso"

http://www.ultrapico.com/Expresso.htm

Expresso is .Net freeware, and after downloading, installing, and playing
with it, I'd give it a try! So far I have found it to be excellent, having
capabilities that Regex Buddy does not have, and a much more intuitive
GUI.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

Kevin Spencer said:
Hi Clinton,

Yes, I have it. I previously used the freeware Regex Coach Utility, but
it is nowhere near as complete in its support for various newer Regular
Expression syntax and programming languages in general. It did have one
nice feature about it. You could split a Regular Expression across
multiple lines, which often made it easier to analyze. However, Regex
Buddy has the graphical tree view, and it is synchronized with the
Regular Expression itself, which more than makes up for the omission of
breaking a Regular Expression across multiple lines.

BTW, it also has a GREP utility built in.

In short, it is well worth the 30 bucks.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

clintonG said:
I was looking at PowerGrep from the same dev group but like Regex Buddy I
don't like the buy before you try business model so that choice has to be
on the shelf for the moment but thanks for bringing it up. I assume
you've used Regex Buddy?

<%= Clinton Gallagher



Regex Buddy is very good. It costs around $30.00, includes quite a few
nice features, including the ability to copy regular expressions in
various language string syntaxes, including C#. It has the ability to
create libraries of regular expressions, a nice visual builder,
color-coding, and quite a bit more. Good testing environment. And it
has some nice reference material included.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

message I'm using an .aspx tool I found at [1] but as nice as the interface is
I think I need to consider using others. Some can generate C# I
understand. Your preferences please...

<%= Clinton Gallagher

[1] http://forta.com/books/0672325667/
 
J

Juan T. Llibre

The kind of RegEx tool I'd like is one which can take a string
I write, and create a RegEx expression which matches it.

*That* will be the RegEx tool that will corner the market.





clintonG said:
Thanks Kevin. I saw that post too and am going to download Expresso in a few minutes. I
know you don't need to be psychic to figure out what I'm likely to be asking next :)

<%= Clinton Gallagher


Kevin Spencer said:
I saw a response to this question in the CSharp group, regarding a product named
"Expresso"

http://www.ultrapico.com/Expresso.htm

Expresso is .Net freeware, and after downloading, installing, and playing with it, I'd
give it a try! So far I have found it to be excellent, having capabilities that Regex
Buddy does not have, and a much more intuitive GUI.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

Kevin Spencer said:
Hi Clinton,

Yes, I have it. I previously used the freeware Regex Coach Utility, but it is nowhere
near as complete in its support for various newer Regular Expression syntax and
programming languages in general. It did have one nice feature about it. You could
split a Regular Expression across multiple lines, which often made it easier to
analyze. However, Regex Buddy has the graphical tree view, and it is synchronized with
the Regular Expression itself, which more than makes up for the omission of breaking a
Regular Expression across multiple lines.

BTW, it also has a GREP utility built in.

In short, it is well worth the 30 bucks.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

I was looking at PowerGrep from the same dev group but like Regex Buddy I don't like
the buy before you try business model so that choice has to be on the shelf for the
moment but thanks for bringing it up. I assume you've used Regex Buddy?

<%= Clinton Gallagher



Regex Buddy is very good. It costs around $30.00, includes quite a few nice
features, including the ability to copy regular expressions in various language
string syntaxes, including C#. It has the ability to create libraries of regular
expressions, a nice visual builder, color-coding, and quite a bit more. Good testing
environment. And it has some nice reference material included.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

I'm using an .aspx tool I found at [1] but as nice as the interface is I think I
need to consider using others. Some can generate C# I understand. Your preferences
please...

<%= Clinton Gallagher

[1] http://forta.com/books/0672325667/
 
K

Kevin Spencer

Hi Juan,
The kind of RegEx tool I'd like is one which can take a string
I write, and create a RegEx expression which matches it.

The problem with that is that you can write a Regular Expression that
matches a literal string quite easily. For example:

literal string

The above is a regular expression which will match the substring "literal
string" in my first sentence. Of course, the real power of regular
expressions is the abilty to match *patterns* in a string, perform grouping,
etc. So, like any programming language (which it is, in a sense), Regular
Expressions have a shorthand syntax that allows one to create patterns of a
large variety of types. A simple example of this would be:

(literal) (string)

This captures the same match as the first, but puts the string "literal"
into a group, and the string "string" into a second group. But of course, we
have already exceeded your desired requirement. On the other hand, we have
made a regular expression that is perhaps more useful (in some situations)
than the first.

And of course, the possible types and combinations of patterns are almost
endless, including wildcard patterns, special characters, boolean rules, and
so on.

Yeah, it's like reading some kind of incredibly concise shorthand code,
without even line breaks or brackets to help. That's why I was so pleased to
see that Expresso allows you to break your regular expression across
multiple lines while building it. That helps a good bit!

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer
Ambiguity has a certain quality to it.

Juan T. Llibre said:
The kind of RegEx tool I'd like is one which can take a string
I write, and create a RegEx expression which matches it.

*That* will be the RegEx tool that will corner the market.





clintonG said:
Thanks Kevin. I saw that post too and am going to download Expresso in a
few minutes. I know you don't need to be psychic to figure out what I'm
likely to be asking next :)

<%= Clinton Gallagher


Kevin Spencer said:
I saw a response to this question in the CSharp group, regarding a
product named "Expresso"

http://www.ultrapico.com/Expresso.htm

Expresso is .Net freeware, and after downloading, installing, and
playing with it, I'd give it a try! So far I have found it to be
excellent, having capabilities that Regex Buddy does not have, and a
much more intuitive GUI.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

Hi Clinton,

Yes, I have it. I previously used the freeware Regex Coach Utility, but
it is nowhere near as complete in its support for various newer Regular
Expression syntax and programming languages in general. It did have one
nice feature about it. You could split a Regular Expression across
multiple lines, which often made it easier to analyze. However, Regex
Buddy has the graphical tree view, and it is synchronized with the
Regular Expression itself, which more than makes up for the omission of
breaking a Regular Expression across multiple lines.

BTW, it also has a GREP utility built in.

In short, it is well worth the 30 bucks.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

message I was looking at PowerGrep from the same dev group but like Regex Buddy
I don't like the buy before you try business model so that choice has
to be on the shelf for the moment but thanks for bringing it up. I
assume you've used Regex Buddy?

<%= Clinton Gallagher



Regex Buddy is very good. It costs around $30.00, includes quite a
few nice features, including the ability to copy regular expressions
in various language string syntaxes, including C#. It has the ability
to create libraries of regular expressions, a nice visual builder,
color-coding, and quite a bit more. Good testing environment. And it
has some nice reference material included.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

message I'm using an .aspx tool I found at [1] but as nice as the interface
is I think I need to consider using others. Some can generate C# I
understand. Your preferences please...

<%= Clinton Gallagher

[1] http://forta.com/books/0672325667/
 
J

Juan T. Llibre

re:
That's why I was so pleased to see that Expresso allows you to break your regular
expression across multiple lines while building it. That helps a good bit!

I really like its "Analyze" feature. The "Builder" is quite good, too!




Kevin Spencer said:
Hi Juan,
The kind of RegEx tool I'd like is one which can take a string
I write, and create a RegEx expression which matches it.

The problem with that is that you can write a Regular Expression that matches a literal
string quite easily. For example:

literal string

The above is a regular expression which will match the substring "literal string" in my
first sentence. Of course, the real power of regular expressions is the abilty to match
*patterns* in a string, perform grouping, etc. So, like any programming language (which
it is, in a sense), Regular Expressions have a shorthand syntax that allows one to
create patterns of a large variety of types. A simple example of this would be:

(literal) (string)

This captures the same match as the first, but puts the string "literal" into a group,
and the string "string" into a second group. But of course, we have already exceeded
your desired requirement. On the other hand, we have made a regular expression that is
perhaps more useful (in some situations) than the first.

And of course, the possible types and combinations of patterns are almost endless,
including wildcard patterns, special characters, boolean rules, and so on.

Yeah, it's like reading some kind of incredibly concise shorthand code, without even
line breaks or brackets to help. That's why I was so pleased to see that Expresso allows
you to break your regular expression across multiple lines while building it. That helps
a good bit!

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

Juan T. Llibre said:
The kind of RegEx tool I'd like is one which can take a string
I write, and create a RegEx expression which matches it.

*That* will be the RegEx tool that will corner the market.





clintonG said:
Thanks Kevin. I saw that post too and am going to download Expresso in a few minutes.
I know you don't need to be psychic to figure out what I'm likely to be asking next
:)

<%= Clinton Gallagher


I saw a response to this question in the CSharp group, regarding a product named
"Expresso"

http://www.ultrapico.com/Expresso.htm

Expresso is .Net freeware, and after downloading, installing, and playing with it,
I'd give it a try! So far I have found it to be excellent, having capabilities that
Regex Buddy does not have, and a much more intuitive GUI.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

Hi Clinton,

Yes, I have it. I previously used the freeware Regex Coach Utility, but it is
nowhere near as complete in its support for various newer Regular Expression syntax
and programming languages in general. It did have one nice feature about it. You
could split a Regular Expression across multiple lines, which often made it easier
to analyze. However, Regex Buddy has the graphical tree view, and it is synchronized
with the Regular Expression itself, which more than makes up for the omission of
breaking a Regular Expression across multiple lines.

BTW, it also has a GREP utility built in.

In short, it is well worth the 30 bucks.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

I was looking at PowerGrep from the same dev group but like Regex Buddy I don't like
the buy before you try business model so that choice has to be on the shelf for the
moment but thanks for bringing it up. I assume you've used Regex Buddy?

<%= Clinton Gallagher



Regex Buddy is very good. It costs around $30.00, includes quite a few nice
features, including the ability to copy regular expressions in various language
string syntaxes, including C#. It has the ability to create libraries of regular
expressions, a nice visual builder, color-coding, and quite a bit more. Good
testing environment. And it has some nice reference material included.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

I'm using an .aspx tool I found at [1] but as nice as the interface is I think I
need to consider using others. Some can generate C# I understand. Your
preferences please...

<%= Clinton Gallagher

[1] http://forta.com/books/0672325667/
 
C

clintonG

Kevin, have you ever heard the expression "preaching to the choir?" :)

I've got the basic pattern matching theory understood but its the use of
expressions to disallow or replace certain characters and/or strings that
I'm trying to really understand thoroughly. The following example
illustrates...

// Example
Lawn Mowers, Repairs & Services - lawnmowers.com

A typical page title that when entered into a TextBox meant to capture
string data for an RSS 2.0 title element should use &amp; instead of the &
to represent the ampersand. I've got an expression that works well for the
example but can't figure out (with the expression I have) how to match the &
and replace it with &amp; (yet) -- or -- how to use the expression I have to
force the 2.0 Regular Expression Validator to fail when the & is present in
the string.

// Expression
[a-z]+([a-z0-9-]*[a-z0-9]+)?(\.([a-z]+([a-z0-9-]*[a-z0-9]+)?)+)*

I also really appreciate Expresso's Analyzer. It is outstanding that
Expresso seems to make it easy for us to pick expressions apart piece by
piece and explain them in English.


<%= Clinton Gallagher






Kevin Spencer said:
Hi Juan,
The kind of RegEx tool I'd like is one which can take a string
I write, and create a RegEx expression which matches it.

The problem with that is that you can write a Regular Expression that
matches a literal string quite easily. For example:

literal string

The above is a regular expression which will match the substring "literal
string" in my first sentence. Of course, the real power of regular
expressions is the abilty to match *patterns* in a string, perform
grouping, etc. So, like any programming language (which it is, in a
sense), Regular Expressions have a shorthand syntax that allows one to
create patterns of a large variety of types. A simple example of this
would be:

(literal) (string)

This captures the same match as the first, but puts the string "literal"
into a group, and the string "string" into a second group. But of course,
we have already exceeded your desired requirement. On the other hand, we
have made a regular expression that is perhaps more useful (in some
situations) than the first.

And of course, the possible types and combinations of patterns are almost
endless, including wildcard patterns, special characters, boolean rules,
and so on.

Yeah, it's like reading some kind of incredibly concise shorthand code,
without even line breaks or brackets to help. That's why I was so pleased
to see that Expresso allows you to break your regular expression across
multiple lines while building it. That helps a good bit!

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

Juan T. Llibre said:
The kind of RegEx tool I'd like is one which can take a string
I write, and create a RegEx expression which matches it.

*That* will be the RegEx tool that will corner the market.





clintonG said:
Thanks Kevin. I saw that post too and am going to download Expresso in a
few minutes. I know you don't need to be psychic to figure out what I'm
likely to be asking next :)

<%= Clinton Gallagher


I saw a response to this question in the CSharp group, regarding a
product named "Expresso"

http://www.ultrapico.com/Expresso.htm

Expresso is .Net freeware, and after downloading, installing, and
playing with it, I'd give it a try! So far I have found it to be
excellent, having capabilities that Regex Buddy does not have, and a
much more intuitive GUI.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

Hi Clinton,

Yes, I have it. I previously used the freeware Regex Coach Utility,
but it is nowhere near as complete in its support for various newer
Regular Expression syntax and programming languages in general. It did
have one nice feature about it. You could split a Regular Expression
across multiple lines, which often made it easier to analyze. However,
Regex Buddy has the graphical tree view, and it is synchronized with
the Regular Expression itself, which more than makes up for the
omission of breaking a Regular Expression across multiple lines.

BTW, it also has a GREP utility built in.

In short, it is well worth the 30 bucks.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

message I was looking at PowerGrep from the same dev group but like Regex
Buddy I don't like the buy before you try business model so that
choice has to be on the shelf for the moment but thanks for bringing
it up. I assume you've used Regex Buddy?

<%= Clinton Gallagher



Regex Buddy is very good. It costs around $30.00, includes quite a
few nice features, including the ability to copy regular expressions
in various language string syntaxes, including C#. It has the
ability to create libraries of regular expressions, a nice visual
builder, color-coding, and quite a bit more. Good testing
environment. And it has some nice reference material included.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

message I'm using an .aspx tool I found at [1] but as nice as the interface
is I think I need to consider using others. Some can generate C# I
understand. Your preferences please...

<%= Clinton Gallagher

[1] http://forta.com/books/0672325667/
 
K

Kevin Spencer

Hi Clinton,

The following Regular Expression will give you the ability to do a
Regex.Replace on a string containing both single "&" characters and "&amp;"
strings. It captures the "&amp;" strings into their own separate matches,
and the "&" characters into their own matches, putting the "&" characters
into a Group. It is also case-insensitive:

(?i)[^&amp;][^&]*|&amp;|(&(?!=amp))

Here's some sample code for reeplacing the single "&" characters with
&amp; -

/// <summary>
/// Replaces Ampersand in a Match with "&amp;"
/// </summary>
/// <param name="m">Match</param>
/// <returns>Replaced Match value</returns>
public static string ampReplacer(Match m)
{
if (m.Groups[1].Captures.Count == 0) return m.Value;
return m.Value.Replace("&", "&amp;");
}

/// <summary>
/// Replaces all single Ampersand characters in a string with "&amp;"
/// </summary>
/// <param name="s">String to process</param>
/// <returns>Processed String</returns>
public static string ReplaceAmpersand(string s)
{
return Regex.Replace(s, @"(?i)[^&amp;][^&]*|&amp;|(&(?!=amp))",
new MatchEvaluator(ampReplacer));
}

The "ampReplacer function is the function passed as the MatchEvaluator
delegate in the Regex.Replace() method used in the "ReplaceAmpersand"
method. The "ReplaceAmpersand" method takes a string as an argument, and
uses Regex.Replace to replace all matches in the string that contain a value
in Groups[1] with "&amp;".

As a side note, I used both Expresso and Regex Buddy to come up with this.
It was indeed a challenge, as I'm not quite a master of Regular Expressions.
But I enjoy learning, so it was a good exercise for me! :)

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer
Ambiguity has a certain quality to it.

clintonG said:
Kevin, have you ever heard the expression "preaching to the choir?" :)

I've got the basic pattern matching theory understood but its the use of
expressions to disallow or replace certain characters and/or strings that
I'm trying to really understand thoroughly. The following example
illustrates...

// Example
Lawn Mowers, Repairs & Services - lawnmowers.com

A typical page title that when entered into a TextBox meant to capture
string data for an RSS 2.0 title element should use &amp; instead of the &
to represent the ampersand. I've got an expression that works well for the
example but can't figure out (with the expression I have) how to match the
& and replace it with &amp; (yet) -- or -- how to use the expression I
have to force the 2.0 Regular Expression Validator to fail when the & is
present in the string.

// Expression
[a-z]+([a-z0-9-]*[a-z0-9]+)?(\.([a-z]+([a-z0-9-]*[a-z0-9]+)?)+)*

I also really appreciate Expresso's Analyzer. It is outstanding that
Expresso seems to make it easy for us to pick expressions apart piece by
piece and explain them in English.


<%= Clinton Gallagher






Kevin Spencer said:
Hi Juan,
The kind of RegEx tool I'd like is one which can take a string
I write, and create a RegEx expression which matches it.

The problem with that is that you can write a Regular Expression that
matches a literal string quite easily. For example:

literal string

The above is a regular expression which will match the substring "literal
string" in my first sentence. Of course, the real power of regular
expressions is the abilty to match *patterns* in a string, perform
grouping, etc. So, like any programming language (which it is, in a
sense), Regular Expressions have a shorthand syntax that allows one to
create patterns of a large variety of types. A simple example of this
would be:

(literal) (string)

This captures the same match as the first, but puts the string "literal"
into a group, and the string "string" into a second group. But of course,
we have already exceeded your desired requirement. On the other hand, we
have made a regular expression that is perhaps more useful (in some
situations) than the first.

And of course, the possible types and combinations of patterns are almost
endless, including wildcard patterns, special characters, boolean rules,
and so on.

Yeah, it's like reading some kind of incredibly concise shorthand code,
without even line breaks or brackets to help. That's why I was so pleased
to see that Expresso allows you to break your regular expression across
multiple lines while building it. That helps a good bit!

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

Juan T. Llibre said:
The kind of RegEx tool I'd like is one which can take a string
I write, and create a RegEx expression which matches it.

*That* will be the RegEx tool that will corner the market.





message Thanks Kevin. I saw that post too and am going to download Expresso in
a few minutes. I know you don't need to be psychic to figure out what
I'm likely to be asking next :)

<%= Clinton Gallagher


I saw a response to this question in the CSharp group, regarding a
product named "Expresso"

http://www.ultrapico.com/Expresso.htm

Expresso is .Net freeware, and after downloading, installing, and
playing with it, I'd give it a try! So far I have found it to be
excellent, having capabilities that Regex Buddy does not have, and a
much more intuitive GUI.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

Hi Clinton,

Yes, I have it. I previously used the freeware Regex Coach Utility,
but it is nowhere near as complete in its support for various newer
Regular Expression syntax and programming languages in general. It
did have one nice feature about it. You could split a Regular
Expression across multiple lines, which often made it easier to
analyze. However, Regex Buddy has the graphical tree view, and it is
synchronized with the Regular Expression itself, which more than
makes up for the omission of breaking a Regular Expression across
multiple lines.

BTW, it also has a GREP utility built in.

In short, it is well worth the 30 bucks.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

message I was looking at PowerGrep from the same dev group but like Regex
Buddy I don't like the buy before you try business model so that
choice has to be on the shelf for the moment but thanks for bringing
it up. I assume you've used Regex Buddy?

<%= Clinton Gallagher



Regex Buddy is very good. It costs around $30.00, includes quite a
few nice features, including the ability to copy regular
expressions in various language string syntaxes, including C#. It
has the ability to create libraries of regular expressions, a nice
visual builder, color-coding, and quite a bit more. Good testing
environment. And it has some nice reference material included.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

message I'm using an .aspx tool I found at [1] but as nice as the
interface is I think I need to consider using others. Some can
generate C# I understand. Your preferences please...

<%= Clinton Gallagher

[1] http://forta.com/books/0672325667/
 
C

clintonG

Hello Kevin,

Well I'm bright eyed but not so bushy-tailed this morning. Thanks for
working this out. Its one of those 'must know' issues one needs to be
concerned with when generating valid XML from an application. I'll be
working with it later today and I'm starting to get a feel for Expresso
which I have a question about. I'm at the point where I've almost come to
understand how expressions are actually processed which -- for me -- means I
will understand how I need to think to put them together. You've been a real
help again and your source is an inspiration which shows how elegant
self-documenting code can be.

As for the Expresso question, what is 1:? supposed to indicate? (noting
that's the closest I could come at the moment to replicate the rectangular
'non-printable' character Expresso uses to indicate some 'thing' it has
matched) In the following simple example it seems to match a white space
although in a manner that is confusing as I will point out but in other
examples with many more characters and white space in the string to be
matched I have counted the position where the ? is said to be matched and
the position reported does not fall on a white space at all.

// Expression
\w*@\w*\.\w*((\.\w*)*)?

// String to match
An example (e-mail address removed) of an email address.

Expresso reports 1:? at Postion 32 Length 0 which infers white space in the
simple example as given noting there was white space characters before the
matched characters and motivating one to ask why Expresso would ignore those
previous white space characters and then report 2:? at Position 0 Length 0
which suggests the parser returned to the beginning of the string to be
matched and found what?

Is this clear as mud or what :)

<%= Clinton Gallagher


Kevin Spencer said:
Hi Clinton,

The following Regular Expression will give you the ability to do a
Regex.Replace on a string containing both single "&" characters and
"&amp;" strings. It captures the "&amp;" strings into their own separate
matches, and the "&" characters into their own matches, putting the "&"
characters into a Group. It is also case-insensitive:

(?i)[^&amp;][^&]*|&amp;|(&(?!=amp))

Here's some sample code for reeplacing the single "&" characters with
&amp; -

/// <summary>
/// Replaces Ampersand in a Match with "&amp;"
/// </summary>
/// <param name="m">Match</param>
/// <returns>Replaced Match value</returns>
public static string ampReplacer(Match m)
{
if (m.Groups[1].Captures.Count == 0) return m.Value;
return m.Value.Replace("&", "&amp;");
}

/// <summary>
/// Replaces all single Ampersand characters in a string with "&amp;"
/// </summary>
/// <param name="s">String to process</param>
/// <returns>Processed String</returns>
public static string ReplaceAmpersand(string s)
{
return Regex.Replace(s, @"(?i)[^&amp;][^&]*|&amp;|(&(?!=amp))",
new MatchEvaluator(ampReplacer));
}

The "ampReplacer function is the function passed as the MatchEvaluator
delegate in the Regex.Replace() method used in the "ReplaceAmpersand"
method. The "ReplaceAmpersand" method takes a string as an argument, and
uses Regex.Replace to replace all matches in the string that contain a
value in Groups[1] with "&amp;".

As a side note, I used both Expresso and Regex Buddy to come up with this.
It was indeed a challenge, as I'm not quite a master of Regular
Expressions. But I enjoy learning, so it was a good exercise for me! :)

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

clintonG said:
Kevin, have you ever heard the expression "preaching to the choir?" :)

I've got the basic pattern matching theory understood but its the use of
expressions to disallow or replace certain characters and/or strings that
I'm trying to really understand thoroughly. The following example
illustrates...

// Example
Lawn Mowers, Repairs & Services - lawnmowers.com

A typical page title that when entered into a TextBox meant to capture
string data for an RSS 2.0 title element should use &amp; instead of the
& to represent the ampersand. I've got an expression that works well for
the example but can't figure out (with the expression I have) how to
match the & and replace it with &amp; (yet) -- or -- how to use the
expression I have to force the 2.0 Regular Expression Validator to fail
when the & is present in the string.

// Expression
[a-z]+([a-z0-9-]*[a-z0-9]+)?(\.([a-z]+([a-z0-9-]*[a-z0-9]+)?)+)*

I also really appreciate Expresso's Analyzer. It is outstanding that
Expresso seems to make it easy for us to pick expressions apart piece by
piece and explain them in English.


<%= Clinton Gallagher






Kevin Spencer said:
Hi Juan,

The kind of RegEx tool I'd like is one which can take a string
I write, and create a RegEx expression which matches it.

The problem with that is that you can write a Regular Expression that
matches a literal string quite easily. For example:

literal string

The above is a regular expression which will match the substring
"literal string" in my first sentence. Of course, the real power of
regular expressions is the abilty to match *patterns* in a string,
perform grouping, etc. So, like any programming language (which it is,
in a sense), Regular Expressions have a shorthand syntax that allows one
to create patterns of a large variety of types. A simple example of this
would be:

(literal) (string)

This captures the same match as the first, but puts the string "literal"
into a group, and the string "string" into a second group. But of
course, we have already exceeded your desired requirement. On the other
hand, we have made a regular expression that is perhaps more useful (in
some situations) than the first.

And of course, the possible types and combinations of patterns are
almost endless, including wildcard patterns, special characters, boolean
rules, and so on.

Yeah, it's like reading some kind of incredibly concise shorthand code,
without even line breaks or brackets to help. That's why I was so
pleased to see that Expresso allows you to break your regular expression
across multiple lines while building it. That helps a good bit!

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

The kind of RegEx tool I'd like is one which can take a string
I write, and create a RegEx expression which matches it.

*That* will be the RegEx tool that will corner the market.





message Thanks Kevin. I saw that post too and am going to download Expresso in
a few minutes. I know you don't need to be psychic to figure out what
I'm likely to be asking next :)

<%= Clinton Gallagher


I saw a response to this question in the CSharp group, regarding a
product named "Expresso"

http://www.ultrapico.com/Expresso.htm

Expresso is .Net freeware, and after downloading, installing, and
playing with it, I'd give it a try! So far I have found it to be
excellent, having capabilities that Regex Buddy does not have, and a
much more intuitive GUI.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

Hi Clinton,

Yes, I have it. I previously used the freeware Regex Coach Utility,
but it is nowhere near as complete in its support for various newer
Regular Expression syntax and programming languages in general. It
did have one nice feature about it. You could split a Regular
Expression across multiple lines, which often made it easier to
analyze. However, Regex Buddy has the graphical tree view, and it is
synchronized with the Regular Expression itself, which more than
makes up for the omission of breaking a Regular Expression across
multiple lines.

BTW, it also has a GREP utility built in.

In short, it is well worth the 30 bucks.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

message I was looking at PowerGrep from the same dev group but like Regex
Buddy I don't like the buy before you try business model so that
choice has to be on the shelf for the moment but thanks for bringing
it up. I assume you've used Regex Buddy?

<%= Clinton Gallagher



Regex Buddy is very good. It costs around $30.00, includes quite a
few nice features, including the ability to copy regular
expressions in various language string syntaxes, including C#. It
has the ability to create libraries of regular expressions, a nice
visual builder, color-coding, and quite a bit more. Good testing
environment. And it has some nice reference material included.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

message I'm using an .aspx tool I found at [1] but as nice as the
interface is I think I need to consider using others. Some can
generate C# I understand. Your preferences please...

<%= Clinton Gallagher

[1] http://forta.com/books/0672325667/
 
K

Kevin Spencer

Hi Clinton,

Regular Expressions are a bear to learn, ieven if you have good tools to
work with them. I've spent hours working out a relatively "simple" one (at
least it seemed simple at first), but learning a bit more with each hour.
Still, I'm a long way from an expert. I can read most of it fairly well by
now, but certain concepts are still a bit difficult to deal with. I still
struggle some with Lookarounds in particular. One thing to keep in mind is
that Regular Expressions consume a string as they move through it, with a
few exceptions (like Lookarounds). They are basically sequential in nature.

You may find the "Analyze" tool helpful with this sort of thing.
Fortunately, I have not 2 but THREE Regular Expression tools to work with (2
of them are Freeware), which enables me to use the one(s) that are best for
the particular type of work I need regarding any individual Regular
Expression and/or problem with one.

The expression you posted,

\w*@\w*\.\w*((\.\w*)*)?

Can be analyzed in so many words as (with the parsing of the email address
where the match begins):

Match any word character, zero or more times. \w*
someone
Next, Match the '@' character once. @
@
Next match any word character zero or more times \w*
somewhere
Next, Match the '.' character once \.
..
Next, Match any word character zero or more times \w*
com
Next, put the following into Group 1 zero or 1 time: (......)?
Match the following into Group 2 zero or more times: (......)*
Match the '.' character once
\.
Match any word character zero or more times \w*
Result of Group 1:
(\.\w*)* Group 2 (Nothing)
Result of Group 2
\.\w* Nothing

Basically, there is no match for either Group 1 or Group 2, as the '.' has
been consumed by the previous Match. However, as both Groups specify a
minimum of Zero times, they don't disqualify the Match, as they appear zero
times each.

Why does Expresso report Group 1 at position 32 (end of string)? Well, no
match has been returned prior to the end of the string. So, that's where the
null match begins. Why does Expressio begin at position 0? Well, I'm not
that good with it!

Still, your regular expression is a bit lax in terms of standards. We worked
one up for valid email addresses the other day, and you may want to borrow
it:

(?i)([-.\w]+)\@(?:((?:\d{1,3}\.){3}\d{1,3})|([-a-z0-9]+(?:\.[-a-z0-9]+)*)\.((?:com|edu|gov|int|mil|net|org|biz|info|name|museum|coop|aero|[a-z]{2})))

It is case-insensitive, and matches both domain name and IP domain email
addresses. It puts the results into 4 possible groups:

1-User Name, 2-Domain IP Address, 3-Domain Name, 4-Root Domain.

Note that groups 2 and (3,4) are exclusive of one another. The email address
can either be an IP address, or a named domain, but not both. It supports
2-letter country suffixes, and multiple-dot domain addresses. And it's
case-sensitive.

I'm not sure we covered all the possible permutations, but it's pretty
strong.

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer
Ambiguity has a certain quality to it.


Basically, the whole string has been consumed by the
clintonG said:
Hello Kevin,

Well I'm bright eyed but not so bushy-tailed this morning. Thanks for
working this out. Its one of those 'must know' issues one needs to be
concerned with when generating valid XML from an application. I'll be
working with it later today and I'm starting to get a feel for Expresso
which I have a question about. I'm at the point where I've almost come to
understand how expressions are actually processed which -- for me -- means
I will understand how I need to think to put them together. You've been a
real help again and your source is an inspiration which shows how elegant
self-documenting code can be.

As for the Expresso question, what is 1:? supposed to indicate? (noting
that's the closest I could come at the moment to replicate the rectangular
'non-printable' character Expresso uses to indicate some 'thing' it has
matched) In the following simple example it seems to match a white space
although in a manner that is confusing as I will point out but in other
examples with many more characters and white space in the string to be
matched I have counted the position where the ? is said to be matched and
the position reported does not fall on a white space at all.

// Expression
\w*@\w*\.\w*((\.\w*)*)?

// String to match
An example (e-mail address removed) of an email address.

Expresso reports 1:? at Postion 32 Length 0 which infers white space in
the simple example as given noting there was white space characters before
the matched characters and motivating one to ask why Expresso would ignore
those previous white space characters and then report 2:? at Position 0
Length 0 which suggests the parser returned to the beginning of the string
to be matched and found what?

Is this clear as mud or what :)

<%= Clinton Gallagher


Kevin Spencer said:
Hi Clinton,

The following Regular Expression will give you the ability to do a
Regex.Replace on a string containing both single "&" characters and
"&amp;" strings. It captures the "&amp;" strings into their own separate
matches, and the "&" characters into their own matches, putting the "&"
characters into a Group. It is also case-insensitive:

(?i)[^&amp;][^&]*|&amp;|(&(?!=amp))

Here's some sample code for reeplacing the single "&" characters with
&amp; -

/// <summary>
/// Replaces Ampersand in a Match with "&amp;"
/// </summary>
/// <param name="m">Match</param>
/// <returns>Replaced Match value</returns>
public static string ampReplacer(Match m)
{
if (m.Groups[1].Captures.Count == 0) return m.Value;
return m.Value.Replace("&", "&amp;");
}

/// <summary>
/// Replaces all single Ampersand characters in a string with "&amp;"
/// </summary>
/// <param name="s">String to process</param>
/// <returns>Processed String</returns>
public static string ReplaceAmpersand(string s)
{
return Regex.Replace(s, @"(?i)[^&amp;][^&]*|&amp;|(&(?!=amp))",
new MatchEvaluator(ampReplacer));
}

The "ampReplacer function is the function passed as the MatchEvaluator
delegate in the Regex.Replace() method used in the "ReplaceAmpersand"
method. The "ReplaceAmpersand" method takes a string as an argument, and
uses Regex.Replace to replace all matches in the string that contain a
value in Groups[1] with "&amp;".

As a side note, I used both Expresso and Regex Buddy to come up with
this. It was indeed a challenge, as I'm not quite a master of Regular
Expressions. But I enjoy learning, so it was a good exercise for me! :)

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

clintonG said:
Kevin, have you ever heard the expression "preaching to the choir?" :)

I've got the basic pattern matching theory understood but its the use of
expressions to disallow or replace certain characters and/or strings
that I'm trying to really understand thoroughly. The following example
illustrates...

// Example
Lawn Mowers, Repairs & Services - lawnmowers.com

A typical page title that when entered into a TextBox meant to capture
string data for an RSS 2.0 title element should use &amp; instead of the
& to represent the ampersand. I've got an expression that works well for
the example but can't figure out (with the expression I have) how to
match the & and replace it with &amp; (yet) -- or -- how to use the
expression I have to force the 2.0 Regular Expression Validator to fail
when the & is present in the string.

// Expression
[a-z]+([a-z0-9-]*[a-z0-9]+)?(\.([a-z]+([a-z0-9-]*[a-z0-9]+)?)+)*

I also really appreciate Expresso's Analyzer. It is outstanding that
Expresso seems to make it easy for us to pick expressions apart piece by
piece and explain them in English.


<%= Clinton Gallagher






Hi Juan,

The kind of RegEx tool I'd like is one which can take a string
I write, and create a RegEx expression which matches it.

The problem with that is that you can write a Regular Expression that
matches a literal string quite easily. For example:

literal string

The above is a regular expression which will match the substring
"literal string" in my first sentence. Of course, the real power of
regular expressions is the abilty to match *patterns* in a string,
perform grouping, etc. So, like any programming language (which it is,
in a sense), Regular Expressions have a shorthand syntax that allows
one to create patterns of a large variety of types. A simple example of
this would be:

(literal) (string)

This captures the same match as the first, but puts the string
"literal" into a group, and the string "string" into a second group.
But of course, we have already exceeded your desired requirement. On
the other hand, we have made a regular expression that is perhaps more
useful (in some situations) than the first.

And of course, the possible types and combinations of patterns are
almost endless, including wildcard patterns, special characters,
boolean rules, and so on.

Yeah, it's like reading some kind of incredibly concise shorthand code,
without even line breaks or brackets to help. That's why I was so
pleased to see that Expresso allows you to break your regular
expression across multiple lines while building it. That helps a good
bit!

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

The kind of RegEx tool I'd like is one which can take a string
I write, and create a RegEx expression which matches it.

*That* will be the RegEx tool that will corner the market.





message Thanks Kevin. I saw that post too and am going to download Expresso
in a few minutes. I know you don't need to be psychic to figure out
what I'm likely to be asking next :)

<%= Clinton Gallagher


I saw a response to this question in the CSharp group, regarding a
product named "Expresso"

http://www.ultrapico.com/Expresso.htm

Expresso is .Net freeware, and after downloading, installing, and
playing with it, I'd give it a try! So far I have found it to be
excellent, having capabilities that Regex Buddy does not have, and a
much more intuitive GUI.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

Hi Clinton,

Yes, I have it. I previously used the freeware Regex Coach Utility,
but it is nowhere near as complete in its support for various newer
Regular Expression syntax and programming languages in general. It
did have one nice feature about it. You could split a Regular
Expression across multiple lines, which often made it easier to
analyze. However, Regex Buddy has the graphical tree view, and it
is synchronized with the Regular Expression itself, which more than
makes up for the omission of breaking a Regular Expression across
multiple lines.

BTW, it also has a GREP utility built in.

In short, it is well worth the 30 bucks.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

message I was looking at PowerGrep from the same dev group but like Regex
Buddy I don't like the buy before you try business model so that
choice has to be on the shelf for the moment but thanks for
bringing it up. I assume you've used Regex Buddy?

<%= Clinton Gallagher



message Regex Buddy is very good. It costs around $30.00, includes quite
a few nice features, including the ability to copy regular
expressions in various language string syntaxes, including C#. It
has the ability to create libraries of regular expressions, a
nice visual builder, color-coding, and quite a bit more. Good
testing environment. And it has some nice reference material
included.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

in message I'm using an .aspx tool I found at [1] but as nice as the
interface is I think I need to consider using others. Some can
generate C# I understand. Your preferences please...

<%= Clinton Gallagher

[1] http://forta.com/books/0672325667/
 
C

clintonG

Hello Kevin,

With exception of bass ackward Lookarounds, when you say expressions are
sequential in nature that is exactly what I am trying to nail down.

What confuses me is they appear to be constructed using set theory like
algabraic expressions whose terms are resolved from the inner most set
working outwards yet when a tool processes them they appear to be parsed
linearly as if serialized. Does that make sense to you?

I sure wish Microsoft would publish a document explaining how the
expressions are parsed the same way they have explained how the life cycle
of a page is processed. The control tree is easy to understand
theoretically. I can only wonder why it can not be similarly mapped and
explained for regular expressions.

I've tried the following:

* RegexDesigner.Net
* Rad Regex Designer
* Regular Expression Tester
* Regulator RegEx
* Expresso

Downloading this afternoon...

* Regex Coach [1] donationware
Which is also said to have an English language 'analyzer' like Expresso
which I consider to have the best interface and functionality but still
can't make a genius from a dummy :).

I'm going to delve into some lists and forums [2] for the next week to see
what I can learn.
This book [3] sounds interesting. I'll take a trip to Barnes and Noble to
see if its on the shelves.
I also have this book [4] which is very good despite the "10 Minutes"
inferences and of course I'll be looking for [5] from O'Rielly.

Let's see what that does for now.

<%= Clinton Gallagher


[1] http://www.weitz.de/regex-coach/
[2] http://regexadvice.com/
[3]
http://books.slashdot.org/article.pl?sid=05/03/22/0810243&tid=156&tid=192&tid=6
[4] http://forta.com/books/0672325667/
[5] http://www.oreilly.com/catalog/regex2/index.html?CMP=ILL-4GV796923290

I appreciate your interest in this context. I've discovered a regular
expression mailint list eminates from

Kevin Spencer said:
Hi Clinton,

Regular Expressions are a bear to learn, ieven if you have good tools to
work with them. I've spent hours working out a relatively "simple" one (at
least it seemed simple at first), but learning a bit more with each hour.
Still, I'm a long way from an expert. I can read most of it fairly well by
now, but certain concepts are still a bit difficult to deal with. I still
struggle some with Lookarounds in particular. One thing to keep in mind is
that Regular Expressions consume a string as they move through it, with a
few exceptions (like Lookarounds). They are basically sequential in
nature.

You may find the "Analyze" tool helpful with this sort of thing.
Fortunately, I have not 2 but THREE Regular Expression tools to work with
(2 of them are Freeware), which enables me to use the one(s) that are best
for the particular type of work I need regarding any individual Regular
Expression and/or problem with one.

The expression you posted,

\w*@\w*\.\w*((\.\w*)*)?

Can be analyzed in so many words as (with the parsing of the email address
where the match begins):

Match any word character, zero or more times. \w* someone
Next, Match the '@' character once. @ @
Next match any word character zero or more times \w* somewhere
Next, Match the '.' character once \.
.
Next, Match any word character zero or more times \w* com
Next, put the following into Group 1 zero or 1 time: (......)?
Match the following into Group 2 zero or more times: (......)*
Match the '.' character once \.
Match any word character zero or more times \w*
Result of Group 1: (\.\w*)* Group 2 (Nothing)
Result of Group 2 \.\w* Nothing

Basically, there is no match for either Group 1 or Group 2, as the '.' has
been consumed by the previous Match. However, as both Groups specify a
minimum of Zero times, they don't disqualify the Match, as they appear
zero times each.

Why does Expresso report Group 1 at position 32 (end of string)? Well, no
match has been returned prior to the end of the string. So, that's where
the null match begins. Why does Expressio begin at position 0? Well, I'm
not that good with it!

Still, your regular expression is a bit lax in terms of standards. We
worked one up for valid email addresses the other day, and you may want to
borrow it:

(?i)([-.\w]+)\@(?:((?:\d{1,3}\.){3}\d{1,3})|([-a-z0-9]+(?:\.[-a-z0-9]+)*)\.((?:com|edu|gov|int|mil|net|org|biz|info|name|museum|coop|aero|[a-z]{2})))

It is case-insensitive, and matches both domain name and IP domain email
addresses. It puts the results into 4 possible groups:

1-User Name, 2-Domain IP Address, 3-Domain Name, 4-Root Domain.

Note that groups 2 and (3,4) are exclusive of one another. The email
address can either be an IP address, or a named domain, but not both. It
supports 2-letter country suffixes, and multiple-dot domain addresses. And
it's case-sensitive.

I'm not sure we covered all the possible permutations, but it's pretty
strong.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.


Basically, the whole string has been consumed by the
clintonG said:
Hello Kevin,

Well I'm bright eyed but not so bushy-tailed this morning. Thanks for
working this out. Its one of those 'must know' issues one needs to be
concerned with when generating valid XML from an application. I'll be
working with it later today and I'm starting to get a feel for Expresso
which I have a question about. I'm at the point where I've almost come to
understand how expressions are actually processed which -- for me --
means I will understand how I need to think to put them together. You've
been a real help again and your source is an inspiration which shows how
elegant self-documenting code can be.

As for the Expresso question, what is 1:? supposed to indicate? (noting
that's the closest I could come at the moment to replicate the
rectangular 'non-printable' character Expresso uses to indicate some
'thing' it has matched) In the following simple example it seems to match
a white space although in a manner that is confusing as I will point out
but in other examples with many more characters and white space in the
string to be matched I have counted the position where the ? is said to
be matched and the position reported does not fall on a white space at
all.

// Expression
\w*@\w*\.\w*((\.\w*)*)?

// String to match
An example (e-mail address removed) of an email address.

Expresso reports 1:? at Postion 32 Length 0 which infers white space in
the simple example as given noting there was white space characters
before the matched characters and motivating one to ask why Expresso
would ignore those previous white space characters and then report 2:? at
Position 0 Length 0 which suggests the parser returned to the beginning
of the string to be matched and found what?

Is this clear as mud or what :)

<%= Clinton Gallagher


Kevin Spencer said:
Hi Clinton,

The following Regular Expression will give you the ability to do a
Regex.Replace on a string containing both single "&" characters and
"&amp;" strings. It captures the "&amp;" strings into their own separate
matches, and the "&" characters into their own matches, putting the "&"
characters into a Group. It is also case-insensitive:

(?i)[^&amp;][^&]*|&amp;|(&(?!=amp))

Here's some sample code for reeplacing the single "&" characters with
&amp; -

/// <summary>
/// Replaces Ampersand in a Match with "&amp;"
/// </summary>
/// <param name="m">Match</param>
/// <returns>Replaced Match value</returns>
public static string ampReplacer(Match m)
{
if (m.Groups[1].Captures.Count == 0) return m.Value;
return m.Value.Replace("&", "&amp;");
}

/// <summary>
/// Replaces all single Ampersand characters in a string with "&amp;"
/// </summary>
/// <param name="s">String to process</param>
/// <returns>Processed String</returns>
public static string ReplaceAmpersand(string s)
{
return Regex.Replace(s, @"(?i)[^&amp;][^&]*|&amp;|(&(?!=amp))",
new MatchEvaluator(ampReplacer));
}

The "ampReplacer function is the function passed as the MatchEvaluator
delegate in the Regex.Replace() method used in the "ReplaceAmpersand"
method. The "ReplaceAmpersand" method takes a string as an argument, and
uses Regex.Replace to replace all matches in the string that contain a
value in Groups[1] with "&amp;".

As a side note, I used both Expresso and Regex Buddy to come up with
this. It was indeed a challenge, as I'm not quite a master of Regular
Expressions. But I enjoy learning, so it was a good exercise for me! :)

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

message Kevin, have you ever heard the expression "preaching to the choir?" :)

I've got the basic pattern matching theory understood but its the use
of expressions to disallow or replace certain characters and/or strings
that I'm trying to really understand thoroughly. The following example
illustrates...

// Example
Lawn Mowers, Repairs & Services - lawnmowers.com

A typical page title that when entered into a TextBox meant to capture
string data for an RSS 2.0 title element should use &amp; instead of
the & to represent the ampersand. I've got an expression that works
well for the example but can't figure out (with the expression I have)
how to match the & and replace it with &amp; (yet) -- or -- how to use
the expression I have to force the 2.0 Regular Expression Validator to
fail when the & is present in the string.

// Expression
[a-z]+([a-z0-9-]*[a-z0-9]+)?(\.([a-z]+([a-z0-9-]*[a-z0-9]+)?)+)*

I also really appreciate Expresso's Analyzer. It is outstanding that
Expresso seems to make it easy for us to pick expressions apart piece
by piece and explain them in English.


<%= Clinton Gallagher






Hi Juan,

The kind of RegEx tool I'd like is one which can take a string
I write, and create a RegEx expression which matches it.

The problem with that is that you can write a Regular Expression that
matches a literal string quite easily. For example:

literal string

The above is a regular expression which will match the substring
"literal string" in my first sentence. Of course, the real power of
regular expressions is the abilty to match *patterns* in a string,
perform grouping, etc. So, like any programming language (which it is,
in a sense), Regular Expressions have a shorthand syntax that allows
one to create patterns of a large variety of types. A simple example
of this would be:

(literal) (string)

This captures the same match as the first, but puts the string
"literal" into a group, and the string "string" into a second group.
But of course, we have already exceeded your desired requirement. On
the other hand, we have made a regular expression that is perhaps more
useful (in some situations) than the first.

And of course, the possible types and combinations of patterns are
almost endless, including wildcard patterns, special characters,
boolean rules, and so on.

Yeah, it's like reading some kind of incredibly concise shorthand
code, without even line breaks or brackets to help. That's why I was
so pleased to see that Expresso allows you to break your regular
expression across multiple lines while building it. That helps a good
bit!

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

The kind of RegEx tool I'd like is one which can take a string
I write, and create a RegEx expression which matches it.

*That* will be the RegEx tool that will corner the market.





message Thanks Kevin. I saw that post too and am going to download Expresso
in a few minutes. I know you don't need to be psychic to figure out
what I'm likely to be asking next :)

<%= Clinton Gallagher


I saw a response to this question in the CSharp group, regarding a
product named "Expresso"

http://www.ultrapico.com/Expresso.htm

Expresso is .Net freeware, and after downloading, installing, and
playing with it, I'd give it a try! So far I have found it to be
excellent, having capabilities that Regex Buddy does not have, and
a much more intuitive GUI.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

Hi Clinton,

Yes, I have it. I previously used the freeware Regex Coach
Utility, but it is nowhere near as complete in its support for
various newer Regular Expression syntax and programming languages
in general. It did have one nice feature about it. You could split
a Regular Expression across multiple lines, which often made it
easier to analyze. However, Regex Buddy has the graphical tree
view, and it is synchronized with the Regular Expression itself,
which more than makes up for the omission of breaking a Regular
Expression across multiple lines.

BTW, it also has a GREP utility built in.

In short, it is well worth the 30 bucks.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

message I was looking at PowerGrep from the same dev group but like Regex
Buddy I don't like the buy before you try business model so that
choice has to be on the shelf for the moment but thanks for
bringing it up. I assume you've used Regex Buddy?

<%= Clinton Gallagher



message Regex Buddy is very good. It costs around $30.00, includes quite
a few nice features, including the ability to copy regular
expressions in various language string syntaxes, including C#.
It has the ability to create libraries of regular expressions, a
nice visual builder, color-coding, and quite a bit more. Good
testing environment. And it has some nice reference material
included.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

in message I'm using an .aspx tool I found at [1] but as nice as the
interface is I think I need to consider using others. Some can
generate C# I understand. Your preferences please...

<%= Clinton Gallagher

[1] http://forta.com/books/0672325667/
 
C

clintonG

Regex Coach is very interesting. It has a unique tree that graphically
represents each part of the expression as well as an English 'Analyzer.'

<%= Clnton Gallagher


clintonG said:
Hello Kevin,

With exception of bass ackward Lookarounds, when you say expressions are
sequential in nature that is exactly what I am trying to nail down.

What confuses me is they appear to be constructed using set theory like
algabraic expressions whose terms are resolved from the inner most set
working outwards yet when a tool processes them they appear to be parsed
linearly as if serialized. Does that make sense to you?

I sure wish Microsoft would publish a document explaining how the
expressions are parsed the same way they have explained how the life cycle
of a page is processed. The control tree is easy to understand
theoretically. I can only wonder why it can not be similarly mapped and
explained for regular expressions.

I've tried the following:

* RegexDesigner.Net
* Rad Regex Designer
* Regular Expression Tester
* Regulator RegEx
* Expresso

Downloading this afternoon...

* Regex Coach [1] donationware
Which is also said to have an English language 'analyzer' like Expresso
which I consider to have the best interface and functionality but still
can't make a genius from a dummy :).

I'm going to delve into some lists and forums [2] for the next week to
see what I can learn.
This book [3] sounds interesting. I'll take a trip to Barnes and Noble to
see if its on the shelves.
I also have this book [4] which is very good despite the "10 Minutes"
inferences and of course I'll be looking for [5] from O'Rielly.

Let's see what that does for now.

<%= Clinton Gallagher


[1] http://www.weitz.de/regex-coach/
[2] http://regexadvice.com/
[3]
http://books.slashdot.org/article.pl?sid=05/03/22/0810243&tid=156&tid=192&tid=6
[4] http://forta.com/books/0672325667/
[5] http://www.oreilly.com/catalog/regex2/index.html?CMP=ILL-4GV796923290

I appreciate your interest in this context. I've discovered a regular
expression mailint list eminates from

Kevin Spencer said:
Hi Clinton,

Regular Expressions are a bear to learn, ieven if you have good tools to
work with them. I've spent hours working out a relatively "simple" one
(at least it seemed simple at first), but learning a bit more with each
hour. Still, I'm a long way from an expert. I can read most of it fairly
well by now, but certain concepts are still a bit difficult to deal with.
I still struggle some with Lookarounds in particular. One thing to keep
in mind is that Regular Expressions consume a string as they move through
it, with a few exceptions (like Lookarounds). They are basically
sequential in nature.

You may find the "Analyze" tool helpful with this sort of thing.
Fortunately, I have not 2 but THREE Regular Expression tools to work with
(2 of them are Freeware), which enables me to use the one(s) that are
best for the particular type of work I need regarding any individual
Regular Expression and/or problem with one.

The expression you posted,

\w*@\w*\.\w*((\.\w*)*)?

Can be analyzed in so many words as (with the parsing of the email
address where the match begins):

Match any word character, zero or more times. \w* someone
Next, Match the '@' character once. @ @
Next match any word character zero or more times \w* somewhere
Next, Match the '.' character once \.
.
Next, Match any word character zero or more times \w* com
Next, put the following into Group 1 zero or 1 time: (......)?
Match the following into Group 2 zero or more times: (......)*
Match the '.' character once \.
Match any word character zero or more times \w*
Result of Group 1: (\.\w*)* Group 2 (Nothing)
Result of Group 2 \.\w* Nothing

Basically, there is no match for either Group 1 or Group 2, as the '.'
has been consumed by the previous Match. However, as both Groups specify
a minimum of Zero times, they don't disqualify the Match, as they appear
zero times each.

Why does Expresso report Group 1 at position 32 (end of string)? Well, no
match has been returned prior to the end of the string. So, that's where
the null match begins. Why does Expressio begin at position 0? Well, I'm
not that good with it!

Still, your regular expression is a bit lax in terms of standards. We
worked one up for valid email addresses the other day, and you may want
to borrow it:

(?i)([-.\w]+)\@(?:((?:\d{1,3}\.){3}\d{1,3})|([-a-z0-9]+(?:\.[-a-z0-9]+)*)\.((?:com|edu|gov|int|mil|net|org|biz|info|name|museum|coop|aero|[a-z]{2})))

It is case-insensitive, and matches both domain name and IP domain email
addresses. It puts the results into 4 possible groups:

1-User Name, 2-Domain IP Address, 3-Domain Name, 4-Root Domain.

Note that groups 2 and (3,4) are exclusive of one another. The email
address can either be an IP address, or a named domain, but not both. It
supports 2-letter country suffixes, and multiple-dot domain addresses.
And it's case-sensitive.

I'm not sure we covered all the possible permutations, but it's pretty
strong.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.


Basically, the whole string has been consumed by the
clintonG said:
Hello Kevin,

Well I'm bright eyed but not so bushy-tailed this morning. Thanks for
working this out. Its one of those 'must know' issues one needs to be
concerned with when generating valid XML from an application. I'll be
working with it later today and I'm starting to get a feel for Expresso
which I have a question about. I'm at the point where I've almost come
to understand how expressions are actually processed which -- for me --
means I will understand how I need to think to put them together. You've
been a real help again and your source is an inspiration which shows how
elegant self-documenting code can be.

As for the Expresso question, what is 1:? supposed to indicate? (noting
that's the closest I could come at the moment to replicate the
rectangular 'non-printable' character Expresso uses to indicate some
'thing' it has matched) In the following simple example it seems to
match a white space although in a manner that is confusing as I will
point out but in other examples with many more characters and white
space in the string to be matched I have counted the position where the
? is said to be matched and the position reported does not fall on a
white space at all.

// Expression
\w*@\w*\.\w*((\.\w*)*)?

// String to match
An example (e-mail address removed) of an email address.

Expresso reports 1:? at Postion 32 Length 0 which infers white space in
the simple example as given noting there was white space characters
before the matched characters and motivating one to ask why Expresso
would ignore those previous white space characters and then report 2:?
at Position 0 Length 0 which suggests the parser returned to the
beginning of the string to be matched and found what?

Is this clear as mud or what :)

<%= Clinton Gallagher


Hi Clinton,

The following Regular Expression will give you the ability to do a
Regex.Replace on a string containing both single "&" characters and
"&amp;" strings. It captures the "&amp;" strings into their own
separate matches, and the "&" characters into their own matches,
putting the "&" characters into a Group. It is also case-insensitive:

(?i)[^&amp;][^&]*|&amp;|(&(?!=amp))

Here's some sample code for reeplacing the single "&" characters with
&amp; -

/// <summary>
/// Replaces Ampersand in a Match with "&amp;"
/// </summary>
/// <param name="m">Match</param>
/// <returns>Replaced Match value</returns>
public static string ampReplacer(Match m)
{
if (m.Groups[1].Captures.Count == 0) return m.Value;
return m.Value.Replace("&", "&amp;");
}

/// <summary>
/// Replaces all single Ampersand characters in a string with "&amp;"
/// </summary>
/// <param name="s">String to process</param>
/// <returns>Processed String</returns>
public static string ReplaceAmpersand(string s)
{
return Regex.Replace(s, @"(?i)[^&amp;][^&]*|&amp;|(&(?!=amp))",
new MatchEvaluator(ampReplacer));
}

The "ampReplacer function is the function passed as the MatchEvaluator
delegate in the Regex.Replace() method used in the "ReplaceAmpersand"
method. The "ReplaceAmpersand" method takes a string as an argument,
and uses Regex.Replace to replace all matches in the string that
contain a value in Groups[1] with "&amp;".

As a side note, I used both Expresso and Regex Buddy to come up with
this. It was indeed a challenge, as I'm not quite a master of Regular
Expressions. But I enjoy learning, so it was a good exercise for me! :)

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

message Kevin, have you ever heard the expression "preaching to the choir?"
:)

I've got the basic pattern matching theory understood but its the use
of expressions to disallow or replace certain characters and/or
strings that I'm trying to really understand thoroughly. The following
example illustrates...

// Example
Lawn Mowers, Repairs & Services - lawnmowers.com

A typical page title that when entered into a TextBox meant to capture
string data for an RSS 2.0 title element should use &amp; instead of
the & to represent the ampersand. I've got an expression that works
well for the example but can't figure out (with the expression I have)
how to match the & and replace it with &amp; (yet) -- or -- how to use
the expression I have to force the 2.0 Regular Expression Validator to
fail when the & is present in the string.

// Expression
[a-z]+([a-z0-9-]*[a-z0-9]+)?(\.([a-z]+([a-z0-9-]*[a-z0-9]+)?)+)*

I also really appreciate Expresso's Analyzer. It is outstanding that
Expresso seems to make it easy for us to pick expressions apart piece
by piece and explain them in English.


<%= Clinton Gallagher






Hi Juan,

The kind of RegEx tool I'd like is one which can take a string
I write, and create a RegEx expression which matches it.

The problem with that is that you can write a Regular Expression that
matches a literal string quite easily. For example:

literal string

The above is a regular expression which will match the substring
"literal string" in my first sentence. Of course, the real power of
regular expressions is the abilty to match *patterns* in a string,
perform grouping, etc. So, like any programming language (which it
is, in a sense), Regular Expressions have a shorthand syntax that
allows one to create patterns of a large variety of types. A simple
example of this would be:

(literal) (string)

This captures the same match as the first, but puts the string
"literal" into a group, and the string "string" into a second group.
But of course, we have already exceeded your desired requirement. On
the other hand, we have made a regular expression that is perhaps
more useful (in some situations) than the first.

And of course, the possible types and combinations of patterns are
almost endless, including wildcard patterns, special characters,
boolean rules, and so on.

Yeah, it's like reading some kind of incredibly concise shorthand
code, without even line breaks or brackets to help. That's why I was
so pleased to see that Expresso allows you to break your regular
expression across multiple lines while building it. That helps a good
bit!

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

The kind of RegEx tool I'd like is one which can take a string
I write, and create a RegEx expression which matches it.

*That* will be the RegEx tool that will corner the market.





message Thanks Kevin. I saw that post too and am going to download Expresso
in a few minutes. I know you don't need to be psychic to figure out
what I'm likely to be asking next :)

<%= Clinton Gallagher


I saw a response to this question in the CSharp group, regarding a
product named "Expresso"

http://www.ultrapico.com/Expresso.htm

Expresso is .Net freeware, and after downloading, installing, and
playing with it, I'd give it a try! So far I have found it to be
excellent, having capabilities that Regex Buddy does not have, and
a much more intuitive GUI.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

message Hi Clinton,

Yes, I have it. I previously used the freeware Regex Coach
Utility, but it is nowhere near as complete in its support for
various newer Regular Expression syntax and programming languages
in general. It did have one nice feature about it. You could
split a Regular Expression across multiple lines, which often
made it easier to analyze. However, Regex Buddy has the graphical
tree view, and it is synchronized with the Regular Expression
itself, which more than makes up for the omission of breaking a
Regular Expression across multiple lines.

BTW, it also has a GREP utility built in.

In short, it is well worth the 30 bucks.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

in message I was looking at PowerGrep from the same dev group but like Regex
Buddy I don't like the buy before you try business model so that
choice has to be on the shelf for the moment but thanks for
bringing it up. I assume you've used Regex Buddy?

<%= Clinton Gallagher



message Regex Buddy is very good. It costs around $30.00, includes
quite a few nice features, including the ability to copy
regular expressions in various language string syntaxes,
including C#. It has the ability to create libraries of regular
expressions, a nice visual builder, color-coding, and quite a
bit more. Good testing environment. And it has some nice
reference material included.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

in message I'm using an .aspx tool I found at [1] but as nice as the
interface is I think I need to consider using others. Some can
generate C# I understand. Your preferences please...

<%= Clinton Gallagher

[1] http://forta.com/books/0672325667/
 
K

Kevin Spencer

Yes, actually that is my third Regex Software package, along with Regex
Buddy and Expresso. I find it helpful to use them concurrently.

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer
Ambiguity has a certain quality to it.

clintonG said:
Regex Coach is very interesting. It has a unique tree that graphically
represents each part of the expression as well as an English 'Analyzer.'

<%= Clnton Gallagher


clintonG said:
Hello Kevin,

With exception of bass ackward Lookarounds, when you say expressions are
sequential in nature that is exactly what I am trying to nail down.

What confuses me is they appear to be constructed using set theory like
algabraic expressions whose terms are resolved from the inner most set
working outwards yet when a tool processes them they appear to be parsed
linearly as if serialized. Does that make sense to you?

I sure wish Microsoft would publish a document explaining how the
expressions are parsed the same way they have explained how the life
cycle of a page is processed. The control tree is easy to understand
theoretically. I can only wonder why it can not be similarly mapped and
explained for regular expressions.

I've tried the following:

* RegexDesigner.Net
* Rad Regex Designer
* Regular Expression Tester
* Regulator RegEx
* Expresso

Downloading this afternoon...

* Regex Coach [1] donationware
Which is also said to have an English language 'analyzer' like Expresso
which I consider to have the best interface and functionality but still
can't make a genius from a dummy :).

I'm going to delve into some lists and forums [2] for the next week to
see what I can learn.
This book [3] sounds interesting. I'll take a trip to Barnes and Noble to
see if its on the shelves.
I also have this book [4] which is very good despite the "10 Minutes"
inferences and of course I'll be looking for [5] from O'Rielly.

Let's see what that does for now.

<%= Clinton Gallagher


[1] http://www.weitz.de/regex-coach/
[2] http://regexadvice.com/
[3]
http://books.slashdot.org/article.pl?sid=05/03/22/0810243&tid=156&tid=192&tid=6
[4] http://forta.com/books/0672325667/
[5] http://www.oreilly.com/catalog/regex2/index.html?CMP=ILL-4GV796923290

I appreciate your interest in this context. I've discovered a regular
expression mailint list eminates from

Kevin Spencer said:
Hi Clinton,

Regular Expressions are a bear to learn, ieven if you have good tools to
work with them. I've spent hours working out a relatively "simple" one
(at least it seemed simple at first), but learning a bit more with each
hour. Still, I'm a long way from an expert. I can read most of it fairly
well by now, but certain concepts are still a bit difficult to deal
with. I still struggle some with Lookarounds in particular. One thing to
keep in mind is that Regular Expressions consume a string as they move
through it, with a few exceptions (like Lookarounds). They are basically
sequential in nature.

You may find the "Analyze" tool helpful with this sort of thing.
Fortunately, I have not 2 but THREE Regular Expression tools to work
with (2 of them are Freeware), which enables me to use the one(s) that
are best for the particular type of work I need regarding any individual
Regular Expression and/or problem with one.

The expression you posted,

\w*@\w*\.\w*((\.\w*)*)?

Can be analyzed in so many words as (with the parsing of the email
address where the match begins):

Match any word character, zero or more times. \w*
someone
Next, Match the '@' character once. @ @
Next match any word character zero or more times \w*
somewhere
Next, Match the '.' character once
\. .
Next, Match any word character zero or more times \w* com
Next, put the following into Group 1 zero or 1 time: (......)?
Match the following into Group 2 zero or more times: (......)*
Match the '.' character once \.
Match any word character zero or more times \w*
Result of Group 1: (\.\w*)* Group 2 (Nothing)
Result of Group 2 \.\w* Nothing

Basically, there is no match for either Group 1 or Group 2, as the '.'
has been consumed by the previous Match. However, as both Groups specify
a minimum of Zero times, they don't disqualify the Match, as they appear
zero times each.

Why does Expresso report Group 1 at position 32 (end of string)? Well,
no match has been returned prior to the end of the string. So, that's
where the null match begins. Why does Expressio begin at position 0?
Well, I'm not that good with it!

Still, your regular expression is a bit lax in terms of standards. We
worked one up for valid email addresses the other day, and you may want
to borrow it:

(?i)([-.\w]+)\@(?:((?:\d{1,3}\.){3}\d{1,3})|([-a-z0-9]+(?:\.[-a-z0-9]+)*)\.((?:com|edu|gov|int|mil|net|org|biz|info|name|museum|coop|aero|[a-z]{2})))

It is case-insensitive, and matches both domain name and IP domain email
addresses. It puts the results into 4 possible groups:

1-User Name, 2-Domain IP Address, 3-Domain Name, 4-Root Domain.

Note that groups 2 and (3,4) are exclusive of one another. The email
address can either be an IP address, or a named domain, but not both. It
supports 2-letter country suffixes, and multiple-dot domain addresses.
And it's case-sensitive.

I'm not sure we covered all the possible permutations, but it's pretty
strong.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.


Basically, the whole string has been consumed by the
message Hello Kevin,

Well I'm bright eyed but not so bushy-tailed this morning. Thanks for
working this out. Its one of those 'must know' issues one needs to be
concerned with when generating valid XML from an application. I'll be
working with it later today and I'm starting to get a feel for Expresso
which I have a question about. I'm at the point where I've almost come
to understand how expressions are actually processed which -- for me --
means I will understand how I need to think to put them together.
You've been a real help again and your source is an inspiration which
shows how elegant self-documenting code can be.

As for the Expresso question, what is 1:? supposed to indicate? (noting
that's the closest I could come at the moment to replicate the
rectangular 'non-printable' character Expresso uses to indicate some
'thing' it has matched) In the following simple example it seems to
match a white space although in a manner that is confusing as I will
point out but in other examples with many more characters and white
space in the string to be matched I have counted the position where the
? is said to be matched and the position reported does not fall on a
white space at all.

// Expression
\w*@\w*\.\w*((\.\w*)*)?

// String to match
An example (e-mail address removed) of an email address.

Expresso reports 1:? at Postion 32 Length 0 which infers white space in
the simple example as given noting there was white space characters
before the matched characters and motivating one to ask why Expresso
would ignore those previous white space characters and then report 2:?
at Position 0 Length 0 which suggests the parser returned to the
beginning of the string to be matched and found what?

Is this clear as mud or what :)

<%= Clinton Gallagher


Hi Clinton,

The following Regular Expression will give you the ability to do a
Regex.Replace on a string containing both single "&" characters and
"&amp;" strings. It captures the "&amp;" strings into their own
separate matches, and the "&" characters into their own matches,
putting the "&" characters into a Group. It is also case-insensitive:

(?i)[^&amp;][^&]*|&amp;|(&(?!=amp))

Here's some sample code for reeplacing the single "&" characters with
&amp; -

/// <summary>
/// Replaces Ampersand in a Match with "&amp;"
/// </summary>
/// <param name="m">Match</param>
/// <returns>Replaced Match value</returns>
public static string ampReplacer(Match m)
{
if (m.Groups[1].Captures.Count == 0) return m.Value;
return m.Value.Replace("&", "&amp;");
}

/// <summary>
/// Replaces all single Ampersand characters in a string with "&amp;"
/// </summary>
/// <param name="s">String to process</param>
/// <returns>Processed String</returns>
public static string ReplaceAmpersand(string s)
{
return Regex.Replace(s, @"(?i)[^&amp;][^&]*|&amp;|(&(?!=amp))",
new MatchEvaluator(ampReplacer));
}

The "ampReplacer function is the function passed as the MatchEvaluator
delegate in the Regex.Replace() method used in the "ReplaceAmpersand"
method. The "ReplaceAmpersand" method takes a string as an argument,
and uses Regex.Replace to replace all matches in the string that
contain a value in Groups[1] with "&amp;".

As a side note, I used both Expresso and Regex Buddy to come up with
this. It was indeed a challenge, as I'm not quite a master of Regular
Expressions. But I enjoy learning, so it was a good exercise for me!
:)

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

message Kevin, have you ever heard the expression "preaching to the choir?"
:)

I've got the basic pattern matching theory understood but its the use
of expressions to disallow or replace certain characters and/or
strings that I'm trying to really understand thoroughly. The
following example illustrates...

// Example
Lawn Mowers, Repairs & Services - lawnmowers.com

A typical page title that when entered into a TextBox meant to
capture string data for an RSS 2.0 title element should use &amp;
instead of the & to represent the ampersand. I've got an expression
that works well for the example but can't figure out (with the
expression I have) how to match the & and replace it with &amp;
(yet) -- or -- how to use the expression I have to force the 2.0
Regular Expression Validator to fail when the & is present in the
string.

// Expression
[a-z]+([a-z0-9-]*[a-z0-9]+)?(\.([a-z]+([a-z0-9-]*[a-z0-9]+)?)+)*

I also really appreciate Expresso's Analyzer. It is outstanding that
Expresso seems to make it easy for us to pick expressions apart piece
by piece and explain them in English.


<%= Clinton Gallagher






Hi Juan,

The kind of RegEx tool I'd like is one which can take a string
I write, and create a RegEx expression which matches it.

The problem with that is that you can write a Regular Expression
that matches a literal string quite easily. For example:

literal string

The above is a regular expression which will match the substring
"literal string" in my first sentence. Of course, the real power of
regular expressions is the abilty to match *patterns* in a string,
perform grouping, etc. So, like any programming language (which it
is, in a sense), Regular Expressions have a shorthand syntax that
allows one to create patterns of a large variety of types. A simple
example of this would be:

(literal) (string)

This captures the same match as the first, but puts the string
"literal" into a group, and the string "string" into a second group.
But of course, we have already exceeded your desired requirement. On
the other hand, we have made a regular expression that is perhaps
more useful (in some situations) than the first.

And of course, the possible types and combinations of patterns are
almost endless, including wildcard patterns, special characters,
boolean rules, and so on.

Yeah, it's like reading some kind of incredibly concise shorthand
code, without even line breaks or brackets to help. That's why I was
so pleased to see that Expresso allows you to break your regular
expression across multiple lines while building it. That helps a
good bit!

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

The kind of RegEx tool I'd like is one which can take a string
I write, and create a RegEx expression which matches it.

*That* will be the RegEx tool that will corner the market.





message Thanks Kevin. I saw that post too and am going to download
Expresso in a few minutes. I know you don't need to be psychic to
figure out what I'm likely to be asking next :)

<%= Clinton Gallagher


message I saw a response to this question in the CSharp group, regarding a
product named "Expresso"

http://www.ultrapico.com/Expresso.htm

Expresso is .Net freeware, and after downloading, installing, and
playing with it, I'd give it a try! So far I have found it to be
excellent, having capabilities that Regex Buddy does not have,
and a much more intuitive GUI.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

message Hi Clinton,

Yes, I have it. I previously used the freeware Regex Coach
Utility, but it is nowhere near as complete in its support for
various newer Regular Expression syntax and programming
languages in general. It did have one nice feature about it. You
could split a Regular Expression across multiple lines, which
often made it easier to analyze. However, Regex Buddy has the
graphical tree view, and it is synchronized with the Regular
Expression itself, which more than makes up for the omission of
breaking a Regular Expression across multiple lines.

BTW, it also has a GREP utility built in.

In short, it is well worth the 30 bucks.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

in message I was looking at PowerGrep from the same dev group but like
Regex Buddy I don't like the buy before you try business model
so that choice has to be on the shelf for the moment but thanks
for bringing it up. I assume you've used Regex Buddy?

<%= Clinton Gallagher



message Regex Buddy is very good. It costs around $30.00, includes
quite a few nice features, including the ability to copy
regular expressions in various language string syntaxes,
including C#. It has the ability to create libraries of
regular expressions, a nice visual builder, color-coding, and
quite a bit more. Good testing environment. And it has some
nice reference material included.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

"clintonG" <[email protected]>
wrote in message
I'm using an .aspx tool I found at [1] but as nice as the
interface is I think I need to consider using others. Some
can generate C# I understand. Your preferences please...

<%= Clinton Gallagher

[1] http://forta.com/books/0672325667/
 
K

Kevin Spencer

Hi Clinton,

Your remarks piqued my curiosity. I have found a few technical articles on
the inner working of regular expressions:

http://www.cs.rochester.edu/u/leblanc/csc173/fa/
http://perldoc.perl.org/perlre.html#Version-8-Regular-Expressions
http://research.microsoft.com/projects/greta/
http://en.wikipedia.org/wiki/Regular_expressions#In_formal_language_theory

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer
Ambiguity has a certain quality to it.

clintonG said:
Hello Kevin,

With exception of bass ackward Lookarounds, when you say expressions are
sequential in nature that is exactly what I am trying to nail down.

What confuses me is they appear to be constructed using set theory like
algabraic expressions whose terms are resolved from the inner most set
working outwards yet when a tool processes them they appear to be parsed
linearly as if serialized. Does that make sense to you?

I sure wish Microsoft would publish a document explaining how the
expressions are parsed the same way they have explained how the life cycle
of a page is processed. The control tree is easy to understand
theoretically. I can only wonder why it can not be similarly mapped and
explained for regular expressions.

I've tried the following:

* RegexDesigner.Net
* Rad Regex Designer
* Regular Expression Tester
* Regulator RegEx
* Expresso

Downloading this afternoon...

* Regex Coach [1] donationware
Which is also said to have an English language 'analyzer' like Expresso
which I consider to have the best interface and functionality but still
can't make a genius from a dummy :).

I'm going to delve into some lists and forums [2] for the next week to
see what I can learn.
This book [3] sounds interesting. I'll take a trip to Barnes and Noble to
see if its on the shelves.
I also have this book [4] which is very good despite the "10 Minutes"
inferences and of course I'll be looking for [5] from O'Rielly.

Let's see what that does for now.

<%= Clinton Gallagher


[1] http://www.weitz.de/regex-coach/
[2] http://regexadvice.com/
[3]
http://books.slashdot.org/article.pl?sid=05/03/22/0810243&tid=156&tid=192&tid=6
[4] http://forta.com/books/0672325667/
[5] http://www.oreilly.com/catalog/regex2/index.html?CMP=ILL-4GV796923290

I appreciate your interest in this context. I've discovered a regular
expression mailint list eminates from

Kevin Spencer said:
Hi Clinton,

Regular Expressions are a bear to learn, ieven if you have good tools to
work with them. I've spent hours working out a relatively "simple" one
(at least it seemed simple at first), but learning a bit more with each
hour. Still, I'm a long way from an expert. I can read most of it fairly
well by now, but certain concepts are still a bit difficult to deal with.
I still struggle some with Lookarounds in particular. One thing to keep
in mind is that Regular Expressions consume a string as they move through
it, with a few exceptions (like Lookarounds). They are basically
sequential in nature.

You may find the "Analyze" tool helpful with this sort of thing.
Fortunately, I have not 2 but THREE Regular Expression tools to work with
(2 of them are Freeware), which enables me to use the one(s) that are
best for the particular type of work I need regarding any individual
Regular Expression and/or problem with one.

The expression you posted,

\w*@\w*\.\w*((\.\w*)*)?

Can be analyzed in so many words as (with the parsing of the email
address where the match begins):

Match any word character, zero or more times. \w* someone
Next, Match the '@' character once. @ @
Next match any word character zero or more times \w* somewhere
Next, Match the '.' character once \.
.
Next, Match any word character zero or more times \w* com
Next, put the following into Group 1 zero or 1 time: (......)?
Match the following into Group 2 zero or more times: (......)*
Match the '.' character once \.
Match any word character zero or more times \w*
Result of Group 1: (\.\w*)* Group 2 (Nothing)
Result of Group 2 \.\w* Nothing

Basically, there is no match for either Group 1 or Group 2, as the '.'
has been consumed by the previous Match. However, as both Groups specify
a minimum of Zero times, they don't disqualify the Match, as they appear
zero times each.

Why does Expresso report Group 1 at position 32 (end of string)? Well, no
match has been returned prior to the end of the string. So, that's where
the null match begins. Why does Expressio begin at position 0? Well, I'm
not that good with it!

Still, your regular expression is a bit lax in terms of standards. We
worked one up for valid email addresses the other day, and you may want
to borrow it:

(?i)([-.\w]+)\@(?:((?:\d{1,3}\.){3}\d{1,3})|([-a-z0-9]+(?:\.[-a-z0-9]+)*)\.((?:com|edu|gov|int|mil|net|org|biz|info|name|museum|coop|aero|[a-z]{2})))

It is case-insensitive, and matches both domain name and IP domain email
addresses. It puts the results into 4 possible groups:

1-User Name, 2-Domain IP Address, 3-Domain Name, 4-Root Domain.

Note that groups 2 and (3,4) are exclusive of one another. The email
address can either be an IP address, or a named domain, but not both. It
supports 2-letter country suffixes, and multiple-dot domain addresses.
And it's case-sensitive.

I'm not sure we covered all the possible permutations, but it's pretty
strong.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.


Basically, the whole string has been consumed by the
clintonG said:
Hello Kevin,

Well I'm bright eyed but not so bushy-tailed this morning. Thanks for
working this out. Its one of those 'must know' issues one needs to be
concerned with when generating valid XML from an application. I'll be
working with it later today and I'm starting to get a feel for Expresso
which I have a question about. I'm at the point where I've almost come
to understand how expressions are actually processed which -- for me --
means I will understand how I need to think to put them together. You've
been a real help again and your source is an inspiration which shows how
elegant self-documenting code can be.

As for the Expresso question, what is 1:? supposed to indicate? (noting
that's the closest I could come at the moment to replicate the
rectangular 'non-printable' character Expresso uses to indicate some
'thing' it has matched) In the following simple example it seems to
match a white space although in a manner that is confusing as I will
point out but in other examples with many more characters and white
space in the string to be matched I have counted the position where the
? is said to be matched and the position reported does not fall on a
white space at all.

// Expression
\w*@\w*\.\w*((\.\w*)*)?

// String to match
An example (e-mail address removed) of an email address.

Expresso reports 1:? at Postion 32 Length 0 which infers white space in
the simple example as given noting there was white space characters
before the matched characters and motivating one to ask why Expresso
would ignore those previous white space characters and then report 2:?
at Position 0 Length 0 which suggests the parser returned to the
beginning of the string to be matched and found what?

Is this clear as mud or what :)

<%= Clinton Gallagher


Hi Clinton,

The following Regular Expression will give you the ability to do a
Regex.Replace on a string containing both single "&" characters and
"&amp;" strings. It captures the "&amp;" strings into their own
separate matches, and the "&" characters into their own matches,
putting the "&" characters into a Group. It is also case-insensitive:

(?i)[^&amp;][^&]*|&amp;|(&(?!=amp))

Here's some sample code for reeplacing the single "&" characters with
&amp; -

/// <summary>
/// Replaces Ampersand in a Match with "&amp;"
/// </summary>
/// <param name="m">Match</param>
/// <returns>Replaced Match value</returns>
public static string ampReplacer(Match m)
{
if (m.Groups[1].Captures.Count == 0) return m.Value;
return m.Value.Replace("&", "&amp;");
}

/// <summary>
/// Replaces all single Ampersand characters in a string with "&amp;"
/// </summary>
/// <param name="s">String to process</param>
/// <returns>Processed String</returns>
public static string ReplaceAmpersand(string s)
{
return Regex.Replace(s, @"(?i)[^&amp;][^&]*|&amp;|(&(?!=amp))",
new MatchEvaluator(ampReplacer));
}

The "ampReplacer function is the function passed as the MatchEvaluator
delegate in the Regex.Replace() method used in the "ReplaceAmpersand"
method. The "ReplaceAmpersand" method takes a string as an argument,
and uses Regex.Replace to replace all matches in the string that
contain a value in Groups[1] with "&amp;".

As a side note, I used both Expresso and Regex Buddy to come up with
this. It was indeed a challenge, as I'm not quite a master of Regular
Expressions. But I enjoy learning, so it was a good exercise for me! :)

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

message Kevin, have you ever heard the expression "preaching to the choir?"
:)

I've got the basic pattern matching theory understood but its the use
of expressions to disallow or replace certain characters and/or
strings that I'm trying to really understand thoroughly. The following
example illustrates...

// Example
Lawn Mowers, Repairs & Services - lawnmowers.com

A typical page title that when entered into a TextBox meant to capture
string data for an RSS 2.0 title element should use &amp; instead of
the & to represent the ampersand. I've got an expression that works
well for the example but can't figure out (with the expression I have)
how to match the & and replace it with &amp; (yet) -- or -- how to use
the expression I have to force the 2.0 Regular Expression Validator to
fail when the & is present in the string.

// Expression
[a-z]+([a-z0-9-]*[a-z0-9]+)?(\.([a-z]+([a-z0-9-]*[a-z0-9]+)?)+)*

I also really appreciate Expresso's Analyzer. It is outstanding that
Expresso seems to make it easy for us to pick expressions apart piece
by piece and explain them in English.


<%= Clinton Gallagher






Hi Juan,

The kind of RegEx tool I'd like is one which can take a string
I write, and create a RegEx expression which matches it.

The problem with that is that you can write a Regular Expression that
matches a literal string quite easily. For example:

literal string

The above is a regular expression which will match the substring
"literal string" in my first sentence. Of course, the real power of
regular expressions is the abilty to match *patterns* in a string,
perform grouping, etc. So, like any programming language (which it
is, in a sense), Regular Expressions have a shorthand syntax that
allows one to create patterns of a large variety of types. A simple
example of this would be:

(literal) (string)

This captures the same match as the first, but puts the string
"literal" into a group, and the string "string" into a second group.
But of course, we have already exceeded your desired requirement. On
the other hand, we have made a regular expression that is perhaps
more useful (in some situations) than the first.

And of course, the possible types and combinations of patterns are
almost endless, including wildcard patterns, special characters,
boolean rules, and so on.

Yeah, it's like reading some kind of incredibly concise shorthand
code, without even line breaks or brackets to help. That's why I was
so pleased to see that Expresso allows you to break your regular
expression across multiple lines while building it. That helps a good
bit!

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

The kind of RegEx tool I'd like is one which can take a string
I write, and create a RegEx expression which matches it.

*That* will be the RegEx tool that will corner the market.





message Thanks Kevin. I saw that post too and am going to download Expresso
in a few minutes. I know you don't need to be psychic to figure out
what I'm likely to be asking next :)

<%= Clinton Gallagher


I saw a response to this question in the CSharp group, regarding a
product named "Expresso"

http://www.ultrapico.com/Expresso.htm

Expresso is .Net freeware, and after downloading, installing, and
playing with it, I'd give it a try! So far I have found it to be
excellent, having capabilities that Regex Buddy does not have, and
a much more intuitive GUI.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

message Hi Clinton,

Yes, I have it. I previously used the freeware Regex Coach
Utility, but it is nowhere near as complete in its support for
various newer Regular Expression syntax and programming languages
in general. It did have one nice feature about it. You could
split a Regular Expression across multiple lines, which often
made it easier to analyze. However, Regex Buddy has the graphical
tree view, and it is synchronized with the Regular Expression
itself, which more than makes up for the omission of breaking a
Regular Expression across multiple lines.

BTW, it also has a GREP utility built in.

In short, it is well worth the 30 bucks.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

in message I was looking at PowerGrep from the same dev group but like Regex
Buddy I don't like the buy before you try business model so that
choice has to be on the shelf for the moment but thanks for
bringing it up. I assume you've used Regex Buddy?

<%= Clinton Gallagher



message Regex Buddy is very good. It costs around $30.00, includes
quite a few nice features, including the ability to copy
regular expressions in various language string syntaxes,
including C#. It has the ability to create libraries of regular
expressions, a nice visual builder, color-coding, and quite a
bit more. Good testing environment. And it has some nice
reference material included.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

in message I'm using an .aspx tool I found at [1] but as nice as the
interface is I think I need to consider using others. Some can
generate C# I understand. Your preferences please...

<%= Clinton Gallagher

[1] http://forta.com/books/0672325667/
 
C

clintonG

The RegEx Coach [1] treeview is interesting and insightful. Worth the
download but note the resizing, insertion points and related Windows I/O
events are clumsy. This software apears to be developed by an academic who
is a pure perl advocate. I'll review the resources you provided. Thanks.

<%= Clinton Gallagher

[1] http://www.weitz.de/regex-coach/


Kevin Spencer said:
Hi Clinton,

Your remarks piqued my curiosity. I have found a few technical articles on
the inner working of regular expressions:

http://www.cs.rochester.edu/u/leblanc/csc173/fa/
http://perldoc.perl.org/perlre.html#Version-8-Regular-Expressions
http://research.microsoft.com/projects/greta/
http://en.wikipedia.org/wiki/Regular_expressions#In_formal_language_theory

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

clintonG said:
Hello Kevin,

With exception of bass ackward Lookarounds, when you say expressions are
sequential in nature that is exactly what I am trying to nail down.

What confuses me is they appear to be constructed using set theory like
algabraic expressions whose terms are resolved from the inner most set
working outwards yet when a tool processes them they appear to be parsed
linearly as if serialized. Does that make sense to you?

I sure wish Microsoft would publish a document explaining how the
expressions are parsed the same way they have explained how the life
cycle of a page is processed. The control tree is easy to understand
theoretically. I can only wonder why it can not be similarly mapped and
explained for regular expressions.

I've tried the following:

* RegexDesigner.Net
* Rad Regex Designer
* Regular Expression Tester
* Regulator RegEx
* Expresso

Downloading this afternoon...

* Regex Coach [1] donationware
Which is also said to have an English language 'analyzer' like Expresso
which I consider to have the best interface and functionality but still
can't make a genius from a dummy :).

I'm going to delve into some lists and forums [2] for the next week to
see what I can learn.
This book [3] sounds interesting. I'll take a trip to Barnes and Noble to
see if its on the shelves.
I also have this book [4] which is very good despite the "10 Minutes"
inferences and of course I'll be looking for [5] from O'Rielly.

Let's see what that does for now.

<%= Clinton Gallagher


[1] http://www.weitz.de/regex-coach/
[2] http://regexadvice.com/
[3]
http://books.slashdot.org/article.pl?sid=05/03/22/0810243&tid=156&tid=192&tid=6
[4] http://forta.com/books/0672325667/
[5] http://www.oreilly.com/catalog/regex2/index.html?CMP=ILL-4GV796923290

I appreciate your interest in this context. I've discovered a regular
expression mailint list eminates from

Kevin Spencer said:
Hi Clinton,

Regular Expressions are a bear to learn, ieven if you have good tools to
work with them. I've spent hours working out a relatively "simple" one
(at least it seemed simple at first), but learning a bit more with each
hour. Still, I'm a long way from an expert. I can read most of it fairly
well by now, but certain concepts are still a bit difficult to deal
with. I still struggle some with Lookarounds in particular. One thing to
keep in mind is that Regular Expressions consume a string as they move
through it, with a few exceptions (like Lookarounds). They are basically
sequential in nature.

You may find the "Analyze" tool helpful with this sort of thing.
Fortunately, I have not 2 but THREE Regular Expression tools to work
with (2 of them are Freeware), which enables me to use the one(s) that
are best for the particular type of work I need regarding any individual
Regular Expression and/or problem with one.

The expression you posted,

\w*@\w*\.\w*((\.\w*)*)?

Can be analyzed in so many words as (with the parsing of the email
address where the match begins):

Match any word character, zero or more times. \w*
someone
Next, Match the '@' character once. @ @
Next match any word character zero or more times \w*
somewhere
Next, Match the '.' character once
\. .
Next, Match any word character zero or more times \w* com
Next, put the following into Group 1 zero or 1 time: (......)?
Match the following into Group 2 zero or more times: (......)*
Match the '.' character once \.
Match any word character zero or more times \w*
Result of Group 1: (\.\w*)* Group 2 (Nothing)
Result of Group 2 \.\w* Nothing

Basically, there is no match for either Group 1 or Group 2, as the '.'
has been consumed by the previous Match. However, as both Groups specify
a minimum of Zero times, they don't disqualify the Match, as they appear
zero times each.

Why does Expresso report Group 1 at position 32 (end of string)? Well,
no match has been returned prior to the end of the string. So, that's
where the null match begins. Why does Expressio begin at position 0?
Well, I'm not that good with it!

Still, your regular expression is a bit lax in terms of standards. We
worked one up for valid email addresses the other day, and you may want
to borrow it:

(?i)([-.\w]+)\@(?:((?:\d{1,3}\.){3}\d{1,3})|([-a-z0-9]+(?:\.[-a-z0-9]+)*)\.((?:com|edu|gov|int|mil|net|org|biz|info|name|museum|coop|aero|[a-z]{2})))

It is case-insensitive, and matches both domain name and IP domain email
addresses. It puts the results into 4 possible groups:

1-User Name, 2-Domain IP Address, 3-Domain Name, 4-Root Domain.

Note that groups 2 and (3,4) are exclusive of one another. The email
address can either be an IP address, or a named domain, but not both. It
supports 2-letter country suffixes, and multiple-dot domain addresses.
And it's case-sensitive.

I'm not sure we covered all the possible permutations, but it's pretty
strong.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.


Basically, the whole string has been consumed by the
message Hello Kevin,

Well I'm bright eyed but not so bushy-tailed this morning. Thanks for
working this out. Its one of those 'must know' issues one needs to be
concerned with when generating valid XML from an application. I'll be
working with it later today and I'm starting to get a feel for Expresso
which I have a question about. I'm at the point where I've almost come
to understand how expressions are actually processed which -- for me --
means I will understand how I need to think to put them together.
You've been a real help again and your source is an inspiration which
shows how elegant self-documenting code can be.

As for the Expresso question, what is 1:? supposed to indicate? (noting
that's the closest I could come at the moment to replicate the
rectangular 'non-printable' character Expresso uses to indicate some
'thing' it has matched) In the following simple example it seems to
match a white space although in a manner that is confusing as I will
point out but in other examples with many more characters and white
space in the string to be matched I have counted the position where the
? is said to be matched and the position reported does not fall on a
white space at all.

// Expression
\w*@\w*\.\w*((\.\w*)*)?

// String to match
An example (e-mail address removed) of an email address.

Expresso reports 1:? at Postion 32 Length 0 which infers white space in
the simple example as given noting there was white space characters
before the matched characters and motivating one to ask why Expresso
would ignore those previous white space characters and then report 2:?
at Position 0 Length 0 which suggests the parser returned to the
beginning of the string to be matched and found what?

Is this clear as mud or what :)

<%= Clinton Gallagher


Hi Clinton,

The following Regular Expression will give you the ability to do a
Regex.Replace on a string containing both single "&" characters and
"&amp;" strings. It captures the "&amp;" strings into their own
separate matches, and the "&" characters into their own matches,
putting the "&" characters into a Group. It is also case-insensitive:

(?i)[^&amp;][^&]*|&amp;|(&(?!=amp))

Here's some sample code for reeplacing the single "&" characters with
&amp; -

/// <summary>
/// Replaces Ampersand in a Match with "&amp;"
/// </summary>
/// <param name="m">Match</param>
/// <returns>Replaced Match value</returns>
public static string ampReplacer(Match m)
{
if (m.Groups[1].Captures.Count == 0) return m.Value;
return m.Value.Replace("&", "&amp;");
}

/// <summary>
/// Replaces all single Ampersand characters in a string with "&amp;"
/// </summary>
/// <param name="s">String to process</param>
/// <returns>Processed String</returns>
public static string ReplaceAmpersand(string s)
{
return Regex.Replace(s, @"(?i)[^&amp;][^&]*|&amp;|(&(?!=amp))",
new MatchEvaluator(ampReplacer));
}

The "ampReplacer function is the function passed as the MatchEvaluator
delegate in the Regex.Replace() method used in the "ReplaceAmpersand"
method. The "ReplaceAmpersand" method takes a string as an argument,
and uses Regex.Replace to replace all matches in the string that
contain a value in Groups[1] with "&amp;".

As a side note, I used both Expresso and Regex Buddy to come up with
this. It was indeed a challenge, as I'm not quite a master of Regular
Expressions. But I enjoy learning, so it was a good exercise for me!
:)

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

message Kevin, have you ever heard the expression "preaching to the choir?"
:)

I've got the basic pattern matching theory understood but its the use
of expressions to disallow or replace certain characters and/or
strings that I'm trying to really understand thoroughly. The
following example illustrates...

// Example
Lawn Mowers, Repairs & Services - lawnmowers.com

A typical page title that when entered into a TextBox meant to
capture string data for an RSS 2.0 title element should use &amp;
instead of the & to represent the ampersand. I've got an expression
that works well for the example but can't figure out (with the
expression I have) how to match the & and replace it with &amp;
(yet) -- or -- how to use the expression I have to force the 2.0
Regular Expression Validator to fail when the & is present in the
string.

// Expression
[a-z]+([a-z0-9-]*[a-z0-9]+)?(\.([a-z]+([a-z0-9-]*[a-z0-9]+)?)+)*

I also really appreciate Expresso's Analyzer. It is outstanding that
Expresso seems to make it easy for us to pick expressions apart piece
by piece and explain them in English.


<%= Clinton Gallagher






Hi Juan,

The kind of RegEx tool I'd like is one which can take a string
I write, and create a RegEx expression which matches it.

The problem with that is that you can write a Regular Expression
that matches a literal string quite easily. For example:

literal string

The above is a regular expression which will match the substring
"literal string" in my first sentence. Of course, the real power of
regular expressions is the abilty to match *patterns* in a string,
perform grouping, etc. So, like any programming language (which it
is, in a sense), Regular Expressions have a shorthand syntax that
allows one to create patterns of a large variety of types. A simple
example of this would be:

(literal) (string)

This captures the same match as the first, but puts the string
"literal" into a group, and the string "string" into a second group.
But of course, we have already exceeded your desired requirement. On
the other hand, we have made a regular expression that is perhaps
more useful (in some situations) than the first.

And of course, the possible types and combinations of patterns are
almost endless, including wildcard patterns, special characters,
boolean rules, and so on.

Yeah, it's like reading some kind of incredibly concise shorthand
code, without even line breaks or brackets to help. That's why I was
so pleased to see that Expresso allows you to break your regular
expression across multiple lines while building it. That helps a
good bit!

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

The kind of RegEx tool I'd like is one which can take a string
I write, and create a RegEx expression which matches it.

*That* will be the RegEx tool that will corner the market.





message Thanks Kevin. I saw that post too and am going to download
Expresso in a few minutes. I know you don't need to be psychic to
figure out what I'm likely to be asking next :)

<%= Clinton Gallagher


message I saw a response to this question in the CSharp group, regarding a
product named "Expresso"

http://www.ultrapico.com/Expresso.htm

Expresso is .Net freeware, and after downloading, installing, and
playing with it, I'd give it a try! So far I have found it to be
excellent, having capabilities that Regex Buddy does not have,
and a much more intuitive GUI.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

message Hi Clinton,

Yes, I have it. I previously used the freeware Regex Coach
Utility, but it is nowhere near as complete in its support for
various newer Regular Expression syntax and programming
languages in general. It did have one nice feature about it. You
could split a Regular Expression across multiple lines, which
often made it easier to analyze. However, Regex Buddy has the
graphical tree view, and it is synchronized with the Regular
Expression itself, which more than makes up for the omission of
breaking a Regular Expression across multiple lines.

BTW, it also has a GREP utility built in.

In short, it is well worth the 30 bucks.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

in message I was looking at PowerGrep from the same dev group but like
Regex Buddy I don't like the buy before you try business model
so that choice has to be on the shelf for the moment but thanks
for bringing it up. I assume you've used Regex Buddy?

<%= Clinton Gallagher



message Regex Buddy is very good. It costs around $30.00, includes
quite a few nice features, including the ability to copy
regular expressions in various language string syntaxes,
including C#. It has the ability to create libraries of
regular expressions, a nice visual builder, color-coding, and
quite a bit more. Good testing environment. And it has some
nice reference material included.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

"clintonG" <[email protected]>
wrote in message
I'm using an .aspx tool I found at [1] but as nice as the
interface is I think I need to consider using others. Some
can generate C# I understand. Your preferences please...

<%= Clinton Gallagher

[1] http://forta.com/books/0672325667/
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,564
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top