I
Iain Barnett
I'm trying to emulate something I've done in .Net many moons ago, which =
is capture a named group, but not just once, get all it's repetitions =
and then be able to see all those repetitions. I think they call them =
GroupCollections in C#. This is the kind of code I'm trying to emulate =
with Ruby(1.9.1):
using System;
using System.Text.RegularExpressions;
public class Test
{
public static void Main ()
{
// Define a regular expression for repeated words.
Regex rx =3D new Regex(@"\b(?<word>\w+)\s+(\k<word>)\b",
RegexOptions.Compiled | RegexOptions.IgnoreCase);
// Define a test string. =20
string text =3D "The the quick brown fox fox jumped over the =
lazy dog dog.";
// Find matches.
MatchCollection matches =3D rx.Matches(text);
// Report the number of matches found.
Console.WriteLine("{0} matches found in:\n {1}",=20
matches.Count,=20
text);
// Report on each match.
foreach (Match match in matches)
{
GroupCollection groups =3D match.Groups;
Console.WriteLine("'{0}' repeated at positions {1} and {2}", =
=20
groups["word"].Value,=20
groups[0].Index,=20
groups[1].Index);
}
}
=09
}
// The example produces the following output to the console:
// 3 matches found in:
// The the quick brown fox fox jumped over the lazy dog dog.
// 'The' repeated at positions 0 and 4
// 'fox' repeated at positions 20 and 25
// 'dog' repeated at positions 50 and 54
For example, if I had the string "11 12" I could have a regex like=20
/
(?<first> \d+ ) \s \g<first>
/x=20
that captured "11" and then the repetition "12" and put them in an =
array (or some kind of collection) referenced by the name.
I think my attempts to get this to work are better explanations. What I =
want is the result
#<MatchData "11 12" first:["11", "12"]> or something like it. At the =
moment all my attempts end with the named capture only keeping the last =
match it made i.e. 12 with no mention of 11.
I know I could do this a different way, perhaps with split or something, =
but I'd like to know if it's possible with just regex. I understand the =
Oniguruma engine is used now but I can't find any good docs for it.
These are my attempts, $ is my prompt.
$ md1 =3D /
(?<first> \d+ )
\s \g<first>
/x.match( "11 12" )=20
#<MatchData "11 12" first:"12">
$ md1[:first]
"12"
$ md1 =3D /
(?<first> \d+ )
(?: \s \g<first> )?
/x.match( "11 12" )=20
#<MatchData "11 12" first:"12">
$ md1[:first]
"12"
$ md1 =3D /
(?<first> \d+ )
(?: \s=20
(?<second> \g<first> )
)?
/x.match( "11 12" )=20
#<MatchData "11 12" first:"12" second:"12">
$ md1[:first]
"12"
$ md1[:second]
"12"
$ md1 =3D /=20
(?: (?<first> \d+ )\s* )+
/x.match( "11 12" )
#<MatchData "11 12" first:"12">
$ md1[:first]
"12"
Iain
is capture a named group, but not just once, get all it's repetitions =
and then be able to see all those repetitions. I think they call them =
GroupCollections in C#. This is the kind of code I'm trying to emulate =
with Ruby(1.9.1):
using System;
using System.Text.RegularExpressions;
public class Test
{
public static void Main ()
{
// Define a regular expression for repeated words.
Regex rx =3D new Regex(@"\b(?<word>\w+)\s+(\k<word>)\b",
RegexOptions.Compiled | RegexOptions.IgnoreCase);
// Define a test string. =20
string text =3D "The the quick brown fox fox jumped over the =
lazy dog dog.";
// Find matches.
MatchCollection matches =3D rx.Matches(text);
// Report the number of matches found.
Console.WriteLine("{0} matches found in:\n {1}",=20
matches.Count,=20
text);
// Report on each match.
foreach (Match match in matches)
{
GroupCollection groups =3D match.Groups;
Console.WriteLine("'{0}' repeated at positions {1} and {2}", =
=20
groups["word"].Value,=20
groups[0].Index,=20
groups[1].Index);
}
}
=09
}
// The example produces the following output to the console:
// 3 matches found in:
// The the quick brown fox fox jumped over the lazy dog dog.
// 'The' repeated at positions 0 and 4
// 'fox' repeated at positions 20 and 25
// 'dog' repeated at positions 50 and 54
For example, if I had the string "11 12" I could have a regex like=20
/
(?<first> \d+ ) \s \g<first>
/x=20
that captured "11" and then the repetition "12" and put them in an =
array (or some kind of collection) referenced by the name.
I think my attempts to get this to work are better explanations. What I =
want is the result
#<MatchData "11 12" first:["11", "12"]> or something like it. At the =
moment all my attempts end with the named capture only keeping the last =
match it made i.e. 12 with no mention of 11.
I know I could do this a different way, perhaps with split or something, =
but I'd like to know if it's possible with just regex. I understand the =
Oniguruma engine is used now but I can't find any good docs for it.
These are my attempts, $ is my prompt.
$ md1 =3D /
(?<first> \d+ )
\s \g<first>
/x.match( "11 12" )=20
#<MatchData "11 12" first:"12">
$ md1[:first]
"12"
$ md1 =3D /
(?<first> \d+ )
(?: \s \g<first> )?
/x.match( "11 12" )=20
#<MatchData "11 12" first:"12">
$ md1[:first]
"12"
$ md1 =3D /
(?<first> \d+ )
(?: \s=20
(?<second> \g<first> )
)?
/x.match( "11 12" )=20
#<MatchData "11 12" first:"12" second:"12">
$ md1[:first]
"12"
$ md1[:second]
"12"
$ md1 =3D /=20
(?: (?<first> \d+ )\s* )+
/x.match( "11 12" )
#<MatchData "11 12" first:"12">
$ md1[:first]
"12"
Iain