W
W. Citoan
I need to reformat portions of text. The patterns I am matching and
their replacements are as follows:
word1 [note:word1] --> [word1]
word1 word2 [note:word1_word2] --> [word1 word2]
word1 word2 word3 [note:word1_word2_word3] --> [word1 word2 word3]
This pattern continues (word4, word5, ... wordN) with an indeterminate
maximum number of words. I know how to replace each individual case.
For example:
my @data = (
"word1 [note:word1]",
"word1 word2 [note:word1_word2]",
"word1 word2 word3 [note:word1_word2_word3]",
);
for (@data) {
print "$_\n";
s/(\w+)\s\[note\1)\]/[$1]/g;
s/(\w+)\s(\w+)\s\[note\1)_(\2)\]/[$1 $2]/g;
s/(\w+)\s(\w+)\s(\w+)\s\[note\1)_(\2)_(\3)\]/[$1 $2 $3]/g;
print "$_\n\n";
}
produces:
word1 [note:word1]
[word1]
word1 word2 [note:word1_word2]
[word1 word2]
word1 word2 word3 [note:word1_word2_word3]
[word1 word2 word3]
I can obviously keep adding additional substitution lines until I have
up to the wordN case. I would have to guess at N and keep adding if I
find larger cases.
However, I was wondering if there was anyway to simplify this so that I
can use a single (or smaller set of) substitution. If it wasn't for the
underscores, I could use a larger grouping (example:
s/((\w+\s)*\w+)\s\[note\1)\]/[$1]/g;
), but I don't see how to do it with the underscores. I cannot simply
strip the underscores out prior to doing the substitution as any cases
without the repeating words need to be left untouched.
Any ideas? Am I missing something obvious?
Thanks,
- W. Citoan
their replacements are as follows:
word1 [note:word1] --> [word1]
word1 word2 [note:word1_word2] --> [word1 word2]
word1 word2 word3 [note:word1_word2_word3] --> [word1 word2 word3]
This pattern continues (word4, word5, ... wordN) with an indeterminate
maximum number of words. I know how to replace each individual case.
For example:
my @data = (
"word1 [note:word1]",
"word1 word2 [note:word1_word2]",
"word1 word2 word3 [note:word1_word2_word3]",
);
for (@data) {
print "$_\n";
s/(\w+)\s\[note\1)\]/[$1]/g;
s/(\w+)\s(\w+)\s\[note\1)_(\2)\]/[$1 $2]/g;
s/(\w+)\s(\w+)\s(\w+)\s\[note\1)_(\2)_(\3)\]/[$1 $2 $3]/g;
print "$_\n\n";
}
produces:
word1 [note:word1]
[word1]
word1 word2 [note:word1_word2]
[word1 word2]
word1 word2 word3 [note:word1_word2_word3]
[word1 word2 word3]
I can obviously keep adding additional substitution lines until I have
up to the wordN case. I would have to guess at N and keep adding if I
find larger cases.
However, I was wondering if there was anyway to simplify this so that I
can use a single (or smaller set of) substitution. If it wasn't for the
underscores, I could use a larger grouping (example:
s/((\w+\s)*\w+)\s\[note\1)\]/[$1]/g;
), but I don't see how to do it with the underscores. I cannot simply
strip the underscores out prior to doing the substitution as any cases
without the repeating words need to be left untouched.
Any ideas? Am I missing something obvious?
Thanks,
- W. Citoan