P
pete
I have input strings where some words start with an underscore. The plan
is to remove all words that do NOT strt with an underscore and simply
keep the rest. So for example starting with
"word1 word2 _word3 word4 word5 _word6 _word7 word8"
I'm trying to end up with
"_word3 _word6 _word7"
The expression I have got so far is s/.*?(_[a-z0-9]+).*?/ $1/gi;
and my understanding is as follows:
The first ".*?" part removes everything up to the first matching RE
The "(_[a-z0-9]+)" matches any letter/number combination that starts
with an underscore [sidenote: yes, I know: \w+]
The final ".*?" removes everything up to the next match, or up to
the end of the string.
Here's how I have the RE in a program
$_=(<>);
s/.*?(_[a-z0-9]+).*?/ $1/gi;
print "Have: $_";
and here's how I run it:
echo "word1 word2 _word3 word4 word5 _word6 _word7 word8" | perl s.pl
and here's the output I get:
Have: _word3 _word6 _word7 word8
Question: Why didn't "word8" get eaten like all its precedessors? and
what do I have to do to match it for removal.
If you have time, I'm looking for enlightenment more than solutions. I
am obviously missing something crucial, but all the online tutorials
I've found stop short of explaining this sort of thing.
is to remove all words that do NOT strt with an underscore and simply
keep the rest. So for example starting with
"word1 word2 _word3 word4 word5 _word6 _word7 word8"
I'm trying to end up with
"_word3 _word6 _word7"
The expression I have got so far is s/.*?(_[a-z0-9]+).*?/ $1/gi;
and my understanding is as follows:
The first ".*?" part removes everything up to the first matching RE
The "(_[a-z0-9]+)" matches any letter/number combination that starts
with an underscore [sidenote: yes, I know: \w+]
The final ".*?" removes everything up to the next match, or up to
the end of the string.
Here's how I have the RE in a program
$_=(<>);
s/.*?(_[a-z0-9]+).*?/ $1/gi;
print "Have: $_";
and here's how I run it:
echo "word1 word2 _word3 word4 word5 _word6 _word7 word8" | perl s.pl
and here's the output I get:
Have: _word3 _word6 _word7 word8
Question: Why didn't "word8" get eaten like all its precedessors? and
what do I have to do to match it for removal.
If you have time, I'm looking for enlightenment more than solutions. I
am obviously missing something crucial, but all the online tutorials
I've found stop short of explaining this sort of thing.