regexpr question is w2 taken

Richard Bell · Apr 15, 2004

I'm a bit new to perl and am trying to emulate the behavior of a free
text search engine that has a feature

is w2 taken

taken to mean the word 'is' within 2 words of the word 'taken' where
the distance (2) and the words ('is', 'taken') are arbitrary.

I've a variable that looks like this

'one two three four and so on words seperated by spaces that goes on
and on and on and on for a very long way'

that I'm tring to process.

I'm having a problem finding a regular expression that handles this
case. Something like

"\bis\b(what goes here){0,2}\btaken\b"

Can someone point me in the right direction?

I assume that $pos will point to the last character matched. Is this
correct? How can I know the index of the first character matched? Can
I know what '(what goes here)' matched? How? As part of this
process, I'm trying to track what characters in the string were
matched by a number of regular expressions by getting $pos and keeping
a bit map of the characters matched.

Thanks.

Richard

Paul Lalli · Apr 15, 2004

I'm a bit new to perl and am trying to emulate the behavior of a free
text search engine that has a feature

is w2 taken

taken to mean the word 'is' within 2 words of the word 'taken' where
the distance (2) and the words ('is', 'taken') are arbitrary.

I've a variable that looks like this

'one two three four and so on words seperated by spaces that goes on
and on and on and on for a very long way'

that I'm tring to process.

I'm having a problem finding a regular expression that handles this
case. Something like

"\bis\b(what goes here){0,2}\btaken\b"

Can someone point me in the right direction?

I assume that $pos will point to the last character matched. Is this
correct? How can I know the index of the first character matched? Can
I know what '(what goes here)' matched? How? As part of this
process, I'm trying to track what characters in the string were
matched by a number of regular expressions by getting $pos and keeping
a bit map of the characters matched.

[untested]

if ($string =~ /\b$first\s+(\w+\s+){0,2}$second\b/){
print "Found $first within two words of $second\n";
print "Separated by '$1'\n";
}

This is assuming, of course, that Perl's definition of 'word' is
acceptable to you. If not, you might want to replace the \w+\s+ above
with something like

[a-zA-Z[

unct:]]+\s+

or, to just say "0, 1, or 2 of any sequences of non-whitespace followed by
whitespace:

\S+\s+

Hope this helps
Paul Lalli

Anno Siegel · Apr 16, 2004

Richard Bell said:
I'm a bit new to perl and am trying to emulate the behavior of a free
text search engine that has a feature

is w2 taken

taken to mean the word 'is' within 2 words of the word 'taken' where
the distance (2) and the words ('is', 'taken') are arbitrary.

I've a variable that looks like this

'one two three four and so on words seperated by spaces that goes on ^^^^^^^^^
"separated"

and on and on and on for a very long way'

that I'm tring to process.

I'm having a problem finding a regular expression that handles this
case. Something like

"\bis\b(what goes here){0,2}\btaken\b"

Can someone point me in the right direction?

An approach:

my ( $first, $last, $n) = ( 'words', 'spaces', 2);

my $any_word = qr/\s*\b\S+/;
print "$1\n" if /($first${any_word}{0,$n}\s*\b$last)/;

There are at least two non-trivial problems left. One is the simplistic
definition of "word" as a maximal sequence of non-spaces. A better
definition of $any_word would be needed. Another is that texts come
in lines, but you will want to match across line boundaries. Slurping
the whole text ?????????????????????????????????

I assume that $pos will point to the last character matched. Is this
correct?

$pos? If you mean the pos() function, it is not correct. perldoc -f pos.

How can I know the index of the first character matched? Can
I know what '(what goes here)' matched? How? As part of this
process,

You ned to read up on regular expressions. These are very elementary
questions. Look for capturing parentheses in perlre and for the
arrays @+ and @- in perlvar.

I'm trying to track what characters in the string were
matched by a number of regular expressions by getting $pos and keeping
a bit map of the characters matched.

A bit map of the characters matched? I'm not sure what you mean, but
you may want vec() and ord(). Watch out for unicode.

Anno

Anno Siegel · Apr 16, 2004

Richard Bell said:
I'm a bit new to perl and am trying to emulate the behavior of a free
text search engine that has a feature

is w2 taken

taken to mean the word 'is' within 2 words of the word 'taken' where
the distance (2) and the words ('is', 'taken') are arbitrary.

I've a variable that looks like this

'one two three four and so on words seperated by spaces that goes on ^^^^^^^^^
"separated"

and on and on and on for a very long way'

that I'm tring to process.

I'm having a problem finding a regular expression that handles this
case. Something like

"\bis\b(what goes here){0,2}\btaken\b"

Can someone point me in the right direction?

An approach:

my ( $first, $last, $n) = ( 'words', 'spaces', 2);

my $any_word = qr/\s*\b\S+/;
print "$1\n" if /($first${any_word}{0,$n}\s*\b$last)/;

I assume that $pos will point to the last character matched. Is this
correct?

$pos? If you mean the pos() function, it is not correct. perldoc -f pos.

How can I know the index of the first character matched? Can
I know what '(what goes here)' matched? How? As part of this
process,

You ned to read up on regular expressions. These are very elementary
questions. Look for capturing parentheses in perlre and for the
arrays @+ and @- in perlvar.

I'm trying to track what characters in the string were
matched by a number of regular expressions by getting $pos and keeping
a bit map of the characters matched.

A bit map of the characters matched? I'm not sure what you mean, but
you may want vec() and ord(). Watch out for unicode.

Anno

Add the value taken from the user to my range for ActiveX ComboBox	0	Aug 27, 2022
What is `transaction`event	0	Apr 9, 2022
Pythen question	0	Aug 14, 2022
Using the with clause, update the statement	1	Apr 26, 2023
Problem with displaying character that code number is 219 (after SetConsoleTextAttribute)?	3	Jan 9, 2023
<Button ...> display is fine, except for two things	1	Oct 23, 2023
IIS logs, time taken, substatus and ASP.Net	7	Jul 1, 2010
Please help me with this question.	5	Mar 26, 2022

regexpr question is w2 taken

Richard Bell

Paul Lalli

Anno Siegel

Anno Siegel

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads