regexpr question is w2 taken

R

Richard Bell

I'm a bit new to perl and am trying to emulate the behavior of a free
text search engine that has a feature

is w2 taken

taken to mean the word 'is' within 2 words of the word 'taken' where
the distance (2) and the words ('is', 'taken') are arbitrary.

I've a variable that looks like this

'one two three four and so on words seperated by spaces that goes on
and on and on and on for a very long way'

that I'm tring to process.

I'm having a problem finding a regular expression that handles this
case. Something like

"\bis\b(what goes here){0,2}\btaken\b"

Can someone point me in the right direction?

I assume that $pos will point to the last character matched. Is this
correct? How can I know the index of the first character matched? Can
I know what '(what goes here)' matched? How? As part of this
process, I'm trying to track what characters in the string were
matched by a number of regular expressions by getting $pos and keeping
a bit map of the characters matched.

Thanks.

Richard
 
P

Paul Lalli

I'm a bit new to perl and am trying to emulate the behavior of a free
text search engine that has a feature

is w2 taken

taken to mean the word 'is' within 2 words of the word 'taken' where
the distance (2) and the words ('is', 'taken') are arbitrary.

I've a variable that looks like this

'one two three four and so on words seperated by spaces that goes on
and on and on and on for a very long way'

that I'm tring to process.

I'm having a problem finding a regular expression that handles this
case. Something like

"\bis\b(what goes here){0,2}\btaken\b"

Can someone point me in the right direction?

I assume that $pos will point to the last character matched. Is this
correct? How can I know the index of the first character matched? Can
I know what '(what goes here)' matched? How? As part of this
process, I'm trying to track what characters in the string were
matched by a number of regular expressions by getting $pos and keeping
a bit map of the characters matched.


[untested]

if ($string =~ /\b$first\s+(\w+\s+){0,2}$second\b/){
print "Found $first within two words of $second\n";
print "Separated by '$1'\n";
}

This is assuming, of course, that Perl's definition of 'word' is
acceptable to you. If not, you might want to replace the \w+\s+ above
with something like

[a-zA-Z[:punct:]]+\s+

or, to just say "0, 1, or 2 of any sequences of non-whitespace followed by
whitespace:

\S+\s+


Hope this helps
Paul Lalli
 
A

Anno Siegel

Richard Bell said:
I'm a bit new to perl and am trying to emulate the behavior of a free
text search engine that has a feature

is w2 taken

taken to mean the word 'is' within 2 words of the word 'taken' where
the distance (2) and the words ('is', 'taken') are arbitrary.

I've a variable that looks like this

'one two three four and so on words seperated by spaces that goes on ^^^^^^^^^
"separated"

and on and on and on for a very long way'

that I'm tring to process.

I'm having a problem finding a regular expression that handles this
case. Something like

"\bis\b(what goes here){0,2}\btaken\b"

Can someone point me in the right direction?

An approach:

my ( $first, $last, $n) = ( 'words', 'spaces', 2);

my $any_word = qr/\s*\b\S+/;
print "$1\n" if /($first${any_word}{0,$n}\s*\b$last)/;

There are at least two non-trivial problems left. One is the simplistic
definition of "word" as a maximal sequence of non-spaces. A better
definition of $any_word would be needed. Another is that texts come
in lines, but you will want to match across line boundaries. Slurping
the whole text ?????????????????????????????????

I assume that $pos will point to the last character matched. Is this
correct?

$pos? If you mean the pos() function, it is not correct. perldoc -f pos.
How can I know the index of the first character matched? Can
I know what '(what goes here)' matched? How? As part of this
process,

You ned to read up on regular expressions. These are very elementary
questions. Look for capturing parentheses in perlre and for the
arrays @+ and @- in perlvar.
I'm trying to track what characters in the string were
matched by a number of regular expressions by getting $pos and keeping
a bit map of the characters matched.

A bit map of the characters matched? I'm not sure what you mean, but
you may want vec() and ord(). Watch out for unicode.

Anno
 
A

Anno Siegel

Richard Bell said:
I'm a bit new to perl and am trying to emulate the behavior of a free
text search engine that has a feature

is w2 taken

taken to mean the word 'is' within 2 words of the word 'taken' where
the distance (2) and the words ('is', 'taken') are arbitrary.

I've a variable that looks like this

'one two three four and so on words seperated by spaces that goes on ^^^^^^^^^
"separated"

and on and on and on for a very long way'

that I'm tring to process.

I'm having a problem finding a regular expression that handles this
case. Something like

"\bis\b(what goes here){0,2}\btaken\b"

Can someone point me in the right direction?

An approach:

my ( $first, $last, $n) = ( 'words', 'spaces', 2);

my $any_word = qr/\s*\b\S+/;
print "$1\n" if /($first${any_word}{0,$n}\s*\b$last)/;
I assume that $pos will point to the last character matched. Is this
correct?

$pos? If you mean the pos() function, it is not correct. perldoc -f pos.
How can I know the index of the first character matched? Can
I know what '(what goes here)' matched? How? As part of this
process,

You ned to read up on regular expressions. These are very elementary
questions. Look for capturing parentheses in perlre and for the
arrays @+ and @- in perlvar.
I'm trying to track what characters in the string were
matched by a number of regular expressions by getting $pos and keeping
a bit map of the characters matched.

A bit map of the characters matched? I'm not sure what you mean, but
you may want vec() and ord(). Watch out for unicode.

Anno
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top