help with regular expression

D

doug

I am making a page to do a search.

The syntax is, you may group words with double quotes.

You may also add a + to signify a word must be present, or a - to
signify it shouldn't be present. Each "phrase" is separated by spaces
or commas.

So for example, all of these would be valid and can be combined by
separating them with spaces:

+"red blue"
+orange
purple
-green
"yellow red"

Is there a way to split this or match this using a regular expression?
 
S

sln

I am making a page to do a search.

The syntax is, you may group words with double quotes.

You may also add a + to signify a word must be present, or a - to
signify it shouldn't be present. Each "phrase" is separated by spaces
or commas.
That is a search criteria, you are using Perl meta characters to define your
needs, and they don't match.
Its all good though. If, for example, you define your own parser to process
the commands that are your private criteria. For example, if passed as a
string to a function that would parse it and apply a regex to another
string (file).
So for example, all of these would be valid and can be combined by
separating them with spaces:

+"red blue"
+orange
purple
-green
"yellow red"

Is there a way to split this or match this using a regular expression?

In essence, it would have to be parsed, based on rules, and a regex constructed
then applied to the object. There are a few ways to do this. Alot of or's.
By and large, the more content parsed will result in longer processing time.

It can be done in a single regex or within a loop, processing each individual.
The difference is, if you have to process individuals, you will still need
a case test on a single regex.

If however, you just want to qualify the object, a single regex for the combined
individuals is fine.

The advantage of constucting regex based on rules is that you can easily add
more control characters dynamically.

sln
 
T

Tad J McClellan

doug said:
I am making a page to do a search.

The syntax is, you may group words with double quotes.


What is your definition of a "word"?

I will assume \w+ is a "word".

You may also add a + to signify a word must be present, or a - to
signify it shouldn't be present. Each "phrase" is separated by spaces
or commas.

So for example, all of these would be valid and can be combined by
separating them with spaces:

+"red blue"
+orange
purple
-green
"yellow red"

Is there a way to split this or match this using a regular expression?


----------------------------------
#!/usr/bin/perl
use warnings;
use strict;

$_ = '+"red blue" +orange purple -green "yellow red"';

my @phrases = /([+-]?(?:"[^"]*"|\w+))/g; # written more prettily below

print "$_\n" for @phrases;
----------------------------------



Also works even if they are NOT separated by spaces:

----------------------------------
#!/usr/bin/perl
use warnings;
use strict;

$_ = '+"red blue"+orange purple-green"yellow red"';

my @phrases = /([+-]? # optional sign
(?: # groups together either...
"[^"]*" # ... a quoted string
| # or
\w+ # a word
)
)/gx; # eXtended regular expressions are wonderful

print "$_\n" for @phrases;
 
T

Ted Zlatanov

d> The syntax is, you may group words with double quotes.

d> You may also add a + to signify a word must be present, or a - to
d> signify it shouldn't be present. Each "phrase" is separated by spaces
d> or commas.

d> So for example, all of these would be valid and can be combined by
d> separating them with spaces:

d> +"red blue"
d> +orange
d> purple
d> -green
d> "yellow red"

d> Is there a way to split this or match this using a regular expression?

Yes, but it will be nasty. You're better off parsing the text in a
stateful way, with knowledge of "I'm inside/outside quotes", "this is a
required word," etc. flags. You'll end up with some kind of parse tree
like this

use constant MUST => 1;
use constant MUST_NOT => 2;

SEARCH ->
[
{ term => 'red blue', required => MUST() },
{ term => 'orange', required => MUST() },
{ term => 'green', required => MUST_NOT() },
{ term => 'purple' },
{ term => 'yellow red' },
]

If Parse::RecDescent is fast enough for your needs, you may want to
consider it. The way it structures grammars is (IMO) intuitive and Perl
6 will use a very similar system.

If not, you'll have to write some logic to find terms and set the flags.

Once you have a parse tree, you can construct a DB query or whatever
backend query will handle your search.

Ted
 
G

Gordon Corbin Etly

What is your definition of a "word"?

I will assume \w+ is a "word".

Is it not safer to assume something like \b\w+(?:'\w+)\b as a word? This
will match "don't" or "joe's", for example, as whole words.
 
J

John W. Krahn

Gordon said:
Is it not safer to assume something like \b\w+(?:'\w+)\b as a word? This
will match "don't" or "joe's", for example, as whole words.

\b\w+(?:'\w+)\b

Can be more simply written as:

\w+'\w+

Perhaps you meant:

\w+(?:'\w+)?



John
 
D

Dave B

Gordon said:
Is it not safer to assume something like \b\w+(?:'\w+)\b as a word? This
will match "don't" or "joe's", for example, as whole words.

As I understand it,

\b\w+\b

and

\w+

mean exactly the same thing. So what you mean is really

\w+(?:'\w+)

or, more likely,

\w+(?:'\w+)?

which might or might not appropriate.
 
G

Gordon Corbin Etly

\b\w+(?:'\w+)\b

Can be more simply written as:

\w+'\w+

Actually I meant to type:

\b\w+(?:'\w+)?\b

But you are write, the \b's aren't needed here. They would be needed,
however, if you are matching something addition to just word characters,
so it's not a bad idea to have them in there if there is any chance what
is to be matched may be altered.

Perhaps you meant:

\w+(?:'\w+)?

Yes this is exactly what I mean to write (I forgot the conditional `?`
that you added.) Thanks for catching that.
 
D

Dave B

Gordon said:
Actually I meant to type:

\b\w+(?:'\w+)?\b

But you are write, the \b's aren't needed here. They would be needed,
however, if you are matching something addition to just word characters,

For example?
 
J

John W. Krahn

Gordon Corbin Etly wrote:

And to help you with your english, as well as your perl:
But you are write, the \b's aren't needed here. They would be needed,
however, if you are matching something addition to just word characters,
so it's not a bad idea to have them in there if there is any chance what
is to be matched may be altered.

That should be:

But you are right, the \b's aren't needed here. They would be needed,
however, if you are matching something in addition to just word
characters, so it's not a bad idea to have them in there if there is any
chance that what is to be matched may be altered.
Yes this is exactly what I mean to write (I forgot the conditional `?`
that you added.) Thanks for catching that.

And again:

Yes this is exactly what I meant to write (I forgot the conditional `?`
that you added.) Thanks for catching that.



John
 
G

Gordon Corbin Etly

John said:
Gordon Corbin Etly wrote:
And to help you with your english, as well as your perl:

Not really, I just wrote it in haste. I think the meaning can still be
deciphered easily enough, though I generally write more proficiently. We
all have our occasional lapses, especially as the mercury rises.
....

That should be:

But you are right,
....

Yes, that is correct. My apologies.

....

And again:

Yes this is exactly what I meant to write
....

While I realize this wasn't the best display of typing skill, I hardly
think it was necessary to correct two small mistakes; many people make
the same sort of typos. Thank anyways, though.
 
J

John W. Krahn

Gordon said:
While I realize this wasn't the best display of typing skill, I hardly
think it was necessary to correct two small mistakes;

Then I guess you missed the other two corrections?


John
 
G

Gordon Corbin Etly

John said:
Gordon Corbin Etly wrote:
Then I guess you missed the other two corrections?

I see them now after reading it again. It was nice you, but it really
wasn't necessary, as they were just typing mistakes, (as opposed to
genuinely having an insufficient understanding of the language)
resulting from composing a post in haste.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,074
Latest member
StanleyFra

Latest Threads

Top