regex: match at least one of two expression

V

Vincent Mouton

Hi,

I am trying to build a regex expression that holds two expressions, and
I'd like the main expression to match if at least one of those two
sub-expressions match, or if they both match. I am using this to parse
URL's.

Example URL's:
/brand/hpe/category/wi (catch this)
/brand/hpe (catch this)
/category/wi/brand/hpe (catch this)
/category/wi (catch this)
/otherpage (do not catch this)

the only thing i came up with is this huge thing:

(\/brand\/(\w)+)|(\/category\/(\w)+)|(\/category\/(\w)+\/brand\/(\w)+)|(\/brand\/(\w)+\/category\/(\w)+)

So a URL is built up of a brand section, a category section or both. Is
there a better way to capture those 4 possible URL's than the expression
I came up with?

thanks a lot,
Vincent
 
A

Anno Siegel

Vincent Mouton said:
Hi,

I am trying to build a regex expression that holds two expressions, and
I'd like the main expression to match if at least one of those two
sub-expressions match, or if they both match. I am using this to parse
URL's.

Example URL's:
/brand/hpe/category/wi (catch this)
/brand/hpe (catch this)
/category/wi/brand/hpe (catch this)
/category/wi (catch this)
/otherpage (do not catch this)

the only thing i came up with is this huge thing:

(\/brand\/(\w)+)|(\/category\/(\w)+)|(\/category\/(\w)+\/brand\/(\w)+)|(\/brand\/(\w)+\/category\/(\w)+)

You don't have to escape "/" when you use alternative delimiters for
the regex.
So a URL is built up of a brand section, a category section or both. Is
there a better way to capture those 4 possible URL's than the expression
I came up with?

It really depends what you want out of the match. I see capturing
parentheses, but it isn't clear how you are going to use the captures.

I'd break the regex down to its elementary parts and combine matches
in an if/else structure. That way, each regex is simple, and you
get to deal with the four different cases in four different code
branches.

if ( m{/brand/\w+}g ) {
if ( m{/category}g ) {
print "$_: brand & category\n";
} else {
print "$_: brand only\n";
}
} elsif ( m{/category/\w+}g ) {
if ( m{/brand} ) {
print "$_: category and brand\n";
} else {
print "$_: category only\n";
}
}

Note how //g is used to let each secondary match start where the
first one has left off.

Anno
 
B

Brian McCauley

Vincent said:
I am trying to build a regex expression that holds two expressions, and
I'd like the main expression to match if at least one of those two
sub-expressions match, or if they both match.
/This|That/

Example URL's:
/brand/hpe/category/wi (catch this)
/brand/hpe (catch this)
/category/wi/brand/hpe (catch this)
/category/wi (catch this)
/otherpage (do not catch this)

the only thing i came up with is this huge thing:

(\/brand\/(\w)+)|(\/category\/(\w)+)|(\/category\/(\w)+\/brand\/(\w)+)|(\/brand\/(\w)+\/category\/(\w)+)

I suspect there's more to what you are wanting than you originally told
us. I suspect when you say 'catch' you mean 'capture'.
So a URL is built up of a brand section, a category section or both. Is
there a better way to capture those 4 possible URL's than the expression
I came up with?

If you can assume that you don't mind if you also capture repeated brand
or category then it simplifes a lot.

/((\/(brand|category)\/\w+)+)/
 
B

Ben Morrow

Quoth Vincent Mouton said:
Hi,

I am trying to build a regex expression that holds two expressions, and
I'd like the main expression to match if at least one of those two
sub-expressions match, or if they both match. I am using this to parse
URL's.

Example URL's:
/brand/hpe/category/wi (catch this)
/brand/hpe (catch this)
/category/wi/brand/hpe (catch this)
/category/wi (catch this)
/otherpage (do not catch this)

the only thing i came up with is this huge thing:

(\/brand\/(\w)+)|(\/category\/(\w)+)| # regex wrapped
(\/category\/(\w)+\/brand\/(\w)+)|
(\/brand\/(\w)+\/category\/(\w)+)

Perl supports other delimiters so's you don't need all those \/s.
So a URL is built up of a brand section, a category section or both. Is
there a better way to capture those 4 possible URL's than the expression
I came up with?

I'd use Perl instead of regexen.

my ($brand, $category);

if (($brand) = $uri =~ m{/brand/(\w+)}g or
($category) = $uri =~ m{/category/(\w+)}g
){
# do stuff with $brand, $category
}

Ben
 
V

Vincent Mouton

Brian said:
I suspect there's more to what you are wanting than you originally told
us. I suspect when you say 'catch' you mean 'capture'.



If you can assume that you don't mind if you also capture repeated brand
or category then it simplifes a lot.

/((\/(brand|category)\/\w+)+)/

Hi,

thank You. It seems obvious, but i didn't think of it this way. And yes,
that does help me a great deal. It's much cleaner. I don't mind catching
repeated brand or category. I know the URL i will check using this regex
will always be one of the 5 examples i gave.

Thank you and thanks Anno for taking time to look into this.
vincent
 
A

A. Sinan Unur

Vincent Mouton
I am trying to build a regex expression that holds two expressions,
and I'd like the main expression to match if at least one of those two
sub-expressions match, or if they both match. I am using this to parse
URL's.

Example URL's:
/brand/hpe/category/wi (catch this)
/brand/hpe (catch this)
/category/wi/brand/hpe (catch this)
/category/wi (catch this)
/otherpage (do not catch this)

Do you want:

/otherpage/brand/hpe/category

as well?

Sinan
 
A

A. Sinan Unur

Vincent Mouton
...

No. Only the first 4 matches.

I am a little confused, so the following might be completely irrelevant. It
seems to me that one would like to know, upon a succesful match, whether
one matched a category or a brand or both and which one is which. In that
case, I would be inclined to write that out in full:

#! perl

use strict;
use warnings;

use Data::Dumper;

my @s = qw(
/brand/hpe/category/wi
/brand/hpe
/category/wi/brand/hpe
/category/wi
/otherpage
);

for my $s (@s) {
if(my $r = extract_brand_category($s)) {
print Dumper $r;
}
}

sub extract_brand_category {
my $s = shift;
my %r;

if($s =~ m{^/(brand)}) {
$r{brand} = $1;
if($s =~ m{/(category)/}) {
$r{category} = $1;
}
} elsif($s =~ m{^/(category)}) {
$r{category} = $1;
if($s =~ m{/(brand)/}) {
$r{brand} = $1;
}
}

return \%r if %r;
return;
}
__END__

I know it does not look as neat as a mind-blowing one-line regex, but I
leave that to experts.

Sinan.
 
G

Gerhard M

Vincent Mouton said:
Example URL's:
/brand/hpe/category/wi (catch this)
/brand/hpe (catch this)
/category/wi/brand/hpe (catch this)
/category/wi (catch this)
/otherpage (do not catch this)

what's up with
/brand/xx/category/yy/yy1
and
/brand/xx/category
and
/category/xx/

ignoring above remarks
this one will be a possible way:

my $p="brand|category";
if (m#^/($p)/(\w+)($|/($p)/(\w+)$)#) {
# may you wana check check $1 to $5
your_code($_);
}


gerhard
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,011
Latest member
AjaUqq1950

Latest Threads

Top