[regex] grep for chars in any order

Discussion in 'Perl Misc' started by viki, Jun 18, 2008.

  1. viki

    viki Guest

    How can I build regex that matches all characters of the string $STR
    in any order with .* added between any two characters: ?
    And without generating all N! transpositions (where N is length of
    $STR) ?
    Example.
    For $STR "abc", I want to match equivalent to:
    /(a.*b.*c)|(a.*c.*b)|(b.*a.*c)|(b.*c.*a)|(c.*a.*b)|(c.*b.*a)/

    Generating all transpositions is not feasible for larger legths of
    $STR.
    /[abc].*[abc].*[abc]/ is easy and fast but gives false positives.
    What is good solution ?

    Thanks
    vkm
    viki, Jun 18, 2008
    #1
    1. Advertising

  2. viki

    Ben Bullock Guest

    On Wed, 18 Jun 2008 00:06:12 -0700, viki wrote:

    > How can I build regex that matches all characters of the string $STR
    > in any order with .* added between any two characters: ?
    > And without generating all N! transpositions (where N is length of
    > $STR) ?
    > Example.
    > For $STR "abc", I want to match equivalent to:
    > /(a.*b.*c)|(a.*c.*b)|(b.*a.*c)|(b.*c.*a)|(c.*a.*b)|(c.*b.*a)/
    >
    > Generating all transpositions is not feasible for larger legths of
    > $STR.
    > /[abc].*[abc].*[abc]/ is easy and fast but gives false positives.
    > What is good solution ?


    I don't think a single regular expression can do that, because there is
    some logic involved which doesn't fit the regular expression mentality -
    you have to work out what the first character matched was, then change
    the second character to match depending on that, and so on.

    I would use the easy and fast method and then remove the false positives
    by checking that the matched string contained all n characters, perhaps
    using s or tr n times and dropping out of the loop if one of my
    substitutions failed.

    The following seems to work, although I haven't tested it extensively:

    #!/usr/local/bin/perl
    use warnings;
    use strict;

    sub matches
    {
    my ($s, $STR) = @_;
    my %chars = map {$_ => 1} split ('', $STR);
    my @chars = sort keys %chars;
    my $anychar = join '', @chars;
    my $matchany = join '.*',map "[$anychar]", @chars; # there's a better
    way
    if ($s =~ /$matchany/) {
    my $copy = $s;
    for my $c (@chars) {
    return unless $copy =~ s/$c//g;
    }
    return 1;
    }
    return;
    }

    print "OK\n" if (matches('naninuneno','aeiou'));
    print "OK\n" if (matches('naninunene','aeiou'));
    Ben Bullock, Jun 18, 2008
    #2
    1. Advertising

  3. viki

    Ben Bullock Guest

    On Wed, 18 Jun 2008 11:04:21 +0000, Glenn Jackman wrote:

    > print 'matched' if $STR =~ /a(?=.*b)(?=.*c)|b(?=.*a)(?=.*c)|c(?=.*a)

    (?=.*b)/

    Great! That is better than the solution I posted. But I have an
    improvement:

    /(?=.*a)(?=.*b)(?=.*c)/

    without any actual matching string also works, reducing the regex length
    from O(n^2) to O(n), where n is the number of characters.

    > So you don't have to create all the combinations but you do need all the
    > permutations (if I have my terminology correct)


    You mean that you need all the combinations of initial characters, but
    not all the permutations (which would be O(n!)).
    Ben Bullock, Jun 18, 2008
    #3
  4. Re: grep for chars in any order

    On Jun 18, 3:06 am, viki <> wrote:
    > How can I build regex that matches all characters of the string $STR
    > in any order with  .* added between any two characters: ?
    > And without generating all N! transpositions (where N is length of
    > $STR) ?
    > Example.
    > For $STR "abc", I want to match equivalent to:
    > /(a.*b.*c)|(a.*c.*b)|(b.*a.*c)|(b.*c.*a)|(c.*a.*b)|(c.*b.*a)/
    >
    > Generating all transpositions is not feasible for larger legths of
    > $STR.
    > /[abc].*[abc].*[abc]/ is easy and fast but gives false positives.
    > What is good solution ?
    >


    use a math module to get permutations:
    http://search.cpan.org/~allenday/Math-Combinatorics-0.09/lib/Math/Combinatorics.pm

    then from those, build your regexes.
    nolo contendere, Jun 18, 2008
    #4
  5. viki

    Paul Lalli Guest

    Re: grep for chars in any order

    On Jun 18, 3:06 am, viki <> wrote:
    > How can I build regex that matches all characters of the string $STR
    > in any order with  .* added between any two characters: ?
    > And without generating all N! transpositions (where N is length of
    > $STR) ?
    > Example.
    > For $STR "abc", I want to match equivalent to:
    > /(a.*b.*c)|(a.*c.*b)|(b.*a.*c)|(b.*c.*a)|(c.*a.*b)|(c.*b.*a)/
    >
    > Generating all transpositions is not feasible for larger legths of
    > $STR.
    > /[abc].*[abc].*[abc]/ is easy and fast but gives false positives.
    > What is good solution ?


    #!/usr/bin/perl
    use strict;
    use warnings;
    use List::MoreUtils qw/all/;

    $STR = "whatever";

    if (all { $STR =~ /$_/ } qw/a b c/) {
    print "Matched all of a, b, c\n";
    }

    __END__

    Paul Lalli
    Paul Lalli, Jun 18, 2008
    #5
  6. Re: grep for chars in any order

    nolo contendere <> wrote:
    >On Jun 18, 3:06 am, viki <> wrote:
    >> How can I build regex that matches all characters of the string $STR
    >> in any order with  .* added between any two characters:


    Maybe I am missing something, but isn't that the same as the text begins
    and ends with a character from $str and all the other characters of $str
    are included somewhere in the text?
    It should be fairly easy to find an algorithm to check for that. You
    just need to be careful about how to handle duplicate characters in $STR
    and/or the text.

    jue
    Jürgen Exner, Jun 18, 2008
    #6
  7. Re: grep for chars in any order

    On Jun 18, 10:45 am, Jürgen Exner <> wrote:
    > nolo contendere <> wrote:
    > >On Jun 18, 3:06 am, viki <> wrote:
    > >> How can I build regex that matches all characters of the string $STR
    > >> in any order with  .* added between any two characters:

    >
    > Maybe I am missing something, but isn't that the same as the text begins
    > and ends with a character from $str and all the other characters of $str
    > are included somewhere in the text?
    > It should be fairly easy to find an algorithm to check for that. You
    > just need to be careful about how to handle duplicate characters in $STR
    > and/or the text.
    >


    Those are both great points. Perhaps the OP could further refine the
    requirements, or state the larger goal.
    nolo contendere, Jun 18, 2008
    #7
  8. Re: grep for chars in any order

    On Jun 18, 12:23 pm, Glenn Jackman <> wrote:
    > At 2008-06-18 03:06AM, "viki" wrote:
    >
    > >  How can I build regex that matches all characters of the string $STR
    > >  in any order with  .* added between any two characters: ?
    > >  And without generating all N! transpositions (where N is length of
    > >  $STR) ?
    > >  Example.
    > >  For $STR "abc", I want to match equivalent to:
    > >  /(a.*b.*c)|(a.*c.*b)|(b.*a.*c)|(b.*c.*a)|(c.*a.*b)|(c.*b.*a)/

    >
    > >  Generating all transpositions is not feasible for larger legths of
    > >  $STR.
    > >  /[abc].*[abc].*[abc]/ is easy and fast but gives false positives.
    > >  What is good solution ?

    >
    > If you're not stuck on creating one regular expression:
    >
    >     sub has_all_chars {
    >         my ($string, $chars) = @_;
    >         my $matched = 1;
    >         foreach my $char (split //, $chars) {
    >             if (index($string, $char) == -1) {
    >                 $matched = 0;
    >                 last;
    >             }
    >         }
    >         matched;


    what is this? ^^^

    >     }
    >     has_all_chars("foobar","rb"); # ==> 1
    >     has_all_chars("foobar","abc"); # ==> 0
    >


    This is pretty much what Paul suggested--this amounts to about the
    same thing as List::MoreUtils's all() function.

    sub all (&@) {
    my $f = shift;
    return if ! @_;
    for (@_) {
    return 0 if ! $f->();
    }
    return 1;
    }
    nolo contendere, Jun 18, 2008
    #8
  9. viki

    smallpond Guest

    Re: grep for chars in any order

    On Jun 18, 12:23 pm, Glenn Jackman <> wrote:
    > At 2008-06-18 03:06AM, "viki" wrote:
    >
    > > How can I build regex that matches all characters of the string $STR
    > > in any order with .* added between any two characters: ?
    > > And without generating all N! transpositions (where N is length of
    > > $STR) ?
    > > Example.
    > > For $STR "abc", I want to match equivalent to:
    > > /(a.*b.*c)|(a.*c.*b)|(b.*a.*c)|(b.*c.*a)|(c.*a.*b)|(c.*b.*a)/

    >
    > > Generating all transpositions is not feasible for larger legths of
    > > $STR.
    > > /[abc].*[abc].*[abc]/ is easy and fast but gives false positives.
    > > What is good solution ?

    >
    > If you're not stuck on creating one regular expression:
    >
    > sub has_all_chars {
    > my ($string, $chars) = @_;
    > my $matched = 1;
    > foreach my $char (split //, $chars) {
    > if (index($string, $char) == -1) {
    > $matched = 0;
    > last;
    > }
    > }
    > matched;
    > }
    > has_all_chars("foobar","rb"); # ==> 1
    > has_all_chars("foobar","abc"); # ==> 0
    >
    > --
    > Glenn Jackman
    > Write a wise saying and your name will live forever. -- Anonymous



    This is a really clever solution. The only thing I would
    do differently is to use chop instead of split. Why create
    a list unless you need the list?

    while (my $char = chop $chars) {

    --S
    smallpond, Jun 18, 2008
    #9
  10. "viki" <> wrote in message
    news:...
    > How can I build regex that matches all characters of the string $STR
    > in any order with .* added between any two characters: ?
    > And without generating all N! transpositions (where N is length of
    > $STR) ?
    > Example.
    > For $STR "abc", I want to match equivalent to:
    > /(a.*b.*c)|(a.*c.*b)|(b.*a.*c)|(b.*c.*a)|(c.*a.*b)|(c.*b.*a)/
    >
    > Generating all transpositions is not feasible for larger legths of
    > $STR.
    > /[abc].*[abc].*[abc]/ is easy and fast but gives false positives.
    > What is good solution ?
    >
    > Thanks
    > vkm


    The way I see the solution, you can have any of the $STR characters,
    followed by .*, followed by another of any of the $STR characters:

    /[$STR].*[$STR]/

    Or am I missing something?

    Mario
    Mario D'Alessio, Jun 18, 2008
    #10
  11. Ignore my post. I realize my mistake. I missed the
    part about the regex matching ALL of the characters.

    Mario

    "Mario D'Alessio" <> wrote in message
    news:g3bk6a$i20$...
    >
    > "viki" <> wrote in message
    > news:...
    >> How can I build regex that matches all characters of the string $STR
    >> in any order with .* added between any two characters: ?
    >> And without generating all N! transpositions (where N is length of
    >> $STR) ?
    >> Example.
    >> For $STR "abc", I want to match equivalent to:
    >> /(a.*b.*c)|(a.*c.*b)|(b.*a.*c)|(b.*c.*a)|(c.*a.*b)|(c.*b.*a)/
    >>
    >> Generating all transpositions is not feasible for larger legths of
    >> $STR.
    >> /[abc].*[abc].*[abc]/ is easy and fast but gives false positives.
    >> What is good solution ?
    >>
    >> Thanks
    >> vkm

    >
    > The way I see the solution, you can have any of the $STR characters,
    > followed by .*, followed by another of any of the $STR characters:
    >
    > /[$STR].*[$STR]/
    >
    > Or am I missing something?
    >
    > Mario
    >
    >
    Mario D'Alessio, Jun 18, 2008
    #11
  12. viki

    Bart Lateur Guest

    Glenn Jackman wrote:

    >At 2008-06-18 09:01AM, "Ben Bullock" wrote:
    >> Great! That is better than the solution I posted. But I have an
    >> improvement:
    >>
    >> /(?=.*a)(?=.*b)(?=.*c)/

    >
    >Nice. I wonder (without bothering to benchmark it) if anchoring the
    >expression would be an optimization:
    >
    > /^(?=.*a)(?=.*b)(?=.*c)/


    It would, in case it doesn't match. The latter will only try a match at
    the start of the string, the former will try again at every character
    position, which is is dead stupid, of course.

    Be aware of the possibility of the string containing newlines.

    /^(?=.*a)(?=.*b)(?=.*c)/s


    --
    Bart.
    Bart Lateur, Jun 18, 2008
    #12
  13. viki

    Guest

    Re: grep for chars in any order

    On Jun 18, 1:06 am, viki <> wrote:
    > How can I build regex that matches all characters of the string $STR
    > in any order with .* added between any two characters: ?
    > And without generating all N! transpositions (where N is length of
    > $STR) ?
    > Example.
    > For $STR "abc", I want to match equivalent to:
    > /(a.*b.*c)|(a.*c.*b)|(b.*a.*c)|(b.*c.*a)|(c.*a.*b)|(c.*b.*a)/



    Dear Viki,

    If you don't mind using several regular expressions (one for each
    letter), you can easily write:

    /a/ and /b/ and /c/

    You can even put it in a Perl grep() statement (which I presume is
    what you intend to use it for) like this:

    my @firstList = ('cab', 'back', 'cat', 'crab', 'dog', 'baby');
    my @secondList = grep { /a/ and /b/ and /c/ } @firstList;

    In this way, @secondList would contain 'cab', 'back', and 'crab',
    but not 'baby' (which would have been a false positive in your
    previous example).

    Of course, this approach uses one regular expression for each
    letter that you're looking for (instead of just one last regular
    expression), but depending on how you're writing your code, that may
    be acceptable.

    I hope this helps, Viki.

    -- Jean-Luc
    , Jun 19, 2008
    #13
  14. viki wrote:
    > How can I build regex that matches all characters of the string $STR
    > in any order with .* added between any two characters: ?
    > And without generating all N! transpositions (where N is length of
    > $STR) ?
    > Example.
    > For $STR "abc", I want to match equivalent to:
    > /(a.*b.*c)|(a.*c.*b)|(b.*a.*c)|(b.*c.*a)|(c.*a.*b)|(c.*b.*a)/
    >
    > Generating all transpositions is not feasible for larger legths of
    > $STR.
    > /[abc].*[abc].*[abc]/ is easy and fast but gives false positives.
    > What is good solution ?


    I haven't tested this but this may do what you want:

    ( Assuming the data you are searching is in $data )

    $data =~ s/[^\Q$STR\E]+//g;
    print "matched!\n" if join( '', sort split //, $data ) eq join( '', sort
    split //, $STR );



    John
    --
    Perl isn't a toolbox, but a small machine shop where you
    can special-order certain sorts of tools at low cost and
    in short order. -- Larry Wall
    John W. Krahn, Jun 19, 2008
    #14
  15. viki

    Ben Bullock Guest

    On Thu, 19 Jun 2008 16:57:15 +0000, John W. Krahn wrote:

    > ( Assuming the data you are searching is in $data )
    >
    > $data =~ s/[^\Q$STR\E]+//g;
    > print "matched!\n" if join( '', sort split //, $data ) eq join( '',

    sort
    > split //, $STR );


    This fails (gives a false negative) if $data = "abcabc" and $STR = "ab",
    because the result of the first "join" is "aabb" and the second "join" is
    "ab". You need to do some kind of unique sort.
    Ben Bullock, Jun 20, 2008
    #15
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Kosio

    Floats to chars and chars to floats

    Kosio, Sep 16, 2005, in forum: C Programming
    Replies:
    44
    Views:
    1,285
    Tim Rentsch
    Sep 23, 2005
  2. Hongyu
    Replies:
    9
    Views:
    902
    James Kanze
    Aug 8, 2008
  3. M.Posseth

    receiving ??? chars instead of "special" chars

    M.Posseth, Nov 15, 2004, in forum: ASP .Net Web Services
    Replies:
    3
    Views:
    226
    Dan Rogers
    Nov 16, 2004
  4. User
    Replies:
    5
    Views:
    461
    Tad McClellan
    Jun 13, 2004
  5. Gerhard M
    Replies:
    6
    Views:
    132
Loading...

Share This Page