matching '?' in a string ending with digits

Discussion in 'Perl Misc' started by ReMo..., Feb 26, 2011.

  1. ReMo...

    ReMo... Guest

    #!/usr/bin/perl

    use strict;
    use warnings;

    my @arr = ('third1000', 'third1000', 'third?1000', '1000third?', 'third{}1000');
    for my $item (@arr) {
    my $targ = $item;
    print "$targ and $item ";
    print "do not " if ($item !~ /$targ/);
    print "match\n"
    }

    The output is:
    third1000 and third1000 match
    third1000 and third1000 match
    third?1000 and third?1000 do not match << I don't understand this
    1000third? and 1000third? match
    third{}1000 and third{}1000 match

    In the above, the nondigits represent arbitrary text that digits are
    added to for a multi-array sort in a module I'm making, because there
    may be otherwise-identical text items.

    /\Q...\E/ seems to make it go away, but then two characters ('$' and '@')
    would apparently need to be accounted for.

    So my question is, what other characters will fail to match in a string
    ending with digits? I assume there are more clues in perlre and perlops,
    but I can't find them. I've got to be missing something really elementary
    here.

    --

    "Learning is what most adults will do for a living in the 21st century."
    -- S.J. Perelman
     
    ReMo..., Feb 26, 2011
    #1
    1. Advertising

  2. ReMo... <> wrote:
    > #!/usr/bin/perl


    > use strict;
    > use warnings;


    > my @arr = ('third1000', 'third1000', 'third?1000', '1000third?', 'third{}1000');
    > for my $item (@arr) {
    > my $targ = $item;
    > print "$targ and $item ";
    > print "do not " if ($item !~ /$targ/);
    > print "match\n"
    > }


    > The output is:
    > third1000 and third1000 match
    > third1000 and third1000 match
    > third?1000 and third?1000 do not match << I don't understand this
    > 1000third? and 1000third? match
    > third{}1000 and third{}1000 match


    > In the above, the nondigits represent arbitrary text that digits are
    > added to for a multi-array sort in a module I'm making, because there
    > may be otherwise-identical text items.


    > /\Q...\E/ seems to make it go away, but then two characters ('$' and '@')
    > would apparently need to be accounted for.


    > So my question is, what other characters will fail to match in a string
    > ending with digits?


    It's not about the digits in the end, it's about the presence
    of characters that have special meanings in a regexp. Take

    'third?1000'

    As a regexp it says "match everything that starts with the 4
    chars 'thri', optionally followed by a 'd', and then by '1000'."
    That's obviously something that doesn't describe the string
    itself, which contains a question mark.

    And the list of strings that you will get problems with can
    easily be extended. Take, for example

    'third()1000'
    'thir\d1000'
    'thir{2,}d1000'
    'third*1000'

    And it gets worse: try

    'th(ird1000'

    which will end in a complaint about an unmatched '(' in a
    regular expression.

    > I assume there are more clues in perlre and perlops,
    > but I can't find them. I've got to be missing something really elementary
    > here.


    Using '/\Q$trag\E/' will help with since for all what's en-
    closed by '\Q' and '\E' the special meaning of the charac-
    ters is removed. I don't see what problems you forsee with
    '$' and '@', but then I don't understand the explanation of
    what you're planing to do with all this.

    Regards, Jens
    --
    \ Jens Thoms Toerring ___
    \__________________________ http://toerring.de
     
    Jens Thoms Toerring, Feb 26, 2011
    #2
    1. Advertising

  3. "ReMo..." <> writes:

    > So my question is, what other characters will fail to match in a string
    > ending with digits? I assume there are more clues in perlre and perlops,
    > but I can't find them. I've got to be missing something really elementary
    > here.


    Digits are not your problem. The problem is characters that have special
    meaning in regexpes, this includes '?'. The regexp 'third?1000' matches
    either 'third1000' or 'thir1000' which of course isn't a substring of
    'third?1000'.

    The regexp 'third1000?' matches 'third1000' and 'third100' which in both
    cases are substrings of 'third1000?' and so you get a match.

    //Makholm
     
    Peter Makholm, Feb 26, 2011
    #3
  4. ReMo...

    Guest

    On Sat, 26 Feb 2011 08:42:29 +0000 (UTC), "ReMo..." <> wrote:

    >
    >#!/usr/bin/perl
    >
    >use strict;
    >use warnings;
    >
    >my @arr = ('third1000', 'third1000', 'third?1000', '1000third?', 'third{}1000');
    >for my $item (@arr) {
    > my $targ = $item;
    > print "$targ and $item ";
    > print "do not " if ($item !~ /$targ/);
    > print "match\n"
    >}
    >
    >The output is:
    >third1000 and third1000 match
    >third1000 and third1000 match
    >third?1000 and third?1000 do not match << I don't understand this


    Your using the wrong operator.
    Change the conditional to
    if ($item ne $targ)

    Otherwise its just dumb to do something you know nothing about.
    The !~ is a regular expression operator. Read any document on regular
    expressions before actually trying them.

    -sln
     
    , Feb 26, 2011
    #4
  5. ReMo...

    Guest

    On 26 Feb 2011 12:54:16 GMT, (Jens Thoms Toerring) wrote:

    >ReMo... <> wrote:
    >> #!/usr/bin/perl

    >
    >> use strict;
    >> use warnings;

    >
    >> my @arr = ('third1000', 'third1000', 'third?1000', '1000third?', 'third{}1000');
    >> for my $item (@arr) {
    >> my $targ = $item;
    >> print "$targ and $item ";
    >> print "do not " if ($item !~ /$targ/);
    >> print "match\n"
    >> }

    >
    >> The output is:
    >> third1000 and third1000 match
    >> third1000 and third1000 match
    >> third?1000 and third?1000 do not match << I don't understand this
    >> 1000third? and 1000third? match
    >> third{}1000 and third{}1000 match

    >
    >> In the above, the nondigits represent arbitrary text that digits are
    >> added to for a multi-array sort in a module I'm making, because there
    >> may be otherwise-identical text items.

    >
    >> /\Q...\E/ seems to make it go away, but then two characters ('$' and '@')
    >> would apparently need to be accounted for.

    >
    >> So my question is, what other characters will fail to match in a string
    >> ending with digits?

    >
    >It's not about the digits in the end, it's about the presence
    >of characters that have special meanings in a regexp. Take
    >
    > 'third?1000'
    >
    >As a regexp it says "match everything that starts with the 4
    >chars 'thri', optionally followed by a 'd', and then by '1000'."
    >That's obviously something that doesn't describe the string
    >itself, which contains a question mark.
    >
    >And the list of strings that you will get problems with can
    >easily be extended. Take, for example
    >
    > 'third()1000'
    > 'thir\d1000'
    > 'thir{2,}d1000'
    > 'third*1000'
    >
    >And it gets worse: try
    >
    > 'th(ird1000'
    >
    >which will end in a complaint about an unmatched '(' in a
    >regular expression.
    >
    >> I assume there are more clues in perlre and perlops,
    >> but I can't find them. I've got to be missing something really elementary
    >> here.

    >
    >Using '/\Q$trag\E/' will help with since for all what's en-
    >closed by '\Q' and '\E' the special meaning of the charac-
    >ters is removed. I don't see what problems you forsee with
    >'$' and '@', but then I don't understand the explanation of
    >what you're planing to do with all this.
    >


    This is awsome, but shouldn't you recommend he
    first looks up regular expressions at wikipedia to find
    out what it is?

    -sln
     
    , Feb 26, 2011
    #5
  6. "ReMo..." <> wrote:
    [...]
    > my $targ = $item;
    > print "$targ and $item ";
    > print "do not " if ($item !~ /$targ/);
    > print "match\n"

    [...]
    >So my question is, what other characters will fail to match in a string


    Almost all that are special in REs. The only(?) exception being '.',
    which of course will still match itself, too.

    >ending with digits?


    This part of the question is a red herring.

    >I assume there are more clues in perlre and perlops,
    >but I can't find them. I've got to be missing something really elementary
    >here.


    The most elementary is: don't use REs unless you need RE behaviour. If
    you simply want to check if one string is part of another string then
    just use a plain "index()".

    jue
     
    Jürgen Exner, Feb 26, 2011
    #6
  7. ReMo...

    ReMo... Guest

    On 2011-02-26, Jens Thoms Toerring <> wrote:
    > ReMo... <> wrote:
    >> #!/usr/bin/perl

    >
    >> use strict;
    >> use warnings;

    >
    >> my @arr = ('third1000','third1000','third?1000','1000third?','third{}1000');
    >> for my $item (@arr) {
    >> my $targ = $item;
    >> print "$targ and $item ";
    >> print "do not " if ($item !~ /$targ/);
    >> print "match\n"
    >> }

    >
    >> The output is:
    >> third1000 and third1000 match
    >> third1000 and third1000 match
    >> third?1000 and third?1000 do not match << I don't understand this
    >> 1000third? and 1000third? match
    >> third{}1000 and third{}1000 match

    >
    >> In the above, the nondigits represent arbitrary text that digits are
    >> added to for a multi-array sort in a module I'm making, because there
    >> may be otherwise-identical text items.

    >
    >> /\Q...\E/ seems to make it go away, but then two characters ('$' and '@')
    >> would apparently need to be accounted for.

    >
    >> So my question is, what other characters will fail to match in a string
    >> ending with digits?

    >
    > It's not about the digits in the end, it's about the presence
    > of characters that have special meanings in a regexp. Take
    >
    > 'third?1000'
    >
    > As a regexp it says "match everything that starts with the 4
    > chars 'thri', optionally followed by a 'd', and then by '1000'."
    > That's obviously something that doesn't describe the string
    > itself, which contains a question mark.
    >
    > And the list of strings that you will get problems with can
    > easily be extended. Take, for example
    >
    > 'third()1000'
    > 'thir\d1000'
    > 'thir{2,}d1000'
    > 'third*1000'
    >
    > And it gets worse: try
    >
    > 'th(ird1000'
    >
    > which will end in a complaint about an unmatched '(' in a
    > regular expression.
    >
    >> I assume there are more clues in perlre and perlops,
    >> but I can't find them. I've got to be missing something really elementary
    >> here.

    >
    > Using '/\Q$trag\E/' will help with since for all what's en-
    > closed by '\Q' and '\E' the special meaning of the charac-
    > ters is removed. I don't see what problems you forsee with
    > '$' and '@', but then I don't understand the explanation of
    > what you're planing to do with all this.


    I knew it had to be that simple, but since '{' worked I assumed
    without much reflection that a match should then work like a
    comparison. Thank you!

    For reference, a test for the sub that the above represents is
    something like:

    $outee = modelsort (
    ['3third','2second','1first','1first'],
    ['alpha','beta','gamma','delta'],
    ['apple','banana','cherry','donut']
    );
    $complex1out = [
    ['1first','1first','2second','3third'],
    ['gamma','delta','beta','alpha'],
    ['cherry','donut','banana','apple']
    ];
    is_deeply ($outee, $complex1out, "modelsort: duplicate sort items");
    ....

    perlre goes on to say: 'You cannot include a literal "$" or "@"
    within a "\Q" sequence...' But of course a variable isn't a literal,
    which explains why I couldn't make a match on those characters
    fail. So that should work just dandy for what I'm doing.
     
    ReMo..., Feb 26, 2011
    #7
  8. ReMo...

    ReMo... Guest

    On 2011-02-26, Peter Makholm <> wrote:
    > "ReMo..." <> writes:
    >
    >> So my question is, what other characters will fail to match in a string
    >> ending with digits? I assume there are more clues in perlre and perlops,
    >> but I can't find them. I've got to be missing something really elementary
    >> here.

    >
    > Digits are not your problem. The problem is characters that have special
    > meaning in regexpes, this includes '?'. The regexp 'third?1000' matches
    > either 'third1000' or 'thir1000' which of course isn't a substring of
    > 'third?1000'.
    >
    > The regexp 'third1000?' matches 'third1000' and 'third100' which in both
    > cases are substrings of 'third1000?' and so you get a match.
    >
    > //Makholm


    Thank you. I really and truly thought (without thinking) that
    since '{' matched, the other quote operators would, also, and that
    a solution was elsewhere than somehow accounting for RE metacharacters.
     
    ReMo..., Feb 26, 2011
    #8
  9. ReMo...

    ReMo... Guest

    On 2011-02-26, J?rgen Exner <> wrote:
    > "ReMo..." <> wrote:
    > [...]
    >> my $targ = $item;
    >> print "$targ and $item ";
    >> print "do not " if ($item !~ /$targ/);
    >> print "match\n"

    > [...]
    >>So my question is, what other characters will fail to match in a string

    >
    > Almost all that are special in REs. The only(?) exception being '.',
    > which of course will still match itself, too.
    >
    >>ending with digits?

    >
    > This part of the question is a red herring.
    >
    >>I assume there are more clues in perlre and perlops,
    >>but I can't find them. I've got to be missing something really elementary
    >>here.

    >
    > The most elementary is: don't use REs unless you need RE behaviour. If
    > you simply want to check if one string is part of another string then
    > just use a plain "index()".
    >
    > jue


    Exactly relevent principle here, because the only reason I'd like
    to stick with an RE is because I have the vaguest of ideas that in
    the future I might want to use a different sort strategy. But in
    fact, I don't actually need to use an RE there right now.

    Thanks!
     
    ReMo..., Feb 26, 2011
    #9
  10. ReMo...

    C.DeRykus Guest

    On Feb 26, 12:42 am, "ReMo..." <> wrote:
    > #!/usr/bin/perl
    >
    > use strict;
    > use warnings;
    >
    > my @arr = ('third1000', 'third1000', 'third?1000', '1000third?', 'third{}1000');
    > for my $item (@arr) {
    >     my $targ = $item;
    >     print "$targ and $item ";
    >     print "do not " if ($item !~ /$targ/);
    >     print "match\n"
    >
    > }
    >
    > The output is:
    > third1000 and third1000 match
    > third1000 and third1000 match
    > third?1000 and third?1000 do not match << I don't understand this
    > 1000third? and 1000third? match
    > third{}1000 and third{}1000 match
    >
    > In the above, the nondigits represent arbitrary text that digits are
    > added to for a multi-array sort in a module I'm making, because there
    > may be otherwise-identical text items.
    >
    > /\Q...\E/ seems to make it go away, but then two characters ('$' and '@')
    > would apparently need to be accounted for.
    >
    > So my question is, what other characters will fail to match in a string
    > ending with digits?  I assume there are more clues in perlre and perlops,
    > but I can't find them.  I've got to be missing something really elementary
    > here.
    >



    See perldoc perlretut for a quick intro about meta-
    characters. Various metacharacters will cause the
    regex to fail as mentioned.

    One problem is that there must be a literal '?' in
    the regex in order to match the '?' in the string
    being matched. Since '?' is a regex metacharacter
    with special meaning to the regex compilation and
    not a literal '?', the match would fail.

    The 're' pragma can be helpful in seeing what
    happens:

    perl -Mre=debug -wle "print 'not'
    if 'third?1000' !~ /third?1000/"

    Compiling REx "third?1000"
    Final program:
    1: EXACT <thir> (3)
    3: CURLY {0,1} (7)
    5: EXACT <d> (0)
    7: EXACT <1000> (9)
    9: END (0)
    anchored "thir" at 0 floating "1000" at 4..5 (checking
    floating) minlen 8
    Guessing start of match in sv for REx "third?1000"
    against "third?1000"
    Found floating substr "1000" at offset 6...
    Contradicts anchored substr "thir", giving up...
    Match rejected by optimizer

    As it turns out though, the debug looks to me as
    if the compilation fails for another reason when
    the optimizer determines "1000" will occurs at
    offset 4 or 5 in the pattern which won't match its
    position at offset 6 in the string being matched.

    --
    Charles DeRykus
     
    C.DeRykus, Feb 26, 2011
    #10
  11. ReMo...

    ccc31807 Guest

    On Feb 26, 3:42 am, "ReMo..." <> wrote:
    > In the above, the nondigits represent arbitrary text that digits are
    > added to for a multi-array sort in a module I'm making, because there
    > may be otherwise-identical text items.


    There's a difference between testing for a match, and testing for
    equality. If you want to test for equality, use 'eq' for strings and
    '==' for numerical values. REs are great, but they aren't the
    universal tool to solve every problem.

    If you want to test for the equality of two strings, do that -- don't
    try to match them. If you want to test whether a string contains the
    exact copy of a substring, use the appropriate functions, like
    index().

    Obviously, what you do in your script depends on what you want done in
    your logic. One thing you might want to consider is substituting
    something for every non-word character, or everything that doesn't
    match [0-9a-zA-Z]. In my job, I have a problem with extraneous
    apostrophes (using CSV, which uses REs indirectly) and have learned to
    replace the apostrophes like this:

    while (<INPUT>)
    {
    next unless /\w/;
    chomp;
    s/'/\\/'g;
    # continue processing
    }

    CC
     
    ccc31807, Feb 28, 2011
    #11
  12. ReMo...

    ccc31807 Guest

    On Feb 28, 3:54 pm, Tad McClellan <> wrote:
    >     perl -we "s/'/\\/'g;"
    >
    >     Substitution replacement not terminated at -e line 1.


    Yes, thanks, my mistake. CC.
     
    ccc31807, Feb 28, 2011
    #12
  13. ReMo...

    ReMo... Guest

    On 2011-02-26, C.DeRykus <> wrote:
    > On Feb 26, 12:42?am, "ReMo..." <> wrote:
    >> #!/usr/bin/perl
    >>
    >> use strict;
    >> use warnings;
    >>
    >> my @arr = ('third1000','third1000','third?1000','1000third?','third{}1000');
    >> for my $item (@arr) {
    >> ? ? my $targ = $item;
    >> ? ? print "$targ and $item ";
    >> ? ? print "do not " if ($item !~ /$targ/);
    >> ? ? print "match\n"
    >>
    >> }
    >>
    >> The output is:
    >> third1000 and third1000 match
    >> third1000 and third1000 match
    >> third?1000 and third?1000 do not match << I don't understand this
    >> 1000third? and 1000third? match
    >> third{}1000 and third{}1000 match
    >>
    >> In the above, the nondigits represent arbitrary text that digits are
    >> added to for a multi-array sort in a module I'm making, because there
    >> may be otherwise-identical text items.
    >>
    >> /\Q...\E/ seems to make it go away, but then two characters ('$' and '@')
    >> would apparently need to be accounted for.
    >>
    >> So my question is, what other characters will fail to match in a string
    >> ending with digits? ?I assume there are more clues in perlre and perlops,
    >> but I can't find them. ?I've got to be missing something really elementary
    >> here.
    >>

    >
    >
    > See perldoc perlretut for a quick intro about meta-
    > characters. Various metacharacters will cause the
    > regex to fail as mentioned.


    I wish I'd done that before.

    > One problem is that there must be a literal '?' in
    > the regex in order to match the '?' in the string
    > being matched. Since '?' is a regex metacharacter
    > with special meaning to the regex compilation and
    > not a literal '?', the match would fail.
    >
    > The 're' pragma can be helpful in seeing what
    > happens:
    >
    > perl -Mre=debug -wle "print 'not'
    > if 'third?1000' !~ /third?1000/"
    >
    > Compiling REx "third?1000"
    > Final program:
    > 1: EXACT <thir> (3)
    > 3: CURLY {0,1} (7)
    > 5: EXACT <d> (0)
    > 7: EXACT <1000> (9)
    > 9: END (0)
    > anchored "thir" at 0 floating "1000" at 4..5 (checking
    > floating) minlen 8
    > Guessing start of match in sv for REx "third?1000"
    > against "third?1000"
    > Found floating substr "1000" at offset 6...
    > Contradicts anchored substr "thir", giving up...
    > Match rejected by optimizer
    >
    > As it turns out though, the debug looks to me as
    > if the compilation fails for another reason when
    > the optimizer determines "1000" will occurs at
    > offset 4 or 5 in the pattern which won't match its
    > position at offset 6 in the string being matched.


    Using debugging would have definitely pointed me in the right
    direction.

    It starts giving an exception one character previous to the
    metacharacter... I think it's checking "d". Then 4..5 may refer
    to the boundary between "d" and "?".
     
    ReMo..., Mar 1, 2011
    #13
  14. ReMo...

    ReMo... Guest

    On 2011-02-28, ccc31807 <> wrote:
    > On Feb 26, 3:42?am, "ReMo..." <> wrote:
    >> In the above, the nondigits represent arbitrary text that digits are
    >> added to for a multi-array sort in a module I'm making, because there
    >> may be otherwise-identical text items.

    >
    > There's a difference between testing for a match, and testing for
    > equality. If you want to test for equality, use 'eq' for strings and
    > '==' for numerical values. REs are great, but they aren't the
    > universal tool to solve every problem.
    >
    > If you want to test for the equality of two strings, do that -- don't
    > try to match them. If you want to test whether a string contains the
    > exact copy of a substring, use the appropriate functions, like
    > index().
    >
    > Obviously, what you do in your script depends on what you want done in
    > your logic. One thing you might want to consider is substituting
    > something for every non-word character, or everything that doesn't
    > match [0-9a-zA-Z]. In my job, I have a problem with extraneous
    > apostrophes (using CSV, which uses REs indirectly) and have learned to
    > replace the apostrophes like this:
    >
    > while (<INPUT>)
    > {
    > next unless /\w/;
    > chomp;
    > s/'/\\/'g;
    > # continue processing
    > }
    >
    > CC


    Something like that was my backup plan, tho I would have used
    3-character strings.
     
    ReMo..., Mar 1, 2011
    #14
  15. ReMo...

    Jim Gibson Guest

    In article <ikhra4$k87$-september.org>, ReMo...
    <> wrote:

    > On 2011-02-26, C.DeRykus <> wrote:
    > > On Feb 26, 12:42?am, "ReMo..." <> wrote:


    > > The 're' pragma can be helpful in seeing what
    > > happens:
    > >
    > > perl -Mre=debug -wle "print 'not'
    > > if 'third?1000' !~ /third?1000/"
    > >
    > > Compiling REx "third?1000"
    > > Final program:
    > > 1: EXACT <thir> (3)
    > > 3: CURLY {0,1} (7)
    > > 5: EXACT <d> (0)
    > > 7: EXACT <1000> (9)
    > > 9: END (0)
    > > anchored "thir" at 0 floating "1000" at 4..5 (checking
    > > floating) minlen 8
    > > Guessing start of match in sv for REx "third?1000"
    > > against "third?1000"
    > > Found floating substr "1000" at offset 6...
    > > Contradicts anchored substr "thir", giving up...
    > > Match rejected by optimizer
    > >
    > > As it turns out though, the debug looks to me as
    > > if the compilation fails for another reason when
    > > the optimizer determines "1000" will occurs at
    > > offset 4 or 5 in the pattern which won't match its
    > > position at offset 6 in the string being matched.

    >
    > Using debugging would have definitely pointed me in the right
    > direction.
    >
    > It starts giving an exception one character previous to the
    > metacharacter... I think it's checking "d". Then 4..5 may refer
    > to the boundary between "d" and "?".


    The '?' modifies the character preceding it and means "zero or one", so
    yes, it is looking at the 'd'. The regular expression 'third?1000' can
    match either 'third1000' or 'thir1000' but does not match the string
    'third?1000' because the '?' in the string is not matched by anything
    in the regular expression (I hope that is clear).

    --
    Jim Gibson
     
    Jim Gibson, Mar 1, 2011
    #15
  16. ReMo...

    Guest

    On Tue, 1 Mar 2011 04:05:24 +0000 (UTC), "ReMo..." <> wrote:

    >On 2011-02-26, C.DeRykus <> wrote:
    >> On Feb 26, 12:42?am, "ReMo..." <> wrote:

    [snip]
    >> As it turns out though, the debug looks to me as
    >> if the compilation fails for another reason when
    >> the optimizer determines "1000" will occurs at
    >> offset 4 or 5 in the pattern which won't match its
    >> position at offset 6 in the string being matched.

    >
    >Using debugging would have definitely pointed me in the right
    >direction.
    >
    >It starts giving an exception one character previous to the
    >metacharacter... I think it's checking "d". Then 4..5 may refer
    >to the boundary between "d" and "?".


    There are ways to physically visualize regular expressions such
    that it is a lot easier to read. The easier it is to read, the better.

    Think of a regular expression as a 2 dimentional object,
    with literals being one dimension X and metacharacters being the other
    dimension Y.

    --------
    This third?1000
    is really this:

    third 1000
    ?

    where a space is left as a placeholder where ? goes.

    Quantifier metachars like +*? affect the thing to its immediate left.
    Here, ? affects only the character 'd' it says match 'd'
    once or not at all.
    So, third1000 or thir1000 will match

    -----
    This third\?1000
    is really this:

    third?1000

    where ? is now literally a ? not a metacharacter.
    It will only match third?1000.

    ------
    This thi(rd)?1000
    is really this:

    thi rd 1000
    ( )?

    where the quantifier ? has the same meaning but affects
    the group of characters enclosed by the parenths ( ).
    In this case the parenthesis are grouping metachars.

    ------
    After you get the hang of it, you can structure a
    regular expression into a pseudo dimensioned object so
    that the quantifiers and other metachars are distinguishable
    from the literal text.

    / # regex delimeter

    ^ # metachar, begining of string
    ( # metachar, start of grouping 1
    third # literal text 'third'
    ){2} # metachar, end of grouping 1, {range}, match group 1 exactly 2 times

    ( # beginning of grouping 2
    1000 # literal text '1000'
    ){3} # metachar, end of grouping 2, match group 2 exactly 3 times
    $ # metachar, end of string

    /x # regex delimeter and x modifier (ignore literal whitespace in expression)

    This will only match 'thirdthird100010001000.

    When you look at it this way, regular expressions are not confusing at all.
    Some herky jerky's jam it all together in a single line to feel superior, just
    ignore them.

    The first thing you do when trying to decipher one is to convert it into a structure
    like above. When its broken down like this its easier.

    good luck.
    -sln
     
    , Mar 2, 2011
    #16
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. John Salerno

    ending a string with a backslash

    John Salerno, May 1, 2006, in forum: Python
    Replies:
    6
    Views:
    267
    John Salerno
    May 1, 2006
  2. =?ISO-8859-1?Q?Martin_J=F8rgensen?=
    Replies:
    5
    Views:
    1,315
    =?ISO-8859-1?Q?Martin_J=F8rgensen?=
    May 6, 2006
  3. rader
    Replies:
    4
    Views:
    192
    Ernest Lergon
    May 7, 2005
  4. Jack
    Replies:
    4
    Views:
    154
    Ben Morrow
    Jul 14, 2006
  5. Replies:
    7
    Views:
    254
    Tad McClellan
    Nov 29, 2006
Loading...

Share This Page