Regular Expressions -- Backtracking?

Discussion in 'Java' started by Vincent, Oct 2, 2010.

  1. Vincent

    Vincent Guest

    I'm not sure if there is an easy way to do this short of resetting the
    matcher index, but I thought I would ask. Is it possible to construct
    a regular expression so that it will return all consecutive, 2-letter
    pairs? So, for example, if I had "hello world", successive calls to
    matcher.find() would return:

    he
    el
    ll
    lo

    wo
    or
    rl
    ld

    Thanks for any and all help.

    Vincent
    Vincent, Oct 2, 2010
    #1
    1. Advertising

  2. Vincent wrote:
    > I'm not sure if there is an easy way to do this short of resetting the
    > matcher index, but I thought I would ask. Is it possible to construct
    > a regular expression so that it will return all consecutive, 2-letter
    > pairs? So, for example, if I had "hello world", successive calls to
    > matcher.find() would return:
    >
    > he
    > el
    > ll
    > lo
    >
    > wo
    > or
    > rl
    > ld
    >
    > Thanks for any and all help.
    >
    > Vincent


    Matcher matcher = Pattern.compile("(?=(..))").matcher(args[0]);
    while (matcher.find()) {
    System.out.println(matcher.group(1));
    }

    AHS
    --
    The difference between a moral man and a man of honor is that the
    latter regrets a discreditable act, even when it has worked and he has
    not been caught. -- H.L. Mencken
    Arved Sandstrom, Oct 2, 2010
    #2
    1. Advertising

  3. Vincent

    Eric Sosman Guest

    On 10/2/2010 9:14 AM, Arved Sandstrom wrote:
    > Vincent wrote:
    >> I'm not sure if there is an easy way to do this short of resetting the
    >> matcher index, but I thought I would ask. Is it possible to construct
    >> a regular expression so that it will return all consecutive, 2-letter
    >> pairs? So, for example, if I had "hello world", successive calls to
    >> matcher.find() would return:
    >>
    >> he
    >> el
    >> ll
    >> lo
    >>
    >> wo
    >> or
    >> rl
    >> ld
    >>
    >> Thanks for any and all help.
    >>
    >> Vincent

    >
    > Matcher matcher = Pattern.compile("(?=(..))").matcher(args[0]);
    > while (matcher.find()) {
    > System.out.println(matcher.group(1));
    > }


    Matches non-letters, too. "(?=(\\w\\w))" comes closer, but
    still doesn't get the blank (empty?) line between "lo" and "wo"
    that the O.P. wants.

    Personally, I think Peter Duniho is on the right track: This
    is a job for a simple-minded loop, not for ponderous regexes.

    --
    Eric Sosman
    lid
    Eric Sosman, Oct 2, 2010
    #3
  4. Eric Sosman wrote:
    > On 10/2/2010 9:14 AM, Arved Sandstrom wrote:
    >> Vincent wrote:
    >>> I'm not sure if there is an easy way to do this short of resetting
    >>> the matcher index, but I thought I would ask. Is it possible to
    >>> construct a regular expression so that it will return all
    >>> consecutive, 2-letter pairs? So, for example, if I had "hello
    >>> world", successive calls to matcher.find() would return:
    >>>
    >>> he
    >>> el
    >>> ll
    >>> lo
    >>>
    >>> wo
    >>> or
    >>> rl
    >>> ld
    >>>
    >>> Thanks for any and all help.
    >>>
    >>> Vincent

    >>
    >> Matcher matcher = Pattern.compile("(?=(..))").matcher(args[0]);
    >> while (matcher.find()) {
    >> System.out.println(matcher.group(1));
    >> }

    >
    > Matches non-letters, too. "(?=(\\w\\w))" comes closer, but
    > still doesn't get the blank (empty?) line between "lo" and "wo"
    > that the O.P. wants.


    Ahhh, I never twigged to that stipulation...even given his example. Still,
    for a single "word", where you're not worried about WS, the provided regex
    does the trick.

    > Personally, I think Peter Duniho is on the right track: This
    > is a job for a simple-minded loop, not for ponderous regexes.


    Depends on the spirit in which the question was asked - does the OP really
    want to use a regex to solve that problem, or is he just interested in
    regular expressions? The RE I provided isn't what I'd call ponderous
    (leaving aside the WS issue which I wouldn't solve with regular expressions
    anyway; I'd just split into words first, then process each word), but the
    real reason for preferring a "simple-minded loop" in production code, to
    solve this problem, is that it would take less time to *write and test* the
    loop than it would take to explain the regular expression to 99% of coders.

    I was mainly motivated to throw this out there because of Peter's
    speculation. Given that I didn't account for the word boundaries (yet :)),
    an RE *can* be constructed that does produce overlapping consecutive
    pairs...albeit without any matching at all.

    AHS
    --
    The difference between a moral man and a man of honor is that the
    latter regrets a discreditable act, even when it has worked and he has
    not been caught. -- H.L. Mencken
    Arved Sandstrom, Oct 2, 2010
    #4
  5. Vincent

    Vincent Guest

    On Oct 2, 10:13 am, "Arved Sandstrom" <> wrote:
    > Eric Sosman wrote:
    > > On 10/2/2010 9:14 AM, Arved Sandstrom wrote:
    > >> Vincent wrote:
    > >>> I'm not sure if there is an easy way to do this short of resetting
    > >>> the matcher index, but I thought I would ask.  Is it possible to
    > >>> construct a regular expression so that it will return all
    > >>> consecutive, 2-letter pairs?  So, for example, if I had "hello
    > >>> world", successive calls to matcher.find() would return:

    >
    > >>> he
    > >>> el
    > >>> ll
    > >>> lo

    >
    > >>> wo
    > >>> or
    > >>> rl
    > >>> ld

    >
    > >>> Thanks for any and all help.

    >
    > >>> Vincent

    >
    > >> Matcher matcher = Pattern.compile("(?=(..))").matcher(args[0]);
    > >> while (matcher.find()) {
    > >>      System.out.println(matcher.group(1));
    > >> }

    >
    > >     Matches non-letters, too.  "(?=(\\w\\w))" comes closer, but
    > > still doesn't get the blank (empty?) line between "lo" and "wo"
    > > that the O.P. wants.

    >
    > Ahhh, I never twigged to that stipulation...even given his example. Still,
    > for a single "word", where you're not worried about WS, the provided regex
    > does the trick.
    >
    > >     Personally, I think Peter Duniho is on the right track: This
    > > is a job for a simple-minded loop, not for ponderous regexes.

    >
    > Depends on the spirit in which the question was asked - does the OP really
    > want to use a regex to solve that problem, or is he just interested in
    > regular expressions? The RE I provided isn't what I'd call ponderous
    > (leaving aside the WS issue which I wouldn't solve with regular expressions
    > anyway; I'd just split into words first, then process each word), but the
    > real reason for preferring a "simple-minded loop" in production code, to
    > solve this problem, is that it would take less time to *write and test* the
    > loop than it would take to explain the regular expression to 99% of coders.
    >
    > I was mainly motivated to throw this out there because of Peter's
    > speculation. Given that I didn't account for the word boundaries (yet :)),
    > an RE *can* be constructed that does produce overlapping consecutive
    > pairs...albeit without any matching at all.
    >
    > AHS
    > --
    > The difference between a moral man and a man of honor is that the
    > latter regrets a discreditable act, even when it has worked and he has
    > not been caught. -- H.L. Mencken


    Well, thank you to everyone who replied. Yes, you are right that a
    loop would solve this problem, but, for whatever reason, it didn't
    seem like an elegant approach when Java provides the ability to use
    regular expressions.

    Also, I didn't meant to include the WS in my example. I am strictly
    looking for adjacent two character pairs. After looking at the
    suggestions, I came up with the following expression that seems to be
    working properly:

    (?=([a-zA-Z]{2}))

    Again, thank you for your assistance.

    Vincent
    Vincent, Oct 2, 2010
    #5
  6. Vincent

    Lew Guest

    Vincent wrote:
    >> AHS
    >> --
    >> The difference between a moral man and a man of honor is that the
    >> latter regrets a discreditable act, even when it has worked and he has
    >> not been caught. -- H.L. Mencken


    Don't quote sigs.

    > Well, thank you to everyone who replied. Yes, you are right that a
    > loop would solve this problem, but, for whatever reason, it didn't
    > seem like an elegant approach when Java provides the ability to use
    > regular expressions.


    I am having a very hard time wrapping my mind around associating "elegant" and
    "regular expressions".

    --
    Lew
    Lew, Oct 2, 2010
    #6
  7. Vincent

    Eric Sosman Guest

    On 10/2/2010 10:47 AM, Vincent wrote:
    > [...]
    > Well, thank you to everyone who replied. Yes, you are right that a
    > loop would solve this problem, but, for whatever reason, it didn't
    > seem like an elegant approach when Java provides the ability to use
    > regular expressions.


    Java also provides the ability to establish an encrypted network
    connection to a String-splitting server, get back all pairs of
    consecutive characters, send them in turn over the network to a
    letters-only filtering server, get the filtered pairs back, load
    them all into a relational database, and retrieve them with SQL.

    ... which isn't elegance, just needless complexity. Canaricide
    by cannon. Like regexes in this case, IMHO -- but what the heck,
    it's your code and you're the boss; do as you please.

    --
    Eric Sosman
    lid
    Eric Sosman, Oct 2, 2010
    #7
  8. Vincent

    Lew Guest

    Eric Sosman wrote:
    > ... Canaricide by cannon.


    Less elegant than by coal mine.

    --
    Lew
    Lew, Oct 2, 2010
    #8
  9. "Lew" <> wrote in message
    news:i88619$2t8$...
    > Eric Sosman wrote:
    >> ... Canaricide by cannon.

    >
    > Less elegant than by coal mine.


    Or by curiosity (first you need to lure it into the catbird seat.)
    Mike Schilling, Oct 2, 2010
    #9
  10. Eric Sosman wrote:
    > On 10/2/2010 10:47 AM, Vincent wrote:
    >> [...]
    >> Well, thank you to everyone who replied. Yes, you are right that a
    >> loop would solve this problem, but, for whatever reason, it didn't
    >> seem like an elegant approach when Java provides the ability to use
    >> regular expressions.

    >
    > Java also provides the ability to establish an encrypted network
    > connection to a String-splitting server, get back all pairs of
    > consecutive characters, send them in turn over the network to a
    > letters-only filtering server, get the filtered pairs back, load
    > them all into a relational database, and retrieve them with SQL.


    Please God, don't crack wise like that around some of the managers and
    architects I've worked with. :) Phrased just a bit differently this is
    exactly the kind of system they'd go for.
    [ SNIP ]

    AHS
    --
    The difference between a moral man and a man of honor is that the
    latter regrets a discreditable act, even when it has worked and he has
    not been caught. -- H.L. Mencken
    Arved Sandstrom, Oct 2, 2010
    #10
  11. On 02.10.2010 19:37, Lew wrote:
    > Vincent wrote:


    >> Well, thank you to everyone who replied. Yes, you are right that a
    >> loop would solve this problem, but, for whatever reason, it didn't
    >> seem like an elegant approach when Java provides the ability to use
    >> regular expressions.

    >
    > I am having a very hard time wrapping my mind around associating
    > "elegant" and "regular expressions".


    :) If they are first class data types in a language like in many
    dynamic languages they can make for awful elegant solutions.

    Just a silly Ruby example:

    irb(main):011:0> "hello world".scan(/\w(?=\w)/) {|m| p(m+$'[0])}
    "he"
    "el"
    "ll"
    "lo"
    "wo"
    "or"
    "rl"
    "ld"
    => "hello world"

    Well, "elegance" of course lies in the eye of the beholder and I would
    not claim that this is the most elegant code you can get with regexps... :)

    Cheers

    robert

    --
    remember.guy do |as, often| as.you_can - without end
    http://blog.rubybestpractices.com/
    Robert Klemme, Oct 2, 2010
    #11
  12. "Arved Sandstrom" <> wrote in message
    news:H1Npo.13943$...
    > Eric Sosman wrote:
    >> On 10/2/2010 10:47 AM, Vincent wrote:
    >>> [...]
    >>> Well, thank you to everyone who replied. Yes, you are right that a
    >>> loop would solve this problem, but, for whatever reason, it didn't
    >>> seem like an elegant approach when Java provides the ability to use
    >>> regular expressions.

    >>
    >> Java also provides the ability to establish an encrypted network
    >> connection to a String-splitting server, get back all pairs of
    >> consecutive characters, send them in turn over the network to a
    >> letters-only filtering server, get the filtered pairs back, load
    >> them all into a relational database, and retrieve them with SQL.

    >
    > Please God, don't crack wise like that around some of the managers and
    > architects I've worked with. :) Phrased just a bit differently this is
    > exactly the kind of system they'd go for.
    > [ SNIP ]


    And you could pull in some developers I've worked with by putting
    string-splitting and letters-only-filtering in different classes (to reduce
    coupling) and wiring the two together with a Spring XML configuration
    Mike Schilling, Oct 2, 2010
    #12
  13. Lew wrote:
    > Eric Sosman wrote:
    >>>> Java also provides the ability to establish an encrypted network
    >>>> connection to a String-splitting server, get back all pairs of
    >>>> consecutive characters, send them in turn over the network to a
    >>>> letters-only filtering server, get the filtered pairs back, load
    >>>> them all into a relational database, and retrieve them with SQL.

    >
    > Arved Sandstrom wrote
    >>> Please God, don't crack wise like that around some of the managers
    >>> and architects I've worked with. :) Phrased just a bit differently
    >>> this is exactly the kind of system they'd go for.
    >>> [ SNIP ]

    >
    > Mike Schilling wrote:
    >> And you could pull in some developers I've worked with by putting
    >> string-splitting and letters-only-filtering in different classes (to
    >> reduce coupling) and wiring the two together with a Spring XML
    >> configuration

    >
    > Half of me wants to burn you all at the stake, half of me wants
    > shower you with awards, and half of me wants to implement that system
    > and foist it onto one of those managers or architects.
    >
    > It'll be RESTful web services, of course, based on that Ruby code
    > that Robert Klemme posted a while back.


    And it goes without saying that it should really be NoSQL in the cloud.

    For the benefit of the marketing types I would simply describe all this as a
    "best-of-breed content enrichment service".

    AHS
    --
    The difference between a moral man and a man of honor is that the
    latter regrets a discreditable act, even when it has worked and he has
    not been caught. -- H.L. Mencken
    Arved Sandstrom, Oct 3, 2010
    #13
  14. On 02-10-2010 13:37, Lew wrote:
    > I am having a very hard time wrapping my mind around associating
    > "elegant" and "regular expressions".


    In some cases regex allows for short, easily readable (for those
    that know regex) and easily modifiable code.

    Like XPath for getting info out of XML.

    Arne
    Arne Vajhøj, Oct 15, 2010
    #14
  15. Vincent

    Arne Vajhøj Guest

    On 02-10-2010 17:06, Robert Klemme wrote:
    > On 02.10.2010 19:37, Lew wrote:
    >> Vincent wrote:

    >
    >>> Well, thank you to everyone who replied. Yes, you are right that a
    >>> loop would solve this problem, but, for whatever reason, it didn't
    >>> seem like an elegant approach when Java provides the ability to use
    >>> regular expressions.

    >>
    >> I am having a very hard time wrapping my mind around associating
    >> "elegant" and "regular expressions".

    >
    > :) If they are first class data types in a language like in many
    > dynamic languages they can make for awful elegant solutions.
    >
    > Just a silly Ruby example:
    >
    > irb(main):011:0> "hello world".scan(/\w(?=\w)/) {|m| p(m+$'[0])}
    > "he"
    > "el"
    > "ll"
    > "lo"
    > "wo"
    > "or"
    > "rl"
    > "ld"
    > => "hello world"
    >
    > Well, "elegance" of course lies in the eye of the beholder and I would
    > not claim that this is the most elegant code you can get with regexps...
    > :)


    I would consider it pretty bad, because it is very difficult
    to read for those not familiar with Ruby.

    In contrast with Arveds Java code, which should be understandable
    by anyone:
    - knowing regex
    - knowing any mainstream programming language

    Arne
    Arne Vajhøj, Oct 15, 2010
    #15
  16. On 15.10.2010 23:54, Arne Vajhøj wrote:
    > On 02-10-2010 17:06, Robert Klemme wrote:
    >> On 02.10.2010 19:37, Lew wrote:
    >>> Vincent wrote:

    >>
    >>>> Well, thank you to everyone who replied. Yes, you are right that a
    >>>> loop would solve this problem, but, for whatever reason, it didn't
    >>>> seem like an elegant approach when Java provides the ability to use
    >>>> regular expressions.
    >>>
    >>> I am having a very hard time wrapping my mind around associating
    >>> "elegant" and "regular expressions".

    >>
    >> :) If they are first class data types in a language like in many
    >> dynamic languages they can make for awful elegant solutions.
    >>
    >> Just a silly Ruby example:
    >>
    >> irb(main):011:0> "hello world".scan(/\w(?=\w)/) {|m| p(m+$'[0])}
    >> "he"
    >> "el"
    >> "ll"
    >> "lo"
    >> "wo"
    >> "or"
    >> "rl"
    >> "ld"
    >> => "hello world"
    >>
    >> Well, "elegance" of course lies in the eye of the beholder and I would
    >> not claim that this is the most elegant code you can get with regexps...
    >> :)

    >
    > I would consider it pretty bad, because it is very difficult
    > to read for those not familiar with Ruby.


    Hmmm... I don't think this is an important criterion. After all, if
    someone needs to read code then it is usually because he needs to modify
    it. For that knowledge of the programming language at hand is mandatory
    anyway. And I don't think the code is so esoteric for someone with at
    least a working knowledge of Ruby.

    If the code would be hard to read for the general Ruby programmer that
    would qualify as a valid argument IMHO. I'm probably the wrong person
    to judge that since I brought the example up in the first place. :)

    Cheers

    robert

    --
    remember.guy do |as, often| as.you_can - without end
    http://blog.rubybestpractices.com/
    Robert Klemme, Oct 16, 2010
    #16
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jay Douglas
    Replies:
    0
    Views:
    592
    Jay Douglas
    Aug 15, 2003
  2. Steven Spear

    Backtracking Search Function Trouble

    Steven Spear, Nov 23, 2004, in forum: C++
    Replies:
    2
    Views:
    366
    Smitsky
    Nov 25, 2004
  3. shoo
    Replies:
    3
    Views:
    439
    Howard
    Mar 8, 2005
  4. ccwork
    Replies:
    2
    Views:
    92
    Thens
    Sep 16, 2003
  5. Noman Shapiro
    Replies:
    0
    Views:
    219
    Noman Shapiro
    Jul 17, 2013
Loading...

Share This Page