Multiple matching with ()*

Discussion in 'Ruby' started by Alessandro Re, Jul 31, 2007.

  1. Hi there!
    I'm Alessandro from Italy and I started using ruby some days ago,
    so... Hello, Community! :)

    Well, I was trying to match a pattern multiple times. I tried both
    with normal match() and scan(), but i can't get the desired result.

    The subject string is something like:
    "1a2bend" or "beg1a2b3c4dend"
    more generally, it should match /^beg(\d\w)*end$/ : always a begin and
    ending pattern, and a unspecified number of central pattern.
    The problem is that the central pattern must be extracted for every
    time it's encountered.
    For example, trying with
    "x1A2B3C4Dz".scan /^(x)(\d\w)*(z)$/
    returns
    [["x", "4D", "z"]]
    while i need something like
    [["x", "1A", "2B", "3C", "4D", "z"]]

    Why does ()* match just the last one? How can i get all the ()* that it matches?

    Probabily i'm doing something wrong, but can't understand where :\

    Thanks!
    --
    ~Ale
    Alessandro Re, Jul 31, 2007
    #1
    1. Advertising

  2. Alessandro Re

    Jano Svitok Guest

    On 7/31/07, Alessandro Re <> wrote:
    > Hi there!
    > I'm Alessandro from Italy and I started using ruby some days ago,
    > so... Hello, Community! :)
    >
    > Well, I was trying to match a pattern multiple times. I tried both
    > with normal match() and scan(), but i can't get the desired result.
    >
    > The subject string is something like:
    > "1a2bend" or "beg1a2b3c4dend"
    > more generally, it should match /^beg(\d\w)*end$/ : always a begin and
    > ending pattern, and a unspecified number of central pattern.
    > The problem is that the central pattern must be extracted for every
    > time it's encountered.
    > For example, trying with
    > "x1A2B3C4Dz".scan /^(x)(\d\w)*(z)$/
    > returns
    > [["x", "4D", "z"]]
    > while i need something like
    > [["x", "1A", "2B", "3C", "4D", "z"]]
    >
    > Why does ()* match just the last one? How can i get all the ()* that it matches?
    >
    > Probabily i'm doing something wrong, but can't understand where :\


    Try:

    if "x1A2B3C4Dz" =~ /^(x)((?:\d\w)*)(z)$/

    return [
    Jano Svitok, Jul 31, 2007
    #2
    1. Advertising

  3. Alessandro Re

    Jano Svitok Guest

    On 7/31/07, Alessandro Re <> wrote:
    > Hi there!
    > I'm Alessandro from Italy and I started using ruby some days ago,
    > so... Hello, Community! :)
    >
    > Well, I was trying to match a pattern multiple times. I tried both
    > with normal match() and scan(), but i can't get the desired result.
    >
    > The subject string is something like:
    > "1a2bend" or "beg1a2b3c4dend"
    > more generally, it should match /^beg(\d\w)*end$/ : always a begin and
    > ending pattern, and a unspecified number of central pattern.
    > The problem is that the central pattern must be extracted for every
    > time it's encountered.
    > For example, trying with
    > "x1A2B3C4Dz".scan /^(x)(\d\w)*(z)$/
    > returns
    > [["x", "4D", "z"]]
    > while i need something like
    > [["x", "1A", "2B", "3C", "4D", "z"]]
    >
    > Why does ()* match just the last one? How can i get all the ()* that it matches?
    >
    > Probabily i'm doing something wrong, but can't understand where :\


    Try:

    if "x1A2B3C4Dz" =~ /^(x)((?:\d\w)*)(z)$/
    a, b = $1, $3 #
    return [a] + $2.scan(/\d\w/).flatten +
    end

    I don't know if it's possible to do it in one run though, maybe you
    could use split as well...
    Take care when doing nested searches as they will overwrite $1..9
    (that's why I used a and b)

    J.
    Jano Svitok, Jul 31, 2007
    #3
  4. On 7/31/07, Alessandro Re <> wrote:
    > For example, trying with
    > "x1A2B3C4Dz".scan /^(x)(\d\w)*(z)$/
    > returns
    > [["x", "4D", "z"]]
    > while i need something like
    > [["x", "1A", "2B", "3C", "4D", "z"]]
    >

    Hi,

    Try this.

    str = "x1A2B3C4Dz"
    p str.scan(/\d?\w/) #>["x", "1A", "2B", "3C", "4D", "z"]

    Harry

    --
    A Look into Japanese Ruby List in English
    http://www.kakueki.com/
    Harry Kakueki, Jul 31, 2007
    #4
  5. Thanks, but i need to match the pattern OR don't match anything.
    "lol1a2vasd".scan(/\d?\w/) => ["l", "o", "l", "1a", "2v", "a", "s", "d"]
    while i need to be sure that the pattern begins with a regex "x" and
    ends with "z"

    (of course, x 1 a 2 b 3 c should be regexes not just chars)

    thanks, you help is apreciated :)

    On 7/31/07, Harry Kakueki <> wrote:
    > On 7/31/07, Alessandro Re <> wrote:
    > > For example, trying with
    > > "x1A2B3C4Dz".scan /^(x)(\d\w)*(z)$/
    > > returns
    > > [["x", "4D", "z"]]
    > > while i need something like
    > > [["x", "1A", "2B", "3C", "4D", "z"]]
    > >

    > Hi,
    >
    > Try this.
    >
    > str = "x1A2B3C4Dz"
    > p str.scan(/\d?\w/) #>["x", "1A", "2B", "3C", "4D", "z"]
    >
    > Harry
    >
    > --
    > A Look into Japanese Ruby List in English
    > http://www.kakueki.com/
    >
    >



    --
    ~Ale
    Alessandro Re, Jul 31, 2007
    #5
  6. Mh well, to me it seems a normal regex processing (i mean, it *should*
    require only one instruction, since this pattern can be read with just
    one regex, even if ruby doesn't allow it... but it would be really
    bad).
    Anyway well, splitting it there are different ways to do it - thanks
    for your sudjestion.
    But if ruby make it possible with one call, i'd prefer to use it.

    Thanks!

    On 7/31/07, Jano Svitok <> wrote:
    > On 7/31/07, Alessandro Re <> wrote:
    > > Hi there!
    > > I'm Alessandro from Italy and I started using ruby some days ago,
    > > so... Hello, Community! :)
    > >
    > > Well, I was trying to match a pattern multiple times. I tried both
    > > with normal match() and scan(), but i can't get the desired result.
    > >
    > > The subject string is something like:
    > > "1a2bend" or "beg1a2b3c4dend"
    > > more generally, it should match /^beg(\d\w)*end$/ : always a begin and
    > > ending pattern, and a unspecified number of central pattern.
    > > The problem is that the central pattern must be extracted for every
    > > time it's encountered.
    > > For example, trying with
    > > "x1A2B3C4Dz".scan /^(x)(\d\w)*(z)$/
    > > returns
    > > [["x", "4D", "z"]]
    > > while i need something like
    > > [["x", "1A", "2B", "3C", "4D", "z"]]
    > >
    > > Why does ()* match just the last one? How can i get all the ()* that it matches?
    > >
    > > Probabily i'm doing something wrong, but can't understand where :\

    >
    > Try:
    >
    > if "x1A2B3C4Dz" =~ /^(x)((?:\d\w)*)(z)$/
    > a, b = $1, $3 #
    > return [a] + $2.scan(/\d\w/).flatten +
    > end
    >
    > I don't know if it's possible to do it in one run though, maybe you
    > could use split as well...
    > Take care when doing nested searches as they will overwrite $1..9
    > (that's why I used a and b)
    >
    > J.
    >
    >



    --
    ~Ale
    Alessandro Re, Jul 31, 2007
    #6
  7. On 7/31/07, Alessandro Re <> wrote:
    > Thanks, but i need to match the pattern OR don't match anything.
    > "lol1a2vasd".scan(/\d?\w/) => ["l", "o", "l", "1a", "2v", "a", "s", "d"]
    > while i need to be sure that the pattern begins with a regex "x" and
    > ends with "z"


    str = "lol1a2vasd"
    p str.scan(/\d\w|\w{3}/)

    Harry

    --
    A Look into Japanese Ruby List in English
    http://www.kakueki.com/
    Harry Kakueki, Jul 31, 2007
    #7
  8. 2007/7/31, Alessandro Re <>:
    > Mh well, to me it seems a normal regex processing (i mean, it *should*
    > require only one instruction, since this pattern can be read with just
    > one regex, even if ruby doesn't allow it... but it would be really
    > bad).
    > Anyway well, splitting it there are different ways to do it - thanks
    > for your sudjestion.
    > But if ruby make it possible with one call, i'd prefer to use it.


    irb(main):006:0> s="x1A2B3C4Dz"
    => "x1A2B3C4Dz"
    irb(main):007:0> s.scan /x(\d\w)*z/
    => [["4D"]]
    irb(main):008:0> s.scan /x((?:\d\w)*?)z/
    => [["1A2B3C4D"]]
    irb(main):009:0> s.scan(/x((?:\d\w)*?)z/).map {|a| a[0].scan(/\d\w/)}
    => [["1A", "2B", "3C", "4D"]]

    Kind regards

    robert
    Robert Klemme, Jul 31, 2007
    #8
  9. Thanks, this is an interesting solution!

    On 7/31/07, Robert Klemme <> wrote:
    > 2007/7/31, Alessandro Re <>:
    > > Mh well, to me it seems a normal regex processing (i mean, it *should*
    > > require only one instruction, since this pattern can be read with just
    > > one regex, even if ruby doesn't allow it... but it would be really
    > > bad).
    > > Anyway well, splitting it there are different ways to do it - thanks
    > > for your sudjestion.
    > > But if ruby make it possible with one call, i'd prefer to use it.

    >
    > irb(main):006:0> s="x1A2B3C4Dz"
    > => "x1A2B3C4Dz"
    > irb(main):007:0> s.scan /x(\d\w)*z/
    > => [["4D"]]
    > irb(main):008:0> s.scan /x((?:\d\w)*?)z/
    > => [["1A2B3C4D"]]
    > irb(main):009:0> s.scan(/x((?:\d\w)*?)z/).map {|a| a[0].scan(/\d\w/)}
    > => [["1A", "2B", "3C", "4D"]]
    >
    > Kind regards
    >
    > robert
    >
    >



    --
    ~Ale
    Alessandro Re, Jul 31, 2007
    #9
  10. Alessandro Re

    botp Guest

    On 7/31/07, Alessandro Re <> wrote:
    > Mh well, to me it seems a normal regex processing (i mean, it *should*
    > require only one instruction, since this pattern can be read with just
    > one regex, even if ruby doesn't allow it... but it would be really bad).


    seems like you have a pattern within a pattern.
    it may be easy to unwrap outer pattern first, then work on the inner
    pattern. something like,

    irb(main):096:0> "lol1a2vasd".scan(/lol(.+)asd/).to_s.scan(/\d\w/)
    => ["1a", "2v"]
    irb(main):097:0> "beg1a2vend".scan(/beg(.+)end/).to_s.scan(/\d\w/)
    => ["1a", "2v"]
    irb(main):098:0> "beg1a2vendxbeg3c4dend".scan(/beg(.+)end/).to_s.scan(/\d\w/)
    => ["1a", "2v", "3c", "4d"]

    is that ok?
    kind regards -botp
    botp, Jul 31, 2007
    #10
  11. Alessandro Re wrote:
    > For example, trying with
    > "x1A2B3C4Dz".scan /^(x)(\d\w)*(z)$/
    > returns
    > [["x", "4D", "z"]]
    > while i need something like
    > [["x", "1A", "2B", "3C", "4D", "z"]]


    Does this goes more into the direction you wanted:

    irb(main):001:0> "x1A2B3C4Dz".scan
    /(?:^(?:x)|\G)(\d\w)(?=(?:\d\w)*(?:z)$)/
    => [["1A"], ["2B"], ["3C"], ["4D"]]

    ???

    Wolfgang Nádasi-Donner
    --
    Posted via http://www.ruby-forum.com/.
    Wolfgang Nádasi-donner, Jul 31, 2007
    #11
  12. On 7/31/07, Alessandro Re <> wrote:
    > while i need to be sure that the pattern begins with a regex "x" and
    > ends with "z"
    >
    > (of course, x 1 a 2 b 3 c should be regexes not just chars)
    >

    Sorry, I misunderstood what you wanted.
    Is this more like it?

    str = "lol1a2vasd"
    m = /^(\w{3})(.*)(\w{3})$/.match(str).captures
    m[1] = m[1].scan(/\d\w/)
    p m.flatten #> ["lol","1a","2v","asd"]

    Harry

    --
    A Look into Japanese Ruby List in English
    http://www.kakueki.com/
    Harry Kakueki, Aug 1, 2007
    #12
  13. On 31.07.2007 17:18, Alessandro Re wrote:
    > Thanks, this is an interesting solution!
    >
    > On 7/31/07, Robert Klemme <> wrote:
    >> 2007/7/31, Alessandro Re <>:
    >>> Mh well, to me it seems a normal regex processing (i mean, it *should*
    >>> require only one instruction, since this pattern can be read with just
    >>> one regex, even if ruby doesn't allow it... but it would be really
    >>> bad).
    >>> Anyway well, splitting it there are different ways to do it - thanks
    >>> for your sudjestion.
    >>> But if ruby make it possible with one call, i'd prefer to use it.

    >> irb(main):006:0> s="x1A2B3C4Dz"
    >> => "x1A2B3C4Dz"
    >> irb(main):007:0> s.scan /x(\d\w)*z/
    >> => [["4D"]]
    >> irb(main):008:0> s.scan /x((?:\d\w)*?)z/
    >> => [["1A2B3C4D"]]
    >> irb(main):009:0> s.scan(/x((?:\d\w)*?)z/).map {|a| a[0].scan(/\d\w/)}
    >> => [["1A", "2B", "3C", "4D"]]


    Give special attention to my usage of the reluctant qualifier which is
    mandatory if your input contains multiple begin end pairs.

    Kind regards

    robert


    PS: please do not top post.
    Robert Klemme, Aug 1, 2007
    #13
  14. On 8/1/07, Harry Kakueki <> wrote:
    > On 7/31/07, Alessandro Re <> wrote:
    > > while i need to be sure that the pattern begins with a regex "x" and
    > > ends with "z"
    > >
    > > (of course, x 1 a 2 b 3 c should be regexes not just chars)
    > >

    > Sorry, I misunderstood what you wanted.
    > Is this more like it?
    >
    > str = "lol1a2vasd"
    > m = /^(\w{3})(.*)(\w{3})$/.match(str).captures
    > m[1] = m[1].scan(/\d\w/)
    > p m.flatten #> ["lol","1a","2v","asd"]
    >
    > Harry
    >
    > --
    > A Look into Japanese Ruby List in English
    > http://www.kakueki.com/
    >
    >


    Yep, it's like this.
    I solved using 2 instructions as you did: first matching extern words,
    then the middle ones, but i still think that one regex would have been
    nicer :)

    Thanks guys

    --
    ~Ale
    Alessandro Re, Aug 2, 2007
    #14
  15. Alessandro Re wrote:
    > ...but i still think that one regex would have been nicer :)


    I don't think, that this will be "nice"...

    irb(main):001:0>
    "x1A2B3C4Dz".scan(/(?:\G|^(?:x))(x|\d\w|z)(?=(?:\d\w)*(?:z|)$)/)
    => [["x"], ["1A"], ["2B"], ["3C"], ["4D"], ["z"]]

    ..., and I didn't test it aganst wrong lines, but after a "flatten" it
    ends up with the required result.

    Wolfgang Nádasi-Donner

    --
    Posted via http://www.ruby-forum.com/.
    Wolfgang Nádasi-donner, Aug 2, 2007
    #15
  16. T24gOC8yLzA3LCBXb2xmZ2FuZyBOw6FkYXNpLWRvbm5lciA8ZWQub2Rhbm93QHdvbmFkby5kZT4g
    d3JvdGU6Cj4gaXJiKG1haW4pOjAwMTowPgo+ICJ4MUEyQjNDNER6Ii5zY2FuKC8oPzpcR3xeKD86
    eCkpKHh8XGRcd3x6KSg/PSg/OlxkXHcpKig/Onp8KSQpLykKPiA9PiBbWyJ4Il0sIFsiMUEiXSwg
    WyIyQiJdLCBbIjNDIl0sIFsiNEQiXSwgWyJ6Il1dCgpXb25kZXJmdWwgOikKVGhhbmtzIQoKLS0g
    Cn5BbGUK
    Alessandro Re, Aug 4, 2007
    #16
  17. 2007/8/4, Alessandro Re <>:
    > On 8/2/07, Wolfgang N=E1dasi-donner <> wrote:
    > > irb(main):001:0>
    > > "x1A2B3C4Dz".scan(/(?:\G|^(?:x))(x|\d\w|z)(?=3D(?:\d\w)*(?:z|)$)/)
    > > =3D> [["x"], ["1A"], ["2B"], ["3C"], ["4D"], ["z"]]

    >
    > Wonderful :)
    > Thanks!


    But this does not seem to work with strings that contain multiple sections:

    irb(main):002:0>
    "x1A2B3C4Dz1a".scan(/(?:\G|^(?:x))(x|\d\w|z)(?=3D(?:\d\w)*(?:z|)$)/)
    =3D> []

    So it's not suited for a one RX approach and still need two levels of
    RX. If that's the case then we have seen simpler solutions for that.
    (Btw, one reason why it's so awkward is that there is no lookbehind in
    Ruby 1.8 - but this will change.)

    Kind regards

    robert
    Robert Klemme, Aug 6, 2007
    #17
  18. Robert Klemme wrote:
    > (Btw, one reason why it's so awkward is that there is no lookbehind in
    > Ruby 1.8 - but this will change.)


    I am waiting for this Christmas gift too...

    Wolfgang Nádasi-Donner
    --
    Posted via http://www.ruby-forum.com/.
    Wolfgang Nádasi-donner, Aug 6, 2007
    #18
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. abir
    Replies:
    5
    Views:
    372
  2. Replies:
    6
    Views:
    314
  3. pinkisntwell
    Replies:
    5
    Views:
    714
    Gabriel Genellina
    Nov 10, 2009
  4. Marc Bissonnette

    Pattern matching : not matching problem

    Marc Bissonnette, Jan 8, 2004, in forum: Perl Misc
    Replies:
    9
    Views:
    220
    Marc Bissonnette
    Jan 13, 2004
  5. Bobby Chamness
    Replies:
    2
    Views:
    212
    Xicheng Jia
    May 3, 2007
Loading...

Share This Page