Do You Understand Regular Expressions?

Discussion in 'Ruby' started by growlatoe@yahoo.co.uk, Jun 20, 2007.

  1. Guest

    Hi all.

    I'm pretty new to Ruby and that sort of thing, and I'm having a few
    problems understanding regular expressions. I'm hoping one of you can
    point me in the right direction.

    I want to replace an entire string with another string. I know you
    don't need regular expressions for that, but it's part of a more
    generic approach. Anyway, the problem I'm having is that my regular
    expressions are finding two matches instead of one, and I don't
    understand why. I've narrowed down my confusion to the following code,
    which shows some output from irb:

    irb(main):001:0> "hello".scan(/.*/)
    => ["hello", ""]

    I was expecting one match, not two, because .* matches everything,
    right? Can someone explain why an empty string is also matched?

    The same thing can be seen when substituting - this is closer to how
    I'm using regular expressions in my code:

    irb(main):001:0> "hello".gsub(/.*/, "P")
    => "PP"

    Two substitutions are made and I expected one. So am I right or wrong
    to expect one substitution?

    Please help - this is driving me nuts!

    And in case it helps...

    $ ruby --version
    ruby 1.8.5 (2006-08-25) [i486-linux]


    Thanks in advance.
     
    , Jun 20, 2007
    #1
    1. Advertising

  2. Tim Hunter Guest

    Tim Hunter, Jun 20, 2007
    #2
    1. Advertising

  3. Axel Etzold Guest

    > irb(main):001:0> "hello".scan(/.*/)
    > => ["hello", ""]
    >
    > I was expecting one match, not two, because .* matches everything,
    > right? Can someone explain why an empty string is also matched?


    String.scan searches for all occurrences of (any number of any
    character) here. So zero occurrences is one match.

    You can search for at least one occurrence like this:

    "hello".scan(/.+/)

    "hello".gsub(/.+/, "P") => 'P'

    As an introduction, I find

    http://www.regular-expressions.info/ruby.html

    quite instructive for the use of regexps in Ruby.

    Best regards,

    Axel
    --
    Psssst! Schon vom neuen GMX MultiMessenger gehört?
    Der kanns mit allen: http://www.gmx.net/de/go/multimessenger
     
    Axel Etzold, Jun 20, 2007
    #3
  4. Axel Etzold wrote:
    >> irb(main):001:0> "hello".scan(/.*/)
    >> => ["hello", ""]
    >>
    >> I was expecting one match, not two, because .* matches everything,
    >> right? Can someone explain why an empty string is also matched?

    >
    > String.scan searches for all occurrences of (any number of any
    > character) here. So zero occurrences is one match.


    That doesn't really explain why the regexp finds an extra empty string.
    I know that zero occurrences is one match but after a greedy match that
    matches everything, there should be (logically?) no other match. I am no
    stranger to regexps and the result is counter-intuitive to me; I would
    consider it a bug. Or at least a very very peculiar behavior.

    Daniel
     
    Daniel DeLorme, Jun 21, 2007
    #4
  5. Daniel DeLorme wrote:
    > Axel Etzold wrote:
    >>> irb(main):001:0> "hello".scan(/.*/)
    >>> => ["hello", ""]
    >>>
    >>> I was expecting one match, not two, because .* matches everything,
    >>> right? Can someone explain why an empty string is also matched?

    >>
    >> String.scan searches for all occurrences of (any number of any
    >> character) here. So zero occurrences is one match.

    >
    > That doesn't really explain why the regexp finds an extra empty string.
    > I know that zero occurrences is one match but after a greedy match that
    > matches everything, there should be (logically?) no other match. I am no
    > stranger to regexps and the result is counter-intuitive to me; I would
    > consider it a bug. Or at least a very very peculiar behavior.
    >
    > Daniel


    I agree. Can someone explain why gsub, sub or scan matches with * are
    different than =~ matches with *

    puts "hello".gsub(/[aeiou]/, '<\1>') # h<>ll<>
    puts "hello".gsub(/.*/, '<\1>') # <><>
    print "before: #{$`}\n" # before: hello
    print "match: #{$&}\n" # match:
    print "after: #{$'}\n" # after:

    puts "hello" =~ (/.*/) # 0
    print "before: #{$`}\n" # before:
    print "match: #{$&}\n" # match: hello
    print "after: #{$'}\n" # after:


    thanks!













    --
    Posted via http://www.ruby-forum.com/.
     
    Ryan Mcdonald, Jun 21, 2007
    #5
  6. Hello Ryan

    In message "Do You Understand Regular Expressions?"
    on 21.06.2007, Ryan Mcdonald <> writes:

    RM> I agree. Can someone explain why gsub, sub or scan matches with * are
    RM> different than =~ matches with *

    RM> puts "hello".gsub(/[aeiou]/, '<\1>') # h<>ll<>

    irb(main):024:0> "hello".gsub( /([aeiou])/, "<\\1>" )

    Please note the () around the expression.
    After that you can refer with \\1 to the found
    letters.


    RM> puts "hello".gsub(/.*/, '<\1>') # <><>

    irb(main):029:0> "hello".gsub(/(.*)/, '<\1>')
    => "<hello><>"
    irb(main):030:0> "hello".gsub(/(.+)/, '<\1>')
    => "<hello>"

    RM> print "before: #{$`}\n" # before: hello

    irb(main):031:0> $`
    => ""

    RM> print "match: #{$&}\n" # match:

    irb(main):032:0> $&
    => "hello"

    RM> print "after: #{$'}\n" # after:

    irb(main):033:0> $'
    => ""


    hope this helps.

    regards.
    Karl-Heinz
     
    Wild Karl-Heinz, Jun 21, 2007
    #6
  7. Stephen Ball Guest

    On 6/20/07, Daniel DeLorme <> wrote:
    > That doesn't really explain why the regexp finds an extra empty string.
    > I know that zero occurrences is one match but after a greedy match that
    > matches everything, there should be (logically?) no other match. I am no
    > stranger to regexps and the result is counter-intuitive to me; I would
    > consider it a bug. Or at least a very very peculiar behavior.
    >
    > Daniel
    >


    It's because the pattern /.*/ matches everything, including the
    absence of everything. Yes, with the proper regexs you can indeed have
    tea and no tea at the same time. Certainly peculiar, but occasionally
    useful.

    So: since * matches "zero or more" characters when it starts the
    search for .* it matches the absence (the 'zero') and then matches the
    string (the 'or more').

    To prevent this you need to indicate to your regular expression that
    you only want the subset of 'everything' that is actually something.
    Here are a couple ways to do this:

    /.+/ will match 1 or more of something, so doesn't return the absence

    /^.*/ will start the search at the start of the pattern, in a way
    bypassing the match of zero (the pattern /^.*$/ makes this more
    clear).

    /..*/ will match everything after something, this is a modified form
    of the above that isn't tied to the start of the string

    -- Stephen
     
    Stephen Ball, Jun 21, 2007
    #7
  8. On Jun 21, 2007, at 9:47 AM, Stephen Ball wrote:

    > On 6/20/07, Daniel DeLorme <> wrote:
    >> That doesn't really explain why the regexp finds an extra empty
    >> string.
    >> I know that zero occurrences is one match but after a greedy match
    >> that
    >> matches everything, there should be (logically?) no other match. I
    >> am no
    >> stranger to regexps and the result is counter-intuitive to me; I
    >> would
    >> consider it a bug. Or at least a very very peculiar behavior.
    >>
    >> Daniel

    >
    > It's because the pattern /.*/ matches everything, including the
    > absence of everything. Yes, with the proper regexs you can indeed have
    > tea and no tea at the same time. Certainly peculiar, but occasionally
    > useful.
    > ...
    > -- Stephen


    That still doesn't really explain why "hello".scan(/.*/) => ["hello",
    ""]

    Why wouldn't it be ["hello", "", "", "", "", "", "", "", "", "", "",
    "", ... ] since I (or rather the OP) could continue to match zero
    characters (bytes) at the end of the string forever? It does seem
    that it might be that a termination condition is checked a bit later
    than it should be in this case.

    -Rob

    Rob Biedenharn http://agileconsultingllc.com
     
    Rob Biedenharn, Jun 21, 2007
    #8
  9. Guest

    Hi --

    On Thu, 21 Jun 2007, Stephen Ball wrote:

    > On 6/20/07, Daniel DeLorme <> wrote:
    >> That doesn't really explain why the regexp finds an extra empty string.
    >> I know that zero occurrences is one match but after a greedy match that
    >> matches everything, there should be (logically?) no other match. I am no
    >> stranger to regexps and the result is counter-intuitive to me; I would
    >> consider it a bug. Or at least a very very peculiar behavior.
    >>
    >> Daniel
    >>

    >
    > It's because the pattern /.*/ matches everything, including the
    > absence of everything. Yes, with the proper regexs you can indeed have
    > tea and no tea at the same time. Certainly peculiar, but occasionally
    > useful.
    >
    > So: since * matches "zero or more" characters when it starts the
    > search for .* it matches the absence (the 'zero') and then matches the
    > string (the 'or more').


    It's the other way around, though; it matches "hello" *first*, and
    then "". So the zero-matching (which I admit I'm among those who find
    unexpected) is happening at the end.

    > To prevent this you need to indicate to your regular expression that
    > you only want the subset of 'everything' that is actually something.
    > Here are a couple ways to do this:
    >
    > /.+/ will match 1 or more of something, so doesn't return the absence
    >
    > /^.*/ will start the search at the start of the pattern, in a way
    > bypassing the match of zero (the pattern /^.*$/ makes this more
    > clear).


    Here, again, "hello" is first, so /^.*/ matches it but doesn't match
    the second time ("") because the "" isn't anchored to ^.


    David

    --
    * Books:
    RAILS ROUTING (new! http://www.awprofessional.com/title/0321509242)
    RUBY FOR RAILS (http://www.manning.com/black)
    * Ruby/Rails training
    & consulting: Ruby Power and Light, LLC (http://www.rubypal.com)
     
    , Jun 21, 2007
    #9
  10. Brian Adkins Guest

    On Jun 21, 4:43 am, Wild Karl-Heinz <> wrote:
    > Hello Ryan
    >
    > In message "Do You Understand Regular Expressions?"
    > on 21.06.2007, Ryan Mcdonald <> writes:
    >
    > RM> I agree. Can someone explain why gsub, sub or scan matches with * are
    > RM> different than =~ matches with *
    >
    > RM> puts "hello".gsub(/[aeiou]/, '<\1>') # h<>ll<>
    >
    > irb(main):024:0> "hello".gsub( /([aeiou])/, "<\\1>" )
    >
    > Please note the () around the expression.
    > After that you can refer with \\1 to the found
    > letters.


    Why not simply change the 1 to a 0 ?

    irb(main):001:0> puts "hello".gsub(/[aeiou]/, '<\0>')
    h<e>ll<o>
     
    Brian Adkins, Jun 21, 2007
    #10
  11. Stephen Ball Guest

    On 6/21/07, <> wrote:
    [snip]
    > > So: since * matches "zero or more" characters when it starts the
    > > search for .* it matches the absence (the 'zero') and then matches the
    > > string (the 'or more').

    >
    > It's the other way around, though; it matches "hello" *first*, and
    > then "". So the zero-matching (which I admit I'm among those who find
    > unexpected) is happening at the end.
    >


    Ah, but notice:

    "hello".scan(/.*$/)
    => ["hello", ""]

    "hello".scan(/^.*/)
    => ["hello"]

    Strange indeed, but it seems that's how it's working. Although I
    suspect I'm not fully grasping the subtleties introduced by the *
    character.

    Hmm, the more I think on it I think I have an answer:

    The /^.*/ pattern specifies that the string must start with anything
    (e.g. it must have at least one character) and then zero or more
    characters following.

    The /.*$/ pattern has no restriction since the anchor is on the side
    with the * character. So it's parsed as "zero or more of anything
    before the end of the string".

    So, if that's correct, you are right that the absence is matched last.
    Verified by the fact that the absence follows the string in the
    pattern match.

    -- Stephen
     
    Stephen Ball, Jun 21, 2007
    #11
  12. Guest

    Hi --

    On Fri, 22 Jun 2007, Stephen Ball wrote:

    > On 6/21/07, <> wrote:
    > [snip]
    >> > So: since * matches "zero or more" characters when it starts the
    >> > search for .* it matches the absence (the 'zero') and then matches the
    >> > string (the 'or more').

    >>
    >> It's the other way around, though; it matches "hello" *first*, and
    >> then "". So the zero-matching (which I admit I'm among those who find
    >> unexpected) is happening at the end.
    >>

    >
    > Ah, but notice:
    >
    > "hello".scan(/.*$/)
    > => ["hello", ""]
    >
    > "hello".scan(/^.*/)
    > => ["hello"]
    >
    > Strange indeed, but it seems that's how it's working. Although I
    > suspect I'm not fully grasping the subtleties introduced by the *
    > character.
    >
    > Hmm, the more I think on it I think I have an answer:
    >
    > The /^.*/ pattern specifies that the string must start with anything
    > (e.g. it must have at least one character) and then zero or more
    > characters following.
    >
    > The /.*$/ pattern has no restriction since the anchor is on the side
    > with the * character. So it's parsed as "zero or more of anything
    > before the end of the string".
    >
    > So, if that's correct, you are right that the absence is matched last.
    > Verified by the fact that the absence follows the string in the
    > pattern match.


    Yes, that was what I was mostly going by :)


    David

    --
    * Books:
    RAILS ROUTING (new! http://www.awprofessional.com/title/0321509242)
    RUBY FOR RAILS (http://www.manning.com/black)
    * Ruby/Rails training
    & consulting: Ruby Power and Light, LLC (http://www.rubypal.com)
     
    , Jun 21, 2007
    #12
  13. Sami Samhuri Guest

    On 6/21/07, Stephen Ball <> wrote:
    [...]
    > The /^.*/ pattern specifies that the string must start with anything
    > (e.g. it must have at least one character) and then zero or more
    > characters following.


    ^ anchors the match to beginning of a line or the beginning of the
    string. The second match fails because it's starting from the first
    point after "hello", where it left off. It says nothing about the
    content that follows.

    "".scan /^.*/ => [""]

    > The /.*$/ pattern has no restriction since the anchor is on the side
    > with the * character. So it's parsed as "zero or more of anything
    > before the end of the string".


    This is correct. First it finds the longest match it can in "hello".
    Then it finds nothing, but still anchored at the end of the line. Note
    that $ does not anchor the end of the string, but the end of each line
    within the string or the very end. \z matches the actual end of
    string, while \A does the same for the beginning.

    Hope this helps.

    --
    Sami Samhuri
     
    Sami Samhuri, Jun 21, 2007
    #13
  14. --k+w/mQv8wyuph6w0
    Content-Type: text/plain; charset=us-ascii
    Content-Disposition: inline
    Content-Transfer-Encoding: quoted-printable

    On 2007-06-21 23:12:32 +0900 (Thu, Jun), Rob Biedenharn wrote:
    > On Jun 21, 2007, at 9:47 AM, Stephen Ball wrote:
    > >It's because the pattern /.*/ matches everything, including the
    > >absence of everything. Yes, with the proper regexs you can indeed have
    > >tea and no tea at the same time. Certainly peculiar, but occasionally
    > >useful.

    >=20
    > That still doesn't really explain why "hello".scan(/.*/) =3D> ["hello", =

    =20
    > ""]
    >=20
    > Why wouldn't it be ["hello", "", "", "", "", "", "", "", "", "", "", =20
    > "", ... ] since I (or rather the OP) could continue to match zero =20
    > characters (bytes) at the end of the string forever? It does seem =20
    > that it might be that a termination condition is checked a bit later =20
    > than it should be in this case.


    I would say the condition is checked at the right time, it's just the
    condition is different: it allows checking a match for empty string
    at the end of just-matched string, it does not allow checking empty
    string after ampty string.

    The interesting behaviour is:

    irb(main):035:0> "hello".scan /.*?/
    =3D> ["", "", "", "", "", ""]

    The /.*?/ matches 'zero or more characters, preferring the shortest
    match'. One could ask - where have the actual characters gone?
    Note that it's not an infinite loop of empty strings.
    After matching 'nothing', the start-position for next match is
    increased, skipping one character, to prevent infinite loop of matching
    nothing again.

    *This* behavour may be considered weird, or buggy, and probably results
    are not what was expected.

    But look at:

    irb(main):038:0> "hello".scan /h(.*)e/
    =3D> [[""]]
    irb(main):039:0> "hello".scan /h(.*)(.*)(.*)(.*)(.*)e/
    =3D> [["", "", "", "", ""]]

    Here 'nothing' matches many times, and definitely this *is* the expected
    behaviour.



    --=20
    No virus found in this outgoing message.
    Checked by 'grep -i virus $MESSAGE'
    Trust me.

    --k+w/mQv8wyuph6w0
    Content-Type: application/pgp-signature
    Content-Disposition: inline

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.7-ecc0.1.6 (GNU/Linux)

    iD8DBQFGe6qIsnU0scoWZKARAinZAJ90/W0QbLmoCRwEPshaOTxsvxohRgCeLM0E
    to5oOEBI6bj7NtbiSky/d+c=
    =04Fg
    -----END PGP SIGNATURE-----

    --k+w/mQv8wyuph6w0--
     
    Mariusz Pękala, Jun 22, 2007
    #14
  15. Guest

    > > It's because the pattern /.*/ matches everything, including the
    > > absence of everything.

    >
    > > So: since * matches "zero or more" characters when it starts the
    > > search for .* it matches the absence (the 'zero') and then matches the
    > > string (the 'or more').

    >
    > It's the other way around, though; it matches "hello" *first*, and
    > then "". So the zero-matching (which I admit I'm among those who find
    > unexpected) is happening at the end.


    Oh right, I think I get it now. If you try to match anything with *
    then a match is guaranteed, because if there's nothing to match, then
    you'll just match nothing?

    Like this:

    irb(main):001:0> "hello".scan(/h*/)
    => ["h", "", "", "", "", ""]

    And this:

    irb(main):002:0> "hello".scan(/P*/)
    => ["", "", "", "", "", ""]


    I've always assumed, and used, .* to make everything before,
    but I suppose .+ does make more sense. Although I have to say
    I still find it a bit odd...

    Thanks everyone for your help.
     
    , Jun 22, 2007
    #15
  16. On Jun 22, 2007, at 6:55 AM, Mariusz P=C4=99kala wrote:
    > On 2007-06-21 23:12:32 +0900 (Thu, Jun), Rob Biedenharn wrote:
    >> On Jun 21, 2007, at 9:47 AM, Stephen Ball wrote:
    >>> It's because the pattern /.*/ matches everything, including the
    >>> absence of everything. Yes, with the proper regexs you can indeed =20=


    >>> have
    >>> tea and no tea at the same time. Certainly peculiar, but =20
    >>> occasionally
    >>> useful.

    >>
    >> That still doesn't really explain why "hello".scan(/.*/) =3D> =

    ["hello",
    >> ""]
    >>
    >> Why wouldn't it be ["hello", "", "", "", "", "", "", "", "", "", "",
    >> "", ... ] since I (or rather the OP) could continue to match zero
    >> characters (bytes) at the end of the string forever? It does seem
    >> that it might be that a termination condition is checked a bit later
    >> than it should be in this case.

    >
    > I would say the condition is checked at the right time, it's just the
    > condition is different: it allows checking a match for empty string
    > at the end of just-matched string, it does not allow checking empty
    > string after ampty string.
    >
    > The interesting behaviour is:
    >
    > irb(main):035:0> "hello".scan /.*?/
    > =3D> ["", "", "", "", "", ""]
    >
    > The /.*?/ matches 'zero or more characters, preferring the shortest
    > match'. One could ask - where have the actual characters gone?
    > Note that it's not an infinite loop of empty strings.
    > After matching 'nothing', the start-position for next match is
    > increased, skipping one character, to prevent infinite loop of =20
    > matching
    > nothing again.
    >
    > *This* behavour may be considered weird, or buggy, and probably =20
    > results
    > are not what was expected.


    A great example which I *do* consider to be buggy. The similar =20
    example from perl is something like:
    $ perl -e '$h =3D "hello"; $h =3D~ s/.*?/[$&]/g; print "$h\n";'
    [][h][][e][][l][][l][][o][]

    It matches the empty string at the beginning, between each character, =20=

    and at the end, but it does consume the actual characters of the =20
    string. Even if not what one would anticipate, it's not too hard to =20
    justify the result. (Something that can't be said for ruby's =20
    ["","","","","",""].)

    The other versions from perl are enlightening:
    $ perl -e '$h =3D "hello"; $h =3D~ s/.?/[$&]/g; print "$h\n";'
    [h][e][l][l][o][]

    $ perl -e '$h =3D "hello"; $h =3D~ s/.*/[$&]/g; print "$h\n";'
    [hello][]

    Both succeed in a zero-character match at the end. These are =20
    equivalent in ruby (1.8.5):

    $ ruby -e 'puts "hello".scan(/.?/).inspect'
    ["h", "e", "l", "l", "o", ""]

    $ ruby -e 'puts "hello".scan(/.*/).inspect'
    ["hello", ""]

    I thought I'd see what Oniguruma (5.8.0; with 1.1.0 gem) had to say:

    irb> require 'oniguruma'
    =3D> true
    irb> reluctant =3D Oniguruma::ORegexp.new('.*?')
    =3D> /.*?/
    irb> greedy =3D Oniguruma::ORegexp.new('.*')
    =3D> /.*/
    irb> greedyq =3D Oniguruma::ORegexp.new('.?')
    =3D> /.?/
    irb> reluctant.scan("hello")
    =3D> [#<MatchData:0x10b9aa4>, #<MatchData:0x10b9a7c>, #<MatchData:=20
    0x10b9a68>, #<MatchData:0x10b9a40>, #<MatchData:0x10b9a18>, =20
    #<MatchData:0x10b99f0>]
    irb> reluctant.scan("hello").map{|md|md[0]}
    =3D> ["", "", "", "", "", ""]
    irb> greedy.scan("hello").map{|md|md[0]}
    =3D> ["hello", ""]
    irb> greedyq.scan("hello").map{|md|md[0]}
    =3D> ["h", "e", "l", "l", "o", ""]

    OK, the same result as the ruby Regexp. Including, that .*? produces =20=

    [""]*6 which is the "before each character and at the end" locations =20
    of the zero-length matches from perl, but the individual single-byte =20
    matches are missing.

    I presume that there's some justification for these behaviors, but I =20
    can't figure out what it might be.

    -Rob

    > But look at:
    >
    > irb(main):038:0> "hello".scan /h(.*)e/
    > =3D> [[""]]
    > irb(main):039:0> "hello".scan /h(.*)(.*)(.*)(.*)(.*)e/
    > =3D> [["", "", "", "", ""]]
    >
    > Here 'nothing' matches many times, and definitely this *is* the =20
    > expected
    > behaviour.


    I agree that those results are exactly what I'd expect.

    > --=20
    > No virus found in this outgoing message.
    > Checked by 'grep -i virus $MESSAGE'
    > Trust me.


    Rob Biedenharn http://agileconsultingllc.com
     
    Rob Biedenharn, Jun 22, 2007
    #16
  17. On 21.06.2007 16:12, Rob Biedenharn wrote:
    > On Jun 21, 2007, at 9:47 AM, Stephen Ball wrote:
    >
    >> On 6/20/07, Daniel DeLorme <> wrote:
    >>> That doesn't really explain why the regexp finds an extra empty string.
    >>> I know that zero occurrences is one match but after a greedy match that
    >>> matches everything, there should be (logically?) no other match. I am no
    >>> stranger to regexps and the result is counter-intuitive to me; I would
    >>> consider it a bug. Or at least a very very peculiar behavior.
    >>>
    >>> Daniel

    >>
    >> It's because the pattern /.*/ matches everything, including the
    >> absence of everything. Yes, with the proper regexs you can indeed have
    >> tea and no tea at the same time. Certainly peculiar, but occasionally
    >> useful.
    >> ...
    >> -- Stephen

    >
    > That still doesn't really explain why "hello".scan(/.*/) => ["hello", ""]
    >
    > Why wouldn't it be ["hello", "", "", "", "", "", "", "", "", "", "", "",
    > ... ] since I (or rather the OP) could continue to match zero characters
    > (bytes) at the end of the string forever? It does seem that it might be
    > that a termination condition is checked a bit later than it should be in
    > this case.


    As far as I remember it works like this: first .* matches the whole
    sequence. Then the "cursor" is placed behind the match, i.e. after the
    last char of the match and matching starts again. At this place the
    empty sequence matches because we're at the end of the match. After
    that match the cursor is advanced one step (to avoid endless
    repetitions) and - alas! - we're at the end of the string and matching
    stops.

    For learning regular expressions this is a great program: it allows to
    graphically step through the matching process:
    http://weitz.de/regex-coach/

    See also this thread:
    http://groups.google.de/group/comp....2390ff905f?lnk=st&q=&rnum=10#f759612390ff905f

    Btw, for replacing the whole string this is much better:

    irb(main):001:0> s = "foo"
    => "foo"
    irb(main):002:0> s.object_id
    => 1073540760
    irb(main):003:0> s.replace "bar"
    => "bar"
    irb(main):004:0> s.object_id
    => 1073540760
    irb(main):005:0> s
    => "bar"
    irb(main):006:0>

    Kind regards

    robert
     
    Robert Klemme, Jun 22, 2007
    #17
  18. On 22.06.2007 14:15, wrote:
    >>> It's because the pattern /.*/ matches everything, including the
    >>> absence of everything.
    >>> So: since * matches "zero or more" characters when it starts the
    >>> search for .* it matches the absence (the 'zero') and then matches the
    >>> string (the 'or more').

    >> It's the other way around, though; it matches "hello" *first*, and
    >> then "". So the zero-matching (which I admit I'm among those who find
    >> unexpected) is happening at the end.

    >
    > Oh right, I think I get it now. If you try to match anything with *
    > then a match is guaranteed, because if there's nothing to match, then
    > you'll just match nothing?
    >
    > Like this:
    >
    > irb(main):001:0> "hello".scan(/h*/)
    > => ["h", "", "", "", "", ""]
    >
    > And this:
    >
    > irb(main):002:0> "hello".scan(/P*/)
    > => ["", "", "", "", "", ""]
    >
    >
    > I've always assumed, and used, .* to make everything before,
    > but I suppose .+ does make more sense. Although I have to say
    > I still find it a bit odd...


    ".*" has its use but it's generally overrated, i.e. more often used than
    needed / wanted. If you show a more concrete example of what you are
    doing we might be able to come up with better suggestions. If you are
    really interested to dive into the matter then I suggest "Mastering
    Regular Expressions" which is an excellent book for the money.

    Kind regards

    robert
     
    Robert Klemme, Jun 22, 2007
    #18
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Doris Cox
    Replies:
    0
    Views:
    546
    Doris Cox
    Dec 2, 2003
  2. Jay Douglas
    Replies:
    0
    Views:
    610
    Jay Douglas
    Aug 15, 2003
  3. Gunnar
    Replies:
    3
    Views:
    264
    Robert Klemme
    May 3, 2005
  4. Ted
    Replies:
    5
    Views:
    187
    Eric Bohlman
    May 30, 2006
  5. Noman Shapiro
    Replies:
    0
    Views:
    235
    Noman Shapiro
    Jul 17, 2013
Loading...

Share This Page