Count substrings from a string

Discussion in 'Ruby' started by Zangief Ief, Nov 18, 2007.

  1. Zangief Ief

    Zangief Ief Guest

    Hi,

    I am able to know if there is a substring in a string with:

    if string =~ /#{word}/:
    puts "match"
    else
    puts "do not match"
    end

    But I would like to count all the substring from a string...

    Thanks for help :)
    --
    Posted via http://www.ruby-forum.com/.
     
    Zangief Ief, Nov 18, 2007
    #1
    1. Advertising

  2. Zangief Ief wrote:
    > Hi,
    >
    > I am able to know if there is a substring in a string with:
    >
    > if string =~ /#{word}/:
    > puts "match"
    > else
    > puts "do not match"
    > end
    >
    > But I would like to count all the substring from a string...
    >
    > Thanks for help :)


    irb(main):001:0> "one word is one word, never be three
    words".scan(/word/).length
    => 3

    :), Wolfgang Nádasi-Donner
    --
    Posted via http://www.ruby-forum.com/.
     
    Wolfgang Nádasi-Donner, Nov 18, 2007
    #2
    1. Advertising

  3. Zangief Ief

    Robert Dober Guest

    On Nov 18, 2007 3:56 PM, Wolfgang N=E1dasi-Donner <> wro=
    te:
    > Zangief Ief wrote:
    > > Hi,
    > >
    > > I am able to know if there is a substring in a string with:
    > >
    > > if string =3D~ /#{word}/:
    > > puts "match"
    > > else
    > > puts "do not match"
    > > end
    > >
    > > But I would like to count all the substring from a string...
    > >
    > > Thanks for help :)

    >
    > irb(main):001:0> "one word is one word, never be three
    > words".scan(/word/).length
    > =3D> 3
    >
    > :), Wolfgang N=E1dasi-Donner
    >
    > --
    > Posted via http://www.ruby-forum.com/.
    >
    >

    It is not clear what you want, I use an idiom like the following to
    allow for overlap

    class String
    def all_subs substr, overlap =3D false
    return scan(substr) unless overlap
    (0...size).inject([]){ |r, i|
    r <<
    ( substr.match(self[i..-1])[0] rescue nil )
    }.compact
    end
    end


    p "aaaa".all_subs( /aaa/ )
    p "aaaa".all_subs( /aaa/, true )

    too bad scan does not have an overlap parameter

    HTH
    Robert


    --=20
    what do I think about Ruby?
    http://ruby-smalltalk.blogspot.com/
     
    Robert Dober, Nov 18, 2007
    #3
  4. Robert Dober wrote:
    > too bad scan does not have an overlap parameter


    It has the possibility :)


    irb(main):001:0> "aaaaaa".scan(/(?=aa)/) do
    irb(main):002:1* puts "#{$~.pre_match}-#{$~[0]}-#{$~.post_match}"
    irb(main):003:1> end
    --aaaaaa
    a--aaaaa
    aa--aaaa
    aaa--aaa
    aaaa--aa
    => "aaaaaa"
    irb(main):004:0> "aaaaaa".scan(/(?=aa)/).length
    => 5

    With the look ahead features severals things can be done.

    Wolfgang Nádasi-Donner
    --
    Posted via http://www.ruby-forum.com/.
     
    Wolfgang Nádasi-Donner, Nov 18, 2007
    #4
  5. Zangief Ief

    Robert Dober Guest

    On Nov 18, 2007 4:16 PM, Wolfgang N=E1dasi-Donner <> wro=
    te:
    > Robert Dober wrote:
    > > too bad scan does not have an overlap parameter

    >
    > It has the possibility :)

    I am not willing to pay that price;)
    but let me see, in case of sparse overlaps ( and that is often the
    case ) the lookahead might be less costly than my
    brute force approach, hmm, not sure, I will try

    38/38 > cat substr.rb && ruby substr.rb
    # vim: sw=3D2 ts=3D2 ft=3Druby expandtab tw=3D0 nu syn:
    #
    class String
    def bf_sub substr
    (0...size).inject([]){ |r, i|
    r <<
    ( substr.match(self[i..-1])[0] rescue nil )
    }.compact
    end

    def la_sub substr
    scan /(?=3D#{substr})/
    end
    end

    require 'benchmark'

    N =3D 1_000
    A =3D "a" * 100
    S =3D /aa/
    B =3D "b " * 20
    T =3D /b\s/
    Benchmark::bmbm do |bm|
    bm.report "look ahead on many occurrances" do
    N.times do
    A.la_sub S
    end
    end
    bm.report "brute force on many occurrances" do
    N.times do
    A.bf_sub S
    end
    end

    bm.report "look ahead on sparse occurrances" do
    N.times do
    B.la_sub T
    end
    end
    bm.report "brute force on sparse occurrances" do
    N.times do
    B.bf_sub T
    end
    end
    end

    Rehearsal -----------------------------------------------------------------=
    ----
    look ahead on many occurrances 0.150000 0.000000 0.150000 ( 0.240=
    576)
    brute force on many occurrances 1.020000 0.030000 1.050000 ( 1.327=
    992)
    look ahead on sparse occurrances 0.050000 0.000000 0.050000 ( 0.125=
    577)
    brute force on sparse occurrances 2.720000 0.090000 2.810000 ( 4.508=
    353)
    ------------------------------------------------------------ total: 4.06000=
    0sec

    user system total r=
    eal
    look ahead on many occurrances 0.150000 0.000000 0.150000 ( 0.184=
    002)
    brute force on many occurrances 0.890000 0.050000 0.940000 ( 1.243=
    384)
    look ahead on sparse occurrances 0.050000 0.000000 0.050000 ( 0.051=
    307)
    brute force on sparse occurrances 2.620000 0.090000 2.710000 ( 3.080=
    540)
    ---------------------------------------------------------------------------=
    --------------------------------
    $stdout.sub("urranc","urrenc")

    Turns out lookahead is quite ok, brute force hopeless even in the case
    suiting it best :(, wonder how this would scale if there
    were a Regexp#match allowing for an offset.

    Cheers
    Robert


    --=20
    what do I think about Ruby?
    http://ruby-smalltalk.blogspot.com/
     
    Robert Dober, Nov 18, 2007
    #5
  6. Robert Dober wrote:
    > ...wonder how this would scale if there were a Regexp#match allowing for an offset.



    ...which will come in Ruby 1.9:

    C:\Dokumente und Einstellungen\wolfgang>irb19
    irb(main):001:0> "abcdef".match(/./,3)
    => #<MatchData "d">

    Wolfgang Nádasi-Donner
    --
    Posted via http://www.ruby-forum.com/.
     
    Wolfgang Nádasi-Donner, Nov 18, 2007
    #6
  7. Zangief Ief

    Zangief Ief Guest

    Zangief Ief, Nov 18, 2007
    #7
  8. Zangief Ief

    Robert Dober Guest

    On Nov 18, 2007 6:29 PM, Wolfgang N=E1dasi-Donner <> wro=
    te:
    > Robert Dober wrote:
    > > ...wonder how this would scale if there were a Regexp#match allowing fo=

    r an offset.
    >
    >
    > ...which will come in Ruby 1.9:
    >
    > C:\Dokumente und Einstellungen\wolfgang>irb19
    > irb(main):001:0> "abcdef".match(/./,3)
    > =3D> #<MatchData "d">
    >

    Great :)

    I have written a 1.9 test and still lookahead is much better, I am
    surpprised as I got rid of all fancy stuff in the
    bf_sub method going back to my Regexp stuff happily again.

    def bf_sub substr
    r =3D []
    size.times do |i|
    s =3D
    substr.match(self,i)
    r << ( s && s[0] )
    end
    r.compact
    end

    512/12 > ruby1.9 substr-1.9.rb
    Rehearsal -----------------------------------------------------------------=
    ----
    look ahead on many occurrances 0.200000 0.000000 0.200000 ( 0.308=
    822)
    brute force on many occurrances 0.530000 0.000000 0.530000 ( 0.649=
    798)
    look ahead on sparse occurrances 0.070000 0.000000 0.070000 ( 0.171=
    571)
    brute force on sparse occurrances 1.260000 0.000000 1.260000 ( 1.517=
    635)
    ------------------------------------------------------------ total: 2.06000=
    0sec

    user system total r=
    eal
    look ahead on many occurrances 0.200000 0.000000 0.200000 ( 0.531=
    080)
    brute force on many occurrances 0.530000 0.000000 0.530000 ( 0.641=
    892)
    look ahead on sparse occurrances 0.070000 0.000000 0.070000 ( 0.158=
    519)
    brute force on sparse occurrances 1.240000 0.000000 1.240000 ( 1.389=
    619)



    Original test
    507/7 > ruby substr.rb
    Rehearsal -----------------------------------------------------------------=
    ----
    look ahead on many occurrances 0.140000 0.000000 0.140000 ( 0.148=
    194)
    brute force on many occurrances 0.770000 0.020000 0.790000 ( 1.049=
    865)
    look ahead on sparse occurrances 0.040000 0.000000 0.040000 ( 0.059=
    984)
    brute force on sparse occurrances 2.130000 0.050000 2.180000 ( 2.316=
    258)
    ------------------------------------------------------------ total: 3.15000=
    0sec

    user system total r=
    eal
    look ahead on many occurrances 0.120000 0.000000 0.120000 ( 0.147=
    802)
    brute force on many occurrances 0.770000 0.030000 0.800000 ( 0.838=
    135)
    look ahead on sparse occurrances 0.040000 0.000000 0.040000 ( 0.057=
    320)
    brute force on sparse occurrances 2.070000 0.070000 2.140000 ( 2.398=
    068)

    >
    > Wolfgang N=E1dasi-Donner
    > --
    > Posted via http://www.ruby-forum.com/.
    >
    >



    R.
    --=20
    what do I think about Ruby?
    http://ruby-smalltalk.blogspot.com/
     
    Robert Dober, Nov 18, 2007
    #8
  9. On 18.11.2007 19:57, Robert Dober wrote:
    > On Nov 18, 2007 6:29 PM, Wolfgang Nádasi-Donner <>wrote:
    >> Robert Dober wrote:
    >>> ...wonder how this would scale if there were a Regexp#match allowing for an offset.

    >>
    >> ...which will come in Ruby 1.9:
    >>
    >> C:\Dokumente und Einstellungen\wolfgang>irb19
    >> irb(main):001:0> "abcdef".match(/./,3)
    >> => #<MatchData "d">
    >>

    > Great :)
    >
    > I have written a 1.9 test and still lookahead is much better, I am
    > surpprised as I got rid of all fancy stuff in the
    > bf_sub method going back to my Regexp stuff happily again.


    IMHO it is not surprising because the lookahead method uses one method
    (scan) which is implemented in C while you create a lot of temporary
    objects and also start scanning at *every* position in the string.
    Usually regexp engines are implemented much smarter than we can do in
    short method. I am sure you have heard of Boyer Moore and Knuth Morris
    Pratt algorithms... :)

    > def bf_sub substr
    > r = []
    > size.times do |i|
    > s =
    > substr.match(self,i)
    > r << ( s && s[0] )
    > end
    > r.compact
    > end


    Kind regards

    robert
     
    Robert Klemme, Nov 18, 2007
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Tung Chau
    Replies:
    1
    Views:
    481
    SM Ryan
    Aug 6, 2004
  2. Tung Chau
    Replies:
    0
    Views:
    388
    Tung Chau
    Aug 6, 2004
  3. JD
    Replies:
    1
    Views:
    343
  4. Danny Challis
    Replies:
    17
    Views:
    392
  5. Michele Dondi
    Replies:
    0
    Views:
    196
    Michele Dondi
    Nov 15, 2007
Loading...

Share This Page