Search string for occurneces of words stored in array

Discussion in 'Ruby' started by John Butler, Apr 30, 2008.

  1. John Butler

    John Butler Guest

    Hi,

    I have a sentence "This is my test sentence" and an array["is", "the",
    "my"] and what i need to do is find the occurence of any of thearray
    words in the sentence.

    I have this working in a loop but i was wondering is there a way to do
    it using one of rubys string methods.

    Its sililar to the include method but searching for multiple words not
    just one.

    "This is my test sentence".include?("This") returns true

    but i want something like

    "This is my test sentence".include?("This", "is", "my")

    anyone got a nice way to do this? I only need to find if one of the
    words occure and then i exit.

    JB
    --
    Posted via http://www.ruby-forum.com/.
     
    John Butler, Apr 30, 2008
    #1
    1. Advertising

  2. -----BEGIN PGP SIGNED MESSAGE-----
    Hash: SHA1

    John Butler wrote:
    | Hi,
    |
    | I have a sentence "This is my test sentence" and an array["is", "the",
    | "my"] and what i need to do is find the occurence of any of thearray
    | words in the sentence.
    |
    | I have this working in a loop but i was wondering is there a way to do
    | it using one of rubys string methods.
    |
    | Its sililar to the include method but searching for multiple words not
    | just one.
    |
    | "This is my test sentence".include?("This") returns true
    |
    | but i want something like
    |
    | "This is my test sentence".include?("This", "is", "my")
    |
    | anyone got a nice way to do this? I only need to find if one of the
    | words occure and then i exit.
    |
    | JB

    How about '["is", "the", "my"].each'?

    I.e.:

    ["is", "the", "my"].each do |word|
    ~ break if "the test sentence'.include? word
    end

    - --
    Phillip Gawlowski
    Twitter: twitter.com/cynicalryan
    Blog: http://justarubyist.blogspot.com

    ~ - You know you've been hacking too long when...
    ...you dream that your SO and yourself are icons in a GUI and you can't
    get close to each other because the window manager demands minimum space
    between icons...
    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.8 (MingW32)
    Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

    iEYEARECAAYFAkgYfXsACgkQbtAgaoJTgL8swgCfW6ixKWKPo2HT8CQzGFeDcaNu
    w6sAnRTk5hihGfh0hZMRBiCiOHEceZpA
    =JpPT
    -----END PGP SIGNATURE-----
     
    Phillip Gawlowski, Apr 30, 2008
    #2
    1. Advertising

  3. Hi --

    On Wed, 30 Apr 2008, John Butler wrote:

    > Hi,
    >
    > I have a sentence "This is my test sentence" and an array["is", "the",
    > "my"] and what i need to do is find the occurence of any of thearray
    > words in the sentence.
    >
    > I have this working in a loop but i was wondering is there a way to do
    > it using one of rubys string methods.
    >
    > Its sililar to the include method but searching for multiple words not
    > just one.
    >
    > "This is my test sentence".include?("This") returns true
    >
    > but i want something like
    >
    > "This is my test sentence".include?("This", "is", "my")
    >
    > anyone got a nice way to do this? I only need to find if one of the
    > words occure and then i exit.



    You could use any?

    irb(main):001:0> words = %w{ This is my }
    => ["This", "is", "my"]
    irb(main):002:0> sentence = "This is my test sentence"
    => "This is my test sentence"
    irb(main):003:0> words.any? {|word| sentence.include?(word) }
    => true
    irb(main):004:0> sentence = "Hi"
    => "Hi"
    irb(main):005:0> words.any? {|word| sentence.include?(word) }
    => false

    Another possibility:

    irb(main):009:0> sentence = "This is my test sentence"
    => "This is my test sentence"
    irb(main):010:0> re = Regexp.new(words.join('|'))
    => /This|is|my/
    irb(main):011:0> sentence =~ re
    => 0


    David

    --
    Rails training from David A. Black and Ruby Power and Light:
    INTRO TO RAILS June 9-12 Berlin
    ADVANCING WITH RAILS June 16-19 Berlin
    INTRO TO RAILS June 24-27 London (Skills Matter)
    See http://www.rubypal.com for details and updates!
     
    David A. Black, Apr 30, 2008
    #3
  4. John Butler

    Jens Wille Guest

    Phillip Gawlowski [2008-04-30 16:09]:
    > John Butler wrote:
    > | Hi,
    > |
    > | I have a sentence "This is my test sentence" and an array["is", "the",
    > | "my"] and what i need to do is find the occurence of any of thearray
    > | words in the sentence.
    > |
    > | I have this working in a loop but i was wondering is there a way to do
    > | it using one of rubys string methods.
    > |
    > | Its sililar to the include method but searching for multiple words not
    > | just one.
    > |
    > | "This is my test sentence".include?("This") returns true
    > |
    > | but i want something like
    > |
    > | "This is my test sentence".include?("This", "is", "my")
    > |
    > | anyone got a nice way to do this? I only need to find if one of the
    > | words occure and then i exit.
    > |
    > | JB
    >
    > How about '["is", "the", "my"].each'?
    >
    > I.e.:
    >
    > ["is", "the", "my"].each do |word|
    > ~ break if "the test sentence'.include? word
    > end

    i'd prefer Enumerable#any?:

    sentence, words = "This is my test sentence", ["This", "is", "my"]
    words.any? { |word| sentence.include?(word) }

    or Regexp:

    sentence =~ Regexp.union(*words)

    cheers
    jens

    --
    Jens Wille, Dipl.-Bibl. (FH)
    prometheus - Das verteilte digitale Bildarchiv für Forschung & Lehre
    Kunsthistorisches Institut der Universität zu Köln
    Albertus-Magnus-Platz, D-50923 Köln
    Tel.: +49 (0)221 470-6668, E-Mail:
    http://www.prometheus-bildarchiv.de/
     
    Jens Wille, Apr 30, 2008
    #4
  5. John Butler

    Jens Wille Guest

    ok, i withdraw my post. david's just quicker... ;-)

    Jens Wille [2008-04-30 16:18]:
    > sentence =~ Regexp.union(*words)

    one addition regarding the regexp, though. in case words may contain
    special characters, it's safer to escape them first:

    sentence =~ Regexp.union(*words.map { |word| Regexp.escape(word) })

    cheers
    jens
     
    Jens Wille, Apr 30, 2008
    #5
  6. John Butler

    Ken Bloom Guest

    On Wed, 30 Apr 2008 09:01:11 -0500, John Butler wrote:

    > Hi,
    >
    > I have a sentence "This is my test sentence" and an array["is", "the",
    > "my"] and what i need to do is find the occurence of any of thearray
    > words in the sentence.
    >
    > I have this working in a loop but i was wondering is there a way to do
    > it using one of rubys string methods.
    >
    > Its sililar to the include method but searching for multiple words not
    > just one.
    >
    > "This is my test sentence".include?("This") returns true
    >
    > but i want something like
    >
    > "This is my test sentence".include?("This", "is", "my")
    >
    > anyone got a nice way to do this? I only need to find if one of the
    > words occure and then i exit.
    >
    > JB


    Ruby quiz #103: the DictionaryMatcher
    http://www.rubyquiz.com/quiz103.html

    You may need to do "This is my test sentence".split.any?{...} if it has
    to specifically be on words. Note that
    "I am running home".include? "run"
    returns true, as does "abc def".include? "c d"

    --Ken

    --
    Ken (Chanoch) Bloom. PhD candidate. Linguistic Cognition Laboratory.
    Department of Computer Science. Illinois Institute of Technology.
    http://www.iit.edu/~kbloom1/
     
    Ken Bloom, Apr 30, 2008
    #6
  7. Hi --

    On Wed, 30 Apr 2008, Jens Wille wrote:

    > ok, i withdraw my post. david's just quicker... ;-)


    Yeah, but yours is cooler because you remembered Regexp.union :)

    > Jens Wille [2008-04-30 16:18]:
    >> sentence =~ Regexp.union(*words)

    > one addition regarding the regexp, though. in case words may contain
    > special characters, it's safer to escape them first:
    >
    > sentence =~ Regexp.union(*words.map { |word| Regexp.escape(word) })


    It actually does it for you:

    Regexp.union("a",".b")
    => /a|\.b/


    David

    --
    Rails training from David A. Black and Ruby Power and Light:
    INTRO TO RAILS June 9-12 Berlin
    ADVANCING WITH RAILS June 16-19 Berlin
    INTRO TO RAILS June 24-27 London (Skills Matter)
    See http://www.rubypal.com for details and updates!
     
    David A. Black, Apr 30, 2008
    #7
  8. John Butler

    Jens Wille Guest

    David A. Black [2008-04-30 16:29]:
    >> Jens Wille [2008-04-30 16:18]:
    >>> sentence =~ Regexp.union(*words)

    >> one addition regarding the regexp, though. in case words may
    >> contain special characters, it's safer to escape them first:
    >>
    >> sentence =~ Regexp.union(*words.map { |word| Regexp.escape(word) })

    > It actually does it for you:
    >
    > Regexp.union("a",".b") => /a|\.b/

    ha, didn't know that ;-) thank you!
     
    Jens Wille, Apr 30, 2008
    #8
  9. John Butler

    Roger Pack Guest

    I'd write my own
    class String
    def includes_all? array
    # stuff
    end
    end
    > "This is my test sentence".includes_all?("This", "is", "my")

    --
    Posted via http://www.ruby-forum.com/.
     
    Roger Pack, Apr 30, 2008
    #9
  10. On 30.04.2008 16:18, Jens Wille wrote:
    > Phillip Gawlowski [2008-04-30 16:09]:
    >> John Butler wrote:
    >> | Hi,
    >> |
    >> | I have a sentence "This is my test sentence" and an array["is", "the",
    >> | "my"] and what i need to do is find the occurence of any of thearray
    >> | words in the sentence.
    >> |
    >> | I have this working in a loop but i was wondering is there a way to do
    >> | it using one of rubys string methods.
    >> |
    >> | Its sililar to the include method but searching for multiple words not
    >> | just one.
    >> |
    >> | "This is my test sentence".include?("This") returns true
    >> |
    >> | but i want something like
    >> |
    >> | "This is my test sentence".include?("This", "is", "my")
    >> |
    >> | anyone got a nice way to do this? I only need to find if one of the
    >> | words occure and then i exit.
    >> |
    >> | JB
    >>
    >> How about '["is", "the", "my"].each'?
    >>
    >> I.e.:
    >>
    >> ["is", "the", "my"].each do |word|
    >> ~ break if "the test sentence'.include? word
    >> end

    > i'd prefer Enumerable#any?:
    >
    > sentence, words = "This is my test sentence", ["This", "is", "my"]
    > words.any? { |word| sentence.include?(word) }


    I'd rather do it the other way round, i.e. iterate over the sentence and
    test words since the sentence is potentially longer:

    irb(main):001:0> require 'enumerator'
    => true
    irb(main):002:0> require 'set'
    => true
    irb(main):003:0> words = %w{This is my}.to_set
    => #<Set: {"my", "This", "is"}>
    irb(main):004:0> "This is my test sentence".to_enum:)scan,/\w+/).any?
    {|w| words.include? w}
    => true
    irb(main):005:0>

    Kind regards

    robert
     
    Robert Klemme, Apr 30, 2008
    #10
  11. Hi --

    On Thu, 1 May 2008, Robert Klemme wrote:

    > On 30.04.2008 16:18, Jens Wille wrote:
    >> Phillip Gawlowski [2008-04-30 16:09]:
    >>> John Butler wrote:
    >>> | Hi,
    >>> |
    >>> | I have a sentence "This is my test sentence" and an array["is", "the",
    >>> | "my"] and what i need to do is find the occurence of any of thearray
    >>> | words in the sentence.
    >>> |
    >>> | I have this working in a loop but i was wondering is there a way to do
    >>> | it using one of rubys string methods.
    >>> |
    >>> | Its sililar to the include method but searching for multiple words not
    >>> | just one.
    >>> |
    >>> | "This is my test sentence".include?("This") returns true
    >>> |
    >>> | but i want something like
    >>> |
    >>> | "This is my test sentence".include?("This", "is", "my")
    >>> |
    >>> | anyone got a nice way to do this? I only need to find if one of the
    >>> | words occure and then i exit.
    >>> |
    >>> | JB
    >>>
    >>> How about '["is", "the", "my"].each'?
    >>>
    >>> I.e.:
    >>>
    >>> ["is", "the", "my"].each do |word|
    >>> ~ break if "the test sentence'.include? word
    >>> end

    >> i'd prefer Enumerable#any?:
    >>
    >> sentence, words = "This is my test sentence", ["This", "is", "my"]
    >> words.any? { |word| sentence.include?(word) }

    >
    > I'd rather do it the other way round, i.e. iterate over the sentence and test
    > words since the sentence is potentially longer:
    >
    > irb(main):001:0> require 'enumerator'
    > => true
    > irb(main):002:0> require 'set'
    > => true
    > irb(main):003:0> words = %w{This is my}.to_set
    > => #<Set: {"my", "This", "is"}>
    > irb(main):004:0> "This is my test sentence".to_enum:)scan,/\w+/).any? {|w|
    > words.include? w}
    > => true
    > irb(main):005:0>


    Is there any reason not to just do:

    "This is my test sentence".scan(/\w+/).any? {|w| words.include? w }


    David

    --
    Rails training from David A. Black and Ruby Power and Light:
    INTRO TO RAILS June 9-12 Berlin
    ADVANCING WITH RAILS June 16-19 Berlin
    INTRO TO RAILS June 24-27 London (Skills Matter)
    See http://www.rubypal.com for details and updates!
     
    David A. Black, Apr 30, 2008
    #11
  12. On 30.04.2008 23:40, David A. Black wrote:
    > Hi --
    >
    > On Thu, 1 May 2008, Robert Klemme wrote:
    >
    >> On 30.04.2008 16:18, Jens Wille wrote:
    >>> Phillip Gawlowski [2008-04-30 16:09]:
    >>>> John Butler wrote:
    >>>> | Hi,
    >>>> |
    >>>> | I have a sentence "This is my test sentence" and an array["is", "the",
    >>>> | "my"] and what i need to do is find the occurence of any of thearray
    >>>> | words in the sentence.
    >>>> |
    >>>> | I have this working in a loop but i was wondering is there a way to do
    >>>> | it using one of rubys string methods.
    >>>> |
    >>>> | Its sililar to the include method but searching for multiple words not
    >>>> | just one.
    >>>> |
    >>>> | "This is my test sentence".include?("This") returns true
    >>>> |
    >>>> | but i want something like
    >>>> |
    >>>> | "This is my test sentence".include?("This", "is", "my")
    >>>> |
    >>>> | anyone got a nice way to do this? I only need to find if one of the
    >>>> | words occure and then i exit.
    >>>> |
    >>>> | JB
    >>>>
    >>>> How about '["is", "the", "my"].each'?
    >>>>
    >>>> I.e.:
    >>>>
    >>>> ["is", "the", "my"].each do |word|
    >>>> ~ break if "the test sentence'.include? word
    >>>> end
    >>> i'd prefer Enumerable#any?:
    >>>
    >>> sentence, words = "This is my test sentence", ["This", "is", "my"]
    >>> words.any? { |word| sentence.include?(word) }

    >> I'd rather do it the other way round, i.e. iterate over the sentence and test
    >> words since the sentence is potentially longer:
    >>
    >> irb(main):001:0> require 'enumerator'
    >> => true
    >> irb(main):002:0> require 'set'
    >> => true
    >> irb(main):003:0> words = %w{This is my}.to_set
    >> => #<Set: {"my", "This", "is"}>
    >> irb(main):004:0> "This is my test sentence".to_enum:)scan,/\w+/).any? {|w|
    >> words.include? w}
    >> => true
    >> irb(main):005:0>

    >
    > Is there any reason not to just do:
    >
    > "This is my test sentence".scan(/\w+/).any? {|w| words.include? w }


    Yes. I used to_enum:)scan,/\w+/) because in this class of problems the
    text (sentence) is tends to be large. The approach using to_enum does
    the test while traversing while scan approach first converts the whole
    text into words and then applies the test thus iterating twice over the
    whole text plus doing more conversions (to words) and needs more
    temporary memory (i.e. for the whole sequence of words, although the
    overhead might be small because of internal String memory sharing).

    The Set approach scales better for larger sets of words because the Set
    lookup is O(1) while an Array based lookup is O(n).

    I am not saying that my approach is faster under all circumstances. But
    it surely scales better.

    Kind regards

    robert
     
    Robert Klemme, Apr 30, 2008
    #12
  13. Hi --

    On Wed, 30 Apr 2008, David A. Black wrote:

    > Hi --
    >
    > On Wed, 30 Apr 2008, John Butler wrote:
    >
    >> Hi,
    >>
    >> I have a sentence "This is my test sentence" and an array["is", "the",
    >> "my"] and what i need to do is find the occurence of any of thearray
    >> words in the sentence.
    >>
    >> I have this working in a loop but i was wondering is there a way to do
    >> it using one of rubys string methods.
    >>
    >> Its sililar to the include method but searching for multiple words not
    >> just one.
    >>
    >> "This is my test sentence".include?("This") returns true
    >>
    >> but i want something like
    >>
    >> "This is my test sentence".include?("This", "is", "my")
    >>
    >> anyone got a nice way to do this? I only need to find if one of the
    >> words occure and then i exit.

    >
    >
    > You could use any?
    >
    > irb(main):001:0> words = %w{ This is my }
    > => ["This", "is", "my"]
    > irb(main):002:0> sentence = "This is my test sentence"
    > => "This is my test sentence"
    > irb(main):003:0> words.any? {|word| sentence.include?(word) }
    > => true
    > irb(main):004:0> sentence = "Hi"
    > => "Hi"
    > irb(main):005:0> words.any? {|word| sentence.include?(word) }
    > => false


    Actually, sentence.include?(word) isn't good, because it will give
    false positives (for substrings).


    David

    --
    Rails training from David A. Black and Ruby Power and Light:
    INTRO TO RAILS June 9-12 Berlin
    ADVANCING WITH RAILS June 16-19 Berlin
    INTRO TO RAILS June 24-27 London (Skills Matter)
    See http://www.rubypal.com for details and updates!
     
    David A. Black, Apr 30, 2008
    #13
  14. On 30.04.2008 23:48, Robert Klemme wrote:
    > On 30.04.2008 23:40, David A. Black wrote:
    >> Hi --
    >>
    >> On Thu, 1 May 2008, Robert Klemme wrote:
    >>
    >>> On 30.04.2008 16:18, Jens Wille wrote:
    >>>> Phillip Gawlowski [2008-04-30 16:09]:
    >>>>> John Butler wrote:
    >>>>> | Hi,
    >>>>> |
    >>>>> | I have a sentence "This is my test sentence" and an array["is",
    >>>>> "the",
    >>>>> | "my"] and what i need to do is find the occurence of any of thearray
    >>>>> | words in the sentence.
    >>>>> |
    >>>>> | I have this working in a loop but i was wondering is there a way
    >>>>> to do
    >>>>> | it using one of rubys string methods.
    >>>>> |
    >>>>> | Its sililar to the include method but searching for multiple
    >>>>> words not
    >>>>> | just one.
    >>>>> |
    >>>>> | "This is my test sentence".include?("This") returns true
    >>>>> |
    >>>>> | but i want something like
    >>>>> |
    >>>>> | "This is my test sentence".include?("This", "is", "my")
    >>>>> |
    >>>>> | anyone got a nice way to do this? I only need to find if one of the
    >>>>> | words occure and then i exit.
    >>>>> |
    >>>>> | JB
    >>>>>
    >>>>> How about '["is", "the", "my"].each'?
    >>>>>
    >>>>> I.e.:
    >>>>>
    >>>>> ["is", "the", "my"].each do |word|
    >>>>> ~ break if "the test sentence'.include? word
    >>>>> end
    >>>> i'd prefer Enumerable#any?:
    >>>>
    >>>> sentence, words = "This is my test sentence", ["This", "is", "my"]
    >>>> words.any? { |word| sentence.include?(word) }
    >>> I'd rather do it the other way round, i.e. iterate over the sentence
    >>> and test words since the sentence is potentially longer:
    >>>
    >>> irb(main):001:0> require 'enumerator'
    >>> => true
    >>> irb(main):002:0> require 'set'
    >>> => true
    >>> irb(main):003:0> words = %w{This is my}.to_set
    >>> => #<Set: {"my", "This", "is"}>
    >>> irb(main):004:0> "This is my test sentence".to_enum:)scan,/\w+/).any?
    >>> {|w| words.include? w}
    >>> => true
    >>> irb(main):005:0>

    >>
    >> Is there any reason not to just do:
    >>
    >> "This is my test sentence".scan(/\w+/).any? {|w| words.include? w }

    >
    > Yes. I used to_enum:)scan,/\w+/) because in this class of problems the
    > text (sentence) is tends to be large. The approach using to_enum does
    > the test while traversing while scan approach first converts the whole
    > text into words and then applies the test thus iterating twice over the
    > whole text plus doing more conversions (to words) and needs more
    > temporary memory (i.e. for the whole sequence of words, although the
    > overhead might be small because of internal String memory sharing).
    >
    > The Set approach scales better for larger sets of words because the Set
    > lookup is O(1) while an Array based lookup is O(n).
    >
    > I am not saying that my approach is faster under all circumstances. But
    > it surely scales better.


    Well, I did a little benchmarking and it turns out that I probably spoke
    too soon. As often - assumptions should be verified against measurable
    reality.

    Here's the numbers. I leave the analysis for the reader, but keep in
    mind that the situation might change significantly if the input text
    needs to be read via IO (from a file etc.). :)

    Kind regards

    robert



    robert@fussel /cygdrive/c/Temp
    $ ./scan.rb
    Rehearsal -------------------------------------------------------
    head arr std 7.578000 0.063000 7.641000 ( 7.628000)
    head arr enum 0.000000 0.000000 0.000000 ( 0.000000)
    head set std 8.016000 0.031000 8.047000 ( 8.043000)
    head set enum 0.000000 0.000000 0.000000 ( 0.000000)
    head rarr std 7.968000 0.016000 7.984000 ( 8.041000)
    head rarr enum 0.000000 0.000000 0.000000 ( 0.002000)
    head rx 0.000000 0.000000 0.000000 ( 0.000000)
    tail arr std 20.203000 0.000000 20.203000 ( 20.390000)
    tail arr enum 32.079000 0.000000 32.079000 ( 33.039000)
    tail set std 15.421000 0.031000 15.452000 ( 15.616000)
    tail set enum 26.672000 0.016000 26.688000 ( 26.721000)
    tail rarr std 19.782000 0.031000 19.813000 ( 19.811000)
    tail rarr enum 31.281000 0.000000 31.281000 ( 31.360000)
    tail rx 0.078000 0.000000 0.078000 ( 0.080000)
    mid arr std 13.828000 0.031000 13.859000 ( 13.853000)
    mid arr enum 15.781000 0.000000 15.781000 ( 15.814000)
    mid set std 11.485000 0.063000 11.548000 ( 11.559000)
    mid set enum 12.953000 0.000000 12.953000 ( 12.961000)
    mid rarr std 14.156000 0.062000 14.218000 ( 14.231000)
    mid rarr enum 15.375000 0.016000 15.391000 ( 15.412000)
    mid rx 0.031000 0.000000 0.031000 ( 0.039000)
    -------------------------------------------- total: 253.047000sec

    user system total real
    head arr std 7.031000 0.062000 7.093000 ( 7.086000)
    head arr enum 0.000000 0.000000 0.000000 ( 0.000000)
    head set std 7.078000 0.063000 7.141000 ( 7.131000)
    head set enum 0.000000 0.000000 0.000000 ( 0.000000)
    head rarr std 7.000000 0.125000 7.125000 ( 7.129000)
    head rarr enum 0.000000 0.000000 0.000000 ( 0.000000)
    head rx 0.000000 0.000000 0.000000 ( 0.000000)
    tail arr std 19.282000 0.031000 19.313000 ( 19.341000)
    tail arr enum 30.328000 0.078000 30.406000 ( 30.658000)
    tail set std 14.594000 0.000000 14.594000 ( 14.600000)
    tail set enum 25.360000 0.000000 25.360000 ( 25.403000)
    tail rarr std 19.047000 0.016000 19.063000 ( 19.076000)
    tail rarr enum 29.922000 0.000000 29.922000 ( 29.984000)
    tail rx 0.078000 0.000000 0.078000 ( 0.082000)
    mid arr std 13.297000 0.000000 13.297000 ( 13.312000)
    mid arr enum 14.453000 0.000000 14.453000 ( 14.451000)
    mid set std 10.954000 0.031000 10.985000 ( 11.012000)
    mid set enum 12.093000 0.000000 12.093000 ( 12.155000)
    mid rarr std 13.312000 0.000000 13.312000 ( 13.346000)
    mid rarr enum 14.375000 0.000000 14.375000 ( 14.389000)
    mid rx 0.031000 0.000000 0.031000 ( 0.037000)

    robert@fussel /cygdrive/c/Temp
    $ cat scan.rb
    #!/bin/env ruby

    require 'set'
    require 'enumerator'

    require 'benchmark'

    TEXT_FRONT = ("a" << (" x" * 1_000_000)).freeze
    TEXT_TAIL = (("x " * 1_000_000) << "a").freeze
    TEXT_MID = (("x " * 500_000) << "a" << (" x" * 500_000)).freeze
    WORDS = %w{a b c d e f}.freeze
    REV_WORDS = WORDS.reverse.freeze
    SET_WORDS = WORDS.to_set.freeze
    RX = Regexp.new("\\b#{Regexp.union(*WORDS)}\\b")

    TEXTS = {
    "head" => TEXT_FRONT,
    "mid" => TEXT_MID,
    "tail" => TEXT_TAIL,
    }

    TESTER = {
    "arr" => WORDS,
    "rarr" => REV_WORDS,
    "set" => SET_WORDS,
    }

    REPEAT = 5

    Benchmark.bmbm 20 do |b|
    TEXTS.each do |tlabel, text|
    TESTER.each do |lab,enum|
    b.report "#{tlabel} #{lab} std" do
    REPEAT.times do
    text.scan(/\w+/).any? {|w| enum.include? w}
    end
    end

    b.report "#{tlabel} #{lab} enum" do
    REPEAT.times do
    text.to_enum:)scan, /\w+/).any? {|w| enum.include? w}
    end
    end
    end

    b.report "#{tlabel} rx" do
    REPEAT.times do
    RX =~ text
    end
    end
    end
    end

    robert@fussel /cygdrive/c/Temp
    $
     
    Robert Klemme, May 1, 2008
    #14
  15. | "This is my test sentence".include?("This") returns true
    |
    | but i want something like
    |
    | "This is my test sentence".include?("This", "is", "my")


    Yet another solution:

    "This is my test sentence".split & ["This", "is", "my"]
    --
    Posted via http://www.ruby-forum.com/.
     
    Albert Schlef, May 1, 2008
    #15
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Peter Strøiman
    Replies:
    1
    Views:
    2,132
    Peter Strøiman
    Aug 23, 2005
  2. Richard Heathfield
    Replies:
    7
    Views:
    393
    Barry Schwarz
    Oct 5, 2003
  3. BerlinBrown
    Replies:
    6
    Views:
    4,831
  4. Ken Fine
    Replies:
    4
    Views:
    214
    Ken Fine
    Aug 14, 2003
  5. pantagruel
    Replies:
    8
    Views:
    480
    Dr John Stockton
    Jul 22, 2006
Loading...

Share This Page