whitespace string only

Discussion in 'Ruby' started by Henrik Horneber, Sep 23, 2004.

  1. Hi!

    What's the best way to test if a string only consists of whitespaces and
    newlines?

    best I could come up with is


    class String

    def is_whitespace_only?
    strings_to_test = split("\n")
    whitespace = /^\s+$/
    is_whitespace_only = true
    strings_to_test.each{ |str|
    unless whitespace.match(str) or str.empty?
    is_whitespace_only = false
    break
    end
    }
    is_whitespace_only
    end

    end

    But somehow I think there should be a better way to do it. Any ideas?
    Is it okay to add such methods to class String itself?

    Any advices appreciated.

    regards,
    Henrik
    Henrik Horneber, Sep 23, 2004
    #1
    1. Advertising

  2. Henrik Horneber

    Evan Webb Guest

    def only_whitespace?
    each_byte { |b| return false if b != 32 }
    true
    end


    On Thu, 2004-09-23 at 00:54, Henrik Horneber wrote:
    > Hi!
    >
    > What's the best way to test if a string only consists of whitespaces and
    > newlines?
    >
    > best I could come up with is
    >
    >
    > class String
    >
    > def is_whitespace_only?
    > strings_to_test = split("\n")
    > whitespace = /^\s+$/
    > is_whitespace_only = true
    > strings_to_test.each{ |str|
    > unless whitespace.match(str) or str.empty?
    > is_whitespace_only = false
    > break
    > end
    > }
    > is_whitespace_only
    > end
    >
    > end
    >
    > But somehow I think there should be a better way to do it. Any ideas?
    > Is it okay to add such methods to class String itself?
    >
    > Any advices appreciated.
    >
    > regards,
    > Henrik
    >
    Evan Webb, Sep 23, 2004
    #2
    1. Advertising

  3. Henrik Horneber

    MiG Guest

    I think regexp should be is faster than each_byte. What about this?

    class String

    def whitespace_only? str
    str.split(/\n/).each { |x|
    return false unless x =~ /^\s*$/
    }
    true
    end

    end


    MiG
    MiG, Sep 23, 2004
    #3
  4. Henrik Horneber

    Evan Webb Guest

    I highly doubt a regex is faster than each_byte. each_byte has very
    little code and is very fast (looping over the array in C and casting
    the chars to fixnums), where as with a regex it has to pass through the
    regex parser, get pulled back out as an object, pushed back into split,
    which there in turn returns a potentially huge array which you pull back
    again to run over with each. Then you've done another comparison with a
    regex within the block which i guarantee is much slower then comparing 2
    Fixnums.

    My initial version didnt do \n, only white space, so here's my updated
    version that even does tabs.

    class String
    def only_ws?
    each_byte { |b| return false unless [9,10,32].include?(b) }
    true
    end
    end

    Evan Webb //


    On Thu, 2004-09-23 at 01:11, MiG wrote:
    > I think regexp should be is faster than each_byte. What about this?
    >
    > class String
    >
    > def whitespace_only? str
    > str.split(/\n/).each { |x|
    > return false unless x =~ /^\s*$/
    > }
    > true
    > end
    >
    > end
    >
    >
    > MiG
    >
    Evan Webb, Sep 23, 2004
    #4
  5. On Thursday 23 September 2004 03:54 am, Henrik Horneber wrote:
    > Hi!
    >
    > What's the best way to test if a string only consists of whitespaces and
    > newlines?


    Unless you're being more specific:

    str.strip.length == 0

    Also matching against something like

    /\A\s*\z/m

    but I'm no Regexp expert by a long shot ;)

    T.

    --
    ( o _ カラãƒ
    // trans.
    / \

    I don't give a damn for a man that can only spell a word one way.
    -Mark Twain
    trans. (T. Onoma), Sep 23, 2004
    #5
  6. "Henrik Horneber" <> schrieb im Newsbeitrag
    news:...
    > Hi!
    >
    > What's the best way to test if a string only consists of whitespaces and
    > newlines?
    >
    > best I could come up with is
    >
    >
    > class String
    >
    > def is_whitespace_only?
    > strings_to_test = split("\n")
    > whitespace = /^\s+$/
    > is_whitespace_only = true
    > strings_to_test.each{ |str|
    > unless whitespace.match(str) or str.empty?
    > is_whitespace_only = false
    > break
    > end
    > }
    > is_whitespace_only
    > end
    >
    > end
    >
    > But somehow I think there should be a better way to do it. Any ideas?
    > Is it okay to add such methods to class String itself?
    >
    > Any advices appreciated.
    >
    > regards,
    > Henrik
    >
    >


    >> rx = %r{\A\s*\z}

    => /\A\s*\z/
    >> rx =~ ""

    => 0
    >> rx =~ " "

    => 0
    >> rx =~ " a"

    => nil
    >> rx =~ " \n a"

    => nil
    >> rx =~ " \n "

    => 0
    >> rx =~ " \n \n"

    => 0

    Regards

    robert
    Robert Klemme, Sep 23, 2004
    #6
  7. Hi!


    > if "#{s}".chomp.strip.length == 0


    ...

    > rx = %r{\A\s*\z}



    Obviously there is more than one way to do it ...and all are better than
    mine. :D

    Thanks everybody!
    Henrik Horneber, Sep 23, 2004
    #7
  8. Henrik Horneber <> writes:

    > Hi!
    >
    > What's the best way to test if a string only consists of whitespaces
    > and newlines?


    class String
    def is_whitespace_only?
    self !~ /[\s\n]/m
    end
    end
    Mikael Brockman, Sep 23, 2004
    #8
  9. Henrik Horneber

    ts Guest

    >>>>> "M" == Mikael Brockman <> writes:

    M> self !~ /[\s\n]/m

    1) \n is in \s with a character class, /m is useless
    2) you are testing that it don't exist a whitespace character in the string


    Guy Decoux
    ts, Sep 23, 2004
    #9
  10. ts <> writes:

    > >>>>> "M" == Mikael Brockman <> writes:

    >
    > M> self !~ /[\s\n]/m
    >
    > 1) \n is in \s with a character class, /m is useless
    > 2) you are testing that it don't exist a whitespace character in the string


    self !~ /[^\s]/
    Mikael Brockman, Sep 23, 2004
    #10
  11. Henrik Horneber

    ts Guest

    >>>>> "M" == Mikael Brockman <> writes:

    M> self !~ /[^\s]/

    or

    self !~ /[\S]/ # one less character :)


    Guy Decoux
    ts, Sep 23, 2004
    #11
  12. "Mikael Brockman" <> schrieb im Newsbeitrag
    news:...
    > ts <> writes:
    >
    > > >>>>> "M" == Mikael Brockman <> writes:

    > >
    > > M> self !~ /[\s\n]/m
    > >
    > > 1) \n is in \s with a character class, /m is useless
    > > 2) you are testing that it don't exist a whitespace character in the

    string
    >
    > self !~ /[^\s]/


    self !~ /\S/

    :)

    robert
    Robert Klemme, Sep 23, 2004
    #12
  13. So which method is fastest?

    Considering how common this can be, one would think it were a built-in String
    method (encoded in c) already.

    T.
    trans. (T. Onoma), Sep 23, 2004
    #13
  14. Hi --

    On Thu, 23 Sep 2004, trans. (T. Onoma) wrote:

    > So which method is fastest?
    >
    > Considering how common this can be, one would think it were a built-in String
    > method (encoded in c) already.


    Not *everything* can be a core method :) Also, the regex engine is
    written in C.


    David

    --
    David A. Black
    David A. Black, Sep 23, 2004
    #14
  15. "trans. (T. Onoma)" <> writes:

    > So which method is fastest?
    >
    > Considering how common this can be, one would think it were a built-in String
    > method (encoded in c) already.
    >
    > T.


    $ ruby whitespace.rb
    user system total real
    henrik 0.860000 0.110000 0.970000 ( 0.977667)
    evan 8.240000 2.220000 10.460000 ( 10.524390)
    mikael 0.010000 0.000000 0.010000 ( 0.014141)
    tonoma 0.040000 0.000000 0.040000 ( 0.041485)

    Here's the benchmark:

    | require 'benchmark'
    |
    | n = 50
    |
    | $whitespace = " \n" * 1000
    |
    | $nonwhitespace = $whitespace
    | $nonwhitespace[-2] = 'a'
    |
    | class String
    | def henrik
    | strings_to_test = split("\n")
    | whitespace = /^\s+$/
    | is_whitespace_only = true
    | strings_to_test.each{ |str|
    | unless whitespace.match(str) or str.empty?
    | is_whitespace_only = false
    | break
    | end
    | }
    | is_whitespace_only
    | end
    |
    | def evan
    | each_byte { |b| return false unless [9,10,32].include?(b) }
    | true
    | end
    |
    | def mikael
    | self !~ /[^\s]/
    | end
    |
    | def tonoma
    | strip.length == 0
    | end
    | end
    |
    | Benchmark::bm do |x|
    | test_algorithm = lambda do |id|
    | x.report id.to_s do
    | whitespace_tester = $whitespace.method id
    | nonwhitespace_tester = $nonwhitespace.method id
    | n.times { whitespace_tester.call }
    | n.times { nonwhitespace_tester.call }
    | end
    | end
    |
    | test_algorithm.call :henrik
    | test_algorithm.call :evan
    | test_algorithm.call :mikael
    | test_algorithm.call :tonoma
    | end
    Mikael Brockman, Sep 23, 2004
    #15
  16. Henrik Horneber

    Jani Monoses Guest

    > | $whitespace = " \n" * 1000
    > |
    > | $nonwhitespace = $whitespace
    > | $nonwhitespace[-2] = 'a'


    doesn't this make the arrays the same? Isn't $nonwhitespace just a reference to $whitespace?
    Jani Monoses, Sep 23, 2004
    #16
  17. Jani Monoses <> writes:

    > > | $whitespace = " \n" * 1000
    > > | | $nonwhitespace = $whitespace
    > > | $nonwhitespace[-2] = 'a'

    >
    > doesn't this make the arrays the same? Isn't $nonwhitespace just a
    > reference to $whitespace?


    Er, yes. Duh. I haven't really awoken yet. Duping whitespace doesn't
    make any real difference, though. More interesting results are found
    when whitespace[0] = 'a'. With n = 10000 and $nonwhitespace ignored
    entirely:

    user system total real
    henrik 14.140000 0.060000 14.200000 ( 14.684509)
    evan 0.110000 0.030000 0.140000 ( 0.146974)
    mikael 0.040000 0.020000 0.060000 ( 0.060418)
    tonoma 3.840000 0.040000 3.880000 ( 4.163754)
    Mikael Brockman, Sep 23, 2004
    #17
  18. Hi --

    On Thu, 23 Sep 2004, Mikael Brockman wrote:

    > | def mikael
    > | self !~ /[^\s]/
    > | end


    And you can even shave a few characters off:

    self !~ /\S/

    with (by my measure) no ill effects.


    David

    --
    David A. Black
    David A. Black, Sep 23, 2004
    #18
  19. Henrik Horneber

    ts Guest

    >>>>> "D" == David A Black <> writes:

    >> | self !~ /[^\s]/


    svg% ruby -rjj -e '/[^\s]/.dump'
    Regexp /[^\s]/
    0 charset_not \011-\015 (0)
    1 end
    svg%

    D> self !~ /\S/

    svg% ruby -rjj -e '/\S/.dump'
    Regexp /\S/
    0 charset_not \011-\012\014-\015 (0)
    1 end
    svg%



    Guy Decoux
    ts, Sep 23, 2004
    #19
  20. In message "whitespace string only"
    on 23.09.2004, ts <> writes:

    t> svg% ruby -rjj -e '/\S/.dump'
    t> Regexp /\S/
    t> 0 charset_not \011-\012\014-\015 (0)
    t> 1 end
    t> svg%

    t> Guy Decoux

    Maybe I've missed something most important :)
    Where can I find jj.rb?

    regards
    Karl-Heinz
    Wild Karl-Heinz, Sep 23, 2004
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Oli Filth
    Replies:
    9
    Views:
    3,314
    Uncle Pirate
    Jan 17, 2005
  2. Replies:
    10
    Views:
    717
    Eric Brunel
    Dec 16, 2008
  3. Phil Mayes
    Replies:
    0
    Views:
    215
    Phil Mayes
    Apr 13, 2009
  4. MRAB
    Replies:
    3
    Views:
    368
  5. Replies:
    0
    Views:
    93
Loading...

Share This Page