whitespace string only

H

Henrik Horneber

Hi!

What's the best way to test if a string only consists of whitespaces and
newlines?

best I could come up with is


class String

def is_whitespace_only?
strings_to_test = split("\n")
whitespace = /^\s+$/
is_whitespace_only = true
strings_to_test.each{ |str|
unless whitespace.match(str) or str.empty?
is_whitespace_only = false
break
end
}
is_whitespace_only
end

end

But somehow I think there should be a better way to do it. Any ideas?
Is it okay to add such methods to class String itself?

Any advices appreciated.

regards,
Henrik
 
M

MiG

I think regexp should be is faster than each_byte. What about this?

class String

def whitespace_only? str
str.split(/\n/).each { |x|
return false unless x =~ /^\s*$/
}
true
end

end


MiG
 
E

Evan Webb

I highly doubt a regex is faster than each_byte. each_byte has very
little code and is very fast (looping over the array in C and casting
the chars to fixnums), where as with a regex it has to pass through the
regex parser, get pulled back out as an object, pushed back into split,
which there in turn returns a potentially huge array which you pull back
again to run over with each. Then you've done another comparison with a
regex within the block which i guarantee is much slower then comparing 2
Fixnums.

My initial version didnt do \n, only white space, so here's my updated
version that even does tabs.

class String
def only_ws?
each_byte { |b| return false unless [9,10,32].include?(b) }
true
end
end

Evan Webb // (e-mail address removed)
 
T

trans. (T. Onoma)

Hi!

What's the best way to test if a string only consists of whitespaces and
newlines?

Unless you're being more specific:

str.strip.length == 0

Also matching against something like

/\A\s*\z/m

but I'm no Regexp expert by a long shot ;)

T.

--
( o _ カラãƒ
// trans.
/ \ (e-mail address removed)

I don't give a damn for a man that can only spell a word one way.
-Mark Twain
 
R

Robert Klemme

Henrik Horneber said:
Hi!

What's the best way to test if a string only consists of whitespaces and
newlines?

best I could come up with is


class String

def is_whitespace_only?
strings_to_test = split("\n")
whitespace = /^\s+$/
is_whitespace_only = true
strings_to_test.each{ |str|
unless whitespace.match(str) or str.empty?
is_whitespace_only = false
break
end
}
is_whitespace_only
end

end

But somehow I think there should be a better way to do it. Any ideas?
Is it okay to add such methods to class String itself?

Any advices appreciated.

regards,
Henrik
=> 0

Regards

robert
 
H

Henrik Horneber

Hi!

if "#{s}".chomp.strip.length == 0
...

rx = %r{\A\s*\z}


Obviously there is more than one way to do it ...and all are better than
mine. :D

Thanks everybody!
 
M

Mikael Brockman

Henrik Horneber said:
Hi!

What's the best way to test if a string only consists of whitespaces
and newlines?

class String
def is_whitespace_only?
self !~ /[\s\n]/m
end
end
 
T

ts

M> self !~ /[\s\n]/m

1) \n is in \s with a character class, /m is useless
2) you are testing that it don't exist a whitespace character in the string


Guy Decoux
 
M

Mikael Brockman

ts said:
M> self !~ /[\s\n]/m

1) \n is in \s with a character class, /m is useless
2) you are testing that it don't exist a whitespace character in the string

self !~ /[^\s]/
 
T

trans. (T. Onoma)

So which method is fastest?

Considering how common this can be, one would think it were a built-in String
method (encoded in c) already.

T.
 
D

David A. Black

Hi --

So which method is fastest?

Considering how common this can be, one would think it were a built-in String
method (encoded in c) already.

Not *everything* can be a core method :) Also, the regex engine is
written in C.


David
 
M

Mikael Brockman

trans. (T. Onoma) said:
So which method is fastest?

Considering how common this can be, one would think it were a built-in String
method (encoded in c) already.

T.

$ ruby whitespace.rb
user system total real
henrik 0.860000 0.110000 0.970000 ( 0.977667)
evan 8.240000 2.220000 10.460000 ( 10.524390)
mikael 0.010000 0.000000 0.010000 ( 0.014141)
tonoma 0.040000 0.000000 0.040000 ( 0.041485)

Here's the benchmark:

| require 'benchmark'
|
| n = 50
|
| $whitespace = " \n" * 1000
|
| $nonwhitespace = $whitespace
| $nonwhitespace[-2] = 'a'
|
| class String
| def henrik
| strings_to_test = split("\n")
| whitespace = /^\s+$/
| is_whitespace_only = true
| strings_to_test.each{ |str|
| unless whitespace.match(str) or str.empty?
| is_whitespace_only = false
| break
| end
| }
| is_whitespace_only
| end
|
| def evan
| each_byte { |b| return false unless [9,10,32].include?(b) }
| true
| end
|
| def mikael
| self !~ /[^\s]/
| end
|
| def tonoma
| strip.length == 0
| end
| end
|
| Benchmark::bm do |x|
| test_algorithm = lambda do |id|
| x.report id.to_s do
| whitespace_tester = $whitespace.method id
| nonwhitespace_tester = $nonwhitespace.method id
| n.times { whitespace_tester.call }
| n.times { nonwhitespace_tester.call }
| end
| end
|
| test_algorithm.call :henrik
| test_algorithm.call :evan
| test_algorithm.call :mikael
| test_algorithm.call :tonoma
| end
 
J

Jani Monoses

| $whitespace = " \n" * 1000
|
| $nonwhitespace = $whitespace
| $nonwhitespace[-2] = 'a'

doesn't this make the arrays the same? Isn't $nonwhitespace just a reference to $whitespace?
 
M

Mikael Brockman

Jani Monoses said:
| $whitespace = " \n" * 1000
| | $nonwhitespace = $whitespace
| $nonwhitespace[-2] = 'a'

doesn't this make the arrays the same? Isn't $nonwhitespace just a
reference to $whitespace?

Er, yes. Duh. I haven't really awoken yet. Duping whitespace doesn't
make any real difference, though. More interesting results are found
when whitespace[0] = 'a'. With n = 10000 and $nonwhitespace ignored
entirely:

user system total real
henrik 14.140000 0.060000 14.200000 ( 14.684509)
evan 0.110000 0.030000 0.140000 ( 0.146974)
mikael 0.040000 0.020000 0.060000 ( 0.060418)
tonoma 3.840000 0.040000 3.880000 ( 4.163754)
 
T

ts

D" == David A Black said:

svg% ruby -rjj -e '/[^\s]/.dump'
Regexp /[^\s]/
0 charset_not \011-\015 (0)
1 end
svg%

D> self !~ /\S/

svg% ruby -rjj -e '/\S/.dump'
Regexp /\S/
0 charset_not \011-\012\014-\015 (0)
1 end
svg%



Guy Decoux
 
W

Wild Karl-Heinz

In message "whitespace string only"

t> svg% ruby -rjj -e '/\S/.dump'
t> Regexp /\S/
t> 0 charset_not \011-\012\014-\015 (0)
t> 1 end
t> svg%

t> Guy Decoux

Maybe I've missed something most important :)
Where can I find jj.rb?

regards
Karl-Heinz
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top