Doing an AND in regexp char class

T

Todd Benson

This question arises out of a couple of recent threads and may or may
not be a Ruby-specific question.

I can check with a character class if one of the characters in the
class exists or does not exist, but can I use a regexp to check if a
string absolutely contains all of the characters in the class?

Using a set perspective, I can do it like this in irb...

s1 = "hello there"
s2 = "ohi"
(s2.unpack('c*') & s1.unpack('c*')).size == s2.size

=> false

I use unpack to avoid creating a bunch of String objects, one for each
element in the array, which would happen if I used #split. What I'm
wondering is if there is a way to do this with a simple regexp.

Thanks,
Todd
 
A

ara.t.howard

This question arises out of a couple of recent threads and may or may
not be a Ruby-specific question.

I can check with a character class if one of the characters in the
class exists or does not exist, but can I use a regexp to check if a
string absolutely contains all of the characters in the class?

Using a set perspective, I can do it like this in irb...

s1 = "hello there"
s2 = "ohi"
(s2.unpack('c*') & s1.unpack('c*')).size == s2.size

=> false

I use unpack to avoid creating a bunch of String objects, one for each
element in the array, which would happen if I used #split. What I'm
wondering is if there is a way to do this with a simple regexp.

Thanks,
Todd

cfp:~ > cat a.rb
class String
def all_chars? chars
tr(chars, '').empty?
end
end

p 'foobar'.all_chars?('rabof')
p 'foobar'.all_chars?('abc')
p 'foobar'.all_chars?('')



cfp:~ > ruby a.rb
true
false
false


a @ http://codeforpeople.com/
 
T

Todd Benson

cfp:~ > cat a.rb
class String
def all_chars? chars
tr(chars, '').empty?
end
end

p 'foobar'.all_chars?('rabof')
p 'foobar'.all_chars?('abc')
p 'foobar'.all_chars?('')



cfp:~ > ruby a.rb
true
false
false

Cool :) #tr is one of those useful methods I somehow consistently forget about.

tkx fur realizashuns,
Todd
 
R

Rick DeNatale

This question arises out of a couple of recent threads and may or may
not be a Ruby-specific question.

I can check with a character class if one of the characters in the
class exists or does not exist, but can I use a regexp to check if a
string absolutely contains all of the characters in the class?

Using a set perspective, I can do it like this in irb...

s1 = "hello there"
s2 = "ohi"
(s2.unpack('c*') & s1.unpack('c*')).size == s2.size

=> false

I use unpack to avoid creating a bunch of String objects, one for each
element in the array, which would happen if I used #split. What I'm
wondering is if there is a way to do this with a simple regexp.

REs can do this, but may not be the best way. The way that comes to
mind is to see if the string matches the characters in any order,
i.e. for "ohi" either ohi, oih, hio, hoi, iho, or ioh
so something like

/(o([^h]*h[^i]i|[^i]*i[^h]*h)|(h([^i]*i[^o]*o|[^o]*[^i]*i)|o([^h]*h[^o]*o|[^o]*o[^h]*h)/

meaning

o followed by either
zero or more non-h's folllowed by an h followed by zero or more
non-i's folllowed by an i
or
zero or more non-i's followed by an i followed by zero or more
non-h's followed by an h
or
h followed by either
...
...

I would be possible to generate such an RE from the string.

But maybe someone cleverer with REs has a better approach.
 
J

Joel VanderWerf

Todd said:
Cool :) #tr is one of those useful methods I somehow consistently forget about.

But it can be done with regex, right? It's just more elegant with tr.

class String
def all_chars? chars
if chars.empty?
empty?
else
/\A[#{chars}]*\z/ === self
end
end
end

p 'foobar'.all_chars?('rabof') # => true
p 'foobar'.all_chars?('abc') # => false
p 'foobar'.all_chars?('') # => false
 
T

Todd Benson

Todd said:
Cool :) #tr is one of those useful methods I somehow consistently forget
about.

But it can be done with regex, right? It's just more elegant with tr.

class String
def all_chars? chars
if chars.empty?
empty?
else
/\A[#{chars}]*\z/ === self
end
end
end

p 'foobar'.all_chars?('rabof') # => true
p 'foobar'.all_chars?('abc') # => false
p 'foobar'.all_chars?('') # => false

I'm drawing a blank here with this one. Why doesn't this work then...

irb(main):006:0> r = /\A[oh]*\z/
=> /\A[oh]*\z/
irb(main):007:0> s = "hello, there"
=> "hello, there"
irb(main):008:0> r === s
=> false

Todd
 
J

Joel VanderWerf

Todd said:
I'm drawing a blank here with this one. Why doesn't this work then...

irb(main):006:0> r = /\A[oh]*\z/
=> /\A[oh]*\z/
irb(main):007:0> s = "hello, there"
=> "hello, there"
irb(main):008:0> r === s
=> false

Maybe I'm confused about was wanted originally. The above tests the
following condition:

(set of chars occurring in given string)
is_a_subset_of
(given set of chars).

irb(main):007:0> /\A[oh]*\z/ === "hohoho"
=> true
irb(main):008:0> /\A[oh]*\z/ === "ho ho"
=> false

If you want superset instead of subset, this works:

irb(main):013:0> /(?=.*h)(?=.*o)/ === "h o"
=> true
 
D

David A. Black

Hi --

Todd said:
On May 8, 2008, at 3:40 PM, Todd Benson wrote:

This question arises out of a couple of recent threads and may or may
not be a Ruby-specific question.

I can check with a character class if one of the characters in the
class exists or does not exist, but can I use a regexp to check if a
string absolutely contains all of the characters in the class?

Using a set perspective, I can do it like this in irb...

s1 = "hello there"
s2 = "ohi"
(s2.unpack('c*') & s1.unpack('c*')).size == s2.size

=> false

I use unpack to avoid creating a bunch of String objects, one for each
element in the array, which would happen if I used #split. What I'm
wondering is if there is a way to do this with a simple regexp.

Thanks,
Todd

cfp:~ > cat a.rb
class String
def all_chars? chars
tr(chars, '').empty?
end
end

p 'foobar'.all_chars?('rabof')
p 'foobar'.all_chars?('abc')
p 'foobar'.all_chars?('')



cfp:~ > ruby a.rb
true
false
false

Cool :) #tr is one of those useful methods I somehow consistently forget
about.

But it can be done with regex, right? It's just more elegant with tr.

class String
def all_chars? chars
if chars.empty?
empty?
else
/\A[#{chars}]*\z/ === self
end
end
end

p 'foobar'.all_chars?('rabof') # => true
p 'foobar'.all_chars?('abc') # => false
p 'foobar'.all_chars?('') # => false

I'm drawing a blank here with this one. Why doesn't this work then...

irb(main):006:0> r = /\A[oh]*\z/
=> /\A[oh]*\z/
irb(main):007:0> s = "hello, there"
=> "hello, there"
irb(main):008:0> r === s
=> false

"hello, there" contains letters other than o and h, but your regex
calls for a string consisting of zero or more o's or h's and nothing
else.

I think there might be some confusion as between determining that a
string contains certain characters, and determining that a string
contains *only* certain characters. My understanding was that you
wanted the first, which you could do with tr but I think you'd
probably want the character cluster to be doing the tr'ing:

"oh".tr("hello, there","").empty? # true; all letters in "oh"
# are also in "hello, there"
"hello, there".tr("ho","").empty? # false

They're both strings, of course, so you can do either with Ara's
or Joel's methods:

"oh".all_chars?("hello, there") # true
"hello, there".all_chars?("oh") # false

though if it's really the former you want you might want to name it
all_present_in? or something.


David

--
Rails training from David A. Black and Ruby Power and Light:
INTRO TO RAILS June 9-12 Berlin
ADVANCING WITH RAILS June 16-19 Berlin
INTRO TO RAILS June 24-27 London (Skills Matter)
See http://www.rubypal.com for details and updates!
 
D

David A. Black

Hi --

Todd said:
I'm drawing a blank here with this one. Why doesn't this work then...

irb(main):006:0> r = /\A[oh]*\z/
=> /\A[oh]*\z/
irb(main):007:0> s = "hello, there"
=> "hello, there"
irb(main):008:0> r === s
=> false

Maybe I'm confused about was wanted originally. The above tests the following
condition:

(set of chars occurring in given string)
is_a_subset_of
(given set of chars).

irb(main):007:0> /\A[oh]*\z/ === "hohoho"
=> true
irb(main):008:0> /\A[oh]*\z/ === "ho ho"
=> false

If you want superset instead of subset, this works:

irb(main):013:0> /(?=.*h)(?=.*o)/ === "h o"
=> true

That depends on the order, though. To do the superset test, you could
just do the subset, but in the other direction: check that the
character class, as a string, doesn't contain anything that isn't in
the main string:

str = "h o"
chars = "ho"

/\A[#{str}]*\z/ === chars # true

(though probably best to uniquify the string first).


David

--
Rails training from David A. Black and Ruby Power and Light:
INTRO TO RAILS June 9-12 Berlin
ADVANCING WITH RAILS June 16-19 Berlin
INTRO TO RAILS June 24-27 London (Skills Matter)
See http://www.rubypal.com for details and updates!
 
J

Joel VanderWerf

David said:
That depends on the order, though.

Yes, it's buggy. Should use //m:

irb(main):003:0> /(?=.*h)(?=.*o)/ === "o \nh"
=> false
irb(main):004:0> /(?=.*h)(?=.*o)/m === "o \nh"
=> true

Does that fix the order problem you were thinking of?
 
D

David A. Black

Hi --

Yes, it's buggy. Should use //m:

irb(main):003:0> /(?=.*h)(?=.*o)/ === "o \nh"
=> false
irb(main):004:0> /(?=.*h)(?=.*o)/m === "o \nh"
=> true

Does that fix the order problem you were thinking of?

Actually I think I was wrong about the order mattering (since they're
zero-width). But /m helps anyway. I still think you could just change
the roles of the two strings and dissect "the string" as a character
class and "the characters" as a string, and use your original
technique.


David

--
Rails training from David A. Black and Ruby Power and Light:
INTRO TO RAILS June 9-12 Berlin
ADVANCING WITH RAILS June 16-19 Berlin
INTRO TO RAILS June 24-27 London (Skills Matter)
See http://www.rubypal.com for details and updates!
 
T

Todd Benson

Todd said:
I'm drawing a blank here with this one. Why doesn't this work then...

irb(main):006:0> r = /\A[oh]*\z/
=> /\A[oh]*\z/
irb(main):007:0> s = "hello, there"
=> "hello, there"
irb(main):008:0> r === s
=> false

Maybe I'm confused about was wanted originally. The above tests the
following condition:

(set of chars occurring in given string)
is_a_subset_of
(given set of chars).

Yep. The subject title is misleading, because the AND is already
there [^ho] means not h _and_ also not o.

I was looking to find if given a string A, can I say whether or not
all of the characters in string A exist in string B (count doesn't
matter, just existence). All of you gave me some good answers that I
hadn't thought of. Good brain food :)

Todd
 
7

7stud --

Joel said:
Todd said:
Cool :) #tr is one of those useful methods I somehow consistently forget about.

But it can be done with regex, right? It's just more elegant with tr.

class String
def all_chars? chars
if chars.empty?
empty?
else
/\A[#{chars}]*\z/ === self
end
end
end

p 'foobar'.all_chars?('rabof') # => true
p 'foobar'.all_chars?('abc') # => false
p 'foobar'.all_chars?('') # => false

Your method doesn't work, which can clearly be seen in these examples:

strs = ["aaa", "bbb", "ccc"]
chars = "abc"

strs.each do |str|

if /\A[#{chars}]*/ =~ str
print str, " - yes"
puts
else
print str, " - no"
puts
end

end

--output:--
aaa - yes
bbb - yes
ccc - yes

It should be clear from the output that even though the string "aaa"
passes your test, it is not true that all the characters in the string
"abc" appear in in the string "aaa".
 
D

David A. Black

Hi --

Joel said:
Todd said:
Thanks,
p 'foobar'.all_chars?('')



cfp:~ > ruby a.rb
true
false
false

Cool :) #tr is one of those useful methods I somehow consistently forget about.

But it can be done with regex, right? It's just more elegant with tr.

class String
def all_chars? chars
if chars.empty?
empty?
else
/\A[#{chars}]*\z/ === self
end
end
end

p 'foobar'.all_chars?('rabof') # => true
p 'foobar'.all_chars?('abc') # => false
p 'foobar'.all_chars?('') # => false

Your method doesn't work, which can clearly be seen in these examples:

strs = ["aaa", "bbb", "ccc"]
chars = "abc"

strs.each do |str|

if /\A[#{chars}]*/ =~ str
print str, " - yes"
puts
else
print str, " - no"
puts
end

end

--output:--
aaa - yes
bbb - yes
ccc - yes

It should be clear from the output that even though the string "aaa"
passes your test, it is not true that all the characters in the string
"abc" appear in in the string "aaa".

Do it the other way around (and don't forget the \z):

if /\A[#{str}]*\z/ =~ chars

It's really the characters in str that you're testing, to make sure
that none of them fail to match the characters in chars. If the
variable names seem backwards, you can change them. It's the logic
that's important, and it works fine.


David

--
Rails training from David A. Black and Ruby Power and Light:
INTRO TO RAILS June 9-12 Berlin
ADVANCING WITH RAILS June 16-19 Berlin
INTRO TO RAILS June 24-27 London (Skills Matter)
See http://www.rubypal.com for details and updates!
 
R

Robert Dober

me too. just got lucky this time ;-)
Knowledge --> the art of getting lucky very often, right Ara ;)
we can deny everything, except that we have the possibility of being better.
simply reflect on that.
h.h. the 14th dalai lama
BTW when I was referring to the quote I learnt most about I was
thinking about "Be kind whenever it is possible. It is always
possible".

Not that I dislike the others or apply any judgment I just wanted to
be clear that I *personally* learnt the most from the above :)

Cheers
Robert
 
7

7stud --

David said:
Hi --

puts
passes your test, it is not true that all the characters in the string
"abc" appear in in the string "aaa".

Do it the other way around (and don't forget the \z):

Whoops.


if /\A[#{str}]*\z/ =~ chars

It's really the characters in str that you're testing, to make sure
that none of them fail to match the characters in chars. If the
variable names seem backwards, you can change them. It's the logic
that's important, and it works fine.

Nice.
 
P

Pit Capitain

2008/5/9 ara.t.howard said:
cfp:~ > cat a.rb
class String
def all_chars? chars
tr(chars, '').empty?
end
end

Using String#tr is nice, but the result is not what Todd wants:

s1 = "hello there"
s2 = "ohe"

(s2.unpack('c*') & s1.unpack('c*')).size == s2.size
=> true

class String
def all_chars? chars
tr(chars, '').empty?
end
end

s1.all_chars?(s2)
=> false

Like in the regexp examples, you have to switch self and chars:

class String
def all_chars? chars
chars.tr(self, '').empty?
end
end

s1.all_chars?(s2)
=> true

Regards,
Pit
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,599
Members
45,163
Latest member
Sasha15427
Top