How to make combinations of an array to produce all possible expressions?

E

Erik Terpstra

I have an array 'conds', which contains some sub-expressions for an
xpath query:

conds = ["@title='Foo'", "@edition='Bar'", "@date='20040513'"]

Is there an existing library that lets me construct all possible
combinations like this?

puts conds.<some Array extension method>.collect{|n| n.join ' and '}

which produces:

@title='Foo' and @edition='Bar' and @date='20040513'
@title='Foo' and @edition='Bar'
@title='Foo' and @date='20040513'
@edition='Bar' and @date='20040513'
@title='Foo'
@edition='Bar'
@date='20040513'
 
R

Robert Klemme

Erik Terpstra said:
I have an array 'conds', which contains some sub-expressions for an
xpath query:

conds = ["@title='Foo'", "@edition='Bar'", "@date='20040513'"]

Is there an existing library that lets me construct all possible
combinations like this?

puts conds.<some Array extension method>.collect{|n| n.join ' and '}

which produces:

@title='Foo' and @edition='Bar' and @date='20040513'
@title='Foo' and @edition='Bar'
@title='Foo' and @date='20040513'
@edition='Bar' and @date='20040513'
@title='Foo'
@edition='Bar'
@date='20040513'

Not yet, but:

module Enumerable
def combine
masks = inject([[], 1]){|(ar, m), e| [ar<<m, m<<1]}[0]
all = masks.inject(0){|al, m| al|m}

result = []

for i in 1..all do
tmp = []

each_with_index do |e, idx|
tmp << e unless (masks[idx] & i) == 0
end

result << tmp
end

result
end
end


irb(main):098:0> puts conds.combine.collect{|n| n.join ' and '}
@title='Foo'
@edition='Bar'
@title='Foo' and @edition='Bar'
@date='20040513'
@title='Foo' and @date='20040513'
@edition='Bar' and @date='20040513'
@title='Foo' and @edition='Bar' and @date='20040513'
=> nil

robert
 
H

Harry Ohlsen

Erik said:
I have an array 'conds', which contains some sub-expressions for an
xpath query:

conds = ["@title='Foo'", "@edition='Bar'", "@date='20040513'"]
<snip>

There's no method I know of, but this seems to work (note that I've explicitly avoided generating an empty set, because you didn't have one in your example, but it should probably be included if we wanted to call this method "subsets" as I have) ...

module Enumerable
def subsets
values = []

(2 << length - 1).times do |n|
items = []

length.times do |i|
if n == 1
items << self
end
end

if items.length > 0 # I'd omit this test for a real "subsets"
values << items
end
end

return values
end
end

conds = ["@title='Foo'", "@edition='Bar'", "@date='20040513'"]

conds.subsets.collect { |subset| p subset.join(" and ") }
 
R

Robert Klemme

Harry Ohlsen said:
Erik said:
I have an array 'conds', which contains some sub-expressions for an
xpath query:

conds = ["@title='Foo'", "@edition='Bar'", "@date='20040513'"]
<snip>

There's no method I know of, but this seems to work (note that I've
explicitly avoided generating an empty set, because you didn't have one in
your example, but it should probably be included if we wanted to call this
method "subsets" as I have) ...
module Enumerable
def subsets
values = []

(2 << length - 1).times do |n|
items = []

length.times do |i|
if n == 1
items << self
end
end

if items.length > 0 # I'd omit this test for a real "subsets"
values << items
end
end

return values
end
end


Ah, similar idea but nicer coding. I like especially your calculation of
the counting range and int[idx] as bit test. I didn't know that one.

Btw, you don't need the test for length 0 if you do

for n in 1 ... (2<<length) do
....

Regards

robert
 
K

Kristof Bastiaensen

I have an array 'conds', which contains some sub-expressions for an
xpath query:

conds = ["@title='Foo'", "@edition='Bar'", "@date='20040513'"]

Hi,
I have found at least to ways.
Using recursion (maybe not so efficient):

class Array
def combine
comb = Proc.new do |a, *t|
tail = (t.empty? ? [] : comb[t])
[[a]] + tail.collect{ |e| [a] + e} + tail
end
comb[self]
end
end

The following should perform better:

class Array
def combine2
(1..(2**(size) - 1)).collect do |i|
i <<= 1
self.select { (i >>= 1) & 1 == 1}
end
end
end

irb(main):018:0> puts conds.combine.collect{|n| n.join ' and '}
@title='Foo'
@title='Foo' and @edition='Bar'
@title='Foo' and @edition='Bar' and @date='20040513'
@title='Foo' and @date='20040513'
@edition='Bar'
@edition='Bar' and @date='20040513'
@date='20040513'
=> nil
irb(main):019:0> puts conds.combine2.collect{|n| n.join ' and '}
@title='Foo'
@edition='Bar'
@title='Foo' and @edition='Bar'
@date='20040513'
@title='Foo' and @date='20040513'
@edition='Bar' and @date='20040513'
@title='Foo' and @edition='Bar' and @date='20040513'
=> nil


Kristof
 
H

Harry Ohlsen

Robert said:
Ah, similar idea but nicer coding.

Coming from you, that's a much appreciated compliment!
I like especially your calculation of
the counting range and int[idx] as bit test. I didn't know that one.

I discovered during a re-browse of pickaxe a couple of months ago, but this is the first time I've actually had a chance to use it.
Btw, you don't need the test for length 0 if you do

for n in 1 ... (2<<length) do

Good point.

Something I was thinking about was that the power set (set of subsets) becomes large very quickly (obvious from the "2 << length", I guess). It might be nice to have an iterator, too. The method that returns the collection of subsets could then just use it.

Something like this (I've removed the empty set test, since I think this is cleaner in a mathematical sense) ...

module Enumerable
def each_subset(&block)
(2 << length - 1).times do |n|
subset = []

length.times do |i|
if n == 1
subset << self
end
end

yield subset
end
end

def subsets
subsets = []

each_subset { |s| subsets << s }

return subsets
end
end

if __FILE__ == $0

conds = ["@title='Foo'", "@edition='Bar'", "@date='20040513'"]

puts "Using iterator ..."
puts

conds.each_subset { |subset| p subset.join(" and ") }

puts
puts "Using aggregate ..."
puts

conds.subsets.collect { |subset| p subset.join(" and ") }

end
 
R

Robert Klemme

Harry Ohlsen said:
Coming from you, that's a much appreciated compliment!

You're welcome! :)
I like especially your calculation of
the counting range and int[idx] as bit test. I didn't know that one.

I discovered during a re-browse of pickaxe a couple of months ago, but
this is the first time I've actually had a chance to use it.
Good point.

Something I was thinking about was that the power set (set of subsets)
becomes large very quickly (obvious from the "2 << length", I guess). It
might be nice to have an iterator, too. The method that returns the
collection of subsets could then just use it.

Another good point!
Something like this (I've removed the empty set test, since I think this
is cleaner in a mathematical sense) ...

That might be handled by an additional flag with a default value (which
one, the mathematical or the practical?). Practically you often don't
need / want the empty set.

Funny to see how this evolves. My current vesion looks like this:

module Enumerable
def each_subset(skip_empty = false)
for n in (skip_empty ? 1 : 0) ... (1 << size) do
subset = []

each_with_index do |elem, i|
subset << elem if n
end

yield subset
end

self
end

def powerset(skip_empty = false)
subsets = []

each_subset(skip_empty) { |s| subsets << s }

return subsets
end
end


Changes / Improvements:

- self[index] is not used since it is not available with all Enumerables
- flag 'skip_empty' added
- self returned from iterator
- renamed #subsets to #powerset (this is the mathematical correct term,
isn't it)
- changed (2 << length - 1) to (1 << length)

Here's another experimental version that works even if #size is not
supported. This really needs only #each. Alternatively one could do an
initial iteration to calculate the size - that saves the intermediate
array.

module Enumerable
def each_subset(skip_empty = false)
enum = respond_to?( :size ) ? self : to_a

for n in (skip_empty ? 1 : 0) ... (1 << enum.size) do
subset = []

enum.each_with_index do |elem, i|
subset << elem if n == 1
end

yield subset
end

self
end

def powerset(skip_empty = false)
subsets = []

each_subset(skip_empty) { |s| subsets << s }

return subsets
end
end

# test class
class Test
include Enumerable

def initialize(n); @n=n; end
def each; @n.times {|c| yield c}; self; end
end

p Test.new(3).powerset


Kind regards

robert
 
L

Linus Sellberg

module Enumerable
def each_subset(skip_empty = false)
enum = respond_to?( :size ) ? self : to_a

for n in (skip_empty ? 1 : 0) ... (1 << enum.size) do
subset = []

enum.each_with_index do |elem, i|
subset << elem if n == 1
end

yield subset
end

self
end

def powerset(skip_empty = false)
subsets = []

each_subset(skip_empty) { |s| subsets << s }

return subsets
end


irb(main):013:0> def powerset( x )
irb(main):014:1> x.inject([[]]) {|m, n| m.map {|b| [n] + b} + m }
irb(main):015:1> end
irb(main):017:0> a = [1,2,3]
=> [1, 2, 3]
irb(main):018:0> powerset a
=> [[3, 2, 1], [3, 2], [3, 1], [3], [2, 1], [2], [1], []]
irb(main):019:0>

Only works for arrays, but it might be possible to generalize it?
 
H

Harry Ohlsen

Robert said:
Funny to see how this evolves. My current vesion looks like this:

module Enumerable
def each_subset(skip_empty = false)
for n in (skip_empty ? 1 : 0) ... (1 << size) do
subset = []

each_with_index do |elem, i|
subset << elem if n
end

yield subset
end

self
end

def powerset(skip_empty = false)
subsets = []

each_subset(skip_empty) { |s| subsets << s }

return subsets
end
end


The first version in your latest e-mail didn't work for me. It just gave me the complete array every time.

I think it's because when n is zero, that's not the same as false, hence the "if" always succeeds.

You just need to change that line to

subset << elem if (n == 1)

The second version worked fine.

I like all of your clean-ups. I wonder whether re-coding in C would increase performance?

Anyway, it might be worth creating an RCR to have this added to Enumerable.

Cheers,

Harry O.
 
R

Robert Klemme

The first version in your latest e-mail didn't work for me. It just
gave me the complete array every time.
I think it's because when n is zero, that's not the same as false,

hence the "if" always succeeds.
You just need to change that line to

subset << elem if (n == 1)

The second version worked fine.


Yeah, I noticed the error but apparently didn't recheck the first version.
You're right of course.
I like all of your clean-ups.

Thank you!
I wonder whether re-coding in C would increase performance?

Dunno. I didn't write a ruby extension yet but I heard it's pretty easy.
:)
Anyway, it might be worth creating an RCR to have this added to
Enumerable.

Maybe. The crucial point is, is this general enough to place it there?
Well, the vote might show it.

Regards

robert
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,599
Members
45,173
Latest member
GeraldReund
Top