P
Phrogz
SUMMARY
What's the fastest and/or shortest you can turn this:
in = ["i", "v", "w", "e", "l", "d", "u", "f", "e", "v", "f", "e",
"d", "e", "w", "d"]
into this:
["i", "l", "u", "v", "f", "e", "w", "d"]
?
DETAILS
The goal is, given a stream of items, to produce the same ordered
stream with no duplicates, where the relative position of the last item
wins. The shortest (and fastest?) for me turns out to be simply:
in.reverse.uniq.reverse
I have the question mark because the in-place version is sometimes
faster, sometimes slower in my tests:
out = in.dup; out.reverse!; out.uniq!; out.reverse!
I thought it might be faster to loop through the array in one pass
myself, but that turns out to be about 4x slower:
seen = {}
out = in.dup
out.each_with_index{ |v,i|
if n=h[v]
out[n] = nil
end
h[v] = i
}
out.compact!
The input for me isn't REALLY an array, but rather a series of items
that I receive one at a time from a depth-first traversal of a graph.
If this influences your answer, so be it
BACKGROUND
Why is this problem mildly interesting? See
http://phrogz.net/nodes/traversingdirectedgraph.asp
I was reading that page for nostalgia the other day and thought about
porting the calculation library to Ruby for fun, and to see how much
easier solving the problem would be in Ruby.
Here's the code I put together for fun (the cells and formulae in the
below correspond to the graphic on the above url), before adding the
'minimal update chain' optimization.
require 'set'
module ClassContainer
def initialize( *args )
self.class[ self.name ] = self
super
end
def self.included( klass )
klass.class_eval{
def self.inherited( subklass )
( @subclasses ||= [] ) << subklass
end
def self.all
@subclasses ||= []
@instances ||= {}
all = @instances.dup
@subclasses.each{ |sub| all.merge!( sub.all ) }
all
end
def self.each
all.each_value{ |i| yield i }
end
def self.[]( name )
all[ name ]
end
def self.[]=( name, instance )
( @instances ||= {} )[ name ] = instance
end
def self.sorted
all.values.sort_by{ |n| n.name.to_s }
end
}
end
end
module GraphNode
attr_reader
utgoing_edges, :incoming_edges
def initialize( *args )
@outgoing_edges = Set.new
@incoming_edges = Set.new
end
def edges
@outgoing_edges + @incoming_edges
end
def add_edge_to( other_node )
@outgoing_edges << other_node
other_node.incoming_edges << self
self
end
def clear_incoming
@incoming_edges.each{ |n|
n.outgoing_edges.delete self
}
@incoming_edges = Set.new
self
end
def traverse( seen_nodes = Set.new, &block )
new_seen = seen_nodes.dup << self
@outgoing_edges.each{ |destination|
unless seen_nodes.include?( destination )
yield destination
destination.traverse( new_seen, &block )
end
}
end
end
class Cell
include GraphNode
include ClassContainer
attr_reader :name
attr_accessor :value
def initialize( name, value=0 )
raise "Duplicate ID" if Cell[ name ]
@name = name
@value = value * 1.0
super
end
def value=( new_value )
# Naive update approach
return if @value == new_value
@value = new_value
update_dependents
new_value
end
def update_dependents
traverse{ |formula|
formula.evaluate( true )
}
end
def to_s
'<%s "%s" value=%.2f>' % [ self.class.name, @name, @value ]
end
end
class Formula < Cell
def initialize( *args )
super
@formula, @value = @value.to_s, 0.0
update_dependencies
end
def update_dependencies
clear_incoming
@formula.scan( /@(\w+)/ ).flatten.each{ |source_name|
if incoming = Cell[ source_name.to_sym ]
incoming.add_edge_to( self )
end
}
end
def evaluate( skip_dependents = false )
scope = ValueSpace.new
Cell.each{ |n| scope.set( n.name, n.value ) }
@value = eval( @formula, scope.get_binding )
puts "Updated #{self}"
update_dependents unless skip_dependents
end
def to_s
'<%s "%s" formula="%s" value=%.2f>' % [ self.class.name, @name,
@formula, @value ]
end
end
class ValueSpace
def set( name, value )
instance_variable_set :"@#{name}", value
end
def get_binding
binding
end
end
if $0 == __FILE__
Cell.new( :a, 5 )
Cell.new( :d, 3 )
Cell.new( :e, 32 )
Cell.new( :i, 10 )
Cell.new( :l, 12 )
Formula.new( :b, "@d + @e + @a" )
Formula.new( :g, "@h * @i" ).value = 20
Formula.new( :h, "@g / @i" ).value = 2
Formula.new( :c, "@a + 3" )
Formula.new( :f, "@b + @c" )
Formula.new( :j, "@c + 8" )
Formula.new( :k, "@l + @j" )
Formula.new( :m, "@l + 10" )
# Need to call here to handle circular dependencies
Formula.each{ |f| f.update_dependencies }
# Naive approach...
puts "Naive initial evaluation..."
Formula.each{ |f| f.evaluate }
puts "*" * 40
puts "Initial values..."
puts Cell.sorted
puts "*" * 40
puts "Changing 'a' to 7..."
Cell[ :a ].value = 7
puts "*" * 40
puts "Changing 'i' to 36..."
Cell[ :i ].value = 36
puts '-' * 40
puts "Per cell dependency..."
Cell.each{ |cell|
puts "#{cell.name} -> #{cell.outgoing_edges.map{|c| c.name}.join(',
')}"
}
end
(which outputs)
Naive initial evaluation...
Updated <Formula "c" formula="@a + 3" value=8.00>
Updated <Formula "f" formula="@b + @c" value=8.00>
Updated <Formula "j" formula="@c + 8" value=16.00>
Updated <Formula "k" formula="@l + @j" value=28.00>
Updated <Formula "f" formula="@b + @c" value=8.00>
Updated <Formula "j" formula="@c + 8" value=16.00>
Updated <Formula "k" formula="@l + @j" value=28.00>
Updated <Formula "b" formula="@d + @e + @a" value=40.00>
Updated <Formula "f" formula="@b + @c" value=48.00>
Updated <Formula "k" formula="@l + @j" value=28.00>
Updated <Formula "g" formula="@h * @i" value=20.00>
Updated <Formula "h" formula="@g / @i" value=2.00>
Updated <Formula "m" formula="@l + 10" value=22.00>
Updated <Formula "h" formula="@g / @i" value=2.00>
Updated <Formula "g" formula="@h * @i" value=20.00>
****************************************
Initial values...
<Cell "a" value=5.00>
<Formula "b" formula="@d + @e + @a" value=40.00>
<Formula "c" formula="@a + 3" value=8.00>
<Cell "d" value=3.00>
<Cell "e" value=32.00>
<Formula "f" formula="@b + @c" value=48.00>
<Formula "g" formula="@h * @i" value=20.00>
<Formula "h" formula="@g / @i" value=2.00>
<Cell "i" value=10.00>
<Formula "j" formula="@c + 8" value=16.00>
<Formula "k" formula="@l + @j" value=28.00>
<Cell "l" value=12.00>
<Formula "m" formula="@l + 10" value=22.00>
****************************************
Changing 'a' to 7...
Updated <Formula "b" formula="@d + @e + @a" value=42.00>
Updated <Formula "f" formula="@b + @c" value=50.00>
Updated <Formula "c" formula="@a + 3" value=10.00>
Updated <Formula "f" formula="@b + @c" value=52.00>
Updated <Formula "j" formula="@c + 8" value=18.00>
Updated <Formula "k" formula="@l + @j" value=30.00>
****************************************
Changing 'i' to 36...
Updated <Formula "h" formula="@g / @i" value=0.56>
Updated <Formula "g" formula="@h * @i" value=20.00>
Updated <Formula "g" formula="@h * @i" value=20.00>
Updated <Formula "h" formula="@g / @i" value=0.56>
----------------------------------------
Per cell dependency...
c -> f, j
e -> b
f ->
i -> h, g
l -> k, m
j -> k
b -> f
k ->
g -> h
m ->
a -> b, c
h -> g
d -> b
What's the fastest and/or shortest you can turn this:
in = ["i", "v", "w", "e", "l", "d", "u", "f", "e", "v", "f", "e",
"d", "e", "w", "d"]
into this:
["i", "l", "u", "v", "f", "e", "w", "d"]
?
DETAILS
The goal is, given a stream of items, to produce the same ordered
stream with no duplicates, where the relative position of the last item
wins. The shortest (and fastest?) for me turns out to be simply:
in.reverse.uniq.reverse
I have the question mark because the in-place version is sometimes
faster, sometimes slower in my tests:
out = in.dup; out.reverse!; out.uniq!; out.reverse!
I thought it might be faster to loop through the array in one pass
myself, but that turns out to be about 4x slower:
seen = {}
out = in.dup
out.each_with_index{ |v,i|
if n=h[v]
out[n] = nil
end
h[v] = i
}
out.compact!
The input for me isn't REALLY an array, but rather a series of items
that I receive one at a time from a depth-first traversal of a graph.
If this influences your answer, so be it
BACKGROUND
Why is this problem mildly interesting? See
http://phrogz.net/nodes/traversingdirectedgraph.asp
I was reading that page for nostalgia the other day and thought about
porting the calculation library to Ruby for fun, and to see how much
easier solving the problem would be in Ruby.
Here's the code I put together for fun (the cells and formulae in the
below correspond to the graphic on the above url), before adding the
'minimal update chain' optimization.
require 'set'
module ClassContainer
def initialize( *args )
self.class[ self.name ] = self
super
end
def self.included( klass )
klass.class_eval{
def self.inherited( subklass )
( @subclasses ||= [] ) << subklass
end
def self.all
@subclasses ||= []
@instances ||= {}
all = @instances.dup
@subclasses.each{ |sub| all.merge!( sub.all ) }
all
end
def self.each
all.each_value{ |i| yield i }
end
def self.[]( name )
all[ name ]
end
def self.[]=( name, instance )
( @instances ||= {} )[ name ] = instance
end
def self.sorted
all.values.sort_by{ |n| n.name.to_s }
end
}
end
end
module GraphNode
attr_reader
def initialize( *args )
@outgoing_edges = Set.new
@incoming_edges = Set.new
end
def edges
@outgoing_edges + @incoming_edges
end
def add_edge_to( other_node )
@outgoing_edges << other_node
other_node.incoming_edges << self
self
end
def clear_incoming
@incoming_edges.each{ |n|
n.outgoing_edges.delete self
}
@incoming_edges = Set.new
self
end
def traverse( seen_nodes = Set.new, &block )
new_seen = seen_nodes.dup << self
@outgoing_edges.each{ |destination|
unless seen_nodes.include?( destination )
yield destination
destination.traverse( new_seen, &block )
end
}
end
end
class Cell
include GraphNode
include ClassContainer
attr_reader :name
attr_accessor :value
def initialize( name, value=0 )
raise "Duplicate ID" if Cell[ name ]
@name = name
@value = value * 1.0
super
end
def value=( new_value )
# Naive update approach
return if @value == new_value
@value = new_value
update_dependents
new_value
end
def update_dependents
traverse{ |formula|
formula.evaluate( true )
}
end
def to_s
'<%s "%s" value=%.2f>' % [ self.class.name, @name, @value ]
end
end
class Formula < Cell
def initialize( *args )
super
@formula, @value = @value.to_s, 0.0
update_dependencies
end
def update_dependencies
clear_incoming
@formula.scan( /@(\w+)/ ).flatten.each{ |source_name|
if incoming = Cell[ source_name.to_sym ]
incoming.add_edge_to( self )
end
}
end
def evaluate( skip_dependents = false )
scope = ValueSpace.new
Cell.each{ |n| scope.set( n.name, n.value ) }
@value = eval( @formula, scope.get_binding )
puts "Updated #{self}"
update_dependents unless skip_dependents
end
def to_s
'<%s "%s" formula="%s" value=%.2f>' % [ self.class.name, @name,
@formula, @value ]
end
end
class ValueSpace
def set( name, value )
instance_variable_set :"@#{name}", value
end
def get_binding
binding
end
end
if $0 == __FILE__
Cell.new( :a, 5 )
Cell.new( :d, 3 )
Cell.new( :e, 32 )
Cell.new( :i, 10 )
Cell.new( :l, 12 )
Formula.new( :b, "@d + @e + @a" )
Formula.new( :g, "@h * @i" ).value = 20
Formula.new( :h, "@g / @i" ).value = 2
Formula.new( :c, "@a + 3" )
Formula.new( :f, "@b + @c" )
Formula.new( :j, "@c + 8" )
Formula.new( :k, "@l + @j" )
Formula.new( :m, "@l + 10" )
# Need to call here to handle circular dependencies
Formula.each{ |f| f.update_dependencies }
# Naive approach...
puts "Naive initial evaluation..."
Formula.each{ |f| f.evaluate }
puts "*" * 40
puts "Initial values..."
puts Cell.sorted
puts "*" * 40
puts "Changing 'a' to 7..."
Cell[ :a ].value = 7
puts "*" * 40
puts "Changing 'i' to 36..."
Cell[ :i ].value = 36
puts '-' * 40
puts "Per cell dependency..."
Cell.each{ |cell|
puts "#{cell.name} -> #{cell.outgoing_edges.map{|c| c.name}.join(',
')}"
}
end
(which outputs)
Naive initial evaluation...
Updated <Formula "c" formula="@a + 3" value=8.00>
Updated <Formula "f" formula="@b + @c" value=8.00>
Updated <Formula "j" formula="@c + 8" value=16.00>
Updated <Formula "k" formula="@l + @j" value=28.00>
Updated <Formula "f" formula="@b + @c" value=8.00>
Updated <Formula "j" formula="@c + 8" value=16.00>
Updated <Formula "k" formula="@l + @j" value=28.00>
Updated <Formula "b" formula="@d + @e + @a" value=40.00>
Updated <Formula "f" formula="@b + @c" value=48.00>
Updated <Formula "k" formula="@l + @j" value=28.00>
Updated <Formula "g" formula="@h * @i" value=20.00>
Updated <Formula "h" formula="@g / @i" value=2.00>
Updated <Formula "m" formula="@l + 10" value=22.00>
Updated <Formula "h" formula="@g / @i" value=2.00>
Updated <Formula "g" formula="@h * @i" value=20.00>
****************************************
Initial values...
<Cell "a" value=5.00>
<Formula "b" formula="@d + @e + @a" value=40.00>
<Formula "c" formula="@a + 3" value=8.00>
<Cell "d" value=3.00>
<Cell "e" value=32.00>
<Formula "f" formula="@b + @c" value=48.00>
<Formula "g" formula="@h * @i" value=20.00>
<Formula "h" formula="@g / @i" value=2.00>
<Cell "i" value=10.00>
<Formula "j" formula="@c + 8" value=16.00>
<Formula "k" formula="@l + @j" value=28.00>
<Cell "l" value=12.00>
<Formula "m" formula="@l + 10" value=22.00>
****************************************
Changing 'a' to 7...
Updated <Formula "b" formula="@d + @e + @a" value=42.00>
Updated <Formula "f" formula="@b + @c" value=50.00>
Updated <Formula "c" formula="@a + 3" value=10.00>
Updated <Formula "f" formula="@b + @c" value=52.00>
Updated <Formula "j" formula="@c + 8" value=18.00>
Updated <Formula "k" formula="@l + @j" value=30.00>
****************************************
Changing 'i' to 36...
Updated <Formula "h" formula="@g / @i" value=0.56>
Updated <Formula "g" formula="@h * @i" value=20.00>
Updated <Formula "g" formula="@h * @i" value=20.00>
Updated <Formula "h" formula="@g / @i" value=0.56>
----------------------------------------
Per cell dependency...
c -> f, j
e -> b
f ->
i -> h, g
l -> k, m
j -> k
b -> f
k ->
g -> h
m ->
a -> b, c
h -> g
d -> b