Speed Golf - Remove Early Dups

Discussion in 'Ruby' started by Phrogz, Dec 6, 2005.

  1. Phrogz

    Phrogz Guest

    SUMMARY
    What's the fastest and/or shortest you can turn this:
    in = ["i", "v", "w", "e", "l", "d", "u", "f", "e", "v", "f", "e",
    "d", "e", "w", "d"]

    into this:
    ["i", "l", "u", "v", "f", "e", "w", "d"]
    ?


    DETAILS
    The goal is, given a stream of items, to produce the same ordered
    stream with no duplicates, where the relative position of the last item
    wins. The shortest (and fastest?) for me turns out to be simply:
    in.reverse.uniq.reverse

    I have the question mark because the in-place version is sometimes
    faster, sometimes slower in my tests:
    out = in.dup; out.reverse!; out.uniq!; out.reverse!

    I thought it might be faster to loop through the array in one pass
    myself, but that turns out to be about 4x slower:
    seen = {}
    out = in.dup
    out.each_with_index{ |v,i|
    if n=h[v]
    out[n] = nil
    end
    h[v] = i
    }
    out.compact!

    The input for me isn't REALLY an array, but rather a series of items
    that I receive one at a time from a depth-first traversal of a graph.
    If this influences your answer, so be it :)


    BACKGROUND
    Why is this problem mildly interesting? See
    http://phrogz.net/nodes/traversingdirectedgraph.asp

    I was reading that page for nostalgia the other day and thought about
    porting the calculation library to Ruby for fun, and to see how much
    easier solving the problem would be in Ruby.

    Here's the code I put together for fun (the cells and formulae in the
    below correspond to the graphic on the above url), before adding the
    'minimal update chain' optimization.

    require 'set'
    module ClassContainer
    def initialize( *args )
    self.class[ self.name ] = self
    super
    end

    def self.included( klass )
    klass.class_eval{
    def self.inherited( subklass )
    ( @subclasses ||= [] ) << subklass
    end

    def self.all
    @subclasses ||= []
    @instances ||= {}
    all = @instances.dup
    @subclasses.each{ |sub| all.merge!( sub.all ) }
    all
    end

    def self.each
    all.each_value{ |i| yield i }
    end

    def self.[]( name )
    all[ name ]
    end

    def self.[]=( name, instance )
    ( @instances ||= {} )[ name ] = instance
    end

    def self.sorted
    all.values.sort_by{ |n| n.name.to_s }
    end
    }
    end
    end

    module GraphNode
    attr_reader :eek:utgoing_edges, :incoming_edges

    def initialize( *args )
    @outgoing_edges = Set.new
    @incoming_edges = Set.new
    end

    def edges
    @outgoing_edges + @incoming_edges
    end

    def add_edge_to( other_node )
    @outgoing_edges << other_node
    other_node.incoming_edges << self
    self
    end

    def clear_incoming
    @incoming_edges.each{ |n|
    n.outgoing_edges.delete self
    }
    @incoming_edges = Set.new
    self
    end

    def traverse( seen_nodes = Set.new, &block )
    new_seen = seen_nodes.dup << self
    @outgoing_edges.each{ |destination|
    unless seen_nodes.include?( destination )
    yield destination
    destination.traverse( new_seen, &block )
    end
    }
    end
    end

    class Cell
    include GraphNode
    include ClassContainer

    attr_reader :name
    attr_accessor :value
    def initialize( name, value=0 )
    raise "Duplicate ID" if Cell[ name ]
    @name = name
    @value = value * 1.0

    super
    end

    def value=( new_value )
    # Naive update approach
    return if @value == new_value
    @value = new_value
    update_dependents
    new_value
    end

    def update_dependents
    traverse{ |formula|
    formula.evaluate( true )
    }
    end

    def to_s
    '<%s "%s" value=%.2f>' % [ self.class.name, @name, @value ]
    end
    end

    class Formula < Cell
    def initialize( *args )
    super
    @formula, @value = @value.to_s, 0.0
    update_dependencies
    end

    def update_dependencies
    clear_incoming
    @formula.scan( /@(\w+)/ ).flatten.each{ |source_name|
    if incoming = Cell[ source_name.to_sym ]
    incoming.add_edge_to( self )
    end
    }
    end

    def evaluate( skip_dependents = false )
    scope = ValueSpace.new
    Cell.each{ |n| scope.set( n.name, n.value ) }
    @value = eval( @formula, scope.get_binding )
    puts "Updated #{self}"
    update_dependents unless skip_dependents
    end

    def to_s
    '<%s "%s" formula="%s" value=%.2f>' % [ self.class.name, @name,
    @formula, @value ]
    end
    end

    class ValueSpace
    def set( name, value )
    instance_variable_set :"@#{name}", value
    end
    def get_binding
    binding
    end
    end

    if $0 == __FILE__
    Cell.new( :a, 5 )
    Cell.new( :d, 3 )
    Cell.new( :e, 32 )
    Cell.new( :i, 10 )
    Cell.new( :l, 12 )

    Formula.new( :b, "@d + @e + @a" )
    Formula.new( :g, "@h * @i" ).value = 20
    Formula.new( :h, "@g / @i" ).value = 2
    Formula.new( :c, "@a + 3" )
    Formula.new( :f, "@b + @c" )
    Formula.new( :j, "@c + 8" )
    Formula.new( :k, "@l + @j" )
    Formula.new( :m, "@l + 10" )

    # Need to call here to handle circular dependencies
    Formula.each{ |f| f.update_dependencies }

    # Naive approach...
    puts "Naive initial evaluation..."
    Formula.each{ |f| f.evaluate }

    puts "*" * 40
    puts "Initial values..."
    puts Cell.sorted

    puts "*" * 40
    puts "Changing 'a' to 7..."
    Cell[ :a ].value = 7

    puts "*" * 40
    puts "Changing 'i' to 36..."
    Cell[ :i ].value = 36

    puts '-' * 40
    puts "Per cell dependency..."
    Cell.each{ |cell|
    puts "#{cell.name} -> #{cell.outgoing_edges.map{|c| c.name}.join(',
    ')}"
    }

    end

    (which outputs)

    Naive initial evaluation...
    Updated <Formula "c" formula="@a + 3" value=8.00>
    Updated <Formula "f" formula="@b + @c" value=8.00>
    Updated <Formula "j" formula="@c + 8" value=16.00>
    Updated <Formula "k" formula="@l + @j" value=28.00>
    Updated <Formula "f" formula="@b + @c" value=8.00>
    Updated <Formula "j" formula="@c + 8" value=16.00>
    Updated <Formula "k" formula="@l + @j" value=28.00>
    Updated <Formula "b" formula="@d + @e + @a" value=40.00>
    Updated <Formula "f" formula="@b + @c" value=48.00>
    Updated <Formula "k" formula="@l + @j" value=28.00>
    Updated <Formula "g" formula="@h * @i" value=20.00>
    Updated <Formula "h" formula="@g / @i" value=2.00>
    Updated <Formula "m" formula="@l + 10" value=22.00>
    Updated <Formula "h" formula="@g / @i" value=2.00>
    Updated <Formula "g" formula="@h * @i" value=20.00>
    ****************************************
    Initial values...
    <Cell "a" value=5.00>
    <Formula "b" formula="@d + @e + @a" value=40.00>
    <Formula "c" formula="@a + 3" value=8.00>
    <Cell "d" value=3.00>
    <Cell "e" value=32.00>
    <Formula "f" formula="@b + @c" value=48.00>
    <Formula "g" formula="@h * @i" value=20.00>
    <Formula "h" formula="@g / @i" value=2.00>
    <Cell "i" value=10.00>
    <Formula "j" formula="@c + 8" value=16.00>
    <Formula "k" formula="@l + @j" value=28.00>
    <Cell "l" value=12.00>
    <Formula "m" formula="@l + 10" value=22.00>
    ****************************************
    Changing 'a' to 7...
    Updated <Formula "b" formula="@d + @e + @a" value=42.00>
    Updated <Formula "f" formula="@b + @c" value=50.00>
    Updated <Formula "c" formula="@a + 3" value=10.00>
    Updated <Formula "f" formula="@b + @c" value=52.00>
    Updated <Formula "j" formula="@c + 8" value=18.00>
    Updated <Formula "k" formula="@l + @j" value=30.00>
    ****************************************
    Changing 'i' to 36...
    Updated <Formula "h" formula="@g / @i" value=0.56>
    Updated <Formula "g" formula="@h * @i" value=20.00>
    Updated <Formula "g" formula="@h * @i" value=20.00>
    Updated <Formula "h" formula="@g / @i" value=0.56>
    ----------------------------------------
    Per cell dependency...
    c -> f, j
    e -> b
    f ->
    i -> h, g
    l -> k, m
    j -> k
    b -> f
    k ->
    g -> h
    m ->
    a -> b, c
    h -> g
    d -> b
    Phrogz, Dec 6, 2005
    #1
    1. Advertising

  2. Phrogz

    ako... Guest

    perhaps in.reverse!.uniq!.reverse! ?
    ako..., Dec 6, 2005
    #2
    1. Advertising

  3. Phrogz

    ako... Guest

    i meant in.reverse.uniq!.reverse! (the first reverse is non-destructive)
    ako..., Dec 6, 2005
    #3
  4. Phrogz wrote:
    > SUMMARY
    > What's the fastest and/or shortest you can turn this:
    > in = ["i", "v", "w", "e", "l", "d", "u", "f", "e", "v", "f", "e",
    > "d", "e", "w", "d"]
    >
    > into this:
    > ["i", "l", "u", "v", "f", "e", "w", "d"]
    > ?
    >
    >
    > DETAILS
    > The goal is, given a stream of items, to produce the same ordered
    > stream with no duplicates, where the relative position of the last item
    > wins. The shortest (and fastest?) for me turns out to be simply:
    > in.reverse.uniq.reverse
    >

    ....
    > The input for me isn't REALLY an array, but rather a series of items
    > that I receive one at a time from a depth-first traversal of a graph.
    > If this influences your answer, so be it :)


    In = ["i", "v", "w", "e", "l", "d", "u", "f", "e", "v", "f", "e",
    "d", "e", "w", "d"]

    out = []

    In.each{ |x| out.delete( x ); out << x }

    p out
    William James, Dec 6, 2005
    #4
  5. Phrogz

    ako... Guest

    does this have a quadratic time complexity? doing this by sorting might
    be faster... or am i mistaken?

    konstantin
    ako..., Dec 6, 2005
    #5
  6. On Dec 5, 2005, at 6:47 PM, ako... wrote:
    > does this have a quadratic time complexity? doing this by sorting
    > might
    > be faster... or am i mistaken?


    Quadratic? No. The suggested algorithms are O(2n) or O(3n).
    Gavin Kistner, Dec 6, 2005
    #6
  7. On Dec 5, 2005, at 6:27 PM, ako... wrote:
    > i meant in.reverse.uniq!.reverse! (the first reverse is non-
    > destructive)


    Array#uniq! can return nil if no changes were made. This will not
    occur given the above input, but can for the general case.

    Using reverse rather than dup does shave a bit of time off, though,
    which puts it as the winner on my machine. So far:

    Rehearsal --------------------------------------------------------
    William James 4.230000 0.060000 4.290000 ( 4.369104)
    Simple Copies 1.180000 0.010000 1.190000 ( 1.227496)
    Dup and In-Place 1.210000 0.010000 1.220000 ( 1.240726)
    Reverse and In-Place 1.130000 0.010000 1.140000 ( 1.158873)
    Hash and Compact 7.220000 0.090000 7.310000 ( 7.552673)
    ---------------------------------------------- total: 15.150000sec

    user system total real
    William James 4.230000 0.040000 4.270000 ( 4.465252)
    Simple Copies 1.190000 0.010000 1.200000 ( 1.218064)
    Dup and In-Place 1.220000 0.010000 1.230000 ( 1.268994)
    Reverse and In-Place 1.120000 0.010000 1.130000 ( 1.152538)
    Hash and Compact 7.220000 0.060000 7.280000 ( 7.511923)


    input = ["i", "v", "w", "e", "l", "d", "u", "f", "e", "v", "f", "e",
    "d", "e", "w", "d"]

    require 'benchmark'
    Benchmark.bmbm( 20 ) do |r|
    n = 100_000

    r.report( 'William James' ){
    n.times{
    out = []
    input.each{ |x| out.delete( x ); out << x }
    }
    }

    r.report( 'Simple Copies' ){
    n.times{
    input.reverse.uniq.reverse
    }
    }

    r.report( 'Dup and In-Place' ){
    n.times{
    out = input.dup
    out.reverse!
    out.uniq!
    out.reverse!
    out
    }
    }

    r.report( 'Reverse and In-Place' ){
    n.times{
    out = input.reverse
    out.uniq!
    out.reverse!
    out
    }
    }

    r.report( "Hash and Compact" ){
    n.times{
    seen = {}
    out = input.dup
    out.each_with_index{ |val,idx|
    if old_idx = seen[ val ]
    out[ old_idx ] = nil
    end
    seen[ val ] = idx
    }
    }
    }

    end
    Gavin Kistner, Dec 6, 2005
    #7
  8. On Tue, 06 Dec 2005 02:27:44 +0100, ako... <> wrote:

    > i meant in.reverse.uniq!.reverse! (the first reverse is non-destructive=

    )

    Don't chain bang methods:

    >> ["i", "l", "u", "v", "f", "e", "w", "d"].reverse.uniq!.reverse!

    NoMethodError: undefined method `reverse!' for nil:NilClass
    from (irb):3
    from :0

    They return nil if nothing changed...
    Dominik Bathon, Dec 6, 2005
    #8
  9. Hi --

    On Tue, 6 Dec 2005, Phrogz wrote:

    > SUMMARY
    > What's the fastest and/or shortest you can turn this:
    > in = ["i", "v", "w", "e", "l", "d", "u", "f", "e", "v", "f", "e",
    > "d", "e", "w", "d"]
    >
    > into this:
    > ["i", "l", "u", "v", "f", "e", "w", "d"]
    > ?
    >
    >
    > DETAILS
    > The goal is, given a stream of items, to produce the same ordered
    > stream with no duplicates, where the relative position of the last item
    > wins. The shortest (and fastest?) for me turns out to be simply:
    > in.reverse.uniq.reverse


    This entry is not in contention on shortness or speed (don't even bother
    to benchmark it -- it's completely off the charts), but I just thought it
    looked really cool :)

    a.inject([]){|r,*b|r-b+b}


    David
    __
    David A. Black


    "Ruby for Rails", forthcoming from Manning Publications, April 2006!
    David A. Black, Dec 6, 2005
    #9
  10. Phrogz

    ako... Guest

    i do not see why not. this is almost exactly a bubble sort. which is in
    worst case quadratic.
    ako..., Dec 6, 2005
    #10
  11. On Dec 6, 2005, at 12:17 AM, ako... wrote:
    > i do not see why not. this is almost exactly a bubble sort. which
    > is in
    > worst case quadratic.


    It's not an arbitrary re-sorting of all the elements involved. The
    elements are all there, in the right order - you just have to weed
    out the few bad ones.
    Gavin Kistner, Dec 6, 2005
    #11
  12. Phrogz

    ako... Guest

    my reasoning was that the code loops over the array and on every
    iteration sequentially searches another array of the same length. this
    is similar to bubble sort where on every iteration the current element
    is moved to the beginning of the array until its place in the order is
    found. but this is not really important, so let us not argue over this
    ; -)
    konstantin
    ako..., Dec 6, 2005
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Ham

    I need speed Mr .Net....speed

    Ham, Oct 28, 2004, in forum: ASP .Net
    Replies:
    6
    Views:
    2,317
    Antony Baula
    Oct 29, 2004
  2. DaveF

    remove dups from ArrayList

    DaveF, Nov 21, 2004, in forum: ASP .Net
    Replies:
    1
    Views:
    8,185
    Anders NorĂ¥s
    Nov 21, 2004
  3. efiedler
    Replies:
    1
    Views:
    2,017
    Tim Ward
    Oct 9, 2003
  4. Replies:
    2
    Views:
    2,269
    Howard
    Apr 28, 2004
  5. David C

    AppendDataBoundItems giving dups

    David C, Jun 25, 2007, in forum: ASP .Net
    Replies:
    2
    Views:
    1,368
    David C
    Jun 25, 2007
Loading...

Share This Page