[ANN] To Yield or Not to Yield: An Inferable Question

Discussion in 'Ruby' started by Michael Edgar, Apr 14, 2011.

  1. Hi Rubyists,

    My undergraduate thesis is focused on trying to answer interesting =
    questions about Ruby
    code statically. I've written a blog post about my first novel result: =
    attempting to infer
    whether a method uses `yield`, relative to the value of `block_given?`.

    http://carboni.ca/blog/p/To-Yield-or-Not-to-Yield-An-Inferable-Question

    I hope you all find it interesting, and I'd be happy to discuss, answer =
    questions and provide
    clarifications either privately or in ruby-talk.

    Michael Edgar

    http://carboni.ca/
    Michael Edgar, Apr 14, 2011
    #1
    1. Advertising

  2. Re: To Yield or Not to Yield: An Inferable Question

    Points I'd raise:

    1. In my experience, very little real-world Ruby code uses
    'block_given?'. If it needs to yield, it just yields. I'd consider this
    to be a case of duck-typing.

    With yield you get a run-time error if no block was passed, but that's
    only one of a much larger set of method call errors (such as calling a
    method with argument of the wrong type).

    Consider also that very little code tests 'a.respond_to? :foo' before
    calling 'a.foo'.

    2. If a method uses &blk or Proc.new or yield, I'd say it's fairly safe
    to assume that the block *may* be called (at least from the point of
    view of automated documentation). Since it's unprovable in general even
    whether the method returns or not, it seems like hard work (for little
    benefit) to try to decide whether a method which accepts a block *never*
    actually calls it.

    3. As you're undoubtedly aware, Ruby is so dynamic that you can't
    analyse a method in isolation anyway. You can decide that a bareword
    like 'foo' is a method call, but you don't know what that method will
    actually do when the program is run - it could be redefined dynamically,
    either within a class or on single objects (in their singleton class).

    # in file one
    class Foo
    def foo
    true
    end
    def bar
    yield 123 if foo # yields, obviously
    end
    end

    # in file two
    a = Foo.new
    def a.foo; false; end
    a.bar { |x| puts "I got #{x}" } # actually it doesn't

    That's an admittedly contrived example, but dynamic method definition
    occurs quite a lot in real applications, e.g. web frameworks like Rails.

    Regards,

    Brian.

    --
    Posted via http://www.ruby-forum.com/.
    Brian Candler, Apr 17, 2011
    #2
    1. Advertising

  3. Re: To Yield or Not to Yield: An Inferable Question

    Hi Brian =96 Thanks for sharing your thoughts! Since much of what you =
    note
    is commonly accepted about Ruby, I am happy that my research is
    subtle enough to warrant such discussion (and interesting enough to get
    an e-mail or two!)=20

    If you don't mind, I'd like to write a blog post sharing your concerns
    (anonymized, naturally) and my responses. Would that be okay?

    > Points I'd raise:
    >=20
    > 1. In my experience, very little real-world Ruby code uses=20
    > 'block_given?'. If it needs to yield, it just yields. I'd consider =

    this=20
    > to be a case of duck-typing.


    This seems to suggest Rubyists rarely write methods that take blocks
    optionally. Of this, I am highly skeptical. Luckily, doing this work in =
    my
    thesis will allow me to study statistics of how block_given? is used.

    > With yield you get a run-time error if no block was passed, but that's=20=


    > only one of a much larger set of method call errors (such as calling a=20=


    > method with argument of the wrong type).


    Correct, but my work intends to show that improper block use, when using =
    yield,
    is a far more easily determined method call error in Ruby than a type =
    error.
    What makes this research fascinating is that Ruby is rich enough to =
    allow for
    such nuance! For a typical language which permits closures as arguments,
    one must use careful alias analysis and escape analysis. `yield` as =
    syntactic
    sugar makes it a much simpler case to analyze, which is why I tackled it =
    first.

    > Consider also that very little code tests 'a.respond_to? :foo' before=20=


    > calling 'a.foo'.


    This does not reflect the intent of this analysis - please see below.

    > 2. If a method uses &blk or Proc.new or yield, I'd say it's fairly =

    safe=20
    > to assume that the block *may* be called (at least from the point of=20=


    > view of automated documentation). Since it's unprovable in general =

    even=20
    > whether the method returns or not, it seems like hard work (for little=20=


    > benefit) to try to decide whether=20


    Nearly everything about a program is undecidable to determine in
    practice - see Rice's Theorem. [1] Luckily, compiler writers and PL =
    theorists
    have been studying forms of analysis for decades to try to get around
    this and discover the patterns that we know can be analyzed. To address
    your example, of course termination is unprovable, but the class
    of functions for which termination is provable includes many, many
    real-world functions. [2] [3]

    > a method which accepts a block *never* actually calls it.



    Here's why this issue is worth tackling: ALL methods accept a block,
    and no matter how trivial, no tools will tell you that passing a block =
    to
    that method was foolish, let alone statically:

    2.+(4) { |x, y| x ** y } #=3D> 6

    Additionally, no tool can tell you that a block is *required* by a =
    method,
    even if it is obvious:

    # No tool currently documents that a block is required here
    def tap
    yield self
    end

    My work does not try to determine each and every case which
    triggers a yield, but merely to develop a coarse classification system =
    for
    a method based on its overall approach to blocks: required, optional,
    or ignored. As I showed in my blog post (and as I will prove in my =
    Thesis),
    this classification can be determined precisely when the result of
    `block_given?` is stored only in simple constants (this includes =
    temporaries)
    when `yield` is used.

    If one peruses the Ruby standard library, one will find that just in the =
    Ruby
    code alone, block_given? occurs 265 times, in *every single case* is =
    used
    to execute yield conditionally, and in every single case, the result is =
    used
    only as a simple constant. [4]

    >=20
    > 3. As you're undoubtedly aware, Ruby is so dynamic that you can't=20
    > analyse a method in isolation anyway. You can decide that a bareword=20=


    > like 'foo' is a method call, but you don't know what that method will=20=


    > actually do when the program is run - it could be redefined =

    dynamically,=20
    > either within a class or on single objects (in their singleton class).
    >=20


    Yes, this is one of the difficulties inherent in statically analyzing a =
    dynamic
    language. Luckily, Laser does not analyze single methods, it works on
    a set of input files and traverses requires/loads by using constant =
    propagation to
    handle changes to $LOAD_PATH and $LOADED_FEATURES. As you note,
    a na=EFve approach doesn't work, and having access to all input files is =
    very
    important. There is code that will be very hard to handle: see =
    SortedSet.setup's
    code as an example for which I haven't figured out an approach just yet.

    Dynamic method creation is, in my opinion, what challenges static =
    analysis
    in Ruby the most. Naturally, in the general case, it makes all analysis =
    impossible.
    What tool could figure out much about a program containing this code?

    def Object.inherited(klass)
    def klass.inherited(some_class)
    some_class.class_eval(gets)
    end
    klass.class_eval(gets)
    end

    My belief, whose validity my research hopes to support (but may =
    ultimately
    reject, or somewhere in the middle) is that such pathological code is =
    less
    of an issue in real-world application code. I do not expect a library =
    like
    RSpec, whose internals are full of dynamic magic, to get as much out of
    my research. This is the biggest challenge ahead of me. Luckily, =
    existing
    work has seen success analyzing real-world code without even touching
    on this issue. [5]

    Thanks again for your interest! I hope my work continues to interest you
    as I continue over the coming months.

    References (sorry, I've only got Bibtex for some of these for now):

    [1] http://en.wikipedia.org/wiki/Rice's_theorem

    [2] @article{cook2006termination,
    title=3D{{Termination proofs for systems code}},
    author=3D{Cook, B. and Podelski, A. and Rybalchenko, A.},
    journal=3D{ACM SIGPLAN Notices},
    volume=3D{41},
    number=3D{6},
    pages=3D{415--426},
    issn=3D{0362-1340},
    year=3D{2006},
    publisher=3D{ACM}
    }

    [3] @article{andreas6terminator,
    title=3D{{Terminator: Beyond safety}},
    author=3D{Andreas, R.C. and Cook, B. and Podelski, A. and Rybalchenko, =
    A.},
    journal=3D{In CAV=9206, LNCS},
    volume=3D{4144},
    pages=3D{415--418}
    }

    [4] ack --ruby -c "block_given\\?" | grep -e ':[^0]$' | cut -d':' -f2 | =
    awk '{s+=3D$1} END {print s}'
    gives the quantity, and using a context-ful grep is enough to see the =
    usage patterns of
    each call. Almost every single call lies in an "if" or "unless" =
    condition, or the condition of
    the ternary operator, and the result is not stored to a variable. =
    lib/time.rb:264 has an example
    justifying my analysis of where block_given? is called once, its result =
    stored in a variable,
    and then that variable is used as a constant to conditionally yield.

    [5] @article{ecstatic,
    title=3D{{Ecstatic--Type Inference for Ruby Using the Cartesian =
    Product Algorithm}},
    author=3D{Kristensen, K.},
    journal=3D{Master's thesis, Aalborg University},
    year=3D{2007}
    }

    Michael Edgar

    http://carboni.ca/
    Michael Edgar, Apr 17, 2011
    #3
  4. Michael Edgar

    Ryan Davis Guest

    Re: To Yield or Not to Yield: An Inferable Question

    On Apr 17, 2011, at 14:40 , Michael Edgar wrote:

    > Hi Brian =96 Thanks for sharing your thoughts! Since much of what you =

    note
    > is commonly accepted about Ruby, I am happy that my research is
    > subtle enough to warrant such discussion (and interesting enough to =

    get
    > an e-mail or two!)=20
    >=20
    > If you don't mind, I'd like to write a blog post sharing your concerns
    > (anonymized, naturally) and my responses. Would that be okay?
    >=20
    >> Points I'd raise:
    >>=20
    >> 1. In my experience, very little real-world Ruby code uses=20
    >> 'block_given?'. If it needs to yield, it just yields. I'd consider =

    this=20
    >> to be a case of duck-typing.

    >=20
    > This seems to suggest Rubyists rarely write methods that take blocks
    > optionally. Of this, I am highly skeptical.=20


    You should be highly skeptical.

    =46rom our seattle.rb projects:

    % ack -l block_given? */dev/{lib,test} | wc -l
    28

    And from my gauntlet setup:

    % ls | wc -l
    20245
    % find ~/.gauntlet -type f | xargs zgrep -l block_given? | wc -l
    4715

    So roughly 1 in 4 gems in my gauntlet downloads use block_given?

    I think that makes it clear that your work can provide a lot of insight.
    Ryan Davis, Apr 18, 2011
    #4
  5. On Thu, Apr 14, 2011 at 7:47 PM, Michael Edgar <> wrote:
    > My undergraduate thesis is focused on trying to answer interesting questions about Ruby
    > code statically. I've written a blog post about my first novel result: attempting to infer
    > whether a method uses `yield`, relative to the value of `block_given?`.
    >
    > http://carboni.ca/blog/p/To-Yield-or-Not-to-Yield-An-Inferable-Question


    I only get 404 for that link (even with ".html" appended). Is this
    the proper link?

    Kind regards

    robert

    --
    remember.guy do |as, often| as.you_can - without end
    http://blog.rubybestpractices.com/
    Robert Klemme, Apr 18, 2011
    #5
  6. It is the correct link, we had some unscheduled downtime. My apologies. =
    Give it
    a couple minutes for the unicorns to kick in.

    Michael Edgar

    http://carboni.ca/

    On Apr 18, 2011, at 5:34 AM, Robert Klemme wrote:

    > On Thu, Apr 14, 2011 at 7:47 PM, Michael Edgar <> =

    wrote:
    >> My undergraduate thesis is focused on trying to answer interesting =

    questions about Ruby
    >> code statically. I've written a blog post about my first novel =

    result: attempting to infer
    >> whether a method uses `yield`, relative to the value of =

    `block_given?`.
    >>=20
    >> =

    http://carboni.ca/blog/p/To-Yield-or-Not-to-Yield-An-Inferable-Question
    >=20
    > I only get 404 for that link (even with ".html" appended). Is this
    > the proper link?
    >=20
    > Kind regards
    >=20
    > robert
    >=20
    > --=20
    > remember.guy do |as, often| as.you_can - without end
    > http://blog.rubybestpractices.com/
    >=20
    Michael Edgar, Apr 18, 2011
    #6
  7. On Mon, Apr 18, 2011 at 11:46 AM, Michael Edgar <> wrote:
    > It is the correct link, we had some unscheduled downtime. My apologies. Give it
    > a couple minutes for the unicorns to kick in.


    Now I see it. Thanks! This looks interesting and I think this is
    something to muse about further. On first glance I only noticed the
    complete absence of another case of "optional block" apart from calls
    guarded by block_given? or tests for &b parameter to be non nil:
    caught exceptions

    irb(main):003:0> def foo
    irb(main):004:1> yield
    irb(main):005:1> rescue LocalJumpError
    irb(main):006:1> end
    => nil
    irb(main):007:0> foo
    => nil
    irb(main):008:0> foo { puts "called" }
    called
    => nil

    irb(main):009:0> LocalJumpError.ancestors
    => [LocalJumpError, StandardError, Exception, Object, Kernel, BasicObject]

    irb(main):010:0> def bar; yield rescue LocalJumpError;end
    => nil
    irb(main):011:0> bar
    => LocalJumpError
    irb(main):012:0> bar { puts "called" }
    called
    => nil

    That will also be tricky since there are multiple exceptions that can
    be caught to make the failed call "disappear" plus you can have
    arbitrary nesting of begin - rescue - end blocks which can depend on
    each other in bad ways (although these are more on the side of
    pathological code).

    Kind regards

    robert

    --
    remember.guy do |as, often| as.you_can - without end
    http://blog.rubybestpractices.com/
    Robert Klemme, Apr 18, 2011
    #7
  8. Robert,

    Excellent, and thank you! It seems I had let myself be tricked by the =
    more common use of "block_given?" into forgetting the actual semanticsof =
    a failed raise. My current analysis is equivalent to just skipping the =
    exception handling part, and assuming the exception isn't handled. In =
    fact, I realize now that I should not even have a special "yield" =
    instruction, but instead lower it even further to something roughly like =
    this:

    %temp =3D HiddenAnalyzerMagic.current_block_if_any
    if %temp
    %temp.call(...)
    else
    raise LocalJumpError.new('no block given (yield)', __FILE__, __LINE__)
    end

    In the "failed yield" case, constant propagation would take over, prune =
    the if-true "%temp.call(...)" branch above, and then the only remaining =
    branch does a raise. Raise just sets $! to the LocalJumpError constant =
    and unconditionally jumps to a copy of the rescue handler. $! is =
    read-only, so constant propagation works on it too, even though it's a =
    global.

    A rescue handler is just a bunch of #=3D=3D=3D calls and jumps, so a =
    *sane* rescue handler which has constants in its rescue clause is =
    actually just a bunch of (some_constant =3D=3D=3D $!) calls and branches =
    on the result. Constant propagation can handle that, eliminating any =
    rescue handlers that fail to match the LocalJumpError. If the exception =
    is caught and not re-raised, the only path left in the rescue handler =
    will lead out of the handler: it's optional! If it fails to be caught =
    (either always or sometimes), there will be a path left to the next =
    exception handler (or the Exit), via an error-path, and if that LJE =
    continues to have an uncaught path until the Exit, then the yield is a =
    required one.

    Some graphviz graphs would help illustrate this, but I don't want to =
    spam up the list too much with big PNGs.

    As usual, that's if you have constants and pure methods as your rescue =
    handler. It all hopes you don't do something like this (Ruby 1.9 only):

    Handler =3D Object.new
    def Handler.=3D=3D=3D(other)
    # analyzer definitely not smart enough to know rand(10) always < 10
    other.message.size > rand(10)
    end

    def foo
    begin
    yield
    rescue Handler
    # always caught, because the LJE message is longer than 10 chars.
    end
    end

    foo is block-optional, but the analyzer as implemented would say it is =
    block-required, as it can't prove the exception is always caught. Ouch.

    Michael Edgar

    http://carboni.ca/

    On Apr 18, 2011, at 7:16 AM, Robert Klemme wrote:

    Now I see it. Thanks! This looks interesting and I think this =
    issomething to muse about further. On first glance I only noticed =
    thecomplete absence of another case of "optional block" apart from =
    callsguarded by block_given? or tests for &b parameter to be non nil:
    caught exceptions
    Michael Edgar, Apr 18, 2011
    #8
  9. Re: To Yield or Not to Yield: An Inferable Question

    Michael Edgar wrote in post #993395:
    >> 1. In my experience, very little real-world Ruby code uses
    >> 'block_given?'. If it needs to yield, it just yields. I'd consider this
    >> to be a case of duck-typing.

    >
    > This seems to suggest Rubyists rarely write methods that take blocks
    > optionally. Of this, I am highly skeptical.


    Ah, by "optionally" I think you mean "does one thing when a block is
    given, but something else when a block is not given". Now I think some
    more, there is a fairly common case:

    class MyFile
    def self.open(*args)
    file = open_it(*args)
    if block_given?
    begin
    yield file
    ensure
    file.close
    end
    else
    return file
    end
    end
    end

    Code analysis can tell you that it's OK to call the method either with
    or without a block (at least assuming no pathological use cases, like
    redefining 'block_given?')

    Regards,

    Brian.

    --
    Posted via http://www.ruby-forum.com/.
    Brian Candler, Apr 18, 2011
    #9
  10. Re: To Yield or Not to Yield: An Inferable Question

    Michael Edgar wrote in post #993395:
    > If one peruses the Ruby standard library, one will find that just in the
    > Ruby
    > code alone, block_given? occurs 265 times, in *every single case* is
    > used
    > to execute yield conditionally, and in every single case, the result is
    > used
    > only as a simple constant. [4]


    However there are some cases where this is done unnecessarily,
    net/telnet.rb being the prime example. e.g.

    if block_given?
    waitfor({"Prompt" => match, "Timeout" => time_out}){|c| yield c
    }
    else
    waitfor({"Prompt" => match, "Timeout" => time_out})
    end

    could have been written simply as:

    waitfor({"Prompt" => match, "Timeout" => time_out}, &blk)

    Net::Telnet also has a load of conditionals because it lets you pass an
    optional block to each call for capturing debug information - an awkward
    API to use, because often you end up passing the same block every time.
    It would have been much easier to pass this in the options hash where it
    could have been set as a default.

    e.g.

    t = Net::Telnet.new("Debug" => lambda { |c| print c }, ...)
    t.cmd("foo")
    t.cmd("bar")
    t.cmd("baz")

    whereas as the moment you have to write

    t = Net::Telnet.new(...)
    out = lambda { |c| print c }
    t.cmd("foo",&out)
    t.cmd("bar",&out)
    t.cmd("baz",&out)

    Also, a Debug parameter could invoke the "<<" method instead of "call",
    which would make it usable with Files and Strings. Then Proc#<< could be
    aliased to call, and duck-typing would suddenly become a lot prettier.
    There would also be no need for Enumerator::Yielder either.

    Sorry, I've strayed right off there :)

    --
    Posted via http://www.ruby-forum.com/.
    Brian Candler, Apr 20, 2011
    #10
  11. Re: To Yield or Not to Yield: An Inferable Question

    On Apr 20, 2011, at 4:59 AM, Brian Candler wrote:

    > However there are some cases where this is done unnecessarily,=20
    > net/telnet.rb being the prime example. e.g.
    >=20
    > if block_given?
    > waitfor({"Prompt" =3D> match, "Timeout" =3D> time_out}){|c| =

    yield c=20
    > }
    > else
    > waitfor({"Prompt" =3D> match, "Timeout" =3D> time_out})
    > end
    >=20
    > could have been written simply as:
    >=20
    > waitfor({"Prompt" =3D> match, "Timeout" =3D> time_out}, &blk)


    I don't know much about the Telnet library, so I'll comment only on how =
    this affects analysis.

    If you rewrote the code you provided as you sugested, the question of =
    whether the block is used then simply depends on whether `waitfor` calls =
    it. No matter how pathologically you write that method, if you introduce =
    the current block as a variable (either via Proc::new or as an explicit =
    block argument, or ...) an analyzer should assume that `waitfor(..., =
    &blk)` may refer to the currently active block, unless it can prove =
    otherwise.

    So the question becomes: how are blocks used by Net::Telnet#waitfor, and =
    all overrides of #waitfor by subclasses which in turn do not override =
    #cmd without invoking super? In other words, resolve the call to =
    #waitfor, and recursively analyze the yield behavior of all possible =
    targets of that method call. If analysis worked on one method, it will =
    work on #waitfor ! Indeed, the only definition of #waitfor I could find =
    in the standard library has only two calls to yield:

    yield buf if block_given? # telnet.rb:594

    and

    yield nil if block_given? # telnet.rb:599

    The hard part is "resolve the call to #waitfor". My belief is that =
    method resolution is undecidable in Ruby, though I haven't proven it =
    just yet. The compiler writers live with this fact and haven't yet gone =
    nuts, for which we owe them our sincerest gratitude. But in designing a =
    linter, one is permitted to occasionally take shortcuts, perhaps even =
    opinionated ones! While I must do my best to accommodate dynamic =
    behavior, I personally have no issue with giving an incorrect analysis =
    if you are nondeterministically creating a subclass of Net::Telnet and =
    overriding methods.

    Somewhere down the line, it may be reasonable to turn off certain =
    optimistic assumptions such as "by the time I analyze this method, I =
    have seen definitions (using `def`, `eval(constant_string)`, =
    `define_method(constant)`, ...) of all possible methods it may call." =
    For now though, purely conservative inference is not yet my focus.

    Michael Edgar

    http://carboni.ca/=
    Michael Edgar, Apr 20, 2011
    #11
  12. Re: To Yield or Not to Yield: An Inferable Question

    You are of course right in your analysis.

    My point is more that using "block_given?" in itself is API smell. It
    means you have one method which can be called in two different ways,
    with two different behaviours.

    The other main example in the core library (1.8.7+) is methods like
    'each' which return an Enumerator if you don't pass a block. I don't
    like that either.

    --
    Posted via http://www.ruby-forum.com/.
    Brian Candler, Apr 21, 2011
    #12
  13. Re: To Yield or Not to Yield: An Inferable Question

    On Thu, Apr 21, 2011 at 11:47 AM, Brian Candler <> wrote:
    > You are of course right in your analysis.
    >
    > My point is more that using "block_given?" in itself is API smell. It
    > means you have one method which can be called in two different ways,
    > with two different behaviours.


    The standard and core libraries are full of those (File.open,
    Enumerable methods...).

    > The other main example in the core library (1.8.7+) is methods like
    > 'each' which return an Enumerator if you don't pass a block. I don't
    > like that either.


    I find that utterly convenient. For generating a series of values I
    often use something like

    17.times.map { ... }
    42.times.to_a

    I find that very elegant. Brian, we are (or rather: Ruby is) not
    loosing you, are we? That would be sad.

    Kind regards

    robert

    --
    remember.guy do |as, often| as.you_can - without end
    http://blog.rubybestpractices.com/
    Robert Klemme, Apr 21, 2011
    #13
  14. Re: To Yield or Not to Yield: An Inferable Question

    Robert K. wrote in post #994268:
    > The standard and core libraries are full of those (File.open,


    I'd forgotten about File.open with a block. That *is* good.

    > Brian, we are (or rather: Ruby is) not
    > loosing you, are we?


    Only when 1.8 is end-of-life :)

    Regards,

    Brian.

    --
    Posted via http://www.ruby-forum.com/.
    Brian Candler, Apr 21, 2011
    #14
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    1
    Views:
    313
    Gabriel Genellina
    Apr 22, 2008
  2. TP
    Replies:
    2
    Views:
    232
    Mensanator
    Mar 26, 2009
  3. pap74
    Replies:
    0
    Views:
    667
    pap74
    Jun 18, 2009
  4. syockit
    Replies:
    2
    Views:
    264
    Dave Angel
    Jul 2, 2010
  5. Markus
    Replies:
    1
    Views:
    184
    Mark Hubbart
    Sep 27, 2004
Loading...

Share This Page