How to strip ruby comments in a ruby line of code?

Discussion in 'Ruby' started by Alexandre Mutel, Nov 19, 2009.

  1. Short description : My question is : do you know any available method,
    giving the string of a Ruby line of code, to remove comments from this
    line of code?

    ________________

    Long description :

    For my dsl project, i'm loading my dsl files and applying a small
    preprocess on each line before performing a global instance_eval on the
    preprocessed file.

    Basically, in my dsl language, it is possible to put a label followed by
    a ":" starting at the beginning of a line like this:
    my_label: here_is_a_dsl(arg1, arg2)

    This label may be followed by a dsl instruction.

    The preprocessor is transforming the previous line to this line:

    newLabel:)my_label) { here_is_a_dsl(arg1, arg2) }

    using the following code:
    append = ""
    File.open(file).each do |line|
    match = line.match(/^([a-zA-Z_]\w+):[\s\r\n]+(.*)/)
    if ( match.nil?)
    append += line
    else
    append += "newLabel:)#{match[1]}) { #{match[2]} }\n"
    end
    end

    The problem arise when there is a comment at the end of the input line :
    my_label: here_is_a_dsl(arg1, arg2) # my comments

    It's then generating the following line:
    newLabel:)my_label) { here_is_a_dsl(arg1, arg2) # my comments }

    Meanning that the "}" end block is commented and having a parse error on
    the whole file.

    I could put a newline after the match like this :
    newLabel:)my_label) { here_is_a_dsl(arg1, arg2) # my comments
    }

    Unfornutately, i'm no longer able to debug my dsl language, because the
    lines are not matching the preprocessed line.

    -----

    I would like to have something really simple and not being forced to use
    a full ruby language parser to parse those lines and remove the
    comments.

    Any idea?

    Thanks!
    --
    Posted via http://www.ruby-forum.com/.
    Alexandre Mutel, Nov 19, 2009
    #1
    1. Advertising

  2. Alexandre Mutel wrote:
    > Short description : My question is : do you know any available method,
    > giving the string of a Ruby line of code, to remove comments from this
    > line of code?
    >
    > I would like to have something really simple and not being forced to use
    > a full ruby language parser to parse those lines and remove the
    > comments.
    >
    > Any idea?
    >
    > Thanks!


    I'm still only learning regular expressions (I'll do another shameless
    plug for rubular.com here), but you could do this:
    string = string.match(/^.*#).to_s[0...-1]

    Yes, it's a poor solution, but should you have nothing else, it'll do.
    --
    Posted via http://www.ruby-forum.com/.
    Aldric Giacomoni, Nov 19, 2009
    #2
    1. Advertising

  3. Alexandre Mutel wrote:
    > I could put a newline after the match like this :
    > newLabel:)my_label) { here_is_a_dsl(arg1, arg2) # my comments
    > }
    >
    > Unfornutately, i'm no longer able to debug my dsl language, because the
    > lines are not matching the preprocessed line.


    However if you eval each line individually, then you can pass in the
    source line number.

    def foo; end
    src = "foo\nfoo\nbar"
    src.each_with_index do |line,i|
    eval "#{line} {\n}", binding, "DSL", i+1
    end

    # Result:
    DSL:3: undefined method `bar' for main:Object (NoMethodError)

    Otherwise, if every input line maps to exactly two output lines, you can
    just patch up the line number in the exception by dividing by two.

    src = "foo\nfoo\nbar\n"
    begin
    eval src.gsub(/\n/, "{\n}\n"), binding, "DSL", 1
    rescue => e
    if e.backtrace.first =~ /\A(.*):(\d+)\z/
    e.backtrace.first.replace "#{$1}:#{($2.to_i+1) / 2}"
    end
    raise e
    end
    --
    Posted via http://www.ruby-forum.com/.
    Brian Candler, Nov 19, 2009
    #3
  4. Aldric Giacomoni wrote:
    > I'm still only learning regular expressions (I'll do another shameless
    > plug for rubular.com here), but you could do this:
    > string = string.match(/^.*#).to_s[0...-1]
    >
    > Yes, it's a poor solution, but should you have nothing else, it'll do.


    the problem with your solution is that this line of code will remove
    valid code :
    myvar_s = "#{myvar}"

    The problem is to handle correctly string escape sequence... it's
    possible, but it requires much more work... I just want to know if
    someone else did this?!
    --
    Posted via http://www.ruby-forum.com/.
    Alexandre Mutel, Nov 19, 2009
    #4
  5. Brian Candler wrote:
    > def foo; end
    > src = "foo\nfoo\nbar"
    > src.each_with_index do |line,i|
    > eval "#{line} {\n}", binding, "DSL", i+1
    > end
    >
    > # Result:
    > DSL:3: undefined method `bar' for main:Object (NoMethodError)


    Wooo, thanks Brian!
    --
    Posted via http://www.ruby-forum.com/.
    Alexandre Mutel, Nov 19, 2009
    #5
  6. Alexandre Mutel wrote:
    > Aldric Giacomoni wrote:
    >> I'm still only learning regular expressions (I'll do another shameless
    >> plug for rubular.com here), but you could do this:
    >> string = string.match(/^.*#).to_s[0...-1]
    >>
    >> Yes, it's a poor solution, but should you have nothing else, it'll do.

    >
    > the problem with your solution is that this line of code will remove
    > valid code :
    > myvar_s = "#{myvar}"
    >
    > The problem is to handle correctly string escape sequence... it's
    > possible, but it requires much more work... I just want to know if
    > someone else did this?!


    Actually, no, because regexps are greedy by default, so it'll go to the
    very last '#' it finds.
    The other solution you got is more elegant, though.. :)

    --
    Posted via http://www.ruby-forum.com/.
    Aldric Giacomoni, Nov 19, 2009
    #6
  7. Alexandre Mutel wrote:
    > Brian Candler wrote:
    >> def foo; end
    >> src = "foo\nfoo\nbar"
    >> src.each_with_index do |line,i|
    >> eval "#{line} {\n}", binding, "DSL", i+1
    >> end
    >>
    >> # Result:
    >> DSL:3: undefined method `bar' for main:Object (NoMethodError)

    >
    > Wooo, thanks Brian!


    Woop, i was to fast. In fact, i need an eval on the whole file, because
    my dsl language allow ruby code to be used (and so definition of
    methods... etc.)
    --
    Posted via http://www.ruby-forum.com/.
    Alexandre Mutel, Nov 19, 2009
    #7
  8. Aldric Giacomoni wrote:

    > Actually, no, because regexps are greedy by default, so it'll go to the
    > very last '#' it finds.
    > The other solution you got is more elegant, though.. :)

    hum, not sure the greedy is helping there:

    line = "line = \"\#{args}\""
    => "line = \"\#{args}\""
    string = line.match(/^.*#/).to_s[0...-1]
    => "line = "

    Expecting is : line = "#{args}"

    In order to strip comments using regexp, you need to handle string
    escape.


    --
    Posted via http://www.ruby-forum.com/.
    Alexandre Mutel, Nov 19, 2009
    #8
  9. Aldric Giacomoni wrote:
    > Alexandre Mutel wrote:
    >> Aldric Giacomoni wrote:
    >>> I'm still only learning regular expressions (I'll do another shameless
    >>> plug for rubular.com here), but you could do this:
    >>> string = string.match(/^.*#).to_s[0...-1]
    >>>
    >>> Yes, it's a poor solution, but should you have nothing else, it'll do.

    >>
    >> the problem with your solution is that this line of code will remove
    >> valid code :
    >> myvar_s = "#{myvar}"
    >>
    >> The problem is to handle correctly string escape sequence... it's
    >> possible, but it requires much more work... I just want to know if
    >> someone else did this?!

    >
    > Actually, no, because regexps are greedy by default, so it'll go to the
    > very last '#' it finds.


    file # => array containing each line of the file you want to clean up
    file.map! do |line|
    line =~ /(^.*)#/
    $1
    end
    --
    Posted via http://www.ruby-forum.com/.
    Aldric Giacomoni, Nov 19, 2009
    #9
  10. Alexandre Mutel wrote:

    > Expecting is : line = "#{args}"
    >
    > In order to strip comments using regexp, you need to handle string
    > escape.


    Ah.. What if the only '#' isn't a comment. Good point.
    --
    Posted via http://www.ruby-forum.com/.
    Aldric Giacomoni, Nov 19, 2009
    #10
  11. Aldric Giacomoni wrote:
    > Alexandre Mutel wrote:
    >> Short description : My question is : do you know any available method,
    >> giving the string of a Ruby line of code, to remove comments from this
    >> line of code?
    >>
    >> I would like to have something really simple and not being forced to use
    >> a full ruby language parser to parse those lines and remove the
    >> comments.
    >>
    >> Any idea?
    >>
    >> Thanks!

    >
    > I'm still only learning regular expressions (I'll do another shameless
    > plug for rubular.com here), but you could do this:
    > string = string.match(/^.*#).to_s[0...-1]
    >
    > Yes, it's a poor solution, but should you have nothing else, it'll do.


    It's not possible to do this reliably with regular experessions, because
    of the interaction of # with quoting constructs. You'll need a parser
    (Treetop can help make one).


    Best,
    --
    Marnen Laibow-Koser
    http://www.marnen.org

    --
    Posted via http://www.ruby-forum.com/.
    Marnen Laibow-Koser, Nov 19, 2009
    #11
  12. Alexandre Mutel wrote:
    > Woop, i was to fast. In fact, i need an eval on the whole file, because
    > my dsl language allow ruby code to be used (and so definition of
    > methods... etc.)


    Then it sounds like you just need to separate the blocks of code
    appropriately. Do you want each line which begins with \w: (a labelled
    line) to be treated specially? Then the rest of the code between the
    labelled lines can be treated as a single string.

    Proof-of-concept:

    src = <<EOS
    def foo
    puts "XXX"
    end
    label1: foo # this is a test
    def bar
    puts "YYY"
    end
    label2: bar
    EOS

    def label(name)
    puts "Executing label #{name} now..."
    yield
    end

    b = binding
    line = 1
    src.split(/^(\w+:.*)\n/).each do |chunk|
    if chunk =~ /(\w+):(.*)$/
    eval "label(#{$1.inspect}) { #{$2}\n }", b, "DSL", line
    line += 1
    else
    eval chunk, b, "DSL", line
    line += chunk.split("\n").size
    end
    end
    --
    Posted via http://www.ruby-forum.com/.
    Brian Candler, Nov 19, 2009
    #12
  13. Brian Candler wrote:
    > Then it sounds like you just need to separate the blocks of code
    > appropriately. Do you want each line which begins with \w: (a labelled
    > line) to be treated specially? Then the rest of the code between the
    > labelled lines can be treated as a single string.
    >
    > b = binding
    > line = 1
    > src.split(/^(\w+:.*)\n/).each do |chunk|
    > if chunk =~ /(\w+):(.*)$/
    > eval "label(#{$1.inspect}) { #{$2}\n }", b, "DSL", line
    > line += 1
    > else
    > eval chunk, b, "DSL", line
    > line += chunk.split("\n").size
    > end
    > end


    Damn, your solution was almost working well, but working on an external
    file, the "eval" loose the step in the code and i'm not able to go back
    to debug the dsl...
    ok, i'm going probably to forgot about this option for now... i'll see
    later on how to do it.

    Thanks again Brian.
    --
    Posted via http://www.ruby-forum.com/.
    Alexandre Mutel, Nov 19, 2009
    #13
  14. Alexandre Mutel wrote:
    > Damn, your solution was almost working well, but working on an external
    > file, the "eval" loose the step in the code


    What do you mean by "loose the step" - it's reporting the wrong line
    number? I hacked together that code very quickly, and I'm sure it's
    fixable. Here is a more verbose version that is more likely to have the
    correct line number.

    buf = nil
    buf_line = 0
    b = binding
    src.each_with_index do |line,i|
    if line =~ /^(\w+):(.*)\n/
    label, code = $1, $2
    if buf
    eval buf, b, "DSL", buf_line+1
    buf = nil
    end
    eval "label(#{label.inspect}) { #{code}\n}", b, "DSL", i+1
    else
    unless buf
    buf = ""
    buf_line = i
    end
    buf << line
    end
    end
    if buf
    eval buf, b, "DSL", buf_line+1
    buf = nil
    end
    --
    Posted via http://www.ruby-forum.com/.
    Brian Candler, Nov 19, 2009
    #14
  15. Brian Candler wrote:
    > Alexandre Mutel wrote:
    >> Damn, your solution was almost working well, but working on an external
    >> file, the "eval" loose the step in the code

    >
    > What do you mean by "loose the step" - it's reporting the wrong line
    > number? I hacked together that code very quickly, and I'm sure it's
    > fixable. Here is a more verbose version that is more likely to have the
    > correct line number.


    i mean that before the eval of a chunk, i'm still in the dsl code, but
    after i press "F8" hit, the debugger is going back to the line just
    after the eval (line += chunk.split("\n").size), although i didn't setup
    any breakpoint code there... it's weird, but then, I'm not able to come
    back and step in the dsl code (even if i put some breakpoints).
    I don't know if it's a bug or limitation on my debugger (i'm using
    RubyMine) or probably I'm missing something...



    --
    Posted via http://www.ruby-forum.com/.
    Alexandre Mutel, Nov 19, 2009
    #15
  16. Oh I see - eval doesn't work with a ruby debugger. I guess the debugger
    is assuming that the line number in the exception backtrace is an offset
    from the start of the eval string, which it isn't here.

    I did think of another and simpler solution for you though. When you
    insert a newline and close-brace, add a semicolon and not another
    newline. e.g.

    n: foo(bar) # comment
    nextline

    becomes:

    label:)n) { foo(bar) # comment
    }; nextline

    How would that be?
    --
    Posted via http://www.ruby-forum.com/.
    Brian Candler, Nov 19, 2009
    #16
  17. Brian Candler wrote:
    > n: foo(bar) # comment
    > nextline
    >
    > becomes:
    >
    > label:)n) { foo(bar) # comment
    > }; nextline
    >
    > How would that be?

    YES! it seems to work perfectly... the ; doesn't alter the line counting
    for the debugger. In fact, I tried this solution this morning without
    the semicolon... but yep, it's logical with semicolon now!

    Thanks very much Brian, this is helping me a lot.

    --
    Posted via http://www.ruby-forum.com/.
    Alexandre Mutel, Nov 19, 2009
    #17
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. KiwiBrian

    Strip all comments

    KiwiBrian, Jul 22, 2004, in forum: HTML
    Replies:
    6
    Views:
    657
    Toby Inkster
    Jul 23, 2004
  2. Monk
    Replies:
    10
    Views:
    1,430
    Michael Wojcik
    Apr 20, 2005
  3. Replies:
    19
    Views:
    773
    Esmond Pitt
    Nov 2, 2007
  4. Aquila
    Replies:
    35
    Views:
    437
    Mathieu Bouchard
    Mar 31, 2005
  5. yelipolok
    Replies:
    4
    Views:
    239
    John W. Krahn
    Jan 27, 2010
Loading...

Share This Page