How to strip ruby comments in a ruby line of code?

A

Alexandre Mutel

Short description : My question is : do you know any available method,
giving the string of a Ruby line of code, to remove comments from this
line of code?

________________

Long description :

For my dsl project, i'm loading my dsl files and applying a small
preprocess on each line before performing a global instance_eval on the
preprocessed file.

Basically, in my dsl language, it is possible to put a label followed by
a ":" starting at the beginning of a line like this:
my_label: here_is_a_dsl(arg1, arg2)

This label may be followed by a dsl instruction.

The preprocessor is transforming the previous line to this line:

newLabel:)my_label) { here_is_a_dsl(arg1, arg2) }

using the following code:
append = ""
File.open(file).each do |line|
match = line.match(/^([a-zA-Z_]\w+):[\s\r\n]+(.*)/)
if ( match.nil?)
append += line
else
append += "newLabel:)#{match[1]}) { #{match[2]} }\n"
end
end

The problem arise when there is a comment at the end of the input line :
my_label: here_is_a_dsl(arg1, arg2) # my comments

It's then generating the following line:
newLabel:)my_label) { here_is_a_dsl(arg1, arg2) # my comments }

Meanning that the "}" end block is commented and having a parse error on
the whole file.

I could put a newline after the match like this :
newLabel:)my_label) { here_is_a_dsl(arg1, arg2) # my comments
}

Unfornutately, i'm no longer able to debug my dsl language, because the
lines are not matching the preprocessed line.

-----

I would like to have something really simple and not being forced to use
a full ruby language parser to parse those lines and remove the
comments.

Any idea?

Thanks!
 
A

Aldric Giacomoni

Alexandre said:
Short description : My question is : do you know any available method,
giving the string of a Ruby line of code, to remove comments from this
line of code?

I would like to have something really simple and not being forced to use
a full ruby language parser to parse those lines and remove the
comments.

Any idea?

Thanks!

I'm still only learning regular expressions (I'll do another shameless
plug for rubular.com here), but you could do this:
string = string.match(/^.*#).to_s[0...-1]

Yes, it's a poor solution, but should you have nothing else, it'll do.
 
B

Brian Candler

Alexandre said:
I could put a newline after the match like this :
newLabel:)my_label) { here_is_a_dsl(arg1, arg2) # my comments
}

Unfornutately, i'm no longer able to debug my dsl language, because the
lines are not matching the preprocessed line.

However if you eval each line individually, then you can pass in the
source line number.

def foo; end
src = "foo\nfoo\nbar"
src.each_with_index do |line,i|
eval "#{line} {\n}", binding, "DSL", i+1
end

# Result:
DSL:3: undefined method `bar' for main:Object (NoMethodError)

Otherwise, if every input line maps to exactly two output lines, you can
just patch up the line number in the exception by dividing by two.

src = "foo\nfoo\nbar\n"
begin
eval src.gsub(/\n/, "{\n}\n"), binding, "DSL", 1
rescue => e
if e.backtrace.first =~ /\A(.*):(\d+)\z/
e.backtrace.first.replace "#{$1}:#{($2.to_i+1) / 2}"
end
raise e
end
 
A

Alexandre Mutel

Aldric said:
I'm still only learning regular expressions (I'll do another shameless
plug for rubular.com here), but you could do this:
string = string.match(/^.*#).to_s[0...-1]

Yes, it's a poor solution, but should you have nothing else, it'll do.

the problem with your solution is that this line of code will remove
valid code :
myvar_s = "#{myvar}"

The problem is to handle correctly string escape sequence... it's
possible, but it requires much more work... I just want to know if
someone else did this?!
 
A

Alexandre Mutel

Brian said:
def foo; end
src = "foo\nfoo\nbar"
src.each_with_index do |line,i|
eval "#{line} {\n}", binding, "DSL", i+1
end

# Result:
DSL:3: undefined method `bar' for main:Object (NoMethodError)

Wooo, thanks Brian!
 
A

Aldric Giacomoni

Alexandre said:
Aldric said:
I'm still only learning regular expressions (I'll do another shameless
plug for rubular.com here), but you could do this:
string = string.match(/^.*#).to_s[0...-1]

Yes, it's a poor solution, but should you have nothing else, it'll do.

the problem with your solution is that this line of code will remove
valid code :
myvar_s = "#{myvar}"

The problem is to handle correctly string escape sequence... it's
possible, but it requires much more work... I just want to know if
someone else did this?!

Actually, no, because regexps are greedy by default, so it'll go to the
very last '#' it finds.
The other solution you got is more elegant, though.. :)
 
A

Alexandre Mutel

Alexandre said:
Wooo, thanks Brian!

Woop, i was to fast. In fact, i need an eval on the whole file, because
my dsl language allow ruby code to be used (and so definition of
methods... etc.)
 
A

Alexandre Mutel

Aldric said:
Actually, no, because regexps are greedy by default, so it'll go to the
very last '#' it finds.
The other solution you got is more elegant, though.. :)
hum, not sure the greedy is helping there:

line = "line = \"\#{args}\""
=> "line = \"\#{args}\""
string = line.match(/^.*#/).to_s[0...-1]
=> "line = "

Expecting is : line = "#{args}"

In order to strip comments using regexp, you need to handle string
escape.
 
A

Aldric Giacomoni

Aldric said:
Alexandre said:
Aldric said:
I'm still only learning regular expressions (I'll do another shameless
plug for rubular.com here), but you could do this:
string = string.match(/^.*#).to_s[0...-1]

Yes, it's a poor solution, but should you have nothing else, it'll do.

the problem with your solution is that this line of code will remove
valid code :
myvar_s = "#{myvar}"

The problem is to handle correctly string escape sequence... it's
possible, but it requires much more work... I just want to know if
someone else did this?!

Actually, no, because regexps are greedy by default, so it'll go to the
very last '#' it finds.

file # => array containing each line of the file you want to clean up
file.map! do |line|
line =~ /(^.*)#/
$1
end
 
A

Aldric Giacomoni

Alexandre said:
Expecting is : line = "#{args}"

In order to strip comments using regexp, you need to handle string
escape.

Ah.. What if the only '#' isn't a comment. Good point.
 
M

Marnen Laibow-Koser

Aldric said:
Alexandre said:
Short description : My question is : do you know any available method,
giving the string of a Ruby line of code, to remove comments from this
line of code?

I would like to have something really simple and not being forced to use
a full ruby language parser to parse those lines and remove the
comments.

Any idea?

Thanks!

I'm still only learning regular expressions (I'll do another shameless
plug for rubular.com here), but you could do this:
string = string.match(/^.*#).to_s[0...-1]

Yes, it's a poor solution, but should you have nothing else, it'll do.

It's not possible to do this reliably with regular experessions, because
of the interaction of # with quoting constructs. You'll need a parser
(Treetop can help make one).


Best,
 
B

Brian Candler

Alexandre said:
Woop, i was to fast. In fact, i need an eval on the whole file, because
my dsl language allow ruby code to be used (and so definition of
methods... etc.)

Then it sounds like you just need to separate the blocks of code
appropriately. Do you want each line which begins with \w: (a labelled
line) to be treated specially? Then the rest of the code between the
labelled lines can be treated as a single string.

Proof-of-concept:

src = <<EOS
def foo
puts "XXX"
end
label1: foo # this is a test
def bar
puts "YYY"
end
label2: bar
EOS

def label(name)
puts "Executing label #{name} now..."
yield
end

b = binding
line = 1
src.split(/^(\w+:.*)\n/).each do |chunk|
if chunk =~ /(\w+):(.*)$/
eval "label(#{$1.inspect}) { #{$2}\n }", b, "DSL", line
line += 1
else
eval chunk, b, "DSL", line
line += chunk.split("\n").size
end
end
 
A

Alexandre Mutel

Brian said:
Then it sounds like you just need to separate the blocks of code
appropriately. Do you want each line which begins with \w: (a labelled
line) to be treated specially? Then the rest of the code between the
labelled lines can be treated as a single string.

b = binding
line = 1
src.split(/^(\w+:.*)\n/).each do |chunk|
if chunk =~ /(\w+):(.*)$/
eval "label(#{$1.inspect}) { #{$2}\n }", b, "DSL", line
line += 1
else
eval chunk, b, "DSL", line
line += chunk.split("\n").size
end
end

Damn, your solution was almost working well, but working on an external
file, the "eval" loose the step in the code and i'm not able to go back
to debug the dsl...
ok, i'm going probably to forgot about this option for now... i'll see
later on how to do it.

Thanks again Brian.
 
B

Brian Candler

Alexandre said:
Damn, your solution was almost working well, but working on an external
file, the "eval" loose the step in the code

What do you mean by "loose the step" - it's reporting the wrong line
number? I hacked together that code very quickly, and I'm sure it's
fixable. Here is a more verbose version that is more likely to have the
correct line number.

buf = nil
buf_line = 0
b = binding
src.each_with_index do |line,i|
if line =~ /^(\w+):(.*)\n/
label, code = $1, $2
if buf
eval buf, b, "DSL", buf_line+1
buf = nil
end
eval "label(#{label.inspect}) { #{code}\n}", b, "DSL", i+1
else
unless buf
buf = ""
buf_line = i
end
buf << line
end
end
if buf
eval buf, b, "DSL", buf_line+1
buf = nil
end
 
A

Alexandre Mutel

Brian said:
What do you mean by "loose the step" - it's reporting the wrong line
number? I hacked together that code very quickly, and I'm sure it's
fixable. Here is a more verbose version that is more likely to have the
correct line number.

i mean that before the eval of a chunk, i'm still in the dsl code, but
after i press "F8" hit, the debugger is going back to the line just
after the eval (line += chunk.split("\n").size), although i didn't setup
any breakpoint code there... it's weird, but then, I'm not able to come
back and step in the dsl code (even if i put some breakpoints).
I don't know if it's a bug or limitation on my debugger (i'm using
RubyMine) or probably I'm missing something...
 
B

Brian Candler

Oh I see - eval doesn't work with a ruby debugger. I guess the debugger
is assuming that the line number in the exception backtrace is an offset
from the start of the eval string, which it isn't here.

I did think of another and simpler solution for you though. When you
insert a newline and close-brace, add a semicolon and not another
newline. e.g.

n: foo(bar) # comment
nextline

becomes:

label:)n) { foo(bar) # comment
}; nextline

How would that be?
 
A

Alexandre Mutel

Brian said:
n: foo(bar) # comment
nextline

becomes:

label:)n) { foo(bar) # comment
}; nextline

How would that be?
YES! it seems to work perfectly... the ; doesn't alter the line counting
for the debugger. In fact, I tried this solution this morning without
the semicolon... but yep, it's logical with semicolon now!

Thanks very much Brian, this is helping me a lot.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top