regexp help sought

rpardee · Feb 24, 2005

Hey All,

I'm trying to parse lines from my text editor's config file, which look
like this (pls watch for line wrap--there is one line per language,
starting with /L<<digit>>):

/L1"SAS" Line Comment = * Block Comment On = /* Block Comment Off = */
Block Comment On Alt = * Block Comment Off Alt = ; Nocase File
Extensions = SAS
/L2"Visual Basic" Line Comment = ' File Extensions = BAS FRM CLS VBS
CTL WSF
/L4"HTML" Nocase Noquote HTML_LANG Block Comment On =  Block Comment On Alt = <% Block Comment Off Alt = %>
String Chars = "' File Extensions = HTM HTML ASP SHTML HTT HTX JSP
/L11"Ruby" Line Comment Num = 2# Block Comment On = =begin Block
Comment Off = =end String Chars='" Escape Char = \ File Extensions = RB
RBW

I'm trying to write a method for extracting the comment markers & their
types (line/block & on/off). Regexps seemed the obvious tool, and I
eventually came up with this one:

c = Regexp.new("(Line|Block) Comment (On |Off |On Alt |Off Alt)*=
([^\s\t\r\n\f]+) ")

This is working well so far, except that it only grabs out the first
type of comment in each line. I'd hoped that I could make it get all
the comment types by putting an additional set of parens and a +
quantifier around the whole expression:

c = Regexp.new("((Line|Block) Comment (On |Off |On Alt |Off Alt)*=
([^\s\t\r\n\f]+))+ ")

But that just seems to break it--that version doesn't capture anything.

Anybody got a clue for me? I'm using v1.8 on windows. My code is
below. (And again, pls watch for line wrapping).

Thanks!

-Roy

def parse_comment_markers(line)
=begin

There are line comments & (2 different kinds of) block comments.

Line comments only have a start marker--EOL is the terminator.

Comment types are:
Line Comment = <>
Block Comment On = <>
Block Comment Off = <>
Block Comment On Alt = <>
Block Comment Off Alt = <>

Where <> can be any contiguous set of non-whitespace chars.

For Line comment marks, preceding digits specify the # of spaces
minus 1
required after the nondigit portion of the marker. So for ruby, the
line
comment mark is 2#, signifying that # is a comment only if it is
followed by
a space. Ignore this for now.

So--funky regexp time. We want to grab sequences centered around
the string " Comment ".
We want the single word prior to "Comment" and all words between
"Comment" and " = ", and
then of course the contiguous nonwhitespace following " = ".

=end
puts line
# Why doesn't the \S char class work?
# c = Regexp.new("(Line|Block) Comment (On |Off |On Alt |Off Alt)*=
(\S+)")
c = Regexp.new("(Line|Block) Comment (On |Off |On Alt |Off Alt)*=
([^\s\t\r\n\f]+) ")
cm = c.match(line)
if cm.nil?
puts "No match!"
else
puts cm.captures.join(" || ")
puts "Comment type is \"" + cm.captures[0] + "\", and comment
marker is \"" + cm.captures[2] + "\""
end
end

parse_comment_markers("/L2 \"Ruby\" Line Comment = # Block Comment On =
' File Extensions = RB RBW")
parse_comment_markers("/L2 \"Ruby\" Block Comment On = =begin Block
Comment Off = =end File Extensions = RB RBW")

Robert Klemme · Feb 24, 2005

Hey All,

I'm trying to parse lines from my text editor's config file, which look
like this (pls watch for line wrap--there is one line per language,
starting with /L<<digit>>):

/L1"SAS" Line Comment = * Block Comment On = /* Block Comment Off = */
Block Comment On Alt = * Block Comment Off Alt = ; Nocase File
Extensions = SAS
/L2"Visual Basic" Line Comment = ' File Extensions = BAS FRM CLS VBS
CTL WSF
/L4"HTML" Nocase Noquote HTML_LANG Block Comment On =  Block Comment On Alt = <% Block Comment Off Alt = %>
String Chars = "' File Extensions = HTM HTML ASP SHTML HTT HTX JSP
/L11"Ruby" Line Comment Num = 2# Block Comment On = =begin Block
Comment Off = =end String Chars='" Escape Char = \ File Extensions = RB
RBW

I'm trying to write a method for extracting the comment markers & their
types (line/block & on/off). Regexps seemed the obvious tool, and I
eventually came up with this one:

c = Regexp.new("(Line|Block) Comment (On |Off |On Alt |Off Alt)*=
([^\s\t\r\n\f]+) ")

This is working well so far, except that it only grabs out the first
type of comment in each line. I'd hoped that I could make it get all
the comment types by putting an additional set of parens and a +
quantifier around the whole expression:

c = Regexp.new("((Line|Block) Comment (On |Off |On Alt |Off Alt)*=
([^\s\t\r\n\f]+))+ ")

But that just seems to break it--that version doesn't capture anything.

Anybody got a clue for me? I'm using v1.8 on windows. My code is
below. (And again, pls watch for line wrapping).

You want String#scan

matches = line.scan(re)

or

line.scan(re) do |match|
....
end

Kind regards

robert

rpardee · Feb 25, 2005

Awesome--that's exactly what I needed. And much more readable than
elaborating the regexp.

Thanks!

-Roy

Why is this WordPress comments form not submitting?	1	Jan 12, 2020
Help with code	0	Jun 12, 2022
Help with my responsive home page	2	Dec 14, 2022
A regexp?	2	Mar 22, 2010
Dumb mistake in 11-line script; any insights?	2	Jan 25, 2012
read and write stock prices to file using arrays	6	Apr 9, 2010
I get syntax errors from RDParser example I found on internet	2	Nov 28, 2007
Regexp-engine: ruby vs. perl	1	Jul 6, 2009

regexp help sought

rpardee

Robert Klemme

rpardee

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads