In search of a Regexp character class for "weird command characters"

N

Nuralanur

-------------------------------1127918773
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit

Hello,

I have a regexp search problem.
I have written a text-correction program in Ruby which
reads in a text file and marks every word that's not
in a dictionary array red in an RTF output file
(still a plain text file, from Ruby's viewpoint) and saves that
file (i.e., it is still a text, with some commands specific
to Rich Text Format).
For instance, in a text, I have a citation "(Fox , 1970)."
Now, "(Fox" is not a correct English word, so it should
be red and bold, the comma is all right, so it stays black, and
"1970)." is not a correct English word, either, so it
should be red and bold, also.
In RTF, you can achieve this by replacing

a="(Fox , 1970)."

by

b=" \cf1\b (Fox \cf0\b0 , \cf1\b 1970). \cf0\b0 ".

Now, if you say

p b

Ruby will give the following output
" \0061\010 (Fox \0060\010 1970). \0060\0100 "

However, I would like to remove all the characters of the form '\' + number
from the RTF file in a next step.
Is there a character class for Regexps (like \w,\S etc.) that achieves
this?
I have learned so far that '\010' is one character, and not the same as
'backslash' +
three digits.

Best regards,

Axel


-------------------------------1127918773--
 
D

David A. Black

Hi --

Hello,

I have a regexp search problem.
I have written a text-correction program in Ruby which
reads in a text file and marks every word that's not
in a dictionary array red in an RTF output file
(still a plain text file, from Ruby's viewpoint) and saves that
file (i.e., it is still a text, with some commands specific
to Rich Text Format).
For instance, in a text, I have a citation "(Fox , 1970)."
Now, "(Fox" is not a correct English word, so it should
be red and bold, the comma is all right, so it stays black, and
"1970)." is not a correct English word, either, so it
should be red and bold, also.
In RTF, you can achieve this by replacing

a="(Fox , 1970)."

by

b=" \cf1\b (Fox \cf0\b0 , \cf1\b 1970). \cf0\b0 ".

Now, if you say

p b

Ruby will give the following output


However, I would like to remove all the characters of the form '\' + number
from the RTF file in a next step.
Is there a character class for Regexps (like \w,\S etc.) that achieves
this?

Not a predefined one, but you can do:

b.gsub!(/[006-010]/,"")

which leaves you with:

" 1 (Fox 00 , 1 1970). 00 "

if you're sure that's what you want.


David
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,780
Messages
2,569,608
Members
45,241
Latest member
Lisa1997

Latest Threads

Top