In search of a Regexp character class for "weird command characters"

Nuralanur · Sep 28, 2005

-------------------------------1127918773
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit

Hello,

I have a regexp search problem.
I have written a text-correction program in Ruby which
reads in a text file and marks every word that's not
in a dictionary array red in an RTF output file
(still a plain text file, from Ruby's viewpoint) and saves that
file (i.e., it is still a text, with some commands specific
to Rich Text Format).
For instance, in a text, I have a citation "(Fox , 1970)."
Now, "(Fox" is not a correct English word, so it should
be red and bold, the comma is all right, so it stays black, and
"1970)." is not a correct English word, either, so it
should be red and bold, also.
In RTF, you can achieve this by replacing

a="(Fox , 1970)."

by

b=" \cf1\b (Fox \cf0\b0 , \cf1\b 1970). \cf0\b0 ".

Now, if you say

p b

Ruby will give the following output

" \0061\010 (Fox \0060\010 1970). \0060\0100 "

However, I would like to remove all the characters of the form '\' + number
from the RTF file in a next step.
Is there a character class for Regexps (like \w,\S etc.) that achieves
this?
I have learned so far that '\010' is one character, and not the same as
'backslash' +
three digits.

Best regards,

Axel

-------------------------------1127918773--

David A. Black · Sep 28, 2005

Hi --

Hello,

I have a regexp search problem.
I have written a text-correction program in Ruby which
reads in a text file and marks every word that's not
in a dictionary array red in an RTF output file
(still a plain text file, from Ruby's viewpoint) and saves that
file (i.e., it is still a text, with some commands specific
to Rich Text Format).
For instance, in a text, I have a citation "(Fox , 1970)."
Now, "(Fox" is not a correct English word, so it should
be red and bold, the comma is all right, so it stays black, and
"1970)." is not a correct English word, either, so it
should be red and bold, also.
In RTF, you can achieve this by replacing

a="(Fox , 1970)."

by

b=" \cf1\b (Fox \cf0\b0 , \cf1\b 1970). \cf0\b0 ".

Now, if you say

p b

Ruby will give the following output

However, I would like to remove all the characters of the form '\' + number
from the RTF file in a next step.
Is there a character class for Regexps (like \w,\S etc.) that achieves
this?

Not a predefined one, but you can do:

b.gsub!(/[006-010]/,"")

which leaves you with:

" 1 (Fox 00 , 1 1970). 00 "

if you're sure that's what you want.

David

howto write rtf directly?	3	Jun 9, 2005
Qt4 : disappearing non-English characters	0	Dec 9, 2009
search for occurance of character in certain position in a string	3	May 21, 2008
How do I search for a constant in a class in a bunch of Jars?	8	Oct 8, 2009
Counting pairs of newline characters in a string	2	Apr 15, 2007
[SUMMARY] Word Search Generator (#159)	3	Apr 18, 2008
Search & Replace in MS Word Puzzle	2	Dec 9, 2006
In search of a kind of converse to sub/gsub	1	Sep 26, 2005

In search of a Regexp character class for "weird command characters"

Nuralanur

David A. Black

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads