(R) character in RegEXP

T

Tsu-na-mi

Hi,

I am having trouble getting a simple regexp to recognize the
registered trademark symbol (R) when it is read from XML. The XML
uses ® for the symbol, and if I print the string after parsing,
it prints correctly. However, the regexp:

$string =~ s/(R)/somethingelse/g;

does not recognize the (R) symbol. NOTE: (R) is the single-ASCII
character. I also tried using \x{AE} which did not work either. The
regular TM symbol doesn't work either, and seems to throw everything
into unicode mode, screwing up other stuff like the bullet and
copyright symbols.

So my question is, If I have XML like :

<P>This is my Widget®</P>

And read it into a string with XML::parser, how should I address this
character (and any char > 256 if you know).

For the record, I am using Perl 5.8.3 on Red Hat 9.0. Thanks for any
help anyone can provide.
 
A

Alan J. Flavell

I am having trouble getting a simple regexp to recognize the
registered trademark symbol (R) when it is read from XML. The XML
uses ® for the symbol,

But what are you giving to Perl - the character itself, or that
numerical character reference?
However, the regexp:

$string =~ s/(R)/somethingelse/g;

does not recognize the (R) symbol.

You said the XML contains ®

How do you expect Perl to know what it's intended to mean?

Or were you getting that as a result from the XML parser? (sorry if
your description didn't seem clear enough).
NOTE: (R) is the single-ASCII character.

Ahem. The ASCII code does not contain this character. ASCII is a
7-bit code, which is at the basis of numerous 8-bit codings.
Presumably you're thinking in terms of iso-8859-1, or maybe even
Windows-1252, as the 8-bit coding.
I also tried using \x{AE} which did not work either. The
regular TM symbol doesn't work either, and seems to throw everything
into unicode mode, screwing up other stuff like the bullet and
copyright symbols.
So my question is,

Not so fast! Let's see some Perl code first. Preferably a
manageable-sized snippet that's complete enough to run for ourselves
and that demonstrates the problem you're experiencing. You're
obviously in a quagmire, and folks would like to help you, but if you
keep yelling and struggling, then you're liable to just get deeper;
stay calm, take it step by step, show us your working...
If I have XML like :

<P>This is my Widget®</P>

And read it into a string with XML::parser, how should I address this
character (and any char > 256 if you know).

With respect, my advice would have to be that you -do- need to take a
while out with perldoc perluniintro and (maybe) perlunicode, to get a
start on Perl's handling of unicode. Just getting a prescription
handed out isn't going to be a great deal of help - one needs to
understand this sufficiently for it to make sense, rather than just
applying magic incantations.
For the record, I am using Perl 5.8.3 on Red Hat 9.0. Thanks for any
help anyone can provide.

Looks OK. But I think you need to (a) show a bit more of your working
and (b) understand the difference between Perl's legacy 8-bit handling
and its Unicode character model.

So, let's see a bit more of your code, if we're to help with the
code. But your part of the bargain would be (no offence intended) to
get a bit more up to speed with the underlying principles.

hope this helps.
 
J

Jim Cochrane

Hi,

I am having trouble getting a simple regexp to recognize the
registered trademark symbol (R) when it is read from XML. The XML
uses ® for the symbol, and if I print the string after parsing,
it prints correctly. However, the regexp:

$string =~ s/(R)/somethingelse/g;

If this is literally what is in your code, you're using the (...) grouping
construct. If you want to literally match '(R)', you need to escape the
parens: s/\(R\)/...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,770
Messages
2,569,584
Members
45,075
Latest member
MakersCBDBloodSupport

Latest Threads

Top