regular expressions, unicode and XML

P

ProvoWallis

Hi,

I'm hoping someone can help me. I'm hopelessly lost.

I'm trying to make a change in some XML files using a regular
expression (re.sub). I can capture the text I want to replace OK but
when I replace it end up with nothing: i.e., just a "" character in my
file.

data = re.sub(r'(?i)(?u)<title><emph typestyle=\"bf\">Sample
Title</emph></title><para indent=\"none\" runin=\"1\"><emph
typestyle=\"bf\">\—(.*?):</emph>', '<title><icon
name="graphic"/> <emph typestyle="bf">Sample
Title—\1:</emph></title><para indent="none" runin="1">', data)

I think my problem is that I don't understand unicode or even know how
my XML is encoded b/c there is nothing in the XML declaration at the
top of the file.

I'd be grateful if someone could give a little adive or point me in the
right direction. I've read abunch of stuff on the board but nothing
seems to click.I'm guessing I have to decode my file when I read it
something like this

raw = inputFile.read()
fileencoding = "utf-8"
data = raw.decode(fileencoding)

and then write it out similarly but this doesn't seem to work.

Any help appreciated,

Greg
 
J

Justin Ezequiel

import codecs

f = codecs.open(pth, 'r', 'utf-8')
data = f.read()
f.close()

## data = re.sub ...

f = codecs.open(pth, 'w', 'utf-8')
f.write(data)
f.close()
 
P

ProvoWallis

Thanks for this but I'm still getting an "empty" character (I don't
know what else to call it) rather than the text captured by my regular
expression in my replaced text.

I even added the utf encoding declaration to my input data but still no
luck.

Any suggestions?
 
J

Justin Ezequiel

when I replace it end up with nothing: i.e., just a "" character in my
how are you viewing the contents of your file?
are you printing it out to stdout?
are you opening your file in a non-unicode aware editor?
try print repr(data) after re.sub so that you see what you actually
have in data

btw, from where did you get you XML files?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,045
Latest member
DRCM

Latest Threads

Top