question on regular expressions

D

Darren Dale

I'm stuck. I'm trying to make this:

file://C:%5Cfolder1%5Cfolder2%5Cmydoc1.pdf,file://C
%5Cfolderx%5Cfoldery%5Cmydoc2.pdf

(no linebreaks) look like this:

../mydoc1.pdf,./mydoc2.pdf

my regular expression abilities are dismal. I won't list all the
unsuccessful things I've tried, in a nutshell, the greedy operators are
messing me up, truncating the output to ./mydoc2.pdf. Could someone offer a
suggestion?

Thanks,
Darren
 
S

Sean Ross

Darren Dale said:
I'm stuck. I'm trying to make this:

file://C:%5Cfolder1%5Cfolder2%5Cmydoc1.pdf,file://C
%5Cfolderx%5Cfoldery%5Cmydoc2.pdf

(no linebreaks) look like this:

./mydoc1.pdf,./mydoc2.pdf

my regular expression abilities are dismal. I won't list all the
unsuccessful things I've tried, in a nutshell, the greedy operators are
messing me up, truncating the output to ./mydoc2.pdf. Could someone offer a
suggestion?

Thanks,
Darren

from os.path import basename
import urllib

url = 'file://C:%5Cfolder1%5Cfolder2%5Cmydoc1.pdf'
print './%s'%basename(urllib.url2pathname(url))

HTH,
Sean
 
M

Michael Fuhr

Darren Dale said:
I'm stuck. I'm trying to make this:

file://C:%5Cfolder1%5Cfolder2%5Cmydoc1.pdf,file://C
%5Cfolderx%5Cfoldery%5Cmydoc2.pdf

(no linebreaks) look like this:

./mydoc1.pdf,./mydoc2.pdf

my regular expression abilities are dismal.

This works for the example string you gave:

newstring = re.sub(r'[^,]*%5[Cc]', './', examplestring)

This replaces all instances of zero or more non-commas that are
followed by '%5C' or '%5c' with './'. Greediness causes the pattern
to replace everything up to the last '%5C' before a comma or the
end of the string.

Regular expressions aren't the only way to do what you want. Python
has standard modules for parsing URLs and file paths -- take a look
at urlparse, urllib/urllib2, and os.path.
 
D

Darren Dale

Michael said:
Darren Dale said:
I'm stuck. I'm trying to make this:

file://C:%5Cfolder1%5Cfolder2%5Cmydoc1.pdf,file://C
%5Cfolderx%5Cfoldery%5Cmydoc2.pdf

(no linebreaks) look like this:

./mydoc1.pdf,./mydoc2.pdf

my regular expression abilities are dismal.

This works for the example string you gave:

newstring = re.sub(r'[^,]*%5[Cc]', './', examplestring)

This replaces all instances of zero or more non-commas that are
followed by '%5C' or '%5c' with './'. Greediness causes the pattern
to replace everything up to the last '%5C' before a comma or the
end of the string.

Regular expressions aren't the only way to do what you want. Python
has standard modules for parsing URLs and file paths -- take a look
at urlparse, urllib/urllib2, and os.path.

Thanks to both of you. I thought re's were appropriate because the string I
gave is buried in an xml file. A more representative example is:

[...snip...]<url>file://C:%5Cfolder1%5Cfolder2%5Cmydoc1.pdf</url>[...snip...
data]<url>file://C%5Cfolderx%5Cfoldery%5Cmydoc2.pdf</url>[...snip...]
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,770
Messages
2,569,586
Members
45,084
Latest member
HansGeorgi

Latest Threads

Top