question on regular expressions

Darren Dale · Dec 3, 2004

I'm stuck. I'm trying to make this:

file://C:%5Cfolder1%5Cfolder2%5Cmydoc1.pdf,file://C
%5Cfolderx%5Cfoldery%5Cmydoc2.pdf

(no linebreaks) look like this:

../mydoc1.pdf,./mydoc2.pdf

my regular expression abilities are dismal. I won't list all the
unsuccessful things I've tried, in a nutshell, the greedy operators are
messing me up, truncating the output to ./mydoc2.pdf. Could someone offer a
suggestion?

Thanks,
Darren

Sean Ross · Dec 3, 2004

Darren Dale said:
I'm stuck. I'm trying to make this:

file://C:%5Cfolder1%5Cfolder2%5Cmydoc1.pdf,file://C
%5Cfolderx%5Cfoldery%5Cmydoc2.pdf

(no linebreaks) look like this:

./mydoc1.pdf,./mydoc2.pdf

my regular expression abilities are dismal. I won't list all the
unsuccessful things I've tried, in a nutshell, the greedy operators are
messing me up, truncating the output to ./mydoc2.pdf. Could someone offer a
suggestion?

Thanks,
Darren

from os.path import basename
import urllib

url = 'file://C:%5Cfolder1%5Cfolder2%5Cmydoc1.pdf'
print './%s'%basename(urllib.url2pathname(url))

HTH,
Sean

Michael Fuhr · Dec 3, 2004

Darren Dale said:
I'm stuck. I'm trying to make this:

file://C:%5Cfolder1%5Cfolder2%5Cmydoc1.pdf,file://C
%5Cfolderx%5Cfoldery%5Cmydoc2.pdf

(no linebreaks) look like this:

./mydoc1.pdf,./mydoc2.pdf

my regular expression abilities are dismal.

This works for the example string you gave:

newstring = re.sub(r'[^,]*%5[Cc]', './', examplestring)

This replaces all instances of zero or more non-commas that are
followed by '%5C' or '%5c' with './'. Greediness causes the pattern
to replace everything up to the last '%5C' before a comma or the
end of the string.

Regular expressions aren't the only way to do what you want. Python
has standard modules for parsing URLs and file paths -- take a look
at urlparse, urllib/urllib2, and os.path.

Darren Dale · Dec 3, 2004

Michael said:
Darren Dale said:

I'm stuck. I'm trying to make this:

file://C:%5Cfolder1%5Cfolder2%5Cmydoc1.pdf,file://C
%5Cfolderx%5Cfoldery%5Cmydoc2.pdf

(no linebreaks) look like this:

./mydoc1.pdf,./mydoc2.pdf

my regular expression abilities are dismal.

Click to expand...

This works for the example string you gave:

newstring = re.sub(r'[^,]*%5[Cc]', './', examplestring)

This replaces all instances of zero or more non-commas that are
followed by '%5C' or '%5c' with './'. Greediness causes the pattern
to replace everything up to the last '%5C' before a comma or the
end of the string.

Regular expressions aren't the only way to do what you want. Python
has standard modules for parsing URLs and file paths -- take a look
at urlparse, urllib/urllib2, and os.path.

Thanks to both of you. I thought re's were appropriate because the string I
gave is buried in an xml file. A more representative example is:

[...snip...]<url>file://C:%5Cfolder1%5Cfolder2%5Cmydoc1.pdf</url>[...snip...
data]<url>file://C%5Cfolderx%5Cfoldery%5Cmydoc2.pdf</url>[...snip...]

Python Regular Expressions	4	Jun 22, 2011
Regular Expressions	4	Jun 17, 2008
Parsing Log records with regular expressions	2	Feb 3, 2011
Regular expressions, capture repeated groups	4	Jul 8, 2010
Python regular expressions just ain't PCRE	13	May 5, 2007
REGULAR EXPRESSIONS - identify specific entries as INVALID	5	Jun 20, 2008
regexp(ing) Backus-Naurish expressions ...	7	Mar 13, 2013
Basic Regular Expressions question...	15	Apr 6, 2005

question on regular expressions

Darren Dale

Sean Ross

Michael Fuhr

Darren Dale

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads