problem with regex

D

dimmaim

i want to find a specific urls from a txt file but i have some issus. Firstwhen i take just two lines from the file with copy paste and assign it to a variable like this and it works only with triple quotes

test='''_*_n.jpg","timelineCoverPhoto":"{\"focus\":{\"x\":0.5,\"y\":0.386925795053},\"photo\":{\"__type__\":{\"name\":\"Photo\"},\"image_lowres\":{\"uri\":\"https://fbcdn-photos-f-a.akamaihd.net/*-*-*/*_*_*_a.jpg\",\"width\":180,\"height\":179}}}","subscribeStatus":"IS_SUBSCRIBED","smallPictureUrl":"https://fbcdn-profile-a.akamaihd.net/*-*-*/s100x100/*_*_*_s.jpg","contactId":"*==","contactType":"USER","friendshipStatus":"ARE_FRIENDS","graphApiWriteId":"contact_*:*:*","hugePictureUrl":"https://fbcdn-profile-a.akamaihd.net/hprofile-ak-frc3/*_*_*_n.jpg","profileFbid":"1284503586","isMobilePushable":"NO","lookupKey":null,"name":{"displayName":"* *","firstName":"*","lastName":"*"},"nameSearchTokens":["*","*"],"phones":[],"phoneticName":{"displayName":null,"firstName":null,"lastName":null},"isMemorialized":false,"communicationRank":1.1144714,"canViewerSendGift":false,"canMessage":true}
*=={"bigPictureUrl":"https://fbcdn-profile-a.akamaihd.net/hprofile-ak-ash3/*.*.*.*/s200x200/*_*_*_n.jpg","timelineCoverPhoto":"{\"focus\":{\"x\":0..5,\"y\":0.49137931034483},\"photo\":{\"__type__\":{\"name\":\"Photo\"},\"image_lowres\":{\"uri\":\"https://fbcdn-photos-h-a.akamaihd.net/*-*-*/*_*_*_a.jpg\",\"width\":180,\"height\":135}}}","subscribeStatus":"IS_SUBSCRIBED","smallPictureUrl":"https://fbcdn-profile-a.akamaihd.net/*-*-*/*.*.*.*/s100x100/*_*_*_a.jpg","contactId":"*==","contactType":"USER","friendshipStatus":"ARE_FRIENDS","graphApiWriteId":"contact_*:*:*","hugePictureUrl":"https://fbcdn-profile-a.akamaihd.net/hprofile-ak-ash3/c0.0.540.540/*_*_*_n.jpg","profileFbid":"*","isMobilePushable":"YES","lookupKey":null,"name":{"displayName":"* *","firstName":"*","lastName":"*"},"nameSearchTokens":["*","*"],"phones":[],"phoneticName":{"displayName":null,"firstName":null,"lastName":null},"isMemorialized":false,"communicationRank":1.2158813,"canViewerSendGift":false,"canMessage":true}'''

uri = re.findall(r'''uri\":\"https://fbcdn-(a-z|photos)?([^\'" >]+)''',test)
print uri

it works fine and i have my result [('photos', '-f-a.akamaihd.net/*-*-*/*_*_*_a.jpg'), ('photos', '-h-a.akamaihd.net/*-*-*/*_*_*_a.jpg')]

but if a take those lines and save it into a txt file like the original is without the quotes and do the following

datafile=open('a.txt','r')
data_array=''
for line in datafile:
data_array=data_array+line

uri = re.findall(r'''uri\":\"https://fbcdn-(a-z|photos)?([^\'" >]+)''',data_array)

after printing uri it gives an empty list,. what to do to make it work for the lines of a txt file
 
R

Roy Smith

i want to find a specific urls from a txt file but i have some issus. First
when i take just two lines from the file with copy paste and assign it to a
variable like this and it works only with triple quotes

test='''<long string elided>''' [...]
but if a take those lines and save it into a txt file like the original is
without the quotes [it doesn't work]

I suspect this has nothing to do with regular expressions, but it's just
about string management.

The first thing you want to do is verify that the text you are reading
in from the file is the same as the text you have in triple quotes. So,
write a program like this:

test='''<long string elided>'''

datafile=open('a.txt','r')
data_array=''
for line in datafile:
data_array=data_array+line

print test == data_array

If that prints True, then you've got the same text in both cases (and
you can go on to looking for other problems). I suspect it will print
False, though. So, now your task is to figure out where those two
strings differ. Maybe something like:

for c1, c2 in zip(test, data_array):
print c1 == c2, repr(c1), repr(c2)

and look for the first place they're not the same. Hopefully that will
give you a clue what's going wrong.
 
D

Dave Angel

i want to find a specific urls from a txt file but i have some issus. First when i take just two lines from the file with copy paste and assign it to a variable like this and it works only with triple quotes

test='''_*_n.jpg","timelineCoverPhoto":"{\"focus\":{\"x\":0.5,\"y\":0.386925795053},\"photo\":{\"__type__\":{\"name\":\"Photo\"},\"image_lowres\":{\"uri\":\"https://fbcdn-photos-f-a.akamaihd.net/*-*-*

Why did you start a second thread with similar content two minutes
after the first? Do you expect us to compare the two messages and
figure out what you changed, or were you just impatient for a
response? I only check in here about 6 times a day, and I
imagine some might be even less often.

Your test string literal has lots of backslashes in it, which get
interpreted into escape sequences in a literal, but not in a
file. If that's really what the file looks like, you're going
to want to use a raw string. I agree with Roy, you're probably
not getting the same string the two ways.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,535
Members
45,007
Latest member
obedient dusk

Latest Threads

Top