how can I extract all urls in a string by using re.findall() ?

Thread starter could ildg
Start date Apr 7, 2005

could ildg

Apr 7, 2005

I want to retrieve all urls in a string. When I use re.fiandall, I get
a list of tuples.
My code is like below:

Code:

url=unicode(r"((http|ftp)://)?(((([\d]+\.)+){3}[\d]+(/[\w./]+)?)|([a-z]\w*((\.\w+)+){2,})([/][\w.~]*)*)")
m=re.findall(url,html)
for i in m:
   print i

html is a variable of string type which contains many urls in it.
the code will print many tuples, and each tuple seems not to represent
a url. e.g, one of them is as below:

(u'http://', u'http', u'image.zhongsou.com/image/netchina.gif', u'',
u'', u'', u'', u'image.zhongsou.com', u'.com', u'.com',
u'/netchina.gif')

Why is there two "http" in it? and why are there so many ampty strings
in the tupe above? It's obviously not a url. How can I get the urls
correctly?

Thanks in advance.

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

How to extract all values except the last value in a string separated by comma in sql	2	Jun 15, 2023
How to loop in folder through all excel files and all sheets using pandas?	0	Dec 1, 2022
FAQ 9.5 How do I extract URLs?	0	Feb 18, 2011
How can I hide a div using an event listener on multiple checkboxes?	6	Dec 23, 2022
Hi, I am a webflow user. I am looking for CSS code that can KEEP ALL ELEMENTS POSITIONED in the SAME spot across all resolutions	0	Oct 27, 2023
How can I fix my pattern coding error in c++	0	Mar 19, 2023
Issue with passing fetched data to POST form. How can I?	0	Jul 23, 2023
My Status, Ciphertext	2	Nov 28, 2023

Facebook Twitter Reddit Pinterest Tumblr WhatsApp Email Link

Members online

Total: 29 (members: 2, guests: 27)
Robots: 386

Forum statistics

Threads: 473,755

Messages: 2,569,536

Members: 45,020

Latest member: GenesisGai

Latest Threads

What steps are the key steps involved in designing a product?
- Started by remotedevelopers
- Wednesday at 9:32 AM
How to fetch and console.log all items from an associative array
- Started by John Keets
- Wednesday at 6:44 AM
The *Best Python Cheat Sheet
- Started by kmh
- Apr 9, 2024
Programming Blog
- Started by WhiteCube
- Apr 7, 2024
Why getting 404 errors?
- Started by IBMJunkman
- Apr 7, 2024
Wonky image crisis
- Started by OBXKH
- Apr 5, 2024
JS exercise ''find the mistake'' not understood well
- Started by Chris
- Apr 4, 2024
What is the difference between console.log(x); AND console.log(`${x} : ${typeof x}`);
- Started by Chris
- Apr 4, 2024
How does Mobile App Development company approach user experience (UX)?
- Started by remotedevelopers
- Apr 3, 2024
Media query issue
- Started by OBXKH
- Apr 2, 2024

Top