Using Beautiful Soup to entangle bookmarks.html

A

Anthra Norell

Hi,
I'm trying to use the Beautiful Soup package to parse through the
"bookmarks.html" file which Firefox exports all your bookmarks into.
've been struggling with the documentation trying to figure out how to
extract all the urls. Has anybody got a couple of longer examples using
Beautiful Soup I could play around with?

Thanks,
Martin.


Martin,

SE is a stream editor that does not introduce the overhead and complications of overkill parsing. See if it suits your needs:
http://cheeseshop.python.org/pypi/SE/2.2 beta
<EAT> # delete all unmatched input
"~(?i)<a.*?href.*?>~==\n" # keep hrefs and add a new line
"~(?i)[^>]+/a>~==\n\n" # keep text till end of anchor and add two newlines
| # run
parameter '' commands string output. Default is a file.
....

"http://www.inksupply.com/index.cfm?source=html/main2.html" ADD_DATE="1016024829" LAST_VISIT="1039439802" LAST_CHARSET="ISO-8859-1"
MIS Associates Inc.

"http://www.weink.com/" ADD_DATE="1016034183" LAST_VISIT="1118782455" LAST_CHARSET="windows-1252"
Inkjet, Laser, Copier, Fax Supplies

"http://www.nextrend.com/analysis/content/pr_9-19-2000.asp" ADD_DATE="1018037196" LAST_VISIT="1126289805" LAST_CHARSET="ISO-8859-1"
NexTrend - Press Releases

"http://wp.netscape.com/escapes/search/netsearch_E.html" ADD_DATE="1021644432" LAST_VISIT="1023182857" LAST_CHARSET="ISO-8859-1"
Net Search Page - Google

"http://www.python.org/" ADD_DATE="1021651575" LAST_VISIT="1121690494" LAST_CHARSET="ISO-8859-1"
Python Language Website

"http://www.teldir.com/real/frame.asp?page=http://www.whitepages.ch" ADD_DATE="1027354641" LAST_VISIT="1115386846"
LAST_CHARSET="windows-1252"
http://www.teldir.com/real/frame.asp?page=http://www.whitepages.ch

.... etc.


You may refine this further by adding more deletions or substitutions. Adding them one by one and examining the output each time
around is very easy and straightforward. The SE object accepts strings as well as file names and then returns strings by default, so
developing interactively in an IDLE window using a sample data string is extremely fast and painless, because it is possible to
develop incrementally, one step at a time.

Regards

Frederic
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,743
Messages
2,569,478
Members
44,899
Latest member
RodneyMcAu

Latest Threads

Top