A little complex usage of Beautiful Soup Parsing Help!


S

SAKTHEESH

I am using Beautiful Soup to parse a html to find all text that is Not
contained inside any anchor elements

I came up with this code which finds all links within href but not the
other way around.

How can I modify this code to get only plain text using Beautiful
Soup, so that I can do some find and replace and modify the soup?

for a in soup.findAll('a',href=True):
print a['href']


Example:

<html><body>
<div> <a href="www.test1.com/identify">test1</a> </div>
<div><br></div>
<div><a href="www.test2.com/identify">test2</a></div>
<div><br></div><div><br></div>
<div>
This should be identified

Identify me 1

Identify me 2
<p id="firstpara" align="center"> This paragraph should be<b>
identified </b>.</p>
</div>
</body></html>

Output:

This should be identified
Identify me 1
Identify me 2
This paragraph should be identified.

I am doing this operation to find text not within `<a></a>` : then
find "Identify" and do replace operation with "Replaced"

So the final output will be like this:

<html><body>
<div> <a href="www.test1.com/identify">test1</a> </div>
<div><br></div>
<div><a href="www.test2.com/identify">test2</a></div>
<div><br></div><div><br></div>
<div>
This should be identified

Repalced me 1

Replaced me 2
<p id="firstpara" align="center"> This paragraph should be<b>
identified </b>.</p>
</div>
</body></html>

Thanks for your time and help !
 
Ad

Advertisements

T

Thomas 'PointedEars' Lahn

SAKTHEESH said:
I am using Beautiful Soup to parse a html to find all text that is Not
contained inside any anchor elements

I came up with this code which finds all links within href

_anchors_ _with_ `href' _attribute_ (commonly: links.)
but not the other way around.

What would that be anyway?
How can I modify this code to get only plain text using Beautiful
Soup, so that I can do some find and replace and modify the soup?

RTFM:
<http://www.crummy.com/software/BeautifulSoup/documentation.html#contents>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Top