BeautiflSoup -- getting all the attributes of a tag?

7

7stud

You can treat a tag like a dictionary to obtain a specific attribute:

import BeautifulSoup as bs

html = "<div x='a' y='b' z='c'>hello</div>"

doc = bs.BeautifulSoup(html)
div = doc.find("div")
print div
print div["x"]

--output:--
a

But you can't iterate over a tag to get all the attributes:

import BeautifulSoup as bs

html = "<div x='a' y='b' z='c'>hello</div>"

doc = bs.BeautifulSoup(html)
div = doc.find("div")

for key in div:
print key, div[key]

--output:--
hello
Traceback (most recent call last):
File "test1.py", line 9, in ?
print key, div[key]
File "/Library/Frameworks/Python.framework/Versions/2.4/lib/
python2.4/site-packages/BeautifulSoup.py", line 430, in __getitem__
return self._getAttrMap()[key]
KeyError: u'hello'

How can you get all the attributes when you don't know the attribute
names ahead of time?
 
7

7stud

You can treat a tag like a dictionary to obtain a specific attribute:

import BeautifulSoup as bs

html = "<div x='a' y='b' z='c'>hello</div>"

doc = bs.BeautifulSoup(html)
div = doc.find("div")
print div
print div["x"]

--output:--
a

But you can't iterate over a tag to get all the attributes:

import BeautifulSoup as bs

html = "<div x='a' y='b' z='c'>hello</div>"

doc = bs.BeautifulSoup(html)
div = doc.find("div")

for key in div:
    print key, div[key]

--output:--
hello
Traceback (most recent call last):
  File "test1.py", line 9, in ?
    print key, div[key]
  File "/Library/Frameworks/Python.framework/Versions/2.4/lib/
python2.4/site-packages/BeautifulSoup.py", line 430, in __getitem__
    return self._getAttrMap()[key]
KeyError: u'hello'

How can you get all the attributes when you don't know the attribute
names ahead of time?

I figured it out:


import BeautifulSoup as bs

html = "<div x='a' y='b' z='c'>hello</div>"

doc = bs.BeautifulSoup(html)
div = doc.find("div")

for attr, val in div.attrs:
print "%s:%s" % (attr, val)

--output:--
x:a
y:b
z:c
 
P

Paul McGuire

You can treat a tag like a dictionary to obtain a specific attribute:
import BeautifulSoup as bs
html = "<div x='a' y='b' z='c'>hello</div>"
doc = bs.BeautifulSoup(html)
div = doc.find("div")
print div
print div["x"]
--output:--
a

But you can't iterate over a tag to get all the attributes:
import BeautifulSoup as bs
html = "<div x='a' y='b' z='c'>hello</div>"
doc = bs.BeautifulSoup(html)
div = doc.find("div")
for key in div:
    print key, div[key]
--output:--
hello
Traceback (most recent call last):
  File "test1.py", line 9, in ?
    print key, div[key]
  File "/Library/Frameworks/Python.framework/Versions/2.4/lib/
python2.4/site-packages/BeautifulSoup.py", line 430, in __getitem__
    return self._getAttrMap()[key]
KeyError: u'hello'
How can you get all the attributes when you don't know the attribute
names ahead of time?

I figured it out:

import BeautifulSoup as bs

html = "<div x='a' y='b' z='c'>hello</div>"

doc = bs.BeautifulSoup(html)
div = doc.find("div")

for attr, val in div.attrs:
    print "%s:%s" % (attr, val)

--output:--
x:a
y:b
z:c- Hide quoted text -

Just for another datapoint, here's how it looks with pyparsing.
-- Paul

from pyparsing import makeHTMLTags,SkipTo

html = """<div x="a" y="b" z="c">hello</div>"""

# HTML tags match case-insensitive'ly
divStart,divEnd = makeHTMLTags("DIV")
divTag = divStart + SkipTo(divEnd)("body") + divEnd

for div in divTag.searchString(html):
print div.dump()
print
# dict-like access to results
for k in div.keys():
print k,div[k]
# object.attribute access to results
print div.body
print div.x
print div.y
print

Prints:
['DIV', ['x', 'a'], ['y', 'b'], ['z', 'c'], False, 'hello', '</DIV>']
- body: hello
- empty: False
- endDiv: </DIV>
- startDiv: ['DIV', ['x', 'a'], ['y', 'b'], ['z', 'c'], False]
- empty: False
- x: a
- y: b
- z: c
- x: a
- y: b
- z: c

body hello
endDiv </DIV>
y b
x a
z c
startDiv ['DIV', ['x', 'a'], ['y', 'b'], ['z', 'c'], False]
empty False
hello
a
b
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,012
Latest member
RoxanneDzm

Latest Threads

Top