Beautiful Soup Question: Filtering Images based on their width and height attributes

Discussion in 'Python' started by PicURLPy, Nov 30, 2006.

  1. PicURLPy

    PicURLPy Guest

    Hello,

    I want to extract some image links from different html pages, in
    particular i want extract those image tags which height values are
    greater than 200. Is there an elegant way in BeautifulSoup to do this?
    PicURLPy, Nov 30, 2006
    #1
    1. Advertising

  2. PicURLPy

    Chris Mellon Guest

    Re: Beautiful Soup Question: Filtering Images based on their widthand height attributes

    On 30 Nov 2006 12:43:45 -0800, PicURLPy <> wrote:
    > Hello,
    >
    > I want to extract some image links from different html pages, in
    > particular i want extract those image tags which height values are
    > greater than 200. Is there an elegant way in BeautifulSoup to do this?
    >


    Most image tags "in the wild" don't have height attributes, you have
    to download the image to see what size it is.

    > --
    > http://mail.python.org/mailman/listinfo/python-list
    >
    Chris Mellon, Nov 30, 2006
    #2
    1. Advertising

  3. Re: Beautiful Soup Question: Filtering Images based on their widthand height attributes

    Chris Mellon wrote:

    >> I want to extract some image links from different html pages, in
    >> particular i want extract those image tags which height values are
    >> greater than 200. Is there an elegant way in BeautifulSoup to do this?

    >
    > Most image tags "in the wild" don't have height attributes, you have
    > to download the image to see what size it is.


    or at least a small portion of it; see the example at the bottom of this
    page for one way to get the size without downloading more than 1k or so:

    http://effbot.org/zone/pil-image-size.htm

    </F>
    Fredrik Lundh, Dec 1, 2006
    #3
  4. PicURLPy

    David Coffin Guest

    Re: Beautiful Soup Question: Filtering Images based on their widthand height attributes


    > Hello,
    >
    > I want to extract some image links from different html pages, in
    > particular i want extract those image tags which height values are
    > greater than 200. Is there an elegant way in BeautifulSoup to do this?


    Yes.

    soup.findAll(lambda tag: tag.name=="img" and tag.has_key("height")
    and int(tag["height"]) > 200)
    David Coffin, Dec 4, 2006
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    2
    Views:
    528
    Enigma Curry
    Mar 11, 2006
  2. Tempo

    Using Beautiful Soup

    Tempo, Aug 19, 2006, in forum: Python
    Replies:
    1
    Views:
    533
    Jorge Godoy
    Aug 19, 2006
  3. cjl
    Replies:
    2
    Views:
    453
    Paul McGuire
    Apr 20, 2007
  4. Tess
    Replies:
    5
    Views:
    442
    Stefan Behnel
    Mar 25, 2008
  5. killsto
    Replies:
    1
    Views:
    1,215
    Chris Rebert
    Dec 1, 2008
Loading...

Share This Page