how to find not the next sibling but the 2nd sibling or find sibling "a" OR sinbling "b"

Discussion in 'Python' started by localpricemaps@gmail.com, Jan 18, 2006.

  1. Guest

    i have some html which looks like this where i want to scrape out the
    href stuff (the www.cnn.com part)

    <div class="noFood">Cheese</div>
    <div class="food">Blue</div>
    <a class="btn" href = "http://www.cnn.com">


    so i wrote this code which scrapes it perfectly:

    for incident in row('div', {'class':'noFood'}):
    b = incident.findNextSibling('div', {'class': 'food'})
    print b
    n = b.findNextSibling('a', {'class': 'btn'})
    print n
    link = n['href'] + "','"

    problem is that sometimes the 2nd tag , the <div class="food"> tag , is
    sometimes called food, sometimes called drink. so sometimes it looks
    like this:

    <div class="noFood">Cheese</div>
    <div class="drink">Pepsi</div>
    <a class="btn" href = "http://www.cnn.com">

    how do i alter my script to take into account the fact that i will
    sometimes have food and sometimes have drink as the class name? is
    there a way to say "look for food or drink" or a way to say "look for
    this incident and then find not the next sibling but the 2nd next
    sibling" if that makes any sense?

    thanks
     
    , Jan 18, 2006
    #1
    1. Advertising

  2. Kent Johnson Guest

    Re: how to find not the next sibling but the 2nd sibling or findsibling "a" OR sinbling "b"

    wrote:
    > i have some html which looks like this where i want to scrape out the
    > href stuff (the www.cnn.com part)
    >
    > <div class="noFood">Cheese</div>
    > <div class="food">Blue</div>
    > <a class="btn" href = "http://www.cnn.com">
    >
    >
    > so i wrote this code which scrapes it perfectly:
    >
    > for incident in row('div', {'class':'noFood'}):
    > b = incident.findNextSibling('div', {'class': 'food'})
    > print b
    > n = b.findNextSibling('a', {'class': 'btn'})
    > print n
    > link = n['href'] + "','"
    >
    > problem is that sometimes the 2nd tag , the <div class="food"> tag , is
    > sometimes called food, sometimes called drink.


    Apparently you are using Beautiful Soup. The value in the attribute
    dictionary can be a callable; try this:

    def isFoodOrDrink(attr):
    return attr in ['food', 'drink']

    b = incident.findNextSibling('div', {'class': isFoodOrDrink})

    Alternately you could omit the class spec and check for it in code.

    Kent
     
    Kent Johnson, Jan 19, 2006
    #2
    1. Advertising

  3. Guest

    i actually realized there are 3 potentials for class names. either
    food or drink or dessert. so my question is whether or not i can alter
    your function to look like this?

    def isFoodOrDrinkOrDesert(attr):
    return attr in ['food', 'drink', 'desert']


    thanks in advance for the help

    Kent Johnson wrote:
    > wrote:
    > > i have some html which looks like this where i want to scrape out the
    > > href stuff (the www.cnn.com part)
    > >
    > > <div class="noFood">Cheese</div>
    > > <div class="food">Blue</div>
    > > <a class="btn" href = "http://www.cnn.com">
    > >
    > >
    > > so i wrote this code which scrapes it perfectly:
    > >
    > > for incident in row('div', {'class':'noFood'}):
    > > b = incident.findNextSibling('div', {'class': 'food'})
    > > print b
    > > n = b.findNextSibling('a', {'class': 'btn'})
    > > print n
    > > link = n['href'] + "','"
    > >
    > > problem is that sometimes the 2nd tag , the <div class="food"> tag , is
    > > sometimes called food, sometimes called drink.

    >
    > Apparently you are using Beautiful Soup. The value in the attribute
    > dictionary can be a callable; try this:
    >
    > def isFoodOrDrink(attr):
    > return attr in ['food', 'drink']
    >
    > b = incident.findNextSibling('div', {'class': isFoodOrDrink})
    >
    > Alternately you could omit the class spec and check for it in code.
    >
    > Kent
     
    , Jan 19, 2006
    #3
  4. Re: how to find not the next sibling but the 2nd sibling orfindsibling "a" OR sinbling "b"

    wrote:

    > i actually realized there are 3 potentials for class names. either
    > food or drink or dessert. so my question is whether or not i can alter
    > your function to look like this?
    >
    > def isFoodOrDrinkOrDesert(attr):
    > return attr in ['food', 'drink', 'desert']


    what happens when you try that ?

    </F>
     
    Fredrik Lundh, Jan 19, 2006
    #4
  5. Kent Johnson Guest

    Re: how to find not the next sibling but the 2nd sibling or findsibling "a" OR sinbling "b"

    wrote:
    > i actually realized there are 3 potentials for class names. either
    > food or drink or dessert. so my question is whether or not i can alter
    > your function to look like this?
    >
    > def isFoodOrDrinkOrDesert(attr):
    > return attr in ['food', 'drink', 'desert']


    Check the spelling of 'dessert' and give it a try.

    Kent
     
    Kent Johnson, Jan 19, 2006
    #5
  6. Guest

    ok i found something that works. instead of using the def i did this:

    for incident in row('div', {'class': 'food' or 'drink' }):

    and it worked!

    only thing is that i think i am messing up the logic and here is why

    So when i run my script i get results, meaning it scrapes some stuff
    out,
    but then i get errors and where i am told:

    TypeError: unsupported operand type(s) for +: 'NullType' and 'str'


    Is this because of the logic in my code? i mean what i want the script
    to
    do is look for the <Tr> tag and then find the first div tag named food
    or
    drink, find its sibling named food, drink or dessert and then find the
    button tag which is the following sibling and THEN scrape out the href.
    is
    it possible that after it finds that first div and looks for the next
    sibling and then the next siblings href that it then tries to run the
    same
    process all over again starting with the 2nd div tag and being that it
    can't
    find the another div tag after the 2nd div tag that it trips up?

    know what i mean?


    here is my code:

    for row in bs('tr'):
    for incident in row('div', {'class': 'food' or 'drink'}):
    b = incident.findNextSibling('div', {'class': 'food' or 'drink'
    or
    'dessert'})
    n = b.findNextSibling('a', {'class': 'btn'})
    link= n['href'] + "','"
     
    , Jan 19, 2006
    #6
  7. Re: how to find not the next sibling but the 2nd sibling orfindsibling "a" OR sinbling "b"

    wrote:

    > ok i found something that works. instead of using the def i did this:
    >
    > for incident in row('div', {'class': 'food' or 'drink' }):
    >
    > and it worked!


    'food' or 'drink' doesn't do what you think it does:

    >>> 'food' or 'drink'

    'food'

    >>> {'class': 'food' or 'drink'}

    {'class': 'food'}

    </F>
     
    Fredrik Lundh, Jan 19, 2006
    #7
  8. Guest

    Re: how to find not the next sibling but the 2nd sibling or findsibling "a" OR sinbling "b"

    hey fredrik,

    i don't understand what you are saying

    Fredrik Lundh wrote:
    > wrote:
    >
    > > ok i found something that works. instead of using the def i did this:
    > >
    > > for incident in row('div', {'class': 'food' or 'drink' }):
    > >
    > > and it worked!

    >
    > 'food' or 'drink' doesn't do what you think it does:
    >
    > >>> 'food' or 'drink'

    > 'food'
    >
    > >>> {'class': 'food' or 'drink'}

    > {'class': 'food'}
    >
    > </F>
     
    , Jan 19, 2006
    #8
  9. Re: how to find not the next sibling but the 2nd sibling or findsibling "a" OR sinbling "b"

    wrote:
    > hey fredrik,
    >
    > i don't understand what you are saying


    Do what he showed in the Python interactive shell,

    > Fredrik Lundh wrote:
    > > 'food' or 'drink' doesn't do what you think it does:
    > >
    > > >>> 'food' or 'drink'

    > > 'food'
    > >
    > > >>> {'class': 'food' or 'drink'}

    > > {'class': 'food'}


    "or" returns the first true element, anything but False or None, I
    think... so 'food' (a string) is true, and always will return in that
    code.

    http://diveintopython.org/power_of_introspection/and_or.html

    Brett
     
    Brett Hoerner, Jan 20, 2006
    #9
  10. Steve Holden Guest

    Re: how to find not the next sibling but the 2nd sibling orfindsibling "a" OR sinbling "b"

    Brett Hoerner wrote:
    > wrote:

    [...]
    > "or" returns the first true element, anything but False or None, I
    > think... so 'food' (a string) is true, and always will return in that
    > code.


    Just in case newbies are reading: in Python several different values are
    considered false in the context of an "if" statement. These include

    False # Boolean False
    0 # The integer zero
    0.0 # Floating-point zero
    (0+0j) # Complex zero
    None # The None object
    [] # The empty list
    () # The empty tuple
    {} # The empty dictionary

    This is mainly to allow the convenience of writing

    if thing:
    ...

    However, one has to be careful that code that needs to treat None
    differently from [] uses explicit testing such as

    if thing is None:
    ...

    You can also construct your own classes so their instances evaluate to
    True or False according to your needs.

    regards
    Steve
    --
    Steve Holden +44 150 684 7255 +1 800 494 3119
    Holden Web LLC www.holdenweb.com
    PyCon TX 2006 www.python.org/pycon/
     
    Steve Holden, Jan 20, 2006
    #10
  11. Guest

    Re: how to find not the next sibling but the 2nd sibling or findsibling "a" OR sinbling "b"

    well actually all i want it to do is find the first thing that shows up
    whether its class:food or class: drink so that works for me. only
    thing is that after it finds class:food i think it runs through the
    html again and finds the following class:drink and being that there is
    not class tag after that class: drink tag it fails.

    Fredrik Lundh wrote:
    > wrote:
    >
    > > ok i found something that works. instead of using the def i did this:
    > >
    > > for incident in row('div', {'class': 'food' or 'drink' }):
    > >
    > > and it worked!

    >
    > 'food' or 'drink' doesn't do what you think it does:
    >
    > >>> 'food' or 'drink'

    > 'food'
    >
    > >>> {'class': 'food' or 'drink'}

    > {'class': 'food'}
    >
    > </F>
     
    , Jan 23, 2006
    #11
  12. Re: how to find not the next sibling but the 2nd siblingorfindsibling "a" OR sinbling "b"

    wrote:

    > well actually all i want it to do is find the first thing that shows up
    > whether its class:food or class: drink so that works for me.


    what makes you think that looking "food" only will find either "food"
    or "drink" ?

    </F>
     
    Fredrik Lundh, Jan 23, 2006
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Philipp Lenssen

    Select next sibling XPath?

    Philipp Lenssen, Nov 25, 2003, in forum: XML
    Replies:
    1
    Views:
    3,764
    Dimitre Novatchev
    Nov 25, 2003
  2. Michael K?nig
    Replies:
    2
    Views:
    19,648
    David Carlisle
    Apr 22, 2005
  3. Replies:
    1
    Views:
    2,392
    Bjoern Hoehrmann
    Dec 6, 2005
  4. Deniz Bahar
    Replies:
    2
    Views:
    475
    Andrey Tarasevich
    Mar 9, 2005
  5. jman
    Replies:
    2
    Views:
    121
    Martin Honnen
    Dec 12, 2008
Loading...

Share This Page