help with lists and writing to file in correct order

Discussion in 'Python' started by homepricemaps@gmail.com, Dec 26, 2005.

  1. Guest

    hey folks,

    have a logic question for you. appreciate the help in advance.

    i am scraping 3 pieces of information from the html namely the food
    name , store name and price. and i am doing this for many different
    food items found ni the html including pizza, burgers, fries etc. what
    i want is to write out to a text file in the following order:

    pizza, pizza hut, 3.00
    burgers, burger king, 4.00
    noodles, panda inn, 2.00

    html is below. does anyone have good recommendation for how to setup
    the code in such a manner where it writes to the text file in th order
    listed previously? any attempt i have made seems to write to the file
    like this

    noodles, panda inn, 3
    noodles, panda inn, 4
    noodles, panda inn, 2


    HTML
    <tr class="base"><td class="tall"><a name="D0L1" "href="his/food"
    target="_blank">

    <td class="desc"><h2 id="foodName">pizza</h2>

    <div class="store"><a name="D0L3" "href="/xPopups/nojs"
    target="_blank"><b>pizza hutt</b></a></div>

    <td class="price">3.00</td>
    <tr>
    , Dec 26, 2005
    #1
    1. Advertising

  2. On Mon, 26 Dec 2005 13:54:37 -0800, homepricemaps wrote:

    > hey folks,
    >
    > have a logic question for you. appreciate the help in advance.
    >
    > i am scraping 3 pieces of information from the html namely the food
    > name , store name and price. and i am doing this for many different
    > food items found ni the html including pizza, burgers, fries etc. what
    > i want is to write out to a text file in the following order:
    >
    > pizza, pizza hut, 3.00
    > burgers, burger king, 4.00
    > noodles, panda inn, 2.00
    >
    > html is below. does anyone have good recommendation for how to setup
    > the code in such a manner where it writes to the text file in th order
    > listed previously? any attempt i have made seems to write to the file
    > like this
    >
    > noodles, panda inn, 3
    > noodles, panda inn, 4
    > noodles, panda inn, 2


    Instead of posting the HTML, how about if you post your code? Unless we
    see your code, how do you expect us to find the bug in it?



    --
    Steven.
    Steven D'Aprano, Dec 27, 2005
    #2
    1. Advertising

  3. Guest

    sorry guys, here is the code

    for incident in bs('a', {'class' : 'price'}):
    price = ""
    for oText in incident.fetchText( oRE):
    price += oText.strip() + "','"

    for incident in bs('div', {'class' : 'store'}):
    store = ""
    for oText in incident.fetchText( oRE):
    store += oText.strip() + "','"

    for incident in bs('h2', {'id' : 'food'}):
    food = ""
    for oText in incident.fetchText( oRE):
    food += oText.strip() + "','"
    , Dec 27, 2005
    #3
  4. On Mon, 26 Dec 2005 17:44:43 -0800, homepricemaps wrote:

    > sorry guys, here is the code
    >
    > for incident in bs('a', {'class' : 'price'}):
    > price = ""
    > for oText in incident.fetchText( oRE):
    > price += oText.strip() + "','"
    >
    > for incident in bs('div', {'class' : 'store'}):
    > store = ""
    > for oText in incident.fetchText( oRE):
    > store += oText.strip() + "','"
    >
    > for incident in bs('h2', {'id' : 'food'}):
    > food = ""
    > for oText in incident.fetchText( oRE):
    > food += oText.strip() + "','"



    This is hardly all your code -- where is the part where you actually
    *write* something to the file? The problem is you are writing the same
    store and food to the file over and over again. After you have collected
    one line of store/food, you must write it to the file immediately, or at
    least save it in a list so you can write the lot at the end.


    --
    Steven.
    Steven D'Aprano, Dec 27, 2005
    #4
  5. Guest

    here is the write part:

    out = open("test.txt", 'a')
    out.write (store+ food+ price + "\n")
    out.close()


    Steven D'Aprano wrote:
    > On Mon, 26 Dec 2005 17:44:43 -0800, homepricemaps wrote:
    >
    > > sorry guys, here is the code
    > >
    > > for incident in bs('a', {'class' : 'price'}):
    > > price = ""
    > > for oText in incident.fetchText( oRE):
    > > price += oText.strip() + "','"
    > >
    > > for incident in bs('div', {'class' : 'store'}):
    > > store = ""
    > > for oText in incident.fetchText( oRE):
    > > store += oText.strip() + "','"
    > >
    > > for incident in bs('h2', {'id' : 'food'}):
    > > food = ""
    > > for oText in incident.fetchText( oRE):
    > > food += oText.strip() + "','"

    >
    >
    > This is hardly all your code -- where is the part where you actually
    > *write* something to the file? The problem is you are writing the same
    > store and food to the file over and over again. After you have collected
    > one line of store/food, you must write it to the file immediately, or at
    > least save it in a list so you can write the lot at the end.
    >
    >
    > --
    > Steven.
    , Dec 27, 2005
    #5
  6. Guest

    the problem with writing to teh file immidiately is that it ends up
    writing all food items together, and then all store items and then all
    prices

    i want

    food, store, price
    food, store, price
    , Dec 27, 2005
    #6
  7. wrote:
    > the problem with writing to teh file immidiately is that it ends up
    > writing all food items together, and then all store items and then all
    > prices
    >
    > i want
    >
    > food, store, price
    > food, store, price
    >

    Well, if it all fits in memory, append each to its own list, and then
    either finally if you can or periodically if you must:

    for food, store, price in zip(foods, stores, prices):
    <do some writing.>

    --
    -Scott David Daniels
    Scott David Daniels, Dec 27, 2005
    #7
  8. Guest

    sorry for asking such beginner questions but i tried this and nothing
    wrote to my text file

    for food, price, store in bs(food, price, store):
    out = open("test.txt", 'a')
    out.write (food + price + store)
    out.close()


    while if i write the following without the for i at least get
    something?
    out = open("test.txt", 'a')
    out.write (food + price + store)
    out.close()


    Scott David Daniels wrote:
    > wrote:
    > > the problem with writing to teh file immidiately is that it ends up
    > > writing all food items together, and then all store items and then all
    > > prices
    > >
    > > i want
    > >
    > > food, store, price
    > > food, store, price
    > >

    > Well, if it all fits in memory, append each to its own list, and then
    > either finally if you can or periodically if you must:
    >
    > for food, store, price in zip(foods, stores, prices):
    > <do some writing.>
    >
    > --
    > -Scott David Daniels
    >
    , Dec 27, 2005
    #8
  9. Guest

    wrote:
    > sorry for asking such beginner questions but i tried this and nothing
    > wrote to my text file
    >
    > for food, price, store in bs(food, price, store):
    > out = open("test.txt", 'a')
    > out.write (food + price + store)
    > out.close()
    >
    >
    > while if i write the following without the for i at least get
    > something?
    > out = open("test.txt", 'a')
    > out.write (food + price + store)
    > out.close()
    >

    pull the open() and close() call out of the loop. And use some other
    name for the variables as they are very confusing and could be error
    prone to.
    , Dec 27, 2005
    #9
  10. On Mon, 26 Dec 2005 20:56:17 -0800, homepricemaps wrote:

    > sorry for asking such beginner questions but i tried this and nothing
    > wrote to my text file
    >
    > for food, price, store in bs(food, price, store):
    > out = open("test.txt", 'a')
    > out.write (food + price + store)
    > out.close()


    What are the contents of food, price and store? If "nothing wrote to my
    text file", chances are all three of them are the empty string.


    > while if i write the following without the for i at least get
    > something?
    > out = open("test.txt", 'a')
    > out.write (food + price + store)
    > out.close()


    You get "something". That's not much help. But I predict that what you are
    getting is the contents of food price and store, at least one of which are
    not empty.

    You need to encapsulate your code by separating the part of the code that
    reads the html file from the part that writes the text file. I suggest
    something like this:


    def read_html_data(name_of_file):
    # I don't know BeautifulSoup, so you will have to fix this...
    datafile = BeautifulSoup(name_of_file)
    # somehow read in the foods, prices and stores
    # for each set of three, store them in a tuple (food, store, price)
    # then store the tuples in a list
    # something vaguely like this:
    data = []
    while 1:
    food = datafile.get("food") # or whatever
    store = datafile.get("store")
    price = datafile.get("price")
    data.append( (food,store,price) )
    datafile.close()
    return data

    def write_data_to_text(datalist, name_of_file):
    # Expects a list of tuples (food,store,price). Writes that list
    # to name_of_file separated by newlines.
    fp = file(name_of_file, "w")
    for triplet in datalist:
    fp.write("Food = %s, store = %s, price = %s\n" % triplet
    fp.close()


    Hope this helps.



    --
    Steven.
    Steven D'Aprano, Dec 27, 2005
    #10
  11. Why don't you use pickle instead of directly writing to the file yourself?
    Siraj Kutlusan, Dec 27, 2005
    #11
  12. Guest

    hey steven-your examlpe was very helpful. is there a paragraph symbolg
    missing in

    fp.write("Food = %s, store = %s, price = %s\n" % triplet


    Steven D'Aprano wrote:
    > On Mon, 26 Dec 2005 20:56:17 -0800, homepricemaps wrote:
    >
    > > sorry for asking such beginner questions but i tried this and nothing
    > > wrote to my text file
    > >
    > > for food, price, store in bs(food, price, store):
    > > out = open("test.txt", 'a')
    > > out.write (food + price + store)
    > > out.close()

    >
    > What are the contents of food, price and store? If "nothing wrote to my
    > text file", chances are all three of them are the empty string.
    >
    >
    > > while if i write the following without the for i at least get
    > > something?
    > > out = open("test.txt", 'a')
    > > out.write (food + price + store)
    > > out.close()

    >
    > You get "something". That's not much help. But I predict that what you are
    > getting is the contents of food price and store, at least one of which are
    > not empty.
    >
    > You need to encapsulate your code by separating the part of the code that
    > reads the html file from the part that writes the text file. I suggest
    > something like this:
    >
    >
    > def read_html_data(name_of_file):
    > # I don't know BeautifulSoup, so you will have to fix this...
    > datafile = BeautifulSoup(name_of_file)
    > # somehow read in the foods, prices and stores
    > # for each set of three, store them in a tuple (food, store, price)
    > # then store the tuples in a list
    > # something vaguely like this:
    > data = []
    > while 1:
    > food = datafile.get("food") # or whatever
    > store = datafile.get("store")
    > price = datafile.get("price")
    > data.append( (food,store,price) )
    > datafile.close()
    > return data
    >
    > def write_data_to_text(datalist, name_of_file):
    > # Expects a list of tuples (food,store,price). Writes that list
    > # to name_of_file separated by newlines.
    > fp = file(name_of_file, "w")
    > for triplet in datalist:
    > fp.write("Food = %s, store = %s, price = %s\n" % triplet
    > fp.close()
    >
    >
    > Hope this helps.
    >
    >
    >
    > --
    > Steven.
    , Dec 28, 2005
    #12
  13. On Tue, 27 Dec 2005 20:11:59 -0800, homepricemaps wrote:

    > hey steven-your examlpe was very helpful. is there a paragraph symbolg
    > missing in
    >
    > fp.write("Food = %s, store = %s, price = %s\n" % triplet


    No, but there is a closing bracket missing:

    fp.write("Food = %s, store = %s, price = %s\n" % triplet)


    --
    Steven.
    Steven D'Aprano, Dec 28, 2005
    #13
  14. Kent Johnson Guest

    wrote:
    > sorry guys, here is the code
    >
    > for incident in bs('a', {'class' : 'price'}):
    > price = ""
    > for oText in incident.fetchText( oRE):
    > price += oText.strip() + "','"
    >
    > for incident in bs('div', {'class' : 'store'}):
    > store = ""
    > for oText in incident.fetchText( oRE):
    > store += oText.strip() + "','"
    >
    > for incident in bs('h2', {'id' : 'food'}):
    > food = ""
    > for oText in incident.fetchText( oRE):
    > food += oText.strip() + "','"
    >


    I would use a loop that finds the row for a single item with something like
    for item in bs('tr', {'class' : 'base'}):

    then inside the loop fetch the values for store, food and price for that
    item and write them to your output file.

    Kent
    Kent Johnson, Dec 28, 2005
    #14
  15. Guest

    hey kent thanks for your help.

    so i ended up using a loop but find that i end up getting the same set
    of results every time. the code is here:

    for incident in bs('tr'):
    data2 = []
    for incident in bs('h2', {'id' : 'dealName'}):
    product2 = ""
    for oText in incident.fetchText( oRE):
    product2 += oText.strip() + ';'



    for incident in bs('a', {'name' : 'D0L3'}):
    store2 = ""
    for oText in incident.fetchText( oRE):
    store2 += oText.strip() + ';'


    for incident in bs('a', {'class' : 'nojs'}):
    price2 = ""
    for oText in incident.fetchText( oRE):
    price2 += oText.strip() + ';'


    tuple2 = (product2, store2, price2)
    data2.append(tuple2)
    print data2

    and i end up getting the following instead of unique results

    pizza, pizzahut, 3.94
    pizza, pizzahut, 3.94
    pizza, pizzahut, 3.94
    pizza, pizzahut, 3.94
    >
    > I would use a loop that finds the row for a single item with something like
    > for item in bs('tr', {'class' : 'base'}):
    >
    > then inside the loop fetch the values for store, food and price for that
    > item and write them to your output file.
    >
    > Kent
    , Dec 29, 2005
    #15
  16. Mike Meyer Guest

    writes:
    > hey kent thanks for your help.
    >
    > so i ended up using a loop but find that i end up getting the same set
    > of results every time. the code is here:
    >
    > for incident in bs('tr'):
    > data2 = []
    > for incident in bs('h2', {'id' : 'dealName'}):
    > product2 = ""
    > for oText in incident.fetchText( oRE):
    > product2 += oText.strip() + ';'
    >
    >
    >
    > for incident in bs('a', {'name' : 'D0L3'}):
    > store2 = ""
    > for oText in incident.fetchText( oRE):
    > store2 += oText.strip() + ';'
    >
    >
    > for incident in bs('a', {'class' : 'nojs'}):
    > price2 = ""
    > for oText in incident.fetchText( oRE):
    > price2 += oText.strip() + ';'
    >
    >
    > tuple2 = (product2, store2, price2)
    > data2.append(tuple2)
    > print data2


    Two things here that are bad in general:
    1) Doing string catenations to build strings. This is slow in
    Python. Build lists of strings and join them, as below.

    2) Using incident as the index variable for all four loops. This is
    very confusing, and certainly part of your problem.

    > and i end up getting the following instead of unique results
    >
    > pizza, pizzahut, 3.94
    > pizza, pizzahut, 3.94
    > pizza, pizzahut, 3.94
    > pizza, pizzahut, 3.94


    Right. The outer loop doesn't do anything to change what the inner
    loops search, so they do the same thing every time through the outer
    loop. You want them to search the row returned by the outer loop each
    time.

    for row in bs('tr'):
    data2 = []
    for incident in row('h2', {'id' :'dealName'}):
    product2list = []
    for oText in incident.fetchText(oRE):
    product2list.append(OText.strip() + ';')
    product2 = ''.join(product2list)
    # etc.

    <mike
    --
    Mike Meyer <> http://www.mired.org/home/mwm/
    Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
    Mike Meyer, Dec 29, 2005
    #16
  17. Guest

    hey mike-the sample code was very useful. have 2 questions

    when i use what you wrote which is listed below i get told
    unboundlocalerror: local variable 'product' referenced before
    assignment. if i however chnage row to incident in "for incident in
    bs('tr'):" i then get mytuples printed out nicely but once again get a
    long list of

    [('pizza;','pizza hut;', '3.94;')]
    [('pizza;','pizza hut;', '3.94;')]


    for row in bs('tr'):
    data=[]
    for incident in row('h2', {'id' : 'dealName'}):
    productlist = []
    for oText in incident.fetchText( oRE):
    productlist.append(oText.strip() + ';')
    product = ''.join(productlist)

    for incident in row('a', {'name' : 'D0L3'}):
    storelist = []
    for oText in incident.fetchText( oRE):
    storelist.append(oText.strip() + ';')
    store = ''.join(storelist)

    tuple = (product, store, price)
    data.append(tuple)
    print data




    > writes:
    > > hey kent thanks for your help.
    > >
    > > so i ended up using a loop but find that i end up getting the same set
    > > of results every time. the code is here:
    > >
    > > for incident in bs('tr'):
    > > data2 = []
    > > for incident in bs('h2', {'id' : 'dealName'}):
    > > product2 = ""
    > > for oText in incident.fetchText( oRE):
    > > product2 += oText.strip() + ';'
    > >
    > >
    > >
    > > for incident in bs('a', {'name' : 'D0L3'}):
    > > store2 = ""
    > > for oText in incident.fetchText( oRE):
    > > store2 += oText.strip() + ';'
    > >
    > >
    > > for incident in bs('a', {'class' : 'nojs'}):
    > > price2 = ""
    > > for oText in incident.fetchText( oRE):
    > > price2 += oText.strip() + ';'
    > >
    > >
    > > tuple2 = (product2, store2, price2)
    > > data2.append(tuple2)
    > > print data2

    >
    > Two things here that are bad in general:
    > 1) Doing string catenations to build strings. This is slow in
    > Python. Build lists of strings and join them, as below.
    >
    > 2) Using incident as the index variable for all four loops. This is
    > very confusing, and certainly part of your problem.
    >
    > > and i end up getting the following instead of unique results
    > >
    > > pizza, pizzahut, 3.94
    > > pizza, pizzahut, 3.94
    > > pizza, pizzahut, 3.94
    > > pizza, pizzahut, 3.94

    >
    > Right. The outer loop doesn't do anything to change what the inner
    > loops search, so they do the same thing every time through the outer
    > loop. You want them to search the row returned by the outer loop each
    > time.
    >
    > for row in bs('tr'):
    > data2 = []
    > for incident in row('h2', {'id' :'dealName'}):
    > product2list = []
    > for oText in incident.fetchText(oRE):
    > product2list.append(OText.strip() + ';')
    > product2 = ''.join(product2list)
    > # etc.
    >
    > <mike
    > --
    > Mike Meyer <> http://www.mired.org/home/mwm/
    > Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
    , Dec 30, 2005
    #17
  18. Kent Johnson Guest

    wrote:
    > hey mike-the sample code was very useful. have 2 questions
    >
    > when i use what you wrote which is listed below i get told
    > unboundlocalerror: local variable 'product' referenced before
    > assignment.


    You would get this error if you have a <tr> that doesn't have an <hr
    id="dealName">. Do you have some <tr> that are not products? If so you
    need to filter them out somehow. Or have you misspelled something? Your
    sample data has id="foodName" not "dealName".

    You might do better with an incremental development. Start with
    for row in bs('tr'):
    print row

    and expand from there. At each step use print statements to make sure
    you are finding the data you expect.

    Kent

    if i however chnage row to incident in "for incident in
    > bs('tr'):" i then get mytuples printed out nicely but once again get a
    > long list of
    >
    > [('pizza;','pizza hut;', '3.94;')]
    > [('pizza;','pizza hut;', '3.94;')]
    >
    >
    > for row in bs('tr'):
    > data=[]
    > for incident in row('h2', {'id' : 'dealName'}):
    > productlist = []
    > for oText in incident.fetchText( oRE):
    > productlist.append(oText.strip() + ';')
    > product = ''.join(productlist)
    >
    > for incident in row('a', {'name' : 'D0L3'}):
    > storelist = []
    > for oText in incident.fetchText( oRE):
    > storelist.append(oText.strip() + ';')
    > store = ''.join(storelist)
    >
    > tuple = (product, store, price)
    > data.append(tuple)
    > print data
    >
    Kent Johnson, Dec 30, 2005
    #18
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. joon
    Replies:
    1
    Views:
    509
    Roedy Green
    Jul 8, 2003
  2. =?UTF-8?B?w4FuZ2VsIEd1dGnDqXJyZXogUm9kcsOtZ3Vleg==

    List of lists of lists of lists...

    =?UTF-8?B?w4FuZ2VsIEd1dGnDqXJyZXogUm9kcsOtZ3Vleg==, May 8, 2006, in forum: Python
    Replies:
    5
    Views:
    384
    =?UTF-8?B?w4FuZ2VsIEd1dGnDqXJyZXogUm9kcsOtZ3Vleg==
    May 15, 2006
  3. Daniel Nogradi
    Replies:
    3
    Views:
    342
    Dennis Lee Bieber
    Nov 10, 2006
  4. Chris Weisiger

    Help with sorting lists of lists

    Chris Weisiger, Oct 14, 2004, in forum: Perl Misc
    Replies:
    7
    Views:
    118
    Tad McClellan
    Oct 14, 2004
  5. froil
    Replies:
    12
    Views:
    302
    Gunnar Hjalmarsson
    Mar 2, 2006
Loading...

Share This Page