Total Beginner - Extracting Data from a Database Online (Screenshot)

Discussion in 'Python' started by logan.c.graham@gmail.com, May 24, 2013.

  1. Guest

    Hey guys,

    I'm learning Python and I'm experimenting with different projects -- I like learning by doing. I'm wondering if you can help me here:

    http://i.imgur.com/KgvSKWk.jpg

    What this is is a publicly-accessible webpage that's a simple database of people who have used the website. Ideally what I'd like to end up with is an excel spreadsheet with data from the columns #fb, # vids, fb sent?, # email tm.

    I'd like to use Python to do it -- crawl the page and extract the data in a usable way.

    I'd love your input! I'm just a learner.
     
    , May 24, 2013
    #1
    1. Advertising

  2. Dave Angel Guest

    Re: Total Beginner - Extracting Data from a Database Online(Screenshot)

    On 05/24/2013 01:32 PM, wrote:
    > Hey guys,
    >
    > I'm learning Python


    Welcome.

    > and I'm experimenting with different projects -- I like learning by doing. I'm wondering if you can help me here:
    >
    >na
    >
    > What this is is a publicly-accessible webpage


    No, it's just a jpeg file, an image.

    > that's a simple database of people who have used the website. Ideally what I'd like to end up with is an excel spreadsheet with data from the columns #fb, # vids, fb sent?, # email tm.
    >
    > I'd like to use Python to do it -- crawl the page and extract the data in a usable way.
    >


    But there's no page to crawl. You may have to start by finding an ocr
    to interpret the image as characters. Or find some other source for
    your data.

    > I'd love your input! I'm just a learner.
    >



    --
    DaveA
     
    Dave Angel, May 24, 2013
    #2
    1. Advertising

  3. RE: Total Beginner - Extracting Data from a Database Online(Screenshot)

    ### table_data_extraction.py ###
    # Usage: table[id][row][column]
    # tables[0]       : 1st table
    # tables[1][2]    : 3rd row of 2nd table
    # tables[3][4][5] : cell content of 6th column of 5th row of 4th table
    # len(table)      : quantity of tables
    # len(table[6])   : quantity of rows of 7th table
    # len(table[7][8]): quantity of columns of 9th row of 8th table

    impor re
    import urllib2

    #to retrieve the contents of the page
    page = urllib2.urlopen("http://example.com/page.html").read().strip()

    #to create the tables list
    tables=[[re.findall('<TD>(.*?)</TD>',r,re.S) for r in re.findall('<TR>(.*?)</TR>',t,re.S)] for t in re.findall('<TABLE>(.*?)</TABLE>',page,re.S)]


    Pretty simple. Good luck!

    ----------------------------------------
    > Date: Fri, 24 May 2013 10:32:26 -0700
    > Subject: Total Beginner - Extracting Data from a Database Online (Screenshot)
    > From:
    > To:
    >
    > Hey guys,
    >
    > I'm learning Python and I'm experimenting with different projects -- I like learning by doing. I'm wondering if you can help me here:
    >
    > http://i.imgur.com/KgvSKWk.jpg
    >
    > What this is is a publicly-accessible webpage that's a simple database ofpeople who have used the website. Ideally what I'd like to end up with is an excel spreadsheet with data from the columns #fb, # vids, fb sent?, # email tm.
    >
    > I'd like to use Python to do it -- crawl the page and extract the data ina usable way.
    >
    > I'd love your input! I'm just a learner.
    > --
    > http://mail.python.org/mailman/listinfo/python-list
     
    Carlos Nepomuceno, May 25, 2013
    #3
  4. Dave Angel Guest

    Re: Total Beginner - Extracting Data from a Database Online(Screenshot)

    On 05/24/2013 07:36 PM, Carlos Nepomuceno wrote:
    >
    > <SNIP>
    > page = urllib2.urlopen("http://example.com/page.html").read().strip()
    >
    > #to create the tables list
    > tables=[[re.findall('<TD>(.*?)</TD>',r,re.S) for r in re.findall('<TR>(.*?)</TR>',t,re.S)] for t in re.findall('<TABLE>(.*?)</TABLE>',page,re.S)]
    >
    >
    > Pretty simple. Good luck!


    Only if the page is html, which the OP's was not. It was an image. Try
    parsing that with regex.



    --
    DaveA
     
    Dave Angel, May 25, 2013
    #4
  5. Re: Total Beginner - Extracting Data from a Database Online(Screenshot)

    On Sat, May 25, 2013 at 3:32 AM, <> wrote:
    > http://i.imgur.com/KgvSKWk.jpg
    >
    > What this is is a publicly-accessible webpage...


    If that's a screenshot of something that we'd be able to access
    directly, then why not just post a link to the actual thing? More
    likely I'm thinking it's NOT publicly accessible, which is why it's
    been censored.

    ChrisA
     
    Chris Angelico, May 25, 2013
    #5
  6. Guest

    If you are talking about accessing a web page, rather than an image, then you want to do what is known as screen scraping.

    One of the best tools for this is called BeautifulSoup.

    http://www.crummy.com/software/BeautifulSoup/
     
    , May 25, 2013
    #6
  7. Guest

    If you are talking about accessing a web page, rather than an image, then what you want to do is known as 'screen scraping'.

    One of the best tools for this is called BeautifulSoup.

    http://www.crummy.com/software/BeautifulSoup/
     
    , May 25, 2013
    #7
  8. Guest

    Sorry to be unclear -- it's a screenshot of the webpage, which is publicly accessible, but it contains sensitive information. A bad combination, admittedly, and something that'll be soon fixed.
     
    , May 26, 2013
    #8
  9. John Ladasky Guest

    On Friday, May 24, 2013 4:36:35 PM UTC-7, Carlos Nepomuceno wrote:
    > #to create the tables list
    > tables=[[re.findall('<TD>(.*?)</TD>',r,re.S) for r in re.findall('<TR>(.*?)</TR>',t,re.S)] for t in re.findall('<TABLE>(.*?)</TABLE>',page,re.S)]
    >
    >
    > Pretty simple.


    Two nested list comprehensions, with regex pattern matching?

    Logan did say he was a "total beginner." :^)
     
    John Ladasky, May 26, 2013
    #9
  10. Guest

    On Saturday, May 25, 2013 6:33:25 PM UTC-7, John Ladasky wrote:
    > On Friday, May 24, 2013 4:36:35 PM UTC-7, Carlos Nepomuceno wrote:
    >
    > > #to create the tables list

    >
    > > tables=[[re.findall('<TD>(.*?)</TD>',r,re.S) for r in re.findall('<TR>(.*?)</TR>',t,re.S)] for t in re.findall('<TABLE>(.*?)</TABLE>',page,re.S)]

    >
    > >

    >
    > >

    >
    > > Pretty simple.

    >
    >
    >
    > Two nested list comprehensions, with regex pattern matching?
    >
    >
    >
    > Logan did say he was a "total beginner." :^)




    Oh goodness, yes, I have no clue.
     
    , May 28, 2013
    #10
  11. RE: Total Beginner - Extracting Data from a Database Online(Screenshot)

    ----------------------------------------
    > Date: Mon, 27 May 2013 17:58:00 -0700
    > Subject: Re: Total Beginner - Extracting Data from a Database Online (Screenshot)
    > From:
    > To:

    [...]
    >
    > Oh goodness, yes, I have no clue.


    For example:

    # to retrieve the contents of all column '# fb' (11th column from the imageyou sent)

    c11 = [tables[0][r][10] for r in range(len(tables[0]))]
    #      ----------------                -------------
    #      this is the content             this is the quantity
    #      of the 11th cell                of rows in table[0]
    #      of row 'r'
     
    Carlos Nepomuceno, May 28, 2013
    #11
  12. Phil Connell Guest

    RE: Total Beginner - Extracting Data from a Database Online(Screenshot)

    On 28 May 2013 02:21, "Carlos Nepomuceno" <>
    wrote:
    >
    > ----------------------------------------
    > > Date: Mon, 27 May 2013 17:58:00 -0700
    > > Subject: Re: Total Beginner - Extracting Data from a Database Online

    (Screenshot)
    > > From:
    > > To:

    > [...]
    > >
    > > Oh goodness, yes, I have no clue.

    >
    > For example:
    >
    > # to retrieve the contents of all column '# fb' (11th column from the

    image you sent)
    >
    > c11 = [tables[0][r][10] for r in range(len(tables[0]))]


    Or rather:

    c11 = [row[10] for row in tables[0]]

    In most cases, range(len(x)) is a sign that you're doing it wrong :)
     
    Phil Connell, May 28, 2013
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Sunil Miriyala
    Replies:
    0
    Views:
    766
    Sunil Miriyala
    Mar 1, 2004
  2. sarmin
    Replies:
    1
    Views:
    305
    BW Glitch
    Dec 4, 2003
  3. VB Programmer

    Total Training Online - Your opinion?

    VB Programmer, Sep 25, 2007, in forum: ASP .Net
    Replies:
    1
    Views:
    323
    Chris Fulstow
    Sep 26, 2007
  4. INTP56

    Total Beginner Question

    INTP56, Feb 3, 2009, in forum: ASP .Net
    Replies:
    7
    Views:
    466
    Alexey Smirnov
    Feb 5, 2009
  5. INTP56

    Total Beginner Question.

    INTP56, Feb 3, 2009, in forum: ASP General
    Replies:
    1
    Views:
    109
    Evertjan.
    Feb 3, 2009
Loading...

Share This Page