Table/table rows/table data tag question?

Discussion in 'HTML' started by Rio, Nov 4, 2004.

  1. Rio

    Rio Guest

    Hi, my goal is to extract data from cells within tables from certain pages
    (sportsbooks odds)!

    I'm using java to achieve this, I get the source of the page, place it in a
    string and pass that string (basically source of the html page) to methods
    that cut it out sequentially.

    First they find whatever it is between
    <table.....any text including data, attributes and other tags up
    until...../table>

    Whatever is in there belongs to table 1 and that substring is cut out and
    passed to another method that finds

    <tr............anything..................../tr>

    that's row 1, that substring is cut out and passed to another method that
    finds

    <td...............anything............../td> and finally strips other tags
    and extracts data.

    Finally every cell has its data, table number, row number and cell number.



    The program works for the great majority of pages I'm trying to extract data
    from. It obviously fails when it encounters table within table

    <table (1).....
    <table (2 within table 1)....
    ..........
    .../table (2)>
    /table (1)>

    becuse it cuts from first <table opening tag until first /table> closing
    tag. That's also a problem I can deal with.


    NOW, THE PROBLEM!

    But one particular page is giving me headache. I noticed my programm wrongly
    counts cells, rows, misplaces data etc. I designed a method TO COUNT EACH
    OCCURENCE OF opening and closing <table <tr and <td tags and found out that
    NUMBER OF OPENING AND CLOSING TAGS IS NOT THE SAME and therefore I can't
    design the programm that can correctly find what I want.

    THE QUESTION IS: How is it possible and how does IE know where one table
    (table row or cell) starts and where it ends and is it possible that some
    <table <tr or <td tags actually only serve to describe attributes of that
    table or row, if so how can I recognize them?


    Big thanks to anyone who just reads this :) !
     
    Rio, Nov 4, 2004
    #1
    1. Advertising

  2. Rio

    Jim Higson Guest

    So you are writing your own HTML parser?

    Why not just use the provided ones? (I'm pretty sure there's one in the JRE
    already, in javax.text or somewhere). This will deal with nested tables etc
    for you.

    If the pages are XHTML, you could even use a generic XML parser, such as
    Xerces. One of the advantages of XHTML is easy parsing with generic tools.
     
    Jim Higson, Nov 4, 2004
    #2
    1. Advertising

  3. Rio

    rf Guest

    Rio wrote:

    > THE QUESTION IS: How is it possible and how does IE know where one table
    > (table row or cell) starts and where it ends and is it possible that some
    > <table <tr or <td tags actually only serve to describe attributes of that
    > table or row, if so how can I recognize them?


    The closing tag for table rows and cells is optional. Browsers understand
    this.

    As you parse a td element if you encounter a <td> or a <tr> or a </tr> then
    you *imply* a </td> for the td element. Rows are easier, as you parse the tr
    element if you encounter a <tr> or a </table> then imply a </tr>.

    --
    Cheers
    Richard.
     
    rf, Nov 4, 2004
    #3
  4. Rio

    Rio Guest

    Thanks a lot for the reply that's really helpful, how about table tag, are
    they supposed to be closed properly, if yes how is it possible to have
    unequal number of opening and closing table tags?


    > The closing tag for table rows and cells is optional. Browsers understand
    > this.
    >
    > As you parse a td element if you encounter a <td> or a <tr> or a </tr>

    then
    > you *imply* a </td> for the td element. Rows are easier, as you parse the

    tr
    > element if you encounter a <tr> or a </table> then imply a </tr>.
    >
    > --
    > Cheers
    > Richard.
    >
    >
     
    Rio, Nov 5, 2004
    #4
  5. Rio

    rf Guest

    Rio wrote:

    [top posting corrected]

    > > The closing tag for table rows and cells is optional. Browsers

    understand
    > > this.
    > >
    > > As you parse a td element if you encounter a <td> or a <tr> or a </tr>

    > then
    > > you *imply* a </td> for the td element. Rows are easier, as you parse

    the
    > tr
    > > element if you encounter a <tr> or a </table> then imply a </tr>.


    > Thanks a lot for the reply that's really helpful, how about table tag,


    Element. You are talking about the table *element*. It has an opening tag
    and a closing tag and, between these, some content.

    > are
    > they supposed to be closed properly,


    Check the specification.
    http://www.w3.org/TR/html4/struct/tables.html#edef-TABLE

    It says there tat the end tag is required.

    > if yes how is it possible to have
    > unequal number of opening and closing table tags?


    The spec says the table element must be closed. This does not mean that
    authors *will* close them. The result will be up to browser error
    correction.

    --
    Cheers
    Richard.
     
    rf, Nov 5, 2004
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Subba Rao via DotNetMonster.com

    script for moving rows up and down and traverse thru rows of HTML table

    Subba Rao via DotNetMonster.com, Mar 19, 2005, in forum: ASP .Net
    Replies:
    0
    Views:
    8,248
    Subba Rao via DotNetMonster.com
    Mar 19, 2005
  2. helpful sql
    Replies:
    0
    Views:
    824
    helpful sql
    May 19, 2005
  3. Arjen Hoekstra
    Replies:
    0
    Views:
    567
    Arjen Hoekstra
    Aug 2, 2005
  4. shruds
    Replies:
    1
    Views:
    856
    John C. Bollinger
    Jan 27, 2006
  5. Jason James

    Rows the rows of a grid view?

    Jason James, Aug 10, 2006, in forum: ASP .Net
    Replies:
    0
    Views:
    401
    Jason James
    Aug 10, 2006
Loading...

Share This Page