TSV to HTML

Discussion in 'Python' started by Brian, May 31, 2006.

  1. Brian

    Brian Guest

    I was wondering if anyone here on the group could point me in a
    direction that would expllaing how to use python to convert a tsv file
    to html. I have been searching for a resource but have only seen
    information on dealing with converting csv to tsv. Specifically I want
    to take the values and insert them into an html table.

    I have been trying to figure it out myself, and in essence, this is
    what I have come up with. Am I on the right track? I really have the
    feeling that I am re-inventing the wheel here.

    1) in the code define a css
    2) use a regex to extract the info between tabs
    3) wrap the values in the appropriate tags and insert into table.
    4) write the .html file

    Thanks again for your patience,
    Brian
     
    Brian, May 31, 2006
    #1
    1. Advertising

  2. Brian

    Tim Chase Guest

    > I was wondering if anyone here on the group could point me
    > in a direction that would expllaing how to use python to
    > convert a tsv file to html. I have been searching for a
    > resource but have only seen information on dealing with
    > converting csv to tsv. Specifically I want to take the
    > values and insert them into an html table.
    >
    > I have been trying to figure it out myself, and in
    > essence, this is what I have come up with. Am I on the
    > right track? I really have the feeling that I am
    > re-inventing the wheel here.
    >
    > 1) in the code define a css
    > 2) use a regex to extract the info between tabs
    > 3) wrap the values in the appropriate tags and insert into
    > table.
    > 4) write the .html file


    Sounds like you just want to do something like

    print "<table>"
    for line in file("in.tsv"):
    print "<tr>"
    items = line.split("\t")
    for item in items:
    print "<td>%s</td>" % item
    print "</tr>"
    print "</table>"

    It gets a little more complex if you need to clean each item
    for HTML entities/scripts/etc...but that's usually just a
    function that you'd wrap around the item:

    print "<td>%s</td>" % escapeEntity(item)

    using whatever "escapeEntity" function you have on hand.
    E.g.

    from xml.sax.saxutils import escape
    :
    :
    print "<td>%s</td>" % escape(item)

    It doesn't gracefully attempt to define headers using
    <thead>, <tbody>, and <th> sorts of rows, but a little
    toying should solve that.

    -tim
     
    Tim Chase, May 31, 2006
    #2
    1. Advertising

  3. Brian

    Dan M Guest

    > 1) in the code define a css
    > 2) use a regex to extract the info between tabs


    In place of this, you might want to look at
    http://effbot.org/librarybook/csv.htm
    Around the middle of that page you'll see how to use a delimiter other
    than a comma

    > 3) wrap the values in the appropriate tags and insert into table. 4)
    > write the .html file
    >
    > Thanks again for your patience,
    > Brian
     
    Dan M, May 31, 2006
    #3
  4. Brian wrote:
    > I was wondering if anyone here on the group could point me in a
    > direction that would expllaing how to use python to convert a tsv file
    > to html. I have been searching for a resource but have only seen
    > information on dealing with converting csv to tsv. Specifically I want
    > to take the values and insert them into an html table.


    import csv
    from xml.sax.saxutils import escape

    def tsv_to_html(input_file, output_file):
    output_file.write('<table><tbody>\n')
    for row in csv.reader(input_file, 'excel-tab'):
    output_file.write('<tr>')
    for col in row:
    output_file.write('<td>%s</td>' % escape(col))
    output_file.write('</tr>\n')
    output_file.write('</tbody></table>')

    Usage example:

    >>> from cStringIO import StringIO
    >>> input_file = StringIO('"foo"\t"bar"\t"baz"\n'

    .... '"qux"\t"quux"\t"quux"\n')
    >>> output_file = StringIO()
    >>> tsv_to_html(input_file, output_file)
    >>> print output_file.getvalue()

    <table><tbody>
    <tr><td>foo</td><td>bar</td><td>baz</td></tr>
    <tr><td>qux</td><td>quux</td><td>quux</td></tr>
    </tbody></table>
     
    Leif K-Brooks, Jun 1, 2006
    #4
  5. Brian

    Brian Guest

    First let me say that I appreciate the responses that everyone has
    given.

    A friend of mine is a ruby programmer but knows nothing about python.
    He gave me the script below and it does exactly what I want, only it is
    in Ruby. Not knowing ruby this is greek to me, and I would like to
    re-write it in python.

    I ask then, is this essentially what others here have shown me to do,
    or is it in a different vein all together?

    Code:

    class TsvToHTML
    @@styleBlock = <<-ENDMARK
    <style type='text/css'>
    td {
    border-left:1px solid #000000;
    padding-right:4px;
    padding-left:4px;
    white-space: nowrap;
    }
    .cellTitle {
    border-bottom:1px solid #000000;
    background:#ffffe0;
    font-weight: bold;
    text-align: center;
    }
    .cell0 { background:#eff1f1; }
    .cell1 { background:#f8f8f8; }
    </style>
    ENDMARK

    def TsvToHTML::wrapTag(data,tag,modifier = "")
    return "<#{tag} #{modifier}>" + data + "</#{tag}>\n"
    end # wrapTag

    def TsvToHTML::makePage(source)
    page = ""
    rowNum = 0
    source.readlines.each { |record|
    row = ""
    record.chomp.split("\t").each { |field|
    # replace blank fields with &nbsp;
    field.sub!(/^$/,"&nbsp;")
    # wrap in TD tag, specify style
    row += wrapTag(field,"td","class=\"" +
    ((rowNum == 0)?"cellTitle":"cell#{rowNum % 2}") +
    "\"")
    }
    rowNum += 1
    # wrap in TR tag, add row to page
    page += wrapTag(row,"tr") + "\n"
    }
    # finish page formatting
    [ [ "table","cellpadding=0 cellspacing=0 border=0" ], "body","html"
    ].each { |tag|
    page = wrapTag(@@styleBlock,"head") + page if tag == "html"
    page = wrapTag(page,*tag)
    }
    return page
    end # makePage
    end # class

    # stdin -> convert -> stdout
    print TsvToHTML.makePage(STDIN)
     
    Brian, Jun 1, 2006
    #5
  6. Brian

    Paddy Guest

    Brian wrote:
    > First let me say that I appreciate the responses that everyone has
    > given.
    >
    > A friend of mine is a ruby programmer but knows nothing about python.
    > He gave me the script below and it does exactly what I want, only it is
    > in Ruby. Not knowing ruby this is greek to me, and I would like to
    > re-write it in python.
    >
    > I ask then, is this essentially what others here have shown me to do,
    > or is it in a different vein all together?
    >

    Leif's Python example uses the csv module which understands a lot more
    about the peculiarities of the CSV/TSV formats.
    The Ruby example prepends a <style>...</style> block.

    The Ruby example splits each line to form a table row and each row on
    tabs, to form the cells.

    The thing about TSV/CSV formats is that their is no one format. you
    need to check how your TSV creator generates the TSV file:
    Does it put quotes around text fields?
    What kind of quotes?
    How does it represent null fields?
    Might you get fields that include newlines?

    - P.S. I'm not a Ruby programmer, just read the source ;-)
     
    Paddy, Jun 1, 2006
    #6
  7. On 31 May 2006 18:48:30 -0700, "Brian" <> declaimed
    the following in comp.lang.python:


    > Code:
    >
    > class TsvToHTML
    > @@styleBlock = <<-ENDMARK


    <snip>

    > print TsvToHTML.makePage(STDIN)


    Given that no "instances" are created, there's no real need to use a
    class (in Python, at least -- I don't know if Ruby is like Java, where
    everything is embedded in a class). A simple module (file) is
    sufficient.

    I took a few liberties -- like splitting out the table generation
    from the rest of the page, and adding argument parsing for input files
    (so this version will create multiple tables if multiple files were
    supplied). Be careful, one or two lines were wrapped by the news client.

    -=-=-=-=-=-=-=-
    # tsv2html.py
    # function module

    import sys

    # define CSS style definition
    STYLEBLOCK = """
    <style type="text/css">
    td {
    border-left:1px solid #000000;
    padding-right:4px;
    padding-left:4px;
    white-space: nowrap; }
    ..cellTitle {
    border-bottom:1px solid #000000;
    background:#ffffe0;
    font-weight: bold;
    text-align: center; }
    ..cell0 { background:#3ff1f1; }
    ..cell1 { background:#f8f8f8; }
    </style>
    """

    # utility function to wrap "data" within
    # <tag modifier> data </tag>
    def wrapTag(data, tag, modifier = ""):
    if type(tag) != type(""): #check for complex (tag, modifier) tuple
    tag, modifier = tag
    return "<%s %s>%s</%s>\n" % (tag, modifier, data, tag)

    # utility function to produce an HTML table
    # from tab-separated data read from
    # iterable source material
    def makeTable(source):
    tableParts = []
    rowNum = 0
    # get each line of source
    for record in source:
    rowParts = []
    # get each field of source; splitting on tabs
    for field in record.strip().split("\t"):
    # convert empty fields to a non-breaking space
    if not field: field = "&nbsp;"
    if rowNum:
    # past the first row, alternate cell style
    tagged = wrapTag(field, "td",
    'class="cell%s"' % (rowNum % 2))
    else:
    # first row, use "title" style
    tagged = wrapTag(field, "td", #I'd use "th"
    'class="cellTitle"')
    # collect the tagged field as a list of row parts
    rowParts.append(tagged)
    rowNum += 1
    # join the row parts, and wrap as a row, collecting rows in
    list
    tableParts.append(wrapTag("".join(rowParts), "tr"))
    # join the rows with a new-line separator
    return wrapTag("\n".join(tableParts),
    ("table",
    'align="center" cellpadding="0" cellspacing="0"
    border="0"'))

    def makePage(data):
    # wrap the tables in rest of HTML tags: table, body, html
    for tag in ["body", "html"]:
    # if current tag is the <html>, insert a <head> block with
    # the CSS style definition
    if tag == "html":
    data = wrapTag(STYLEBLOCK, "head") + data
    data = wrapTag(data, tag)
    return data

    if __name__ == "__main__":
    # if command line arguments supplied, treat as file names
    if len(sys.argv) > 1:
    fout = open("TSV2HTML.html", "w")
    tables = []
    # for each file supplied
    for fid in sys.argv[1:]:
    # open for read, and open a <filename>.html for output
    fin = open(fid, "r")
    # generate page from file data, write new file
    tables.append(makeTable(fin))
    fin.close()
    fout.write(makePage("\n".join(tables)))
    fout.close()
    else:
    # no arguments, read stdin, write stdout
    sys.stdout.write(makePage(makeTable(sys.stdin))) #could use
    print

    NOTE: no HTML escaping is done, and my test data sometimes caused
    problems.
    --
    Wulfraed Dennis Lee Bieber KD6MOG

    HTTP://wlfraed.home.netcom.com/
    (Bestiaria Support Staff: )
    HTTP://www.bestiaria.com/
     
    Dennis Lee Bieber, Jun 1, 2006
    #7
  8. Brian

    Brian Guest

    Dennis,

    Thank you for that response. Your code was very helpful to me. I
    think that actually seeing how it should be done in Python was a lot
    more educational than spending hours with trial and error.

    One question (and this is a topic that I still have trouble getting my
    arms around). Why is the text in STYLEBLOCK tripple quoted?

    Thanks again,
    Brian
     
    Brian, Jun 1, 2006
    #8
  9. Brian wrote:
    > One question (and this is a topic that I still have trouble getting my
    > arms around). Why is the text in STYLEBLOCK tripple quoted?


    Because triple-quoted strings can span lines and include single quotes
    and double quotes.

    --
    --Scott David Daniels
     
    Scott David Daniels, Jun 1, 2006
    #9
  10. On 1 Jun 2006 03:29:35 -0700, "Brian" <> declaimed the
    following in comp.lang.python:

    > Thank you for that response. Your code was very helpful to me. I
    > think that actually seeing how it should be done in Python was a lot
    > more educational than spending hours with trial and error.
    >

    It's not the best code around -- I hacked it together pretty much
    line-for-line from an assumption of what the Ruby was doing (I don't do
    Ruby -- too much PERL idiom in it)

    > One question (and this is a topic that I still have trouble getting my
    > arms around). Why is the text in STYLEBLOCK tripple quoted?
    >

    Triple quotes allow: 1) use of single quotes within the block
    without needing to escape them; 2) allows the string to span multiple
    lines. Plain string quoting must be one logical line to the parser.

    I've practically never seen anyone use a line continuation character
    in Python. And triple quoting looks cleaner than parser concatenation.

    The alternatives would have been:

    Line Continuation:
    STYLEBLOCK = '\n\
    <style type="text/css">\n\
    td {\n\
    border-left:1px solid #000000;\n\
    padding-right:4px;\n\
    padding-left:4px;\n\
    white-space: nowrap; }\n\
    ..cellTitle {\n\
    border-bottom:1px solid #000000;\n\
    background:#ffffe0;\n\
    font-weight: bold;\n\
    text-align: center; }\n\
    ..cell0 { background:#3ff1f1; }\n\
    ..cell1 { background:#f8f8f8; }\n\
    </style>\n\
    '
    Note the \n\ as the end of each line; the \n is to keep the
    formatting on the generated HTML (otherwise everything would be one long
    line) and the final \ (which must be the physical end of line)
    signifying "this line is continued". Also note that I used ' rather than
    " to avoid escaping the " on text/css.

    Parser Concatenation:
    STYLEBLOCK = (
    '<style type="text/css">\n'
    "td {\n"
    " border-left:1px solid #000000;\n"
    " padding-right:4px;\n"
    " padding-left:4px;\n"
    " white-space: nowrap; }\n"
    ".cellTitle {\n"
    " border-bottom:1px solid #000000;\n"
    " background:#ffffe0;\n"
    " font-weight: bold;\n"
    " text-align: center; }\n"
    ".cell0 { background:#3ff1f1; }\n"
    ".cell1 { background:#f8f8f8; }\n"
    "</style>\n"
    )

    Note the use of ( ) where the original had """ """. Also note that
    each line has quotes at start/end (the first has ' to avoid escaping
    text/css). There are no commas separating each line (and the \n is still
    for formatting). Using the ( ) creates an expression, and Python is nice
    enough to let one split expressions inside () or [lists], {dicts}, over
    multiple lines (I used that feature in a few spots to put call arguments
    on multiple lines). Two strings that are next to each other

    "string1" "string2"

    are parsed as one string

    "string1string2"

    Using """ (or ''') is the cleanest of those choices, especially if
    you want to do preformatted layout of the text. It works similar to the
    Ruby/PERL construct that basically said: Copy all text up to the next
    occurrence of MARKER_STRING.




    > Thanks again,
    > Brian

    --
    Wulfraed Dennis Lee Bieber KD6MOG

    HTTP://wlfraed.home.netcom.com/
    (Bestiaria Support Staff: )
    HTTP://www.bestiaria.com/
     
    Dennis Lee Bieber, Jun 1, 2006
    #10
  11. Brian

    Brian Guest

    Dennis Lee Bieber wrote:
    > On 1 Jun 2006 03:29:35 -0700, "Brian" <> declaimed the
    > following in comp.lang.python:
    >
    > > Thank you for that response. Your code was very helpful to me. I
    > > think that actually seeing how it should be done in Python was a lot
    > > more educational than spending hours with trial and error.
    > >

    > It's not the best code around -- I hacked it together pretty much
    > line-for-line from an assumption of what the Ruby was doing (I don't do
    > Ruby -- too much PERL idiom in it)
    >
    > > One question (and this is a topic that I still have trouble getting my
    > > arms around). Why is the text in STYLEBLOCK tripple quoted?
    > >

    > Triple quotes allow: 1) use of single quotes within the block
    > without needing to escape them; 2) allows the string to span multiple
    > lines. Plain string quoting must be one logical line to the parser.
    >
    > I've practically never seen anyone use a line continuation character
    > in Python. And triple quoting looks cleaner than parser concatenation.
    >
    > The alternatives would have been:
    >
    > Line Continuation:
    > STYLEBLOCK = '\n\
    > <style type="text/css">\n\
    > td {\n\
    > border-left:1px solid #000000;\n\
    > padding-right:4px;\n\
    > padding-left:4px;\n\
    > white-space: nowrap; }\n\
    > .cellTitle {\n\
    > border-bottom:1px solid #000000;\n\
    > background:#ffffe0;\n\
    > font-weight: bold;\n\
    > text-align: center; }\n\
    > .cell0 { background:#3ff1f1; }\n\
    > .cell1 { background:#f8f8f8; }\n\
    > </style>\n\
    > '
    > Note the \n\ as the end of each line; the \n is to keep the
    > formatting on the generated HTML (otherwise everything would be one long
    > line) and the final \ (which must be the physical end of line)
    > signifying "this line is continued". Also note that I used ' rather than
    > " to avoid escaping the " on text/css.
    >
    > Parser Concatenation:
    > STYLEBLOCK = (
    > '<style type="text/css">\n'
    > "td {\n"
    > " border-left:1px solid #000000;\n"
    > " padding-right:4px;\n"
    > " padding-left:4px;\n"
    > " white-space: nowrap; }\n"
    > ".cellTitle {\n"
    > " border-bottom:1px solid #000000;\n"
    > " background:#ffffe0;\n"
    > " font-weight: bold;\n"
    > " text-align: center; }\n"
    > ".cell0 { background:#3ff1f1; }\n"
    > ".cell1 { background:#f8f8f8; }\n"
    > "</style>\n"
    > )
    >
    > Note the use of ( ) where the original had """ """. Also note that
    > each line has quotes at start/end (the first has ' to avoid escaping
    > text/css). There are no commas separating each line (and the \n is still
    > for formatting). Using the ( ) creates an expression, and Python is nice
    > enough to let one split expressions inside () or [lists], {dicts}, over
    > multiple lines (I used that feature in a few spots to put call arguments
    > on multiple lines). Two strings that are next to each other
    >
    > "string1" "string2"
    >
    > are parsed as one string
    >
    > "string1string2"
    >
    > Using """ (or ''') is the cleanest of those choices, especially if
    > you want to do preformatted layout of the text. It works similar to the
    > Ruby/PERL construct that basically said: Copy all text up to the next
    > occurrence of MARKER_STRING.


    Thank you for your explanation, now it makes sense.

    Brian
     
    Brian, Jun 1, 2006
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mark Kamoski
    Replies:
    1
    Views:
    7,171
  2. Mitchua
    Replies:
    1
    Views:
    7,150
    Ice Demon
    Jul 15, 2003
  3. BCC

    How to read tsv file?

    BCC, Jan 30, 2004, in forum: C++
    Replies:
    10
    Views:
    4,878
    David Harmon
    Jan 30, 2004
  4. Adam Akhtar
    Replies:
    9
    Views:
    590
    Florian Gilcher
    Aug 16, 2008
  5. Dr Eberhard Lisse

    Date in CSV/TSV question

    Dr Eberhard Lisse, Jan 1, 2013, in forum: Perl Misc
    Replies:
    19
    Views:
    372
    Uri Guttman
    Feb 14, 2013
Loading...

Share This Page