Regular Expression - old regex module vs. re module

Discussion in 'Python' started by Steve, Jun 29, 2006.

  1. Steve

    Steve Guest

    Hi All,

    I'm having a tough time converting the following regex.compile patterns
    into the new re.compile format. There is also a differences in the
    regsub.sub() vs. re.sub()

    Could anyone lend a hand?


    import regsub
    import regex

    import re # << need conversion to this module

    .....

    """Convert perl style format symbology to printf tokens.

    Take a string and substitute computed printf tokens for perl style
    format symbology.

    For example:

    ###.## yields %6.2f
    ######## yields %8d
    <<<<< yields %-5s
    """


    exponentPattern = regex.compile('\(^\|[^\\#]\)\(#+\.#+\*\*\*\*\)')
    floatPattern = regex.compile('\(^\|[^\\#]\)\(#+\.#+\)')
    integerPattern = regex.compile('\(^\|[^\\#]\)\(##+\)')
    leftJustifiedStringPattern = regex.compile('\(^\|[^\\<]\)\(<<+\)')
    rightJustifiedStringPattern = regex.compile('\(^\|[^\\>]\)\(>>+\)')



    while 1: # process all integer fields
    print("Testing Integer")
    if integerPattern.search(s) < 0: break
    print("Integer Match : ", integerPattern.search(s).span() )
    # i1 , i2 = integerPattern.regs[2]
    i1 , i2 = integerPattern.search(s).span()
    width_total = i2 - i1
    f = '%'+`width_total`+'d'
    # s = regsub.sub(integerPattern, '\\1'+f, s)
    s = integerPattern.sub(f, s)



    Thanks in advance!

    Steve
    Steve, Jun 29, 2006
    #1
    1. Advertising

  2. Steve

    Jim Segrave Guest

    In article <>,
    Steve <> wrote:
    >Hi All,
    >
    >I'm having a tough time converting the following regex.compile patterns
    >into the new re.compile format. There is also a differences in the
    >regsub.sub() vs. re.sub()
    >
    >Could anyone lend a hand?
    >
    >
    >import regsub
    >import regex
    >
    >import re # << need conversion to this module
    >
    >....
    >
    > """Convert perl style format symbology to printf tokens.
    >
    > Take a string and substitute computed printf tokens for perl style
    > format symbology.
    >
    > For example:
    >
    > ###.## yields %6.2f
    > ######## yields %8d
    > <<<<< yields %-5s
    > """


    Perhaps not optimal, but this processes things as requested. Note that
    all floats have to be done before any integer patterns are replaced.

    ==========================
    #!/usr/local/bin/python

    import re

    """Convert perl style format symbology to printf tokens.
    Take a string and substitute computed printf tokens for perl style
    format symbology.

    For example:

    ###.## yields %6.2f
    ######## yields %8d
    <<<<< yields %-5s
    """


    # handle cases where there's no integer or no fractional chars
    floatPattern = re.compile(r'(?<!\\)(#+\.(#*)|\.(#+))')
    integerPattern = re.compile(r'(?<![\\.])(#+)(?![.#])')
    leftJustifiedStringPattern = re.compile(r'(?<!\\)(<+)')
    rightJustifiedStringPattern = re.compile(r'(?<!\\)(>+)')

    def float_sub(matchobj):
    # fractional part may be in either groups()[1] or groups()[2]
    if matchobj.groups()[1] is not None:
    return "%%%d.%df" % (len(matchobj.groups()[0]),
    len(matchobj.groups()[1]))
    else:
    return "%%%d.%df" % (len(matchobj.groups()[0]),
    len(matchobj.groups()[2]))


    def unperl_format(s):
    changed_things = 1
    while changed_things:
    # lather, rinse and repeat until nothing new happens
    changed_things = 0

    mat_obj = leftJustifiedStringPattern.search(s)
    if mat_obj:
    s = re.sub(leftJustifiedStringPattern, "%%-%ds" %
    len(mat_obj.groups()[0]), s, 1)
    changed_things = 1

    mat_obj = rightJustifiedStringPattern.search(s)
    if mat_obj:
    s = re.sub(rightJustifiedStringPattern, "%%%ds" %
    len(mat_obj.groups()[0]), s, 1)
    changed_things = 1

    # must do all floats before ints
    mat_obj = floatPattern.search(s)
    if mat_obj:
    s = re.sub(floatPattern, float_sub, s, 1)
    changed_things = 1
    # don't fall through to the int code
    continue

    mat_obj = integerPattern.search(s)
    if mat_obj:
    s = re.sub(integerPattern, "%%%dd" % len(mat_obj.groups()[0]),
    s, 1)
    changed_things = 1
    return s

    if __name__ == '__main__':
    testarray = ["integer: ####, integer # integer at end #",
    "float ####.## no decimals ###. no int .### at end ###.",
    "Left string <<<<<< short left string <",
    "right string >>>>>> short right string >",
    "escaped chars \\#### \\####.## \\<\\<<<< \\>\\><<<"]


    for s in testarray:
    print("Testing: %s" % s)
    print "Result: %s" % unperl_format(s)
    print

    ======================

    Running this gives

    Testing: integer: ####, integer # integer at end #
    Result: integer: %4d, integer %1d integer at end %1d

    Testing: float ####.## no decimals ###. no int .### at end ###.
    Result: float %7.2f no decimals %4.0f no int %4.3f at end %4.0f

    Testing: Left string <<<<<< short left string <
    Result: Left string %-6s short left string %-1s

    Testing: right string >>>>>> short right string >
    Result: right string %6s short right string %1s

    Testing: escaped chars \#### \####.## \<\<<<< \>\><<<
    Result: escaped chars \#%3d \#%6.2f \<\<%-3s \>\>%-3s



    --
    Jim Segrave ()
    Jim Segrave, Jun 30, 2006
    #2
    1. Advertising

  3. Steve

    Paul McGuire Guest

    "Steve" <> wrote in message
    news:...
    > Hi All,
    >
    > I'm having a tough time converting the following regex.compile patterns
    > into the new re.compile format. There is also a differences in the
    > regsub.sub() vs. re.sub()
    >
    > Could anyone lend a hand?
    >
    >


    Not an re solution, but pyparsing makes for an easy-to-follow program.
    TransformString only needs to scan through the string once - the
    "reals-before-ints" testing is factored into the definition of the
    formatters variable.

    Pyparsing's project wiki is at http://pyparsing.wikispaces.com.

    -- Paul

    -------------------
    from pyparsing import *

    """
    read Perl-style formatting placeholders and replace with
    proper Python %x string interp formatters

    ###### -> %6d
    ##.### -> %6.3f
    <<<<< -> %-5s
    >>>>> -> %5s


    """

    # set up patterns to be matched - Word objects match character groups
    # made up of characters in the Word constructor; Combine forces
    # elements to be adjacent with no intervening whitespace
    # (note use of results name in realFormat, for easy access to
    # decimal places substring)
    intFormat = Word("#")
    realFormat = Combine(Word("#")+"."+
    Word("#").setResultsName("decPlaces"))
    leftString = Word("<")
    rightString = Word(">")

    # define parse actions for each - the matched tokens are the third
    # arg to parse actions; parse actions will replace the incoming tokens with
    # value returned from the parse action
    intFormat.setParseAction( lambda s,l,toks: "%%%dd" % len(toks[0]) )
    realFormat.setParseAction( lambda s,l,toks: "%%%d.%df" %
    (len(toks[0]),len(toks.decPlaces)) )
    leftString.setParseAction( lambda s,l,toks: "%%-%ds" % len(toks[0]) )
    rightString.setParseAction( lambda s,l,toks: "%%%ds" % len(toks[0]) )

    # collect all formatters into a single "grammar"
    # - note reals are checked before ints
    formatters = rightString | leftString | realFormat | intFormat

    # set up our test string, and use transform string to invoke parse actions
    # on any matched tokens
    testString = """
    This is a string with
    ints: #### # ###############
    floats: #####.# ###.###### #.#
    left-justified strings: <<<<<<<< << <
    right-justified strings: >>>>>>>>>> >> >
    int at end of sentence: ####.
    """
    print formatters.transformString( testString )

    -------------------
    Prints:

    This is a string with
    ints: %4d %1d %15d
    floats: %7.1f %10.6f %3.1f
    left-justified strings: %-8s %-2s %-1s
    right-justified strings: %10s %2s %1s
    int at end of sentence: %4d.
    Paul McGuire, Jun 30, 2006
    #3
  4. Steve

    Jim Segrave Guest

    In article <ePapg.6149$>,
    Paul McGuire <._bogus_.com> wrote:

    >Not an re solution, but pyparsing makes for an easy-to-follow program.
    >TransformString only needs to scan through the string once - the
    >"reals-before-ints" testing is factored into the definition of the
    >formatters variable.
    >
    >Pyparsing's project wiki is at http://pyparsing.wikispaces.com.


    If fails for floats specified as ###. or .###, it outputs an integer
    format and the decimal point separately. It also ignores \# which
    should prevent the '#' from being included in a format.



    --
    Jim Segrave ()
    Jim Segrave, Jun 30, 2006
    #4
  5. Steve

    Paul McGuire Guest

    "Jim Segrave" <> wrote in message
    news:...
    >
    > If fails for floats specified as ###. or .###, it outputs an integer
    > format and the decimal point separately. It also ignores \# which
    > should prevent the '#' from being included in a format.
    >


    True. What is the spec for these formatting strings, anyway? I Googled a
    while, and it does not appear that this is really a Perl string formatting
    technique, despite the OP's comments to the contrary. And I'm afraid my
    limited Regex knowledge leaves the OP's example impenetrable to me. I got
    lost among the '\'s and parens.

    I actually thought that "###." was *not* intended to be floating point, but
    instead represented an integer before a sentence-ending period. You do have
    to be careful of making *both* leading and trailing digits optional, or else
    simple sentence punctuating periods will get converted to "%1f"!

    As for *ignoring* "\#", it would seem to me we would rather convert this to
    "#", since "#" shouldn't be escaped in normal string interpolation.

    The following modified version adds handling for "\#", "\<" and "\>", and
    real numbers with no integer part. The resulting program isn't radically
    different from the first version. (I've highlighted the changes with "<==="
    marks.)

    -- Paul

    ------------------
    from pyparsing import Combine,Word,Optional,Regex

    """
    read Perl-style formatting placeholders and replace with
    proper %x string interp formatters

    ###### -> %6d
    ##.### -> %6.3f
    <<<<< -> %-5s
    >>>>> -> %5s


    """

    # set up patterns to be matched
    # (note use of results name in realFormat, for easy access to
    # decimal places substring)
    intFormat = Word("#")
    realFormat = Combine(Optional(Word("#"))+"."+ # <===
    Word("#").setResultsName("decPlaces"))
    leftString = Word("<")
    rightString = Word(">")
    escapedChar = Regex(r"\\[#<>]") # <===

    # define parse actions for each - the matched tokens are the third
    # arg to parse actions; parse actions will replace the incoming tokens with
    # value returned from the parse action
    intFormat.setParseAction( lambda s,l,toks: "%%%dd" % len(toks[0]) )
    realFormat.setParseAction( lambda s,l,toks: "%%%d.%df" %
    (len(toks[0]),len(toks.decPlaces)) )
    leftString.setParseAction( lambda s,l,toks: "%%-%ds" % len(toks[0]) )
    rightString.setParseAction( lambda s,l,toks: "%%%ds" % len(toks[0]) )
    escapedChar.setParseAction( lambda s,l,toks: toks[0][1] ) #
    <===

    # collect all formatters into a single "grammar"
    # - note reals are checked before ints
    formatters = rightString | leftString | realFormat | intFormat | escapedChar
    # <===

    # set up our test string, and use transform string to invoke parse actions
    # on any matched tokens
    testString = r"""
    This is a string with
    ints: #### # ###############
    floats: #####.# ###.###### #.# .###
    left-justified strings: <<<<<<<< << <
    right-justified strings: >>>>>>>>>> >> >
    int at end of sentence: ####.
    I want \##, please.
    """

    print testString
    print formatters.transformString( testString )

    ------------------
    Prints:

    This is a string with
    ints: #### # ###############
    floats: #####.# ###.###### #.# .###
    left-justified strings: <<<<<<<< << <
    right-justified strings: >>>>>>>>>> >> >
    int at end of sentence: ####.
    I want \##, please.


    This is a string with
    ints: %4d %1d %15d
    floats: %7.1f %10.6f %3.1f %4.3f
    left-justified strings: %-8s %-2s %-1s
    right-justified strings: %10s %2s %1s
    int at end of sentence: %4d.
    I want #%1d, please.
    Paul McGuire, Jun 30, 2006
    #5
  6. Steve

    Paul McGuire Guest

    "Jim Segrave" <> wrote in message
    news:...
    > In article <ePapg.6149$>,
    > Paul McGuire <._bogus_.com> wrote:
    >
    > >Not an re solution, but pyparsing makes for an easy-to-follow program.
    > >TransformString only needs to scan through the string once - the
    > >"reals-before-ints" testing is factored into the definition of the
    > >formatters variable.
    > >
    > >Pyparsing's project wiki is at http://pyparsing.wikispaces.com.

    >
    > If fails for floats specified as ###. or .###, it outputs an integer
    > format and the decimal point separately. It also ignores \# which
    > should prevent the '#' from being included in a format.
    >

    Ah! This may be making some sense to me now. Here are the OP's original
    re's for matching.

    exponentPattern = regex.compile('\(^\|[^\\#]\)\(#+\.#+\*\*\*\*\)')
    floatPattern = regex.compile('\(^\|[^\\#]\)\(#+\.#+\)')
    integerPattern = regex.compile('\(^\|[^\\#]\)\(##+\)')
    leftJustifiedStringPattern = regex.compile('\(^\|[^\\<]\)\(<<+\)')
    rightJustifiedStringPattern = regex.compile('\(^\|[^\\>]\)\(>>+\)')

    Each re seems to have two parts to it. The leading parts appear to be
    guards against escaped #, <, or > characters, yes? The second part of each
    re shows the actual pattern to be matched. If so:

    It seems that we *don't* want "###." or ".###" to be recognized as floats,
    floatPattern requires at least one "#" character on either side of the ".".
    Also note that single #, <, and > characters don't seem to be desired, but
    at least two or more are required for matching. Pyparsing's Word class
    accepts an optional min=2 constructor argument if this really is the case.
    And it also seems that the pattern is supposed to be enclosed in ()'s. This
    seems especially odd to me, since one of the main points of this funky
    format seems to be to set up formatting that preserves column alignment of
    text, as if creating a tabular output - enclosing ()'s just junks this up.

    My example also omitted the exponent pattern. This can be handled with
    another expression like realFormat, but with the trailing "****" characters.
    Be sure to insert this expression before realFormat in the list of
    formatters.

    I may be completely off in my re interpretation. Perhaps one of the re
    experts here can explain better what the OP's re's are all about. Can
    anybody locate/cite the actual spec for this formatting, um, format?

    -- Paul
    Paul McGuire, Jun 30, 2006
    #6
  7. Steve

    Paul McGuire Guest

    "Jim Segrave" <> wrote in message
    news:...
    > If fails for floats specified as ###. or .###, it outputs an integer
    > format and the decimal point separately. It also ignores \# which
    > should prevent the '#' from being included in a format.
    >


    Here's a little more study on this (all tests are using Python 2.4.1):

    If floats are specified as "###.", should we generate "%4.0f" as the result?
    In fact, to get 3 leading places and a trailing decimal point, when 0
    decimal places are desired, should be formatted with "%3.0f." - we have to
    explicitly put in the trailing '.' character.
    >>> print ">%1.0f<" % 10.00001

    >10<
    >>> print ">%2.0f<" % 10.00001

    >10<
    >>> print ">%3.0f<" % 10.00001

    > 10<
    >>> print ">%3.0f.<" % 10.00001

    > 10.<

    But as we see below, if the precision field is not zero, the initial width
    consumes one character for the decimal point. If the precision field *is*
    zero, then the entire width is used for the integer part of the value, with
    no trailing decimal point.

    ".###" almost makes no sense. There is no floating point format that
    suppresses the leading '0' before the decimal point.
    >>> print ">%1.2f<" % 0.00001

    >0.00<
    >>> print ">%2.2f<" % 0.00001

    >0.00<
    >>> print ">%3.2f<" % 0.00001

    >0.00<
    >>> print ">%4.2f<" % 0.00001

    >0.00<
    >>> print ">%5.2f<" % 0.00001

    > 0.00<


    Using the %f with a nonzero precision field, will always output at least the
    number of decimal places, plus the decimal point and leading '0' if number
    is less than 1.

    This whole discussion so far has also ignore negative values, again, we
    should really look more into the spec for this formatting scheme, rather
    than try to read the OP's mind.

    -- Paul
    Paul McGuire, Jun 30, 2006
    #7
  8. Steve

    Jim Segrave Guest

    In article <UCdpg.7174$>,
    Paul McGuire <._bogus_.com> wrote:
    >"Jim Segrave" <> wrote in message
    >news:...
    >> In article <ePapg.6149$>,
    >> Paul McGuire <._bogus_.com> wrote:
    >>
    >> >Not an re solution, but pyparsing makes for an easy-to-follow program.
    >> >TransformString only needs to scan through the string once - the
    >> >"reals-before-ints" testing is factored into the definition of the
    >> >formatters variable.
    >> >
    >> >Pyparsing's project wiki is at http://pyparsing.wikispaces.com.

    >>
    >> If fails for floats specified as ###. or .###, it outputs an integer
    >> format and the decimal point separately. It also ignores \# which
    >> should prevent the '#' from being included in a format.
    >>

    >Ah! This may be making some sense to me now. Here are the OP's original
    >re's for matching.
    >
    >exponentPattern = regex.compile('\(^\|[^\\#]\)\(#+\.#+\*\*\*\*\)')
    >floatPattern = regex.compile('\(^\|[^\\#]\)\(#+\.#+\)')
    >integerPattern = regex.compile('\(^\|[^\\#]\)\(##+\)')
    >leftJustifiedStringPattern = regex.compile('\(^\|[^\\<]\)\(<<+\)')
    >rightJustifiedStringPattern = regex.compile('\(^\|[^\\>]\)\(>>+\)')
    >
    >Each re seems to have two parts to it. The leading parts appear to be
    >guards against escaped #, <, or > characters, yes? The second part of each
    >re shows the actual pattern to be matched. If so:
    >
    >It seems that we *don't* want "###." or ".###" to be recognized as floats,
    >floatPattern requires at least one "#" character on either side of the ".".
    >Also note that single #, <, and > characters don't seem to be desired, but
    >at least two or more are required for matching. Pyparsing's Word class
    >accepts an optional min=2 constructor argument if this really is the case.
    >And it also seems that the pattern is supposed to be enclosed in ()'s. This
    >seems especially odd to me, since one of the main points of this funky
    >format seems to be to set up formatting that preserves column alignment of
    >text, as if creating a tabular output - enclosing ()'s just junks this up.
    >


    The poster was excluding escaped (with a '\' character, but I've just
    looked up the Perl format statement and in fact fields always begin
    with a '@', and yes having no digits on one side of the decimal point
    is legal. Strings can be left or right justified '@<<<<', '@>>>>', or
    centred '@||||', numerics begin with an @, contain '#' and may contain
    a decimal point. Fields beginning with '^' instead of '@' are omitted
    if the format is a numeric ('#' with/without decimal). I assumed from
    the poster's original patterns that one has to worry about '@', but
    that's incorrect, they need to be present to be a format as opposed to
    ordinary text and there's appears to be no way to embed a '@' in an
    format. It's worth noting that PERL does implicit float to int
    coercion, so it treats @### the same for ints and floats (no decimal
    printed).

    For the grisly details:

    http://perl.com/doc/manual/html/pod/perlform.html

    --
    Jim Segrave ()
    Jim Segrave, Jun 30, 2006
    #8
  9. Steve

    Paul McGuire Guest

    "Jim Segrave" <> wrote in message
    news:...
    <snip>
    > The poster was excluding escaped (with a '\' character, but I've just
    > looked up the Perl format statement and in fact fields always begin
    > with a '@', and yes having no digits on one side of the decimal point
    > is legal. Strings can be left or right justified '@<<<<', '@>>>>', or
    > centred '@||||', numerics begin with an @, contain '#' and may contain
    > a decimal point. Fields beginning with '^' instead of '@' are omitted
    > if the format is a numeric ('#' with/without decimal). I assumed from
    > the poster's original patterns that one has to worry about '@', but
    > that's incorrect, they need to be present to be a format as opposed to
    > ordinary text and there's appears to be no way to embed a '@' in an
    > format. It's worth noting that PERL does implicit float to int
    > coercion, so it treats @### the same for ints and floats (no decimal
    > printed).
    >
    > For the grisly details:
    >
    > http://perl.com/doc/manual/html/pod/perlform.html
    >
    > --
    > Jim Segrave ()
    >


    Ah, wunderbar! Some further thoughts...

    I can see that the OP omitted the concept of "@|||" centering, since the
    Python string interpolation forms only support right or left justified
    fields, and it seems he is trying to do some form of format->string interp
    automation. Adding centering would require not only composing a suitable
    string interp format, but also some sort of pad() operation in the arg
    passed to the string interp operation. I suspect this also rules out simple
    handling of the '^' operator as mentioned in the spec, and likewise for the
    trailing ellipsis if a field is not long enough for the formatted value.

    The '@' itself seems to be part of the field, so "@<<<<" would be a 5
    column, left-justified string. A bare '@' seems to be a single string
    placeholder (meaningless to ask right or left justified :) ), since this is
    used in the doc's hack for including a "@" in the output. (That is, as you
    said, the original spec provides no mechanism for escaping in a '@'
    character, it has to get hacked in as a value dropped into a single
    character field.)

    The Perl docs say that fields that are too long are truncated. This does
    not happen in Python string interps for numeric values, but it can be done
    with strings (using the precision field).
    >>> print "%-10s" % string.ascii_uppercase

    ABCDEFGHIJKLMNOPQRSTUVWXYZ
    >>> print "%-10.10s" % string.ascii_uppercase

    ABCDEFGHIJ

    So if we were to focus on support for "@", "@>>>", "@<<<", "@###" and
    "@###.##" (with and without leading or trailing digits about the decimal)
    style format fields, this shouldn't be overly difficult, and may even meet
    the OP's requirements. (The OP seemed to also want some support for
    something like "@##.###****" for scientific notation, again, not a
    dealbreaker.)

    -- Paul
    Paul McGuire, Jun 30, 2006
    #9
  10. Steve

    Jim Segrave Guest

    In article <R1fpg.6488$>,
    Paul McGuire <._bogus_.com> wrote:
    >"Jim Segrave" <> wrote in message
    >news:...
    >
    >I can see that the OP omitted the concept of "@|||" centering, since the
    >Python string interpolation forms only support right or left justified
    >fields, and it seems he is trying to do some form of format->string interp
    >automation. Adding centering would require not only composing a suitable
    >string interp format, but also some sort of pad() operation in the arg
    >passed to the string interp operation. I suspect this also rules out simple
    >handling of the '^' operator as mentioned in the spec, and likewise for the
    >trailing ellipsis if a field is not long enough for the formatted value.
    >
    >The '@' itself seems to be part of the field, so "@<<<<" would be a 5
    >column, left-justified string. A bare '@' seems to be a single string
    >placeholder (meaningless to ask right or left justified :) ), since this is
    >used in the doc's hack for including a "@" in the output. (That is, as you
    >said, the original spec provides no mechanism for escaping in a '@'
    >character, it has to get hacked in as a value dropped into a single
    >character field.)
    >
    >The Perl docs say that fields that are too long are truncated. This does
    >not happen in Python string interps for numeric values, but it can be done
    >with strings (using the precision field).
    >>>> print "%-10s" % string.ascii_uppercase

    >ABCDEFGHIJKLMNOPQRSTUVWXYZ
    >>>> print "%-10.10s" % string.ascii_uppercase

    >ABCDEFGHIJ
    >
    >So if we were to focus on support for "@", "@>>>", "@<<<", "@###" and
    >"@###.##" (with and without leading or trailing digits about the decimal)
    >style format fields, this shouldn't be overly difficult, and may even meet
    >the OP's requirements. (The OP seemed to also want some support for
    >something like "@##.###****" for scientific notation, again, not a
    >dealbreaker.)


    One would need a much clearer spec on what the OP really wants to do - note
    that` Perl formats have the variable names embeeded as part of the
    format string, so writing a simple Perl->Python converter isn't going
    to work,

    I've given him a good start for an re based solution, you've given one
    for a pyparsing based one, at this point I'd hope the OP can take it
    from there or can come back with more specific questions on how to
    deal with some of the awfulness of the formats he's working with.




    --
    Jim Segrave ()
    Jim Segrave, Jun 30, 2006
    #10
  11. Steve

    Steve Guest

    Hi All!

    Thanks for your suggestions and comments! I was able to use some of
    your code and suggestions and have come up with this new version of
    Report.py.

    Here's the updated code :


    -----------------------------------------------------------------

    #!/usr/bin/env python
    """Provides two classes to create formatted reports.

    The ReportTemplate class reads a template file or string containing a
    fixed format with field tokens and substitutes member values from an
    arbitrary python object.

    The ColumnReportTemplate class takes a string argument to define a
    header and line format for multiple calls with sequence data.

    6/30/2006
    Steve Reiss () - Converted to re module methods

    """

    __author__ = "Robin Friedrich "
    __version__ = "1.0.0"

    import string
    import sys
    import re

    from types import StringType, ListType, TupleType, InstanceType,
    FileType

    #these regex pattern objects are used in the _make_printf function

    exponentPattern = re.compile('\(^\|[^\\#]\)|#+\.#+\*\*\*\*')
    floatPattern = re.compile('\(^\|[^\\#]\)|#+\.#+')
    integerPattern = re.compile("\(^\|[^\\#]\)|\##+")
    leftJustifiedStringPattern = re.compile('\(^\|[^\\<]\)|\<<+')
    rightJustifiedStringPattern = re.compile('\(^\|[^\\>]\)|\>>+')

    ###################################################################
    # _make_printf #
    ###################################################################

    def _make_printf(s):
    """Convert perl style format symbology to printf tokens.

    Take a string and substitute computed printf tokens for perl style
    format symbology.

    For example:

    ###.## yields %6.2f
    ######## yields %8d
    <<<<< yields %-5s
    """
    # print("Original String = %s\n\n") % (s)


    while 1: # process all sci notation fields
    if exponentPattern.search(s) < 0: break
    i1 , i2 = exponentPattern.search(s).span()
    width_total = i2 - i1
    field = s[i1:i2-4]
    width_mantissa = len( field[string.find(field,'.')+1:] )
    f = '%'+`width_total`+'.'+`width_mantissa`+'e'
    s = exponentPattern.sub(f, s, 1)

    while 1: # process all floating pt fields
    if floatPattern.search(s) < 0: break
    i1 , i2 = floatPattern.search(s).span()
    width_total = i2 - i1
    field = s[i1:i2]
    width_mantissa = len( field[string.find(field,'.')+1:] )
    f = '%'+`width_total`+'.'+`width_mantissa`+'f'
    s = floatPattern.sub(f, s, 1)

    while 1: # process all integer fields
    if integerPattern.search(s) < 0: break
    i1 , i2 = integerPattern.search(s).span()
    width_total = i2 - i1
    f = '%'+`width_total`+'d'
    s = integerPattern.sub(f, s, 1)


    while 1: # process all left justified string
    fields
    if leftJustifiedStringPattern.search(s) < 0: break
    i1 , i2 = leftJustifiedStringPattern.search(s).span()
    width_total = i2 - i1
    f = '%-'+`width_total`+'s'
    s = leftJustifiedStringPattern.sub(f, s, 1)

    while 1: # process all right justified
    string fields
    if rightJustifiedStringPattern.search(s) < 0: break
    i1 , i2 = rightJustifiedStringPattern.search(s).span()
    width_total = i2 - i1
    f = '%'+`width_total`+'s'
    s = rightJustifiedStringPattern.sub(f, s, 1)

    s = re.sub('\\\\', ' ', s)
    # print
    # print("printf format = %s") % (s)
    return s


    ###################################################################
    # ReportTemplate #
    ###################################################################

    class ReportTemplate:
    """Provide a print formatting object.

    Defines an object which holds a formatted output template and can
    print values substituted from a data object. The data members from
    another Python object are used to substitute values into the
    template. This template object is initialized from a template
    file or string which employs the formatting technique below. The
    intent is to provide a specification template which preserves
    spacing so that fields can be lined up easily.

    Special symbols are used to identify fields into which values
    are substituted.

    These symbols are:

    ##### for right justified integer

    #.### for fixed point values rounded mantissa

    #.###**** for scientific notation (four asterisks
    required)

    <<<<< for left justified string

    >>>>> for right justified string


    %% is needed in the template to signify a real
    percentage
    symbol

    \# A backslash is used to escape the above ##, <<, >>
    symbols
    if you need to use them outside a field spec.
    The backslash will be removed upon output.

    The total width of the symbol and it's decimal point position is
    used to compute the appropriate printf token; see 'make_printf'
    method. The symbol must have at least two adjacent characters for
    it to be recognized as a field specifier.

    To the right of each line of template body, following a '@@'
    delimiter, is a comma separated list for corresponding variable
    names. Sequence objects are supported. If you place a name of a
    5-tuple for example, there should be five fields specified on the
    left prepared to take those values. Also, individual element or
    slices can be used. The values from these variable names will be
    substituted into their corresponding fields in sequence.

    For example:

    a line of template might look like:
    TGO1 = ####.# VGO = ##.####**** Vehicle: <<<<<<<<<< @@
    t_go,v_go, vname
    and would print like:
    TGO1 = 22.4 VGO = -1.1255e+03 Vehicle: Atlantis
    """
    delimiter = '@@'

    def __init__( self, template = ''):
    self.body = []
    self.vars = []

    #read in and parse a format template
    try:
    tpl = open(template, 'r')
    lines = string.split(tpl.read(), '\n')[:-1]
    tpl.close()
    except IOError:
    lines = string.split(template, '\n')

    self.nrows = len(lines)

    for i in range(self.nrows):
    self.body.append([])
    self.vars.append([])

    for i in range(self.nrows):
    splits = string.split(lines, self.delimiter)
    body = splits[0] # I don't use tuple unpacking here because
    # I don't know if there was indeed a @@ on the line

    if len(splits) > 1 :
    vars = splits[1]
    else:
    vars = ''

    #if body[-1] == '\n':
    #self.body = body[:-1]
    #else:
    self.body = body
    varstrlist = string.split(vars, ',')
    #print i, varstrlist

    for item in varstrlist:
    self.vars.append(string.strip(item))

    #print self.vars
    if len(self.vars) > 0:
    self.body = _make_printf( self.body )
    else:
    print 'Template formatting error, line', i+1

    def __repr__(self):
    return string.join(self.body, '\n')

    def __call__(self, *dataobjs):
    return self._format(dataobjs[0])


    def _format( self, dataobj ):
    """Return the values of the given data object substituted into
    the template format stored in this object.
    """
    # value[] is a list of lists of values from the dataobj
    # body[] is the list of strings with %tokens to print
    # if value == None just print the string without the %
    argument
    s = ''
    value = []

    for i in range(self.nrows):
    value.append([])

    for i in range(self.nrows):
    for vname in self.vars:
    try:
    if string.find(vname, '[') < 0:
    # this is the nominal case and a simple get
    will be faster
    value.append(getattr(dataobj, vname))
    else:
    # I use eval so that I can support sequence
    values
    # although it's slow.
    value.append(eval('dataobj.'+vname))
    except AttributeError, SyntaxError:
    value.append('')

    if value[0] != '':
    try:
    temp_vals = []
    for item in value:
    # items on the list of values for this line
    # can be either literals or lists
    if type(item) == ListType:
    # take each element of the list and tack it
    # onto the printing list
    for element in item:
    temp_vals.append(element)
    else:
    temp_vals.append(item)
    # self.body is the current output line with %
    tokens
    # temp_vals contains the values to be inserted into
    them.
    s = s + (self.body % tuple(temp_vals)) + '\n'
    except TypeError:
    print 'Error on this line. The data value(s) could
    not be formatted as numbers.'
    print 'Check that you are not placing a string
    value into a number field.'
    else:
    s = s + self.body + '\n'
    return s

    def writefile(self, file, dataobj):
    """takes either a pathname or an open file object and a data
    object.
    Instantiates the template with values from the data object
    sending output to the open file.
    """
    if type(file) == StringType:
    fileobj = open(file,'w')
    elif type(file) == FileType:
    fileobj = file
    else:
    raise TypeError, '1st argument must be a pathname or an
    open file object.'
    fileobj.write(self._format(dataobj))
    if type(file) == StringType: fileobj.close()


    ###################################################################
    # isReportTemplate #
    ###################################################################

    def isReportTemplate(obj):
    """Return 1 if obj is an instance of class ReportTemplate.
    """
    if type(obj) == InstanceType and \
    string.find(`obj.__class__` , ' ReportTemplate ') > -1:
    return 1
    else:
    return 0


    ###################################################################
    # ColumnReportTemplate #
    ###################################################################

    class ColumnReportTemplate:
    """This class allows one to specify column oriented output formats.

    The first argument to the constructor is a format string containing
    the header text and a line of field specifier tokens. A line
    containing nothing but dashes, underbars, spaces or tabs is
    detected
    as the separator between these two sections. For example, a format
    string might look like this:

    '''Page &P Date: &M/D/Y Time: &h:m:s
    Time Event Factor A2 Factor B2
    -------- ------------------- ----------- -------------
    ###.#### <<<<<<<<<<<<<<<<<<< ##.###**** ##.######****'''

    The last line will be treated as the format for output data
    contained
    in a four-sequence. This line would (for example) be translated to
    '%8.4f %-19s %10.3e %13.6e' for value substitution.
    In the header text portion one may use special variable tokens
    indicating that runtime values should be substituted into the
    header block. These tokens start with a & character and are
    immediately followed by either a P or a time/date format string.
    In the above example the header contains references to page number,
    current date in month/day/year order, and the current time.
    Today it produced 'Page 2 Date: 10/04/96 Time:
    15:13:28'
    See doc string for now() function for further details.

    An optional second argument is an output file handle to send
    written output to (default is stdout). Keyword arguments may be
    used to tailor the instance. At this time the 'page_length'
    parameter is the only useful one.

    Instances of this class are then used to print out any number of
    records with the write method. The write method argument must be a
    sequence of elements matching the number and data type implied by
    the field specification tokens.

    At the end of a page, a formfeed is output as well as new copy
    of the header text.
    """
    page_length = 50
    lineno = 1
    pageno = 1
    first_write = 1


    def __init__(self, format = '', output = sys.stdout, **kw):
    # print("Original format = ", format)
    self.output = output

    self.header_separator = re.compile('\n[-_\s\t]+\n')
    self.header_token = re.compile('&([^ \n\t]+)')


    for item, value in kw.items():
    setattr(self, item, value)

    try: #
    use try block in case there is NOT a header at all
    result = self.header_separator.search(format).start() #
    NEW separation of header and body from format

    # print("result = ", result)
    HeaderLine = self.header_separator.search(format).group() #
    get the header lines that were matched



    if result > -1: # separate
    the header text from the format

    # print("split = ", self.header_separator.split(format) )
    HeaderPieces = self.header_separator.split(format)
    # print("HeaderPiece[0] = ", HeaderPieces[0])
    # print("HeaderPiece[1] = ", HeaderPieces[1])

    self.header = HeaderPieces[0] + HeaderLine # header text
    PLUS the matched HeaderLine
    self.body = _make_printf(HeaderPieces[1]) # convert the
    format chars to printf expressions

    except :
    self.header = '' # fail block of
    TRY - no headings found - set to blank
    self.body = _make_printf(format) # need to
    process the format


    # print("header = ", self.header)
    # print("body = ", self.body)

    self.header = self.prep_header(self.header) # parse the
    special chars (&Page &M/D/Y &h:m:s) in header
    self.header_len = len(string.split(self.header,'\n'))
    self.max_body_len = self.page_length - self.header_len


    def prep_header(self, header):
    """Substitute the header tokens with a named string printf
    token. """
    start = 0
    new_header = ''
    self.header_values = {}

    # print("original header = %s") % (header)
    HeaderPieces = self.header_token.split(header) # split
    up the header w/ the regular expression

    HeadCount = 0

    for CurrentHeadPiece in HeaderPieces :

    if HeadCount % 2 == 1: # matching
    tokens to the pattern will be in the ODD indexes of Heads[]
    # print("Heads %s = %s") % (HeadCount,CurrentHeadPiece)
    new_header = new_header + '%(' + CurrentHeadPiece +')s'
    self.header_values[CurrentHeadPiece] = 1
    else:
    new_header = new_header + CurrentHeadPiece

    HeadCount = HeadCount + 1


    # print("new header = %s") % (new_header)

    return new_header


    def write(self, seq):
    """Write the given sequence as a record in field format.
    Length of sequence must match the number and data type
    of the field tokens.
    """
    seq = tuple(seq)

    if self.lineno > self.max_body_len or self.first_write:
    self.new_page()
    self.first_write = 0

    self.output.write( self.body % seq + '\n' )
    self.lineno = self.lineno + 1

    def new_page(self):
    """Issue formfeed, substitute current values for header
    variables, then print header text.
    """
    for key in self.header_values.keys():
    if key == 'P':
    self.header_values[key] = self.pageno
    else:
    self.header_values[key] = now(key)

    header = self.header % self.header_values
    self.output.write('\f'+ header +'\n')
    self.lineno = 1
    self.pageno = self.pageno + 1

    def isColumnReportTemplate(obj):
    """Return 1 if obj is an instance of class ColumnReportTemplate.
    """
    if type(obj) == InstanceType and \
    string.find(`obj.__class__` , ' ColumnReportTemplate ') > -1:
    return 1
    else:
    return 0

    ###################################################################
    # now - return date and/or time value #
    ###################################################################

    def now(code='M/D/Y'):
    """Function returning a formatted string representing the current
    date and/or time. Input arg is a string using code letters to
    represent date/time components.

    Code Letter Expands to
    D Day of month
    M Month (two digit)
    Y Year (two digit)
    h hour (two digit 24-hour clock)
    m minutes
    s seconds

    Other characters such as '/' ':' '_' '-' and ' ' are carried
    through
    as is and can be used as separators.
    """
    import time
    T = {}
    T['year'], T['month'], T['dom'], T['hour'], T['min'], T['sec'], \
    T['dow'], T['day'], T['dst'] =
    time.localtime(time.time())
    T['yr'] = repr(T['year'])[-2:]
    formatstring = ''

    tokens = {'D':'%(dom)02d', 'M':'%(month)02d', 'Y':'%(yr)02s',
    'h':'%(hour)02d', 'm':'%(min)02d', 's':'%(sec)02d',
    '/':'/', ':':':', '-':'-', ' ':' ' , '_':'_', ';':';',
    '^':'^'}

    for char in code:
    formatstring = formatstring + tokens[char]

    return formatstring % T


    ###################################################################
    # test_Rt - Test Report Template #
    ###################################################################

    def test_RT():

    template_string = """
    --------------------------------------------------
    Date <<<<<<<<<<<<<<<<<<<<<<<<<<< Time >>>>>>> @@ date,
    time

    Input File : <<<<<<<<<<<<<<<<<<<<< @@ file[0]
    Output File : <<<<<<<<<<<<<<<<<<<<< @@ file[1]
    Corr. Coeff : ##.########**** StdDev : ##.### @@ coeff,
    deviation
    Fraction Breakdown : ###.# %% Run :\# ### @@ brkdwn,
    runno
    Passed In Value : ### @@ invalue
    --------------------------------------------------
    """
    class Data:

    def __init__(self, InValue):
    # self.date = "September 12, 1998"
    self.date = now()
    # self.time = "18:22:00"
    self.time = now('h:m:s') #datetime.time()

    self.file = ['TX2667-AE0.dat', 'TX2667-DL0.dat']
    self.coeff = -3.4655102872e-05
    self.deviation = 0.4018
    self.runno = 56 + InValue
    self.brkdwn = 43.11
    self.invalue = InValue

    Report = ReportTemplate(template_string)

    for i in range(2):
    D = Data(i)
    print Report(D)


    ###################################################################
    # test_Rt_file - Test Report Template from file #
    ###################################################################

    def test_RT_file():

    template_string ='ReportFormat1.txt' # filename of report format

    class Data:

    def __init__(self, InValue):
    self.date = now()
    self.time = now('h:m:s') #datetime.time()
    self.file = ['TX2667-AE0.dat', 'TX2667-DL0.dat']
    self.coeff = -3.4655102872e-05
    self.deviation = 0.4018
    self.runno = 56 + InValue
    self.brkdwn = 43.11
    self.invalue = InValue

    Report = ReportTemplate(template_string)

    for i in range(2):
    D = Data(i)
    print Report(D)

    ###################################################################
    # test_CRT - Test Column Report Template #
    ###################################################################

    def test_CRT():

    print
    print
    print "test_CRT()"
    print

    format='''
    Page &P Date: &M/D/Y Time: &h:m:s
    Test Column Report 1

    Time Event Factor A2 Factor B2
    -------- ------------------- ----------- -------------
    ####.### <<<<<<<<<<<<<<<<<<< ##.###**** ##.######****'''

    data = [12.225, 'Aftershock', 0.5419, 144.8]
    report = ColumnReportTemplate( format, page_length=15 )

    for i in range(0,200,10):
    if i > 0 :
    data = [data[0]+i, data[1], data[2]/i*10., data[3]*i/20.]
    report.write( data )

    ###################################################################
    # test_CRT2 - Test Column Report Template #
    ###################################################################

    def test_CRT2():

    print
    print
    print "test_CRT2()"
    print

    format='''
    Page &P Date: &M/D/Y Time: &h:m:s
    Test Column Report 2

    I ID City Factor A2 Factor B2
    --- ------ ------------------- ----------- -------------[color=darkred]
    >>> #### <<<<<<<<<<<<<<<<<<< #####.## >>>>>>>'''[/color]


    data = [0, 5, 'Mt. View', 541, 144.2]
    report = ColumnReportTemplate( format, page_length=15 )

    for i in range(0,201,10):
    data = [i, data[1]+i, data[2], data[3] + (i*10), data[4] + (i *
    20)]
    report.write( data )

    ###################################################################
    # test_CRT3 - Test Column Report Template - no header chars #
    ###################################################################

    def test_CRT3():

    print
    print
    print "test_CRT3()"
    print

    format='''
    Test Column Report 3

    I ID City Factor A2 Factor B2
    --- ------ ------------------- ----------- -------------
    [color=darkred]
    >>> #### <<<<<<<<<<<<<<<<<<< #####.## <<<<<<<<<<'''[/color]

    #--- ------ ------------------- ----------- -------------

    data = [0, 5, 'Santa Cruz', 541, 144.2]
    report = ColumnReportTemplate( format, page_length=15 )

    for i in range(0,201,10):
    data = [i, data[1]+i, data[2], data[3] + (i*10), data[4] + (i *
    20)]
    report.write( data )

    ###################################################################
    # test_CRT4 - Test Column Report Template - no header at all #
    ###################################################################

    def test_CRT4():

    print
    print
    print "test_CRT4()"
    print

    format='''>>> #### <<<<<<<<<<<<<<<<<<< #####.##
    #####.##'''

    data = [0, 5, 'Santa Cruz', 541, 144.2]
    report = ColumnReportTemplate( format, page_length=50 )

    for i in range(0,201,10):
    data = [i, data[1]+i, data[2], data[3] + (i*10), data[4] + (i *
    20)]
    report.write( data )

    ###################################################################
    ############# M A I N ###########################
    ###################################################################

    def Main():

    print "\n\nTesting this module.\n\n"

    TheHeading = '''
    simple heading \#
    r-just int fixed point sci-notation left-just string
    right-just string
    ##### #.### #.###**** <<<<< >>>>>'''

    print
    print " Make printf Test : "
    print _make_printf(TheHeading)
    print
    print


    test_RT()
    test_CRT()

    print
    test_RT_file()
    print

    test_CRT2()
    test_CRT3()
    test_CRT4()

    print
    print "Current Date & time = ", now('M-D-Y h:m:s')


    if __name__ == "__main__":
    Main()
    Steve, Jun 30, 2006
    #11
  12. Steve

    Jim Segrave Guest

    In article <>,
    Steve <> wrote:
    >Hi All!
    >
    >Thanks for your suggestions and comments! I was able to use some of
    >your code and suggestions and have come up with this new version of
    >Report.py.
    >
    >Here's the updated code :
    >
    >exponentPattern = re.compile('\(^\|[^\\#]\)|#+\.#+\*\*\*\*')
    >floatPattern = re.compile('\(^\|[^\\#]\)|#+\.#+')
    >integerPattern = re.compile("\(^\|[^\\#]\)|\##+")
    >leftJustifiedStringPattern = re.compile('\(^\|[^\\<]\)|\<<+')
    >rightJustifiedStringPattern = re.compile('\(^\|[^\\>]\)|\>>+')



    Some comments and suggestions

    If you want to include backslashes in a string, either
    use raw strings, or use double backslashes (raw strings are much
    easier to read). Otherwise, you have an accident waiting to happen -
    '\(' _does_ make a two character string as you desired, but '\v' makes
    a one character string, which is unlikely to be what you wanted.

    That's a stylistic point. But more serious is the leading part of all
    your regexes - '\(^\|[^\\#]\)|'. I'm not sure what you're trying to
    accomplish - presumably to skip over escaped format characters, but
    that's not what they do.

    \( says to look for an open parenthesis (you've escaped it, so it's
    not a grouping character. ^ says look at the start of the line. This
    means the first character can never match an open parens, so this
    entire term, up to the alternate expression (after the non-escaped
    pipe symbol) never matches anything. If you want to ignore escaped
    formating characters before a format, then you should use a negative
    lookbehind assertation (see the library reference, 4.2.1, Regular
    Expression Syntax:

    '(?<!\\)'

    This says that the match to the format can't start
    immediately after a backslash. You need to make a final
    pass over your format to remove the extra backslashes, otherwise they
    will appear in the output, which you do, but you replace them with
    spaces, which may not be the right thing to do - how could you output
    a line of like '################' as part of a template?

    Other odds and ends comments interleaved here

    >###################################################################
    ># _make_printf #
    >###################################################################
    >
    >def _make_printf(s):
    > """Convert perl style format symbology to printf tokens.
    >
    > Take a string and substitute computed printf tokens for perl style
    > format symbology.
    >
    > For example:
    >
    > ###.## yields %6.2f
    > ######## yields %8d
    > <<<<< yields %-5s
    > """
    ># print("Original String = %s\n\n") % (s)
    >
    >
    > while 1: # process all sci notation fields
    > if exponentPattern.search(s) < 0: break
    > i1 , i2 = exponentPattern.search(s).span()
    > width_total = i2 - i1
    > field = s[i1:i2-4]
    > width_mantissa = len( field[string.find(field,'.')+1:] )
    > f = '%'+`width_total`+'.'+`width_mantissa`+'e'
    > s = exponentPattern.sub(f, s, 1)


    There are better ways to examine a match than with span() - consider
    using grouping in your regex to get the mantissa width and the total
    width, use a regex with grouping like this:

    If:

    exponentPattern = re.compile(r'(?<!\\)(#+\.(#+)\*\*\*\*'))

    then you could do this:
    m = re.match(exponentPattern, s)
    if m:
    s = exponentPattern.sub("%%%d.%de" % (len(m.groups()[0],
    len(m.groups()[1]), s, 1)

    m.groups()[0] will be the entire '#+\.#\*\*\'*\*') match, in other
    words the field width to be printed

    m.groups()[1] will be the string after the decimal point, not
    inclouding the '*'s

    In my opinion, building the string by using the sprintf like '%'
    formatting operator, rather than adding together a bunch of substrings
    is easier to read and maintain.

    Similar use of grouping can be done for the other format string types.


    > s = re.sub('\\\\', ' ', s)
    > return s


    As I noted, should backslashes be converted to spaces? And again, it's
    easier to type and read if it uses raw strings:

    s = re.sub(r'\\', ' ', s)


    >###################################################################
    ># ReportTemplate #
    >###################################################################
    >
    >class ReportTemplate:
    > """Provide a print formatting object.

    [Snip]
    > The total width of the symbol and it's decimal point position is
    > used to compute the appropriate printf token; see 'make_printf'
    > method. The symbol must have at least two adjacent characters for


    A minor nit - make_printf is better described as a function, not a
    method (it's not part of the ReportTemplate class or any other cleass)

    [SNIP]

    > def __init__( self, template = ''):
    > self.body = []
    > self.vars = []
    >
    > #read in and parse a format template
    > try:
    > tpl = open(template, 'r')
    > lines = string.split(tpl.read(), '\n')[:-1]


    You'd be better off to use something like:

    lines = []
    for l in open(template, 'r').readlines():
    lines.append(l.rstrip)

    The use of rstrip discards any trailing whitespace and deals with
    reading Windows generated files on a Unix box, where lines will end in
    CRLF and you'd strip only the LF

    > except IOError:
    > lines = string.split(template, '\n')


    I have my doubts about the advisability of assuming that you are
    either passed the name of a file containing a template or a
    template itself. A misspelled file name won't raise
    an error, it will simply be processed as a fixed output. I would have
    passed a flag to say if the template argument was file name or a
    template and terminated with an error if it was a non-existant file.

    [SNIP]

    > def _format( self, dataobj ):
    > """Return the values of the given data object substituted into
    > the template format stored in this object.
    > """
    > # value[] is a list of lists of values from the dataobj
    > # body[] is the list of strings with %tokens to print
    > # if value == None just print the string without the %
    >argument
    > s = ''
    > value = []
    >
    > for i in range(self.nrows):
    > value.append([])
    >
    > for i in range(self.nrows):
    > for vname in self.vars:
    > try:
    > if string.find(vname, '[') < 0:
    > # this is the nominal case and a simple get
    >will be faster
    > value.append(getattr(dataobj, vname))
    > else:
    > # I use eval so that I can support sequence
    >values
    > # although it's slow.
    > value.append(eval('dataobj.'+vname))
    > except AttributeError, SyntaxError:
    > value.append('')


    There's another way to do this - use getattr to retrieve the sequence,
    then use __getitem__ to index it

    Something like this would work, again using a regex to separate out
    the index (you might want the regex compiled once at __init__
    time). The regex looks for a run of characters valid in a python
    variable name followed by an optional integer (with or without sign)
    index in square brackets. m.groups()[0] is the variable name portion,
    m.grousp()[1] is the index if there's a subscripting term.

    try:
    m = re.match(r'([a-zA-Z_][a-zA-Z0-9._]*)\s*(?:\[\s*([+-]?\d+)\s*\])?\s*',
    value)
    if not m: raise SyntaxError
    if m.groups()[1] is None:
    value.append(getattr(dataobj, vname))
    else:
    value.append(getattr(m.groups()[0]).\
    __getitem__(int(m.groups()[1])))
    except AttributeError, SyntaxError, IndexError:
    > value.append('')


    This is a bit ugly, but avoids eval with all the dangers it carries -
    a deliberately hacked template file can be used to do a lot of damage,
    a badly written one could be hard to debug

    > if value[0] != '':
    > try:
    > temp_vals = []
    > for item in value:
    > # items on the list of values for this line
    > # can be either literals or lists
    > if type(item) == ListType:


    Might you be better off asking if item has a __getitem__? It would
    then work with tuples and Extending to dictionaries would then be easier

    >def isReportTemplate(obj):
    > """Return 1 if obj is an instance of class ReportTemplate.
    > """
    > if type(obj) == InstanceType and \
    > string.find(`obj.__class__` , ' ReportTemplate ') > -1:
    > return 1
    > else:
    > return 0


    Why not just use isinstance(obj, classname)?

    >###################################################################
    ># ColumnReportTemplate #
    >###################################################################
    >
    >class ColumnReportTemplate:
    > """This class allows one to specify column oriented output formats.
    >



    --
    Jim Segrave ()
    Jim Segrave, Jul 1, 2006
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. VSK
    Replies:
    2
    Views:
    2,286
  2. wolverine
    Replies:
    2
    Views:
    1,191
    David Harmon
    Aug 30, 2006
  3. joes
    Replies:
    2
    Views:
    1,005
    Daniel Pitts
    May 25, 2007
  4. Replies:
    3
    Views:
    753
    Reedick, Andrew
    Jul 1, 2008
  5. grocery_stocker
    Replies:
    20
    Views:
    320
Loading...

Share This Page