Is there a maximum size to a Python program?

Discussion in 'Python' started by Carbon Man, Apr 27, 2009.

  1. Carbon Man

    Carbon Man Guest

    I have a program that is generated from a generic process. It's job is to
    check to see whether records (replicated from another system) exist in a
    local table, and if it doesn't, to add them. I have 1 of these programs for
    every table in the database. Everything works well until I do the postcode
    table. The generated code is 5MB for a system with no current data. Normally
    the file would not be this big as only the changes are copied over. Python
    just quits, I have tried stepping through the code in the debugger but it
    doesn't even start.
    I am thinking that dynamically generating the programs to run might not be
    such a good idea. It would be a shame to drop it because the system needs to
    be generic and it runs from an XML file so the resulting code could be
    pretty complex, and I am new to Python. The program did generate a pyc so it
    was able to compile.
    Thoughts anyone?
     
    Carbon Man, Apr 27, 2009
    #1
    1. Advertising

  2. On Mon, 27 Apr 2009 13:01:22 +1000, Carbon Man wrote:

    > I have a program that is generated from a generic process. It's job is
    > to check to see whether records (replicated from another system) exist
    > in a local table, and if it doesn't, to add them. I have 1 of these
    > programs for every table in the database.



    Sounds like a needlessly complicated way of doing things. Surely
    replicating data from a database is a solved problem?


    > Everything works well until I
    > do the postcode table. The generated code is 5MB for a system with no
    > current data.


    I don't understand what you mean by "with no current data". Do you mean
    that you generate 5MG of source code when the original postcode table is
    empty? Or 5MB of source code when the *replicated* table is empty? Or
    something else?


    > Normally the file would not be this big as only the
    > changes are copied over. Python just quits, I have tried stepping
    > through the code in the debugger but it doesn't even start.


    That depends on how your generated code looks.

    Some years ago, somebody posted here complaining that his generated code
    wasn't working. It turned out that he was generating a single function
    looking something vaguely like this:

    def function(alist):
    alist[0] = 0
    alist[1] = 0
    alist[2] = 0
    alist[3] = 0
    ...
    alist[29999997] = 0
    alist[29999998] = 0
    alist[29999999] = 0
    return alist

    Naturally, Python would need to load all 30 million lines of code into a
    *single* code object, which was too large to load.

    After changing the generated code to look something like this:

    def function(alist):
    for i in xrange(30000000):
    alist = 0
    return alist

    it worked perfectly.

    (I am a little fuzzy on the precise details, and googling is not helping,
    so please excuse any discrepancies between my memory of the thread and
    the actual reality.)



    > I am thinking that dynamically generating the programs to run might not
    > be such a good idea. It would be a shame to drop it because the system
    > needs to be generic and it runs from an XML file so the resulting code
    > could be pretty complex, and I am new to Python. The program did
    > generate a pyc so it was able to compile.
    > Thoughts anyone?


    Perhaps you could explain what the 5MB of code is supposed to do it, and
    how you generate the code, and show a *small* number of sample lines.




    --
    Steven
     
    Steven D'Aprano, Apr 27, 2009
    #2
    1. Advertising

  3. Carbon Man

    Paul Hemans Guest

    Thanks for the reply,

    > Sounds like a needlessly complicated way of doing things. Surely
    > replicating data from a database is a solved problem?

    The data is coming from a legacy system and going into SQLalchemy to give
    database independence. We will probably update the legacy apps in the future
    but it is a massive job for little benefit.

    > I don't understand what you mean by "with no current data". Do you mean
    > that you generate 5MG of source code when the original postcode table is
    > empty? Or 5MB of source code when the *replicated* table is empty? Or
    > something else?

    5MB of source code when the replicated table is empty

    > Perhaps you could explain what the 5MB of code is supposed to do it, and
    > how you generate the code, and show a *small* number of sample lines.

    Here is a generated file that works. Other files are essentially the same
    but with more entries. This one only adds if the record does not exist, the
    next stage was to update an existing record with changes.

    !/usr/bin/python
    # -*- coding: utf-8 -*-
    import schema
    import time
    from datetime import *
    from sqlalchemy import *
    from sqlalchemy.orm import *
    engine = create_engine('sqlite:///tutorial.db', echo=False)
    Session = sessionmaker(bind=engine)
    session = Session()

    entry = schema.BILLS()
    exists = session.query(schema.BILLS).filter(schema.BILLS.REFNO==u"1")
    if exists.count == 0:
    entry.REFNO = u"1"
    entry.THRDPTY = u"""C5"""
    entry.AMOUNT = 0
    entry.TIMESTAMP = 0
    entry.ALLOCATED = 0
    session.add(entry)

    session.commit()
    session.close()
     
    Paul Hemans, Apr 27, 2009
    #3
  4. Carbon Man

    John Machin Guest

    On Apr 27, 1:01 pm, "Carbon Man" <> wrote:
    > I have a program that is generated from a generic process. It's job is to
    > check to see whether records (replicated from another system) exist in a
    > local table, and if it doesn't, to add them. I have 1 of these programs for
    > every table in the database. Everything works well until I do the postcode
    > table. The generated code is 5MB for a system with no current data. Normally
    > the file would not be this big as only the changes are copied over. Python
    > just quits, I have tried stepping through the code in the debugger but it
    > doesn't even start.
    > I am thinking that dynamically generating the programs to run might not be
    > such a good idea.


    I tend to agree with that line of thinking. What you need is:
    (a) ONE program. That's not one per table, that's one and only one,
    period/full-stop.
    (b) Per-table config info like column names and types -- you have this
    already.

    The idea is that instead of reading a data file and generating SQL per
    row, you generate SQL once, with parameter markers for the data
    values. Then you read the data file and execute the prepared SQL for
    each row

    More detail:

    At the start, build an INSERT statement that's tailored for the table
    that you are updating:

    insert_sql = "INSERT INTO table_name VALUES (?,?,?,?)"
    or (if you are paranoid)
    "INSERT INTO table_name(col1,col2,col3,col4) VALUES (?,?,?,?)"

    Notes: (a) assumes four columns as an example (b) Lower-case stuff has
    to be replaced with the correct table name and column name(s) (c) "?"
    is the parameter marker for the DB module you are using

    For each row read from the data file, transform the input into a tuple
    of suitable data types, and then do the insert:
    import yourdbmodule
    # much later:
    try:
    cursor.execute(insert_sql, data_tuple)
    except yourdbmodule.IntegrityError: #
    if it's the primary key not unique exception:
    pass # ignoring this is part of your requirement
    else:
    raise # probably never happen, but should be checked

    > It would be a shame to drop it because the system needs to
    > be generic


    OTOH it would be a shame to continue with a process where the amount
    of code lying around is O(N_update_rows) instead of O(1) and the
    number of programs is O(N_tables) instead of O(1).

    > and it runs from an XML file so the resulting code could be
    > pretty complex, and I am new to Python.


    The above advice is in general language-agnostic; I'm just using
    Python code because you asked in this newsgroup. The fact that other
    languages may give you more rope with which to hang yourself is also
    immaterial :)

    HTH,
    John
     
    John Machin, Apr 27, 2009
    #4
  5. Carbon Man

    John Machin Guest

    On Apr 27, 2:34 pm, "Paul Hemans" <> wrote:

    [snip]

    > exists = session.query(schema.BILLS).filter(schema.BILLS.REFNO==u"1")
    > if exists.count == 0:
    >     entry.REFNO = u"1"
    >     entry.THRDPTY = u"""C5"""
    >     entry.AMOUNT = 0
    >     entry.TIMESTAMP = 0
    >     entry.ALLOCATED = 0
    >     session.add(entry)


    á¼Î´Î¬ÎºÏυσεν ὠἸησοῦς
     
    John Machin, Apr 27, 2009
    #5
  6. Carbon Man

    Paul Hemans Guest

    Thanks John, I understand where you are coming from and will try and digest
    it all. One problem though that I didn't mention in my original posting was
    that the replication may only require updating one or more fields, that is a
    problem with a generating a single SQL statement to cover all requests. I am
    having a look at eval and exec to see if they will give me a little more
    flexibility in runtime generation of the code.

    "John Machin" <> wrote in message
    news:...
    On Apr 27, 1:01 pm, "Carbon Man" <> wrote:
    > I have a program that is generated from a generic process. It's job is to
    > check to see whether records (replicated from another system) exist in a
    > local table, and if it doesn't, to add them. I have 1 of these programs
    > for
    > every table in the database. Everything works well until I do the postcode
    > table. The generated code is 5MB for a system with no current data.
    > Normally
    > the file would not be this big as only the changes are copied over. Python
    > just quits, I have tried stepping through the code in the debugger but it
    > doesn't even start.
    > I am thinking that dynamically generating the programs to run might not be
    > such a good idea.


    I tend to agree with that line of thinking. What you need is:
    (a) ONE program. That's not one per table, that's one and only one,
    period/full-stop.
    (b) Per-table config info like column names and types -- you have this
    already.

    The idea is that instead of reading a data file and generating SQL per
    row, you generate SQL once, with parameter markers for the data
    values. Then you read the data file and execute the prepared SQL for
    each row

    More detail:

    At the start, build an INSERT statement that's tailored for the table
    that you are updating:

    insert_sql = "INSERT INTO table_name VALUES (?,?,?,?)"
    or (if you are paranoid)
    "INSERT INTO table_name(col1,col2,col3,col4) VALUES (?,?,?,?)"

    Notes: (a) assumes four columns as an example (b) Lower-case stuff has
    to be replaced with the correct table name and column name(s) (c) "?"
    is the parameter marker for the DB module you are using

    For each row read from the data file, transform the input into a tuple
    of suitable data types, and then do the insert:
    import yourdbmodule
    # much later:
    try:
    cursor.execute(insert_sql, data_tuple)
    except yourdbmodule.IntegrityError: #
    if it's the primary key not unique exception:
    pass # ignoring this is part of your requirement
    else:
    raise # probably never happen, but should be checked

    > It would be a shame to drop it because the system needs to
    > be generic


    OTOH it would be a shame to continue with a process where the amount
    of code lying around is O(N_update_rows) instead of O(1) and the
    number of programs is O(N_tables) instead of O(1).

    > and it runs from an XML file so the resulting code could be
    > pretty complex, and I am new to Python.


    The above advice is in general language-agnostic; I'm just using
    Python code because you asked in this newsgroup. The fact that other
    languages may give you more rope with which to hang yourself is also
    immaterial :)

    HTH,
    John
     
    Paul Hemans, Apr 27, 2009
    #6
  7. On Monday 27 April 2009 05:01:22 Carbon Man wrote:
    > I have a program that is generated from a generic process. It's job is to
    > check to see whether records (replicated from another system) exist in a
    > local table, and if it doesn't, to add them. I have 1 of these programs for
    > every table in the database. Everything works well until I do the postcode
    > table. The generated code is 5MB for a system with no current data.
    > Normally the file would not be this big as only the changes are copied
    > over. Python just quits, I have tried stepping through the code in the
    > debugger but it doesn't even start.
    > I am thinking that dynamically generating the programs to run might not

    be
    > such a good idea. It would be a shame to drop it because the system

    needs
    > to be generic and it runs from an XML file so the resulting code could be
    > pretty complex, and I am new to Python. The program did generate a pyc

    so
    > it was able to compile.
    > Thoughts anyone?


    I agree with what most people here have said: don't generate Python code,
    it's a bad idea. Put your data in CSV files instead (one per table, named
    after the table, for instantce). You might want to either have a separate file
    with metadata (column names, etc...) or have the first line be a header:

    "<col1_name:col1_type>","<col2_name:col2_type>"...
    "<data_for_col1>","<data_for_col2>",...
    "<data_for_col1>","<data_for_col2>",...

    Then use the csv module (in the standard library) to read the file.

    If your original data is in XML, you might want to parse it using lxml
    instead.

    Cheers,

    Emm
     
    Emmanuel Surleau, Apr 27, 2009
    #7
  8. On Sun, 26 Apr 2009 21:51:00 -0700, John Machin wrote:

    > á¼Î´Î¬ÎºÏυσεν ὠἸησοῦς


    Alright, I give up. Is that APL code? *grin*


    --
    Steven
     
    Steven D'Aprano, Apr 27, 2009
    #8
  9. Carbon Man

    John Machin Guest

    On Apr 27, 3:31 pm, "Paul Hemans" <> wrote:
    > Thanks John, I understand where you are coming from and will try and digest
    > it all. One problem though that I didn't mention in my original posting was
    > that the replication may only require updating one or more fields, that is a
    > problem with a generating a single SQL statement to cover all requests.


    OK so you need TWO programs -- one tailored for initial loading of a
    table, the other for subsequent updates. Still O(1) :)

    OR you could still do it in one, by using the prepared whole-row SQL
    statement if appropriate, or building an SQL statement on the fly if
    it's an update.

    > I am
    > having a look at eval and exec


    WRONG WAY
    GO BACK

    > to see if they will give me a little more
    > flexibility in runtime generation of the code.


    And another thing that you didn't mention was that you are using
    SQLAlchemy -- perhaps you might like to ask your question on that
    package's forum ... including a few more facts than heretofore :)

    HTH
    John
     
    John Machin, Apr 27, 2009
    #9
  10. Carbon Man

    John Machin Guest

    On Apr 27, 3:48 pm, Steven D'Aprano
    <> wrote:
    > On Sun, 26 Apr 2009 21:51:00 -0700, John Machin wrote:
    > > á¼Î´Î¬ÎºÏυσεν ὠἸησοῦς

    >
    > Alright, I give up.


    Your googler cactus?

    > Is that APL code? *grin*


    No but +1 anyway :)
     
    John Machin, Apr 27, 2009
    #10
  11. Carbon Man

    John Machin Guest

    On Apr 27, 2:34 pm, "Paul Hemans" <> wrote:

    > entry = schema.BILLS()
    > exists = session.query(schema.BILLS).filter(schema.BILLS.REFNO==u"1")
    > if exists.count == 0:
    >     entry.REFNO = u"1"
    >     entry.THRDPTY = u"""C5"""
    >     entry.AMOUNT = 0
    >     entry.TIMESTAMP = 0
    >     entry.ALLOCATED = 0


    Instead of generating code like that, you can do it dynamically. The
    built-in setattr() function is your friend, e.g.

    for col_name in col_names:
    if needs_updating(col_name):
    value = get_new_value(col_name)
    setattr(entry, col_name, value)

    >     session.add(entry)
    >
    > session.commit()
    > session.close()


    Cheers,

    John
     
    John Machin, Apr 27, 2009
    #11
  12. In message <gt3fv1$1aci$>, Paul Hemans wrote:

    > One problem though that I didn't mention in my original
    > posting was that the replication may only require updating one or more
    > fields, that is a problem with a generating a single SQL statement to
    > cover all requests.


    That's not a big issue. Assume the field names and corresponding values are
    coming from a Python dict, eg

    FieldValues = \
    {
    "field1" : ... value1 ...;
    "field2" : ... value2 ...;
    }

    then you can construct an SQL statement on the fly with something like

    sqlcmd = \
    (
    "insert into my_table set "
    +
    ", ".join
    (
    "%s = %s" % (k, SQLString(FieldValues[k]))
    for k in FieldValues.keys()
    )
    )

    where SQLString is as defined at
    <http://codecodex.com/wiki/index.php?title=Useful_MySQL_Routines#Quoting>.
     
    Lawrence D'Oliveiro, Apr 27, 2009
    #12
  13. In message <Xns9BFA70E834F97duncanbooth@127.0.0.1>, Duncan Booth wrote:

    > Lawrence D'Oliveiro <_zealand> wrote:
    >
    >> In message <gt3fv1$1aci$>, Paul Hemans wrote:
    >>
    >>> One problem though that I didn't mention in my original
    >>> posting was that the replication may only require updating one or
    >>> more fields, that is a problem with a generating a single SQL
    >>> statement to cover all requests.

    >>
    >> That's not a big issue. Assume the field names and corresponding
    >> values are coming from a Python dict, eg
    >>
    >> FieldValues = \
    >> {
    >> "field1" : ... value1 ...;
    >> "field2" : ... value2 ...;
    >> }
    >>
    >> then you can construct an SQL statement on the fly with something like
    >>
    >> sqlcmd = \
    >> (
    >> "insert into my_table set "
    >> +
    >> ", ".join
    >> (
    >> "%s = %s" % (k, SQLString(FieldValues[k]))
    >> for k in FieldValues.keys()
    >> )
    >> )
    >>
    >> where SQLString is as defined at
    >><http://codecodex.com/wiki/index.php?

    > title=Useful_MySQL_Routines#Quoting
    >>>.

    >>
    >>

    >
    > Not so nice if the input data is something like:
    >
    > FieldValues = { "field1=0);DROP my_table;": "" }
    >
    > So you want something to validate fieldnames.


    That's not going to happen. These are field names, not values you're talking
    about.

    > Also you are assuming that all the values are strings ...


    No I'm not.
     
    Lawrence D'Oliveiro, Apr 27, 2009
    #13
  14. Carbon Man

    Mike Kent Guest

    On Apr 27, 1:49 am, John Machin <> wrote:

    > > I am
    > > having a look at eval and exec

    >
    > WRONG WAY
    > GO BACK


    +1 QOTW
     
    Mike Kent, Apr 27, 2009
    #14
  15. Carbon Man

    Aahz Guest

    In article <>,
    Steven D'Aprano <> wrote:
    >On Sun, 26 Apr 2009 21:51:00 -0700, John Machin wrote:
    >>
    >> á¼Î´Î¬ÎºÏυσεν ὠἸησοῦς

    >
    >Alright, I give up. Is that APL code? *grin*


    base64 encoding
    --
    Aahz () <*> http://www.pythoncraft.com/

    "If you think it's expensive to hire a professional to do the job, wait
    until you hire an amateur." --Red Adair
     
    Aahz, Apr 27, 2009
    #15
  16. En Mon, 27 Apr 2009 13:12:29 -0300, Aahz <> escribió:

    > In article <>,
    > Steven D'Aprano <> wrote:
    >> On Sun, 26 Apr 2009 21:51:00 -0700, John Machin wrote:
    >>>
    >>> á¼Î´Î¬ÎºÏυσεν ὠἸησοῦς

    >>
    >> Alright, I give up. Is that APL code? *grin*

    >
    > base64 encoding


    Yes, my news client kindly decoded it for me.
    It looks like Greek, and I can transliterate the last word as Iesous, so
    it might be some kind of Biblical reference. But I don't have the
    slightiest idea of what that means...

    --
    Gabriel Genellina
     
    Gabriel Genellina, Apr 27, 2009
    #16
  17. En Mon, 27 Apr 2009 14:50:23 -0300, Gabriel Genellina
    <> escribió:

    > En Mon, 27 Apr 2009 13:12:29 -0300, Aahz <> escribió:
    >
    >> In article <>,
    >> Steven D'Aprano <> wrote:
    >>> On Sun, 26 Apr 2009 21:51:00 -0700, John Machin wrote:
    >>>>
    >>>> á¼Î´Î¬ÎºÏυσεν ὠἸησοῦς
    >>>
    >>> Alright, I give up. Is that APL code? *grin*

    >>
    >> base64 encoding

    >
    > Yes, my news client kindly decoded it for me.
    > It looks like Greek, and I can transliterate the last word as Iesous, so
    > it might be some kind of Biblical reference. But I don't have the
    > slightiest idea of what that means...


    Answering to myself: Jesus wept.
    http://en.wikipedia.org/wiki/Jesus_wept#Use_as_an_expletive

    (This reminds me a book by Ellery Queen that I read a (very long) time
    ago, the phrase "And Ellery cried" was a complete chapter by itself. I
    didn't catch the Gospel reference that time - I thought it was just a
    matter of style, a very short (and dramatic) chapter).

    --
    Gabriel Genellina
     
    Gabriel Genellina, Apr 27, 2009
    #17
  18. Carbon Man

    Paul Hemans Guest

    The reason I started down this path is that the program is grinding to a
    halt (my PC as well) and the only way to get it going again is:
    >>>import gc
    >>>gc.collect()

    2524104

    Memory used by the process is around 500MB, I have posted about this problem
    before on this newsgroup but I didn't get a resolution. So I was trying to
    separate the logic into different files and finish each one with
    gc.collect().
    Sorry I didn't post the original problem as well but I really just wanted
    the thing to work and not cloud the issue again.
    I have implemented your approach and the code is much cleaner (except for
    another problem that I have posted to sqlAlchemy). It doesn't solve my
    original problem with the memory, but I really need SOMETHING to work. So I
    will keep plodding away and then maybe move the XML processing to lxml and
    see if that resolves the memory problem.


    "John Machin" <> wrote in message
    news:...
    On Apr 27, 3:31 pm, "Paul Hemans" <> wrote:
    > Thanks John, I understand where you are coming from and will try and
    > digest
    > it all. One problem though that I didn't mention in my original posting
    > was
    > that the replication may only require updating one or more fields, that is
    > a
    > problem with a generating a single SQL statement to cover all requests.


    OK so you need TWO programs -- one tailored for initial loading of a
    table, the other for subsequent updates. Still O(1) :)

    OR you could still do it in one, by using the prepared whole-row SQL
    statement if appropriate, or building an SQL statement on the fly if
    it's an update.

    > I am
    > having a look at eval and exec


    WRONG WAY
    GO BACK

    > to see if they will give me a little more
    > flexibility in runtime generation of the code.


    And another thing that you didn't mention was that you are using
    SQLAlchemy -- perhaps you might like to ask your question on that
    package's forum ... including a few more facts than heretofore :)

    HTH
    John
     
    Paul Hemans, Apr 28, 2009
    #18
  19. Carbon Man

    Paul Hemans Guest

    The whole story

    Hi Andrew,
    The reason I am using mapped objects is that I need to abstract from the
    database implementation allowing the replication to target a number of
    different platforms. This will definitely slow things down.
    > process a whole pile in memory and then (perhaps every 10,000 - when your
    > memory is about to run out and start paging) flush the session.

    Under windows how can I tell when memory is about to run out? I guess there
    is no cross-platform solution to this.
    Writing external files has all come about from a post titled "Memory
    problems (garbage collection) by Carbon Man" which I never got a resolution
    to.
    I was trying to execute gc.collect() when a process was complete because I
    was having huge problems with memory (though the problem still remains). If
    I stop at "import schema" There are 2524104 objects processed by
    gc.collect()

    There is a new issue in the code (marked with #### Problem with SQLalchemy),
    but here is the program as it now stands:

    #!/usr/bin/python
    # -*- coding: utf-8 -*-
    # Reads an XML file and creates schema.py from the TABLES branch (very
    simplistic table structures)
    # Processes the nodes within the DATA branch, into SQLalchemy
    from xml.dom import minidom
    import os
    import codecs
    from cStringIO import StringIO
    import time
    from datetime import *
    from sqlalchemy import *
    from sqlalchemy.orm import *

    class xmlProcessing:
    """ General class for XML processing"""

    def process(self, filename="", xmlString=""):
    if xmlString:
    pass
    elif filename:
    xmldoc = minidom.parse(filename)
    self.parse( xmldoc.documentElement )

    def parseBranch(self, parentNode):
    """ Process an XML branch """
    for node in parentNode.childNodes:
    try:
    parseMethod = getattr(self, "parse_%s" %
    node.__class__.__name__)
    except AttributeError:
    continue
    if parseMethod(node):
    continue
    self.parseBranch(node)

    def parse_Document(self, node):
    pass

    def parse_Text(self, node):
    pass

    def parse_Comment(self, node):
    pass

    def parse_Element(self, node):
    try:
    handlerMethod = getattr(self, "do_%s" % node.tagName)
    except AttributeError:
    return False
    handlerMethod(node)
    return True

    class reptorParsing(xmlProcessing):
    """ Specific class for generating a SQLalchemy program to create tables
    and populate them with data"""

    def __init__(self):
    self.schemaPreface = StringIO()
    self.schemaPreface.write("""from sqlalchemy import *
    import time
    from datetime import *
    from sqlalchemy.ext.declarative import declarative_base
    #engine = create_engine('sqlite:///tutorial.db', echo=False)
    #metadata = MetaData()
    Base = declarative_base()""")
    self.schemaTables = StringIO()
    self.schemaTablesCreate = StringIO()
    self.schemaFields = StringIO()
    self.tableDict = {}
    self.tableName = StringIO()
    self.tables = StringIO()

    def parse(self, parentNode):
    """Main entry point to begin processing a XML document"""
    self.parseBranch(parentNode)
    # Properties such as schemaTables and .tables are populated by the
    various methods below
    fupdate=codecs.open(os.path.join(os.getcwd(), "update.py"), 'w',
    'UTF-8')
    if self.schemaTables:
    f=codecs.open(os.path.join(os.getcwd(), "schema.py"), 'w',
    'UTF-8')
    f.write(self.schemaPreface.getvalue()+u"\n"+self.schemaTables.getvalue()+
    u"if __name__ == '__main__':\n" +
    self.schemaTablesCreate.getvalue() +
    u"\n engine = create_engine('sqlite:///tutorial.db',
    echo=False)\n" +
    u" metadata = Base.metadata\n" +
    u" metadata.create_all(engine)\n")
    f.close()
    if self.tables:
    fupdate.write(self.tables.getvalue())
    fupdate.close()

    def do_TABLES(self, tableNode):
    """Process schema for tables"""
    for node in tableNode.childNodes:
    self.tableName = node.tagName
    # Define a declaritive mapping class
    self.schemaTables.write("""\nclass %s(Base):
    __tablename__ = '%s'
    """ % (self.tableName, self.tableName))
    self.schemaTablesCreate.write("\n tableDef =
    "+self.tableName+"()")
    self.schemaFields = StringIO()
    # allow for userA = users("Billy","Bob") via a __init__()
    self.schemaInitPreface = StringIO()
    self.schemaInitPreface.write(" def __init__(self")
    self.schemaInitBody = StringIO()
    self.parseBranch(node)
    self.schemaInitPreface.write("):\n")
    self.schemaTables.write(self.schemaFields.getvalue() + "\n" + \
    self.schemaInitPreface.getvalue() + \
    self.schemaInitBody.getvalue() + "\n")

    # Need a way to execute schema so that the tables will be created.

    def do_FIELDS(self, fieldsNode):
    """Process schema for fields within tables"""
    for node in fieldsNode.childNodes:
    if self.schemaFields:
    self.schemaFields.write("\n")
    cType = ""
    initType = '""'
    # The attribute type holds the type of field
    crType = node.attributes["type"].value
    if crType==u"C":
    cType = "String(length=%s)" % node.attributes["len"].value
    initType = '""'
    elif crType==u"N" and node.attributes["dec"].value==u'0':
    cType = "Integer"
    initType = '0'
    elif crType==u"N":
    cType = "Numeric(precision=%s, scale=%s)" %
    (node.attributes["len"].value,node.attributes["dec"].value)
    initType = '0'
    elif crType==u"L":
    cType = "Boolean"
    initType = 'True'
    elif crType==u"T":
    cType = "DateTime"
    initType = "datetime.strptime('','')"
    elif crType==u"D":
    cType = "Date"
    initType = "datetime.strptime('','')"
    elif crType==u"M" or crType==u"G":
    cType = "Text"
    initType = '""'

    if node.attributes.getNamedItem("primary"):
    cType += ", primary_key=True"
    self.schemaFields.write(" %s = Column(%s)" % (node.tagName,
    cType))
    self.schemaInitPreface.write(", \\\n %s=%s" %
    (node.tagName, initType))
    self.schemaInitBody.write(" self.%s = %s\n" %
    (node.tagName, node.tagName))
    self.tableDict[self.tableName + "." + node.tagName] = crType

    def do_DATA(self, dataNode):
    """This is for processing actual data to be pushed into the tables

    Layout is DATA -> TABLE_NAME key='primary_field' -> TUPLE ->
    FIELD_NAME -> VALUE"""
    # This is the schema program created in the TABLES branch
    import schema
    engine = create_engine('sqlite:///tutorial.db', echo=False)
    Session = sessionmaker(bind=engine)
    self.session = Session()

    for node in dataNode.childNodes:
    self.tableName = node.tagName

    self.keyValue = ""
    self.keyField = node.attributes["key"].value
    self.parseBranch(node)
    #self.tables.write("\nimport update_%s" % (self.tableName))
    self.fDataUpdate.close()
    self.session.commit()

    self.session.close()

    def do_TUPLE(self, tupleNode):
    """ A TUPLE is what the XML file refers to a table row.
    Sits below a DATA->table_name branch """

    import schema
    colValues = {}
    TI = eval("schema."+self.tableName+"()")
    exists = None
    for node in tupleNode.childNodes:
    for dataNode in node.childNodes:
    crType = self.tableDict[self.tableName + "." + node.tagName]

    if crType==u"C" or crType==u"M":
    cValue = dataNode.data
    elif crType==u"T":
    cValue = datetime.strptime(dataNode.data, "%Y-%m-%d
    %H:%M")
    elif crType==u"D":
    cValue = datetime.strptime(dataNode.data, "%Y-%m-%d")
    else:
    cValue = dataNode.data
    if node.tagName == self.keyField:
    #### Problem with SQLalchemy.
    exists =
    self.session.query(TI).filter(getattr(TI,self.keyField)==cValue)
    # Set the value onto a list to be processed below
    colValues[node.tagName] = cValue

    if exists is None:
    return

    if exists.count == 0:
    entry = TI
    else:
    entry = exists[0]

    for col_name in colValues:
    setattr(entry, col_name, colValues[col_name])
    if exists.count == 0:
    # Insert
    self.session.add(TI)


    if __name__ == '__main__':
    replicate = reptorParsing()
    replicate.process(filename=os.path.join(os.getcwd(), "request.xml"))
    del replicate
     
    Paul Hemans, Apr 28, 2009
    #19
  20. Carbon Man

    Paul Hemans Guest

    Hi,
    Please see my post titled "The whole story"

    "Martin P. Hellwig" <> wrote in message
    news:...
    > Carbon Man wrote:
    >> I have a program that is generated from a generic process. It's job is to
    >> check to see whether records (replicated from another system) exist in a
    >> local table, and if it doesn't, to add them.

    > <cut>
    > To answer the topic question, it would be limited to the memory your
    > platform can allocate and I guess that the most critical part would be if
    > you run out of memory for handling the namespace references.
    >
    > However my magic 8 ball whispers to me that the provided solution tries to
    > be smarter than the given problem requires, which is not a good thing. If
    > so and if possible, please provide a more detailed explanation of the
    > actual problem you want to solve instead of the problem you currently have
    > with the provided solution.
    >
    > --
    > MPH
    > http://blog.dcuktec.com
     
    Paul Hemans, Apr 28, 2009
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. E. Ulrich Kriegel
    Replies:
    1
    Views:
    771
    Mark Thornton
    May 29, 2004
  2. David Stockwell
    Replies:
    2
    Views:
    563
    Grant Edwards
    Jun 8, 2004
  3. Obnoxious User

    maximum size of a program

    Obnoxious User, Mar 21, 2008, in forum: C++
    Replies:
    14
    Views:
    607
    James Kanze
    Mar 23, 2008
  4. Victor Reyes
    Replies:
    3
    Views:
    123
    MonkeeSage
    Dec 27, 2007
  5. phanhuyich
    Replies:
    4
    Views:
    324
Loading...

Share This Page