Question about scoping

Discussion in 'Perl Misc' started by Bob Dubery, Feb 27, 2007.

  1. Bob Dubery

    Bob Dubery Guest

    Hi all,

    Some extracts from a program that I recently worked on follow.

    The program is a "listener" that waits for data coming in on a
    specific port. The incoming messages will contain 2 XML records, each
    of which must be validated.

    This program was exhibiting the characteristics of a memory leak. When
    started up it would consume a certain amount of memory, but over time
    the memory in use would grow and grow.

    Here's the code I found (not the whole program) then I'll show my
    changes and ask the actual question...

    <code snippets start>
    #! /usr/bin/perl

    use strict;
    use IO::Socket;
    use XML::DOM;
    use English;
    use DBI;
    use lib "/usr/local/PostOffice/progs/modules";
    use QueuesDatabase;

    .......
    .......
    my $line; # will contain all the input received
    my $data; # will contain the current input

    # get the input from the client
    my $bytesRead = sysread($new_sock, $data, 2048);

    # the data received can be more than 2048 characters, so we need to
    keep reading if we havn't received the
    # end of transmission character
    while ($bytesRead)
    {
    $line = $line . $data;

    # if it is the end of transmission, set the variable to 0 to exit
    the while loop
    if ($line =~ /$/)
    {
    $bytesRead = 0;
    }
    # otherwise keep reading from the socket
    else
    {
    $bytesRead = sysread($new_sock, $data, 2048);
    }
    }

    # make sure all the conditions are met
    validateFile($line);

    ......
    ......


    sub validateFile
    {
    # get the value passed to this subroutine
    my $dataString = shift;

    # remove the newline character and all other formatting characters
    chomp $dataString;
    $dataString =~ s/\t//g;
    $dataString =~ s/\n//g;
    $dataString =~ s/\r//g;
    $dataString =~ s/\f//g;

    # check for and remove the starting charater for the data transfer
    if ($dataString =~ /^/) # character (0x02)
    {
    $dataString =~ s/^//; # character (0x02)

    # check for and remove the ending charater for the data transfer
    if ($dataString =~ /$/) # character (0x03)
    {
    $dataString =~ s/$//; # character (0x03)

    # make sure the very first part of the file is a valid xml
    header with utf-8 encoding included
    unless ($dataString =~ /^<\?xml version=\"1\.0\" encoding=
    \"UTF-8\"\?>/)
    {
    $errMsg = "Invalid xml header for the routing header.";
    }
    else
    {
    # now check for the second xml header for the body file
    unless ($dataString =~ /<\?xml version=\"1\.0\" encoding=
    \"UTF-8\"\?>/)
    {
    $errMsg = "Invalid xml header for the xml body file.";
    }
    else
    {
    # separate the xml header and xml body file into 2 different
    variables
    $dataString =~ s/^(<\?xml version=\"1\.0\" encoding=
    \"UTF-8\"\?>.+)(<\?xml version=\"1\.0\" encoding=\"UTF-8\"\>.+)$/
    $1$2/;
    my $headerFile = $1;
    my $bodyFile = $2;

    # parse the xml headerFile to make sure that it is a well-
    formed xml file
    eval {new XML::DOM::parser->parse($headerFile)}; # create
    a new parser object and load the input file

    # parser will have died with error message in $EVAL_ERROR if
    XML is not well-formed
    if($EVAL_ERROR)
    {
    $errMsg = "XML header file is not well-formed and can't be
    processed.";
    }
    else
    {
    # parse the xml bodyFile to make sure that it is a well-
    formed xml file
    eval {new XML::DOM::parser->parse($bodyFile)}; # create
    a new parser object and load the input file

    # parser will have died with error message in $EVAL_ERROR
    if XML is not well-formed
    if($EVAL_ERROR)
    {
    $errMsg = "XML body file is not well-formed and can't be
    processed.";
    }
    }
    }
    }
    }
    else
    {
    $errMsg = "No closing flag found for the data.";
    }
    }
    else
    {
    $errMsg = "No opening flag found for the data.";
    }
    }

    <code snippets end>

    So the program uses "strict" and 'my' is used in all the subroutines.

    Now I made the following change....

    <modified snippet starts>
    {
    # separate the xml header and xml body file into 2 different
    variables
    $dataString =~ s/^(<\?xml version=\"1\.0\" encoding=
    \"UTF-8\"\?>.+)(<\?xml version=\"1\.0\" encoding=\"UTF-8\"\?>.+)$/
    $1$2/;
    my $headerFile = $1;
    my $bodyFile = $2;

    # parse the xml headerFile to make sure that it is a well-
    formed xml file
    my $headerXML = eval {new XML::DOM::parser-
    >parse($headerFile)}; # create a new parser object and load the

    input file

    # parser will have died with error message in $EVAL_ERROR if
    XML is not well-formed
    if($EVAL_ERROR)
    {
    $errMsg = "XML header file is not well-formed and can't be
    processed.";
    }
    else
    {
    # parse the xml bodyFile to make sure that it is a well-
    formed xml file
    my $bodyXML = eval {new XML::DOM::parser-
    >parse($bodyFile)}; # create a new parser object and load the input

    file

    # parser will have died with error message in $EVAL_ERROR
    if XML is not well-formed
    if($EVAL_ERROR)
    {
    $errMsg = "XML body file is not well-formed and can't be
    processed.";
    }
    else
    {
    $bodyXML->dispose;
    }
    $headerXML->dispose;
    }
    }
    <modified snippet ends>

    So what I did was to open the XML::DOM::parser objects to variable
    names and then call the dispose method once the XML has been parsed to
    see if it's well formed. Result, memory usage - except for the brief
    period when a document is being parsed - is more or less constant.

    So my question is about the scoping in the original code. Within a
    subroutine there is a line like
    eval {new XML::DOM::parser->parse($headerFile)};

    How is that parser object scoped? I imagine that the author of the
    code expected the object to dissappear out of memory once the
    subroutine was entered.

    So why the constant growth in memory usage? I can think of only two
    possibilities....

    1) eval {new XML::DOM::parser->parse($headerFile)}; results in
    somethong that is globally scoped

    2) XML::DOM::parser creates a whole lot of other objects/variables in
    memory that persist even when the actual XML::DOM::parser object
    passes out of scope.

    Or is there another reason?

    Thanks

    Bob
    Bob Dubery, Feb 27, 2007
    #1
    1. Advertising

  2. Bob Dubery wrote:
    >
    > sub validateFile
    > {
    > # get the value passed to this subroutine
    > my $dataString = shift;
    >
    > # remove the newline character and all other formatting characters
    > chomp $dataString;
    > $dataString =~ s/\t//g;
    > $dataString =~ s/\n//g;
    > $dataString =~ s/\r//g;
    > $dataString =~ s/\f//g;


    OMG! Replace the previous five lines with:

    $dataString =~ tr/\t\n\r\f//d;


    > # check for and remove the starting charater for the data transfer
    > if ($dataString =~ /^/) # character (0x02)
    > {
    > $dataString =~ s/^//; # character (0x02)
    >
    > # check for and remove the ending charater for the data transfer
    > if ($dataString =~ /$/) # character (0x03)
    > {
    > $dataString =~ s/$//; # character (0x03)

    You don't need to run the same regular expression twice, nor do you really
    need the comment:

    # check for and remove the starting charater for the data transfer
    if ($dataString =~ s/^\x02//)
    {
    # check for and remove the ending charater for the data transfer
    if ($dataString =~ s/\x03$//)
    {



    John
    --
    Perl isn't a toolbox, but a small machine shop where you can special-order
    certain sorts of tools at low cost and in short order. -- Larry Wall
    John W. Krahn, Feb 27, 2007
    #2
    1. Advertising

  3. Bob Dubery

    Bob Dubery Guest

    John's reply was informative, but it did not address my immediate
    concern.

    However, I think I may have moved on to a state where I remain
    confused, but confused at a higher level :)

    Considering this....
    eval {new XML::DOM::parser->parse($headerFile)};

    Is it the case that because neither the parser object nor the results
    of the eval are assigned to any kind of variable that either or both
    are regarded as barewords?
    Bob Dubery, Mar 5, 2007
    #3
  4. Bob Dubery

    Ben Morrow Guest

    Quoth "Bob Dubery" <>:
    > John's reply was informative, but it did not address my immediate
    > concern.
    >
    > However, I think I may have moved on to a state where I remain
    > confused, but confused at a higher level :)
    >
    > Considering this....
    > eval {new XML::DOM::parser->parse($headerFile)};


    I'm amazed this syntax works at all. It appears to be parsed as

    eval { (new XML::DOM::parser)->parse($headerFile) };

    or

    eval { XML::DOM::parser->new->parse($headerFile) };

    which would be the recommended way to write it. The indirect object
    notation ('new Foo' as opposed to 'Foo->new') is considered to have been
    a bad idea: it confuses the reader, and under the wrong circumstances it
    can confuse Perl.

    > Is it the case that because neither the parser object nor the results
    > of the eval are assigned to any kind of variable that either or both
    > are regarded as barewords?


    Err... no. What do you think a 'bareword' is? It's actually an
    expression like XML::DOM::parser or STDOUT that Perl parses as a string
    even though it isn't quoted: hence 'bare'.

    Since you haven't posted a complete program, I can't tell what you are
    trying to achieve; but I would guess that you don't want to throw away
    the result of the parse. So you want something like

    my $doc = eval {
    XML::DOM::parser->new->parse($headerFile)
    }
    or die "parse of '$headerFile' failed: $@";

    (Of course, if you are just going to die on error you don't need the
    eval: just let XML::DOM::parser die for you. I am assuming you want some
    more sophisticated error handling here.)

    Ben

    --
    don't get my sympathy hanging out the 15th floor. you've changed the locks 3
    times, he still comes reeling though the door, and soon he'll get to you, teach
    you how to get to purest hell. you do it to yourself and that's what really
    hurts is you do it to yourself just you, you and noone else **
    Ben Morrow, Mar 5, 2007
    #4
  5. Bob Dubery

    Bob Dubery Guest

    On Mar 5, 8:57 am, Ben Morrow <> wrote:
    > Quoth "Bob Dubery" <>:
    >
    > > John's reply was informative, but it did not address my immediate
    > > concern.

    >
    > > However, I think I may have moved on to a state where I remain
    > > confused, but confused at a higher level :)

    >
    > > Considering this....
    > > eval {new XML::DOM::parser->parse($headerFile)};

    >
    > I'm amazed this syntax works at all. It appears to be parsed as
    >
    > eval { (new XML::DOM::parser)->parse($headerFile) };
    >
    > or
    >
    > eval { XML::DOM::parser->new->parse($headerFile) };
    >
    > which would be the recommended way to write it. The indirect object
    > notation ('new Foo' as opposed to 'Foo->new') is considered to have been
    > a bad idea: it confuses the reader, and under the wrong circumstances it
    > can confuse Perl.


    Well thanks for that. I will try making that change to this program.

    >
    > > Is it the case that because neither the parser object nor the results
    > > of the eval are assigned to any kind of variable that either or both
    > > are regarded as barewords?

    >
    > Err... no. What do you think a 'bareword' is? It's actually an
    > expression like XML::DOM::parser or STDOUT that Perl parses as a string
    > even though it isn't quoted: hence 'bare'.
    >
    > Since you haven't posted a complete program, I can't tell what you are
    > trying to achieve; but I would guess that you don't want to throw away
    > the result of the parse.


    This is a program written by somebody else. It's a "listener". It is
    started up and "listens" for messages coming in on a certain port. The
    incoming messages each contain two XML documents - one is a routing
    header and the second is the actual payload.

    I initially thought it had a memory leak as it kept on using more and
    more memory until eventually it either fell over or the server just
    ran out of resources.

    I started trying to figure out why this was happening, and the only
    thing that seemed potentially troublesome was the use of the XML
    parser.

    In fact the results of the parse were NOT being used. The parsing was
    only done to check that both XML documents were well-formed. If either
    was not well-formed then an error message had to be entered into a
    log, and a signal sent back out on the socket indicating a "nack".

    My thinking was that by not explicitly assigning the parser objects to
    a variable, thus making it impossible to close them, some resources
    used by instances of the parser were not being released.

    Since I changed the statement that you're commenting on to something
    like
    my $bodyXML = eval {new XML::DOM::parser- >parse($bodyFile)};

    and going on to perform a $bodyXML->dispose()

    the listener has run more reliably and memory usage is reduced. It is
    still creeping up with time, but much more slowly.

    So I'm going to try the alternate notation that you offer and see if
    that brings a further improvement.

    Thanks

    Bob
    Bob Dubery, Mar 9, 2007
    #5
  6. Bob Dubery

    Bob Dubery Guest

    On Mar 9, 4:58 pm, "Bob Dubery" <> wrote:
    > On Mar 5, 8:57 am, Ben Morrow <> wrote:
    > > I'm amazed this syntax works at all. It appears to be parsed as

    >
    > > eval { (newXML::DOM::parser)->parse($headerFile) };

    >
    > > or

    >
    > > eval {XML::DOM::parser->new->parse($headerFile) };

    >
    > > which would be the recommended way to write it. The indirect object
    > > notation ('new Foo' as opposed to 'Foo->new') is considered to have been
    > > a bad idea: it confuses the reader, and under the wrong circumstances it
    > > can confuse Perl.

    >
    > Well thanks for that. I will try making that change to this program.
    >

    <snip>
    > Since I changed the statement that you're commenting on to something
    > like
    > my $bodyXML = eval {newXML::DOM::parser- >parse($bodyFile)};
    >
    > and going on to perform a $bodyXML->dispose()
    >
    > the listener has run more reliably and memory usage is reduced. It is
    > still creeping up with time, but much more slowly.
    >
    > So I'm going to try the alternate notation that you offer and see if
    > that brings a further improvement.


    And it DID improve things... but the memory consumed by the program
    was still creeping up.

    In the end I tried XML::parser::Expat instead. This means some changes
    to the code as XML::parser::Expat objects cannot be reused and may
    only perform ONE parse. But the leak is fixed.

    Prior to this I had tried using XML::parser instead of XML::DOM, but
    the problem did not go away.

    Is there a memory leak in XML::parser?


    The code for the validation routine now looks like this....


    sub validateFile
    {
    # get the value passed to this subroutine
    my $dataString = shift;

    # eliminate tabs, line feeds etc...
    $dataString =~ tr/\t\n\r\f//d;
    # eliminate padding between tags...
    $dataString =~ s/>\s*</></g;

    # create parser objects
    # these are expat parsers - cannot be reused
    my $hParser = XML::parser::Expat->new(ProtocolEncoding => 'UTF-8');
    my $bParser = XML::parser::Expat->new(ProtocolEncoding => 'UTF-8');

    # check for and remove the starting charater for the data transfer
    if ($dataString =~ /^/) # character (0x02)
    {
    $dataString =~ s/^//; # character (0x02)

    # check for and remove the ending charater for the data transfer
    if ($dataString =~ /$/) # character (0x03)
    {
    $dataString =~ s/$//; # character (0x03)

    # make sure the very first part of the file is a valid xml header
    with utf-8 encoding included
    unless ($dataString =~ /^<\?xml version=\"1\.0\" encoding=
    \"UTF-8\"\?>/)
    {
    $errMsg = "Invalid xml header for the routing header.";
    }
    else
    {
    # now check for the second xml header for the body file
    unless ($dataString =~ /<\?xml version=\"1\.0\" encoding=
    \"UTF-8\"\?>/)
    {
    $errMsg = "Invalid xml header for the xml body file.";
    }
    else
    {
    # separate the xml header and xml body file into 2 different
    variables
    $dataString =~ s/^(<\?xml version=\"1\.0\" encoding=\"UTF-8\"\?>.
    +)(<\?xml version=\"1\.0\" encoding=\"UTF-8\"\?>.+)$/$1$2/;
    my $headerFile = $1;
    my $bodyFile = $2;

    # parse the xml headerFile to make sure that it is a well-formed
    xml file
    my $headerOK = eval {$hParser->parse($headerFile)};

    # parser will have died and eval failed if XML is not well-formed
    unless($headerOK)
    {
    $errMsg = "XML header file is not well-formed and can't be
    processed.";
    }
    else
    {
    # parse the xml bodyFile to make sure that it is a well-formed
    xml file
    my $bodyOK = eval {$bParser->parse($bodyFile)};

    # parser will have died with error message in $EVAL_ERROR if XML
    is not well-formed
    unless($bodyOK)
    {
    $errMsg = "XML body file is not well-formed and can't be
    processed.";
    }
    }
    }
    }
    }
    else
    {
    $errMsg = "No closing flag found for the data.";
    }
    }
    else
    {
    $errMsg = "No opening flag found for the data.";
    }
    $hParser->release;
    $bParser->release;
    }

    Now there is one remaining oddity... even though I have called the
    release methods on the parsers, the memory consumed by the parsers
    only gets released when the subroutine is executed again and the
    statements

    my $hParser = XML::parser::Expat->new(ProtocolEncoding => 'UTF-8');
    my $bParser = XML::parser::Expat->new(ProtocolEncoding => 'UTF-8');

    are executed.

    However, this means that the memory usage will not creep. It goes up
    when a large incoming message is received, and may stay up for some
    time, but the next incoming message will effectively release the
    memory resources by instantiating new Expat parsers.
    Bob Dubery, Mar 20, 2007
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Matt S.
    Replies:
    0
    Views:
    392
    Matt S.
    May 4, 2004
  2. Alan Little

    Beginner's scoping question

    Alan Little, Nov 10, 2004, in forum: Python
    Replies:
    6
    Views:
    259
    Alan Little
    Nov 12, 2004
  3. It's me

    A scoping question

    It's me, Dec 28, 2004, in forum: Python
    Replies:
    8
    Views:
    324
    Dennis Lee Bieber
    Dec 28, 2004
  4. 2b|!2b==?

    scoping question

    2b|!2b==?, Apr 21, 2007, in forum: C++
    Replies:
    1
    Views:
    282
    Ivan Vecerina
    Apr 21, 2007
  5. Matt Barnicle
    Replies:
    10
    Views:
    626
    Bruno Desthuilliers
    Dec 2, 2007
Loading...

Share This Page