Issue: unexpected value in $2 (Perl 5.10.1)

Discussion in 'Perl Misc' started by John Bokma, Mar 22, 2013.

  1. John Bokma

    John Bokma Guest

    The following piece of code assigns an unexpected value to $2 in a
    program I am working on. When I run tests, it works as expected, but
    when I run the actual program (which parses currently 2 large XML files)
    $2 gets assigned a random value when the second XML file is parsed [1]:

    $iso_gmt =~ /^(\d{4})-(\d\d)-(\d\d)T(\d\d):(\d\d):(\d\d)Z$/
    or croak "Not a valid ISO GMT date time: '$iso_gmt'";
    # Note: without the assingment below $2 might be set to a different
    # value for an unknown reason (bug in XML::parser C code?)
    # perl, v5.10.1
    #my $month = $2;
    my $seconds_since_epoch;
    my $error;
    $seconds_since_epoch = timegm( $6, $5, $4, $3, $2 - 1, $1 - 1900 );

    $iso_gmt = '2012-10-30T21:05:01Z';

    Examples of errors:

    Month '20129103' out of range 0..11
    Month '11826511' out of range 0..11
    Month '20079951' out of range 0..11
    Month '15160655' out of range 0..11
    Month '21972303' out of range 0..11
    Month '10208591' out of range 0..11

    0133254f - 20129103
    00b4754f - 11826511
    0132654f - 20079951
    00e7554f - 15160655
    014f454f - 21972303
    009bc54f - 10208591

    ^^^ looks like there is a pattern.

    Uncommenting the assignment of $2 to $month removes the effect.
    Reversing the parse order of the 2 XML files also removes the effect.

    Anyone a suggestion (or 2) to pin point what goes wrong here (or is this
    a know bug?). It looks like memory accidentally gets overwritten.


    [1] using XML::parser

    --
    John Bokma j3b

    Blog: http://johnbokma.com/ Perl Consultancy: http://castleamber.com/
    Perl for books: http://johnbokma.com/perl/help-in-exchange-for-books.html
    John Bokma, Mar 22, 2013
    #1
    1. Advertising

  2. John Bokma

    John Bokma Guest

    Ben Morrow <> writes:

    > Quoth John Bokma <>:
    >>
    >> The following piece of code assigns an unexpected value to $2 in a
    >> program I am working on. When I run tests, it works as expected, but
    >> when I run the actual program (which parses currently 2 large XML files)
    >> $2 gets assigned a random value when the second XML file is parsed [1]:
    >>
    >> $iso_gmt =~ /^(\d{4})-(\d\d)-(\d\d)T(\d\d):(\d\d):(\d\d)Z$/
    >> or croak "Not a valid ISO GMT date time: '$iso_gmt'";
    >> # Note: without the assingment below $2 might be set to a different
    >> # value for an unknown reason (bug in XML::parser C code?)
    >> # perl, v5.10.1
    >> #my $month = $2;
    >> my $seconds_since_epoch;
    >> my $error;
    >> $seconds_since_epoch = timegm( $6, $5, $4, $3, $2 - 1, $1 - 1900 );

    >
    > This code doesn't call XML::parser. Where are you calling it?


    My apologies for the unclear write up. The above piece of code is called
    by code that uses XML::parser to parse two XML files. Since the issue
    seems to depend on the order in which the XML files are parsed I don't
    think it's possible (or: easy) to create a minimal test case (otherwise
    I certainly would've posted that one). Note: tests that I run via prove
    to test the above code don't show the issue.

    > Also, I would usually want to assign the return value of m// rather than
    > relying on globals:


    Yes, same here, usually ;-).

    >> Examples of errors:
    >>
    >> Month '20129103' out of range 0..11
    >> Month '11826511' out of range 0..11
    >> Month '20079951' out of range 0..11
    >> Month '15160655' out of range 0..11
    >> Month '21972303' out of range 0..11
    >> Month '10208591' out of range 0..11
    >>
    >> 0133254f - 20129103
    >> 00b4754f - 11826511
    >> 0132654f - 20079951
    >> 00e7554f - 15160655
    >> 014f454f - 21972303
    >> 009bc54f - 10208591
    >>
    >> ^^^ looks like there is a pattern.

    >
    > Where does this output come from?


    timegm throws the "Month ... out of range ...". I used a one liner to
    convert those values to hex (given above).

    [..]
    > The output of Devel::peek::Dump($2) at various points would be helpful.
    > If you can it would also be worth trying with a perl built with
    > -DDEBUGGING, to see if you get an assertion failure.


    Thanks, will look into those two options. Right now I am not able to
    reproduce it. Most likely due to code changes.

    --
    John Bokma j3b

    Blog: http://johnbokma.com/ Perl Consultancy: http://castleamber.com/
    Perl for books: http://johnbokma.com/perl/help-in-exchange-for-books.html
    John Bokma, Mar 23, 2013
    #2
    1. Advertising

  3. John Bokma

    Klaus Guest

    On 23 mar, 00:44, Ben Morrow <> wrote:
    > Quoth John Bokma <>:
    > >     $iso_gmt =~ /^(\d{4})-(\d\d)-(\d\d)T(\d\d):(\d\d):(\d\d)Z$/
    > >    or croak "Not a valid ISO GMT date time: '$iso_gmt'";


    > Also, I would usually want to assign the return value of m// rather than
    > relying on globals:
    >     my ($year, $month, $day, $hour, $min, $sec) =
    >         $iso_gmt =~ /.../;


    I agree, but I would also add that it is perfectly safe to use $1, $2,
    $3, ... in an assignment to local variables immediatly after $iso_gmt
    =~ /.../ or croak "..."

    For example, the following lines are perfectly safe:

    $iso_gmt =~ /.../ or croak "...";
    my ($year, $month, $day, $hour, $min, $sec)
    = ($1, $2, $3, $4, $5, $6);

    However, the following line is not safe:

    > > $seconds_since_epoch = timegm( $6, $5, $4, $3, $2 - 1, $1 - 1900 );


    I am very suspicious about the fact that $1, $2, $3, ... are passed by
    reference to a subroutine timegm( $6, $5, $4, $3, $2 - 1, $1 - 1900 ).

    Variables such as $1, $2, $3, ... have very complicated scoping rules.

    Passing such variables by reference to a subroutine makes it very hard
    to guarantee the integrity of $1, $2, $3, ...
    Klaus, Mar 23, 2013
    #3
  4. John Bokma

    John Bokma Guest

    Klaus <> writes:

    > However, the following line is not safe:
    >
    >> > $seconds_since_epoch = timegm( $6, $5, $4, $3, $2 - 1, $1 - 1900 );


    Can you give me an example of calling a sub that shows this unsafe
    behavior? And/or the scoping issues you mention?

    I've always considered it perfectly safe to do something like:

    $foo =~ /(.)/;
    bar( $1 );

    if not, I like to know cases this can go wrong. Thanks.

    --
    John Bokma j3b

    Blog: http://johnbokma.com/ Perl Consultancy: http://castleamber.com/
    Perl for books: http://johnbokma.com/perl/help-in-exchange-for-books.html
    John Bokma, Mar 23, 2013
    #4
  5. John Bokma

    Klaus Guest

    On 23 mar, 16:01, John Bokma <> wrote:
    > Klaus <> writes:
    > > However, the following line is not safe:

    >
    > >> > $seconds_since_epoch = timegm( $6, $5, $4, $3, $2 - 1, $1 - 1900 );

    >
    > Can you give me an example of calling a sub that shows this unsafe
    > behavior? And/or the scoping issues you mention?
    >
    > I've always considered it perfectly safe to do something like:
    >
    > $foo =~ /(.)/;
    > bar( $1 );
    >
    > if not, I like to know cases this can go wrong. Thanks.


    Here is an example where

    Case 1 uses the safe bar($var)
    Case 2 uses the unsafe bar($1)

    ==============================================
    use strict;
    use warnings;

    my $foo = 'abc';

    $foo =~ /^(..).$/ or die "Error 1";
    my $var = $1;
    print "Case 1: ", bar($var), "\n";

    $foo =~ /^(..).$/ or die "Error 2";
    print "Case 2: ", bar($1), "\n";

    sub bar {
    my $z = '';

    $_[0] =~ /^(.).$/ or return "Error 3";
    $z .= $1;

    $_[0] =~ /^.(.)$/ or return "Error 4";
    $z .= $1;

    return "ok ('$z')";
    }
    ==============================================

    The output is:
    ==============================================
    Case 1: ok ('ab')
    Case 2: Error 4
    ==============================================
    Klaus, Mar 23, 2013
    #5
  6. John Bokma

    Klaus Guest

    On 23 mar, 16:01, John Bokma <> wrote:
    > Klaus <> writes:
    > > However, the following line is not safe:
    > >> > $seconds_since_epoch = timegm( $6, $5, $4, $3, $2 - 1, $1 - 1900 );


    > And/or the scoping issues you mention?


    see perldoc perlre:

    ++ These special variables, like the %+ hash and the numbered match
    ++ variables ($1 , $2 , $3 , etc.) are dynamically scoped until the
    ++ end of the enclosing block or until the next successful match,
    ++ whichever comes first.

    The scoping issue is that the execution of timegm($6, ...) falls in
    the same dynamically scoped block as the previously executed $iso_gmt
    =~ /.../ or croak "...";

    timegm($6, ...) can itself have regular expressions, in which case
    that regular expression overwrites whatever is contained in $6.

    The issue is that $6 is aliased to the first parameter of
    timegm($6, ...).

    These cases are not necessarily wrong, but they are very difficult to
    get right.
    Klaus, Mar 23, 2013
    #6
  7. Klaus <> writes:

    [...]

    > ==============================================
    > use strict;
    > use warnings;
    >
    > my $foo = 'abc';
    >
    > $foo =~ /^(..).$/ or die "Error 1";
    > my $var = $1;
    > print "Case 1: ", bar($var), "\n";
    >
    > $foo =~ /^(..).$/ or die "Error 2";
    > print "Case 2: ", bar($1), "\n";
    >
    > sub bar {
    > my $z = '';
    >
    > $_[0] =~ /^(.).$/ or return "Error 3";
    > $z .= $1;


    'use warnings' wouldn't complain about .= with a 'no value' variable
    on the left hand side and even if this wasn't the case, the useless
    concatenation could simply be replaced with a proper assignment in
    order to avoid the "cargo cult 'initialization'" (my $z = '').
    Rainer Weikusat, Mar 23, 2013
    #7
  8. On 2013-03-23 16:36, Klaus <> wrote:
    > On 23 mar, 16:01, John Bokma <> wrote:
    >> Klaus <> writes:
    >> > However, the following line is not safe:
    >> >> > $seconds_since_epoch = timegm( $6, $5, $4, $3, $2 - 1, $1 - 1900 );

    >
    >> And/or the scoping issues you mention?

    >
    > see perldoc perlre:
    >
    > ++ These special variables, like the %+ hash and the numbered match
    > ++ variables ($1 , $2 , $3 , etc.) are dynamically scoped until the
    > ++ end of the enclosing block or until the next successful match,
    > ++ whichever comes first.
    >
    > The scoping issue is that the execution of timegm($6, ...) falls in
    > the same dynamically scoped block as the previously executed $iso_gmt
    >=~ /.../ or croak "...";
    >
    > timegm($6, ...) can itself have regular expressions, in which case
    > that regular expression overwrites whatever is contained in $6.


    While that is theoretically possible, it doesn't seem to be the case
    with timegm(). I just skimmed the source and don't see any use of
    regexps there and frankly I can't think of any legitimate use timegm()
    could have for regexps.


    > The issue is that $6 is aliased to the first parameter of
    > timegm($6, ...).


    So I doubt that this is the issue. (Also I think John would have been
    able to extract a minimal test case if it was that easy)

    hp


    --
    _ | Peter J. Holzer | Fluch der elektronischen Textverarbeitung:
    |_|_) | Sysadmin WSR | Man feilt solange an seinen Text um, bis
    | | | | die Satzbestandteile des Satzes nicht mehr
    __/ | http://www.hjp.at/ | zusammenpaßt. -- Ralph Babel
    Peter J. Holzer, Mar 23, 2013
    #8
  9. with <> John Bokma wrote:
    > Klaus <> writes:
    >
    >> However, the following line is not safe:
    >>
    >>> > $seconds_since_epoch = timegm( $6, $5, $4, $3, $2 - 1, $1 - 1900 );

    >
    > Can you give me an example of calling a sub that shows this unsafe
    > behavior? And/or the scoping issues you mention?


    #!/usr/bin/perl

    use strict;
    use warnings;
    use feature qw{ say };

    sub holy_cosmos {
    my $input = shift @_;
    'holy_hallelujah' =~ m{holy_(.*)};
    say $1 // 'missing'
    }

    'holy_rainbow' =~ m{(.*)_(.*)};
    say $1 // 'not found';
    holy_cosmos $1;
    say $1 // 'not found';

    __END__

    {2109:6} [0:0]% perl ~/foo.90HmzN.pl
    holy
    hallelujah
    holy
    {2374:7} [0:0]%

    I doubt there's any. Probably, that was the case in pre-historic Perls.
    Or this is some XS that directly manipulates variables in other scope.

    *CUT*

    --
    Torvalds' goal for Linux is very simple: World Domination
    Stallman's goal for GNU is even simpler: Freedom
    Eric Pozharski, Mar 24, 2013
    #9
  10. >>>>> "Eric" == Eric Pozharski <> writes:

    Eric> sub holy_cosmos {
    Eric> my $input = shift @_;
    Eric> 'holy_hallelujah' =~ m{holy_(.*)};
    Eric> say $1 // 'missing'
    Eric> }

    This is bad. Very bad. If the match fails, it DOES NOT RESET the match
    variables. So you have broken code from the getgo.

    --
    Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
    <> <URL:http://www.stonehenge.com/merlyn/>
    Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc.
    See http://methodsandmessages.posterous.com/ for Smalltalk discussion
    Randal L. Schwartz, Mar 25, 2013
    #10
  11. with <> Ben Morrow wrote:
    *SKIP*

    > Try this:
    >
    > sub foo {
    > my ($x) = @_;
    > "y" =~ /(.)/;
    > say "$x:$_[0]";
    > }
    >
    > "x" =~ /(.)/;
    > foo $1;
    >
    > Note that this can only happen if you access @_ directly (or by
    > reference) after doing a pattern match, which is not common.


    Slowly: they said
    (<>)
    that this:

    #!/usr/bin/perl

    use strict;
    use warnings;
    use feature qw{ say };

    sub foo {
    my ($x) = @_;
    "y" =~ /(.)/;
    say "$x:$_[0]";
    }

    "x" =~ /(.)/;
    foo $1;
    say $1;
    __END__

    should be this:

    x:y
    y

    or this:

    y:y
    y

    instead it is this:

    {1894:4} [0:0]% perl ~/foo.H2vhcu.pl
    x:y
    x

    p.s. I've already seen what Devel::peek::Dump said me, has nothing to
    do with subs, @_, or aliasing.

    --
    Torvalds' goal for Linux is very simple: World Domination
    Stallman's goal for GNU is even simpler: Freedom
    Eric Pozharski, Mar 25, 2013
    #11
  12. with <> Randal L. Schwartz wrote:
    >>>>>> "Eric" == Eric Pozharski <> writes:

    >
    >Eric> sub holy_cosmos {
    >Eric> my $input = shift @_;
    >Eric> 'holy_hallelujah' =~ m{holy_(.*)};
    >Eric> say $1 // 'missing'
    >Eric> }
    >
    > This is bad. Very bad. If the match fails, it DOES NOT RESET the match
    > variables. So you have broken code from the getgo.


    How do you reset match variables?

    p.s. I already feel much better and would like to pump it even further.

    --
    Torvalds' goal for Linux is very simple: World Domination
    Stallman's goal for GNU is even simpler: Freedom
    Eric Pozharski, Mar 25, 2013
    #12
  13. John Bokma

    Klaus Guest

    On 25 mar, 09:29, Eric Pozharski <> wrote:
    > with <> Randal L. Schwartz wrote:
    >
    > >>>>>> "Eric" == Eric Pozharski <> writes:

    >
    > >Eric> sub holy_cosmos {
    > >Eric>     my $input = shift @_;
    > >Eric>     'holy_hallelujah' =~ m{holy_(.*)};
    > >Eric>     say $1 // 'missing'
    > >Eric> }

    >
    > > This is bad.  Very bad.  If the match fails, it DOES NOT RESET the match
    > > variables.  So you have broken code from the getgo.

    >
    > How do you reset match variables?


    Usually you don't reset match variables at all.

    The point is to test the match, if it fails, then you don't look at
    match variables.

    For example:

    if ('holy_hallelujah' =~ m{holy_(.*)}) {
    say $1;
    }
    else {
    # do not look at match variables...
    say 'missing';
    }
    Klaus, Mar 25, 2013
    #13
  14. Eric Pozharski <> writes:
    > with <> Randal L. Schwartz wrote:
    >>>>>>> "Eric" == Eric Pozharski <> writes:

    >>
    >>Eric> sub holy_cosmos {
    >>Eric> my $input = shift @_;
    >>Eric> 'holy_hallelujah' =~ m{holy_(.*)};
    >>Eric> say $1 // 'missing'
    >>Eric> }
    >>
    >> This is bad. Very bad. If the match fails, it DOES NOT RESET the match
    >> variables. So you have broken code from the getgo.

    >
    > How do you reset match variables?


    The next successful match will 'reset' them. In this case, this means
    if the 'holy_hallelujah' match could fail (which it can't but a
    'non-constant' one could), the line below would test the value the
    last successful match put into $1.

    As Klaus already wrote: The easiest way to stay clear of $1
    .... pitfalls is 'never use them except in code which is conditionally
    executed when "the match" succeeded'.
    Rainer Weikusat, Mar 25, 2013
    #14
  15. with <> Ben Morrow wrote:
    >
    > Quoth Eric Pozharski <>:
    >> with <> Ben Morrow wrote:

    *SKIP*
    >> (<>)

    > Who they? (Google Groups appears to be so broken it can't look up a
    > msgid any more...)


    Giganews filters googlegroups? Amazing. Count me in.

    *SKIP*
    >> p.s. I've already seen what Devel::peek::Dump said me, has nothing
    >> to do with subs, @_, or aliasing.

    > Then you're using it wrong, because this has everything to do with @_
    > and aliasing.


    #!/usr/bin/perl

    use strict;
    use warnings;
    use feature qw{ say };

    'holy_rainbow' =~ m{(.*)_(.*)};
    say $1 // 'not found';

    {
    'holy_hallelujah' =~ m{holy_(.*)};
    say $1 // 'missing';
    }

    say $1 // 'not found';

    __END__

    {11145:5} [0:0]% perl ~/foo.4PonDs.pl
    holy
    hallelujah
    holy

    See? And no @_ involved.

    *CUT*

    --
    Torvalds' goal for Linux is very simple: World Domination
    Stallman's goal for GNU is even simpler: Freedom
    Eric Pozharski, Mar 26, 2013
    #15
  16. Eric Pozharski <> writes:

    [...]

    >> Then you're using it wrong, because this has everything to do with @_
    >> and aliasing.

    >
    > #!/usr/bin/perl
    >
    > use strict;
    > use warnings;
    > use feature qw{ say };
    >
    > 'holy_rainbow' =~ m{(.*)_(.*)};
    > say $1 // 'not found';
    >
    > {
    > 'holy_hallelujah' =~ m{holy_(.*)};
    > say $1 // 'missing';
    > }
    >
    > say $1 // 'not found';
    >
    > __END__
    >
    > {11145:5} [0:0]% perl ~/foo.4PonDs.pl
    > holy
    > hallelujah
    > holy
    >
    > See? And no @_ involved.


    What is this now supposed to demonstrate? As documented, the $1
    .... are always implicitly local to the enclosing block. This subthread
    was about passing them as arguments to a subroutine call which will
    cause them to be aliased to @_ slots and a successful pattern match
    executed by the subroutine will then change the arguments.

    Slightly less contrived example
    -----------------------------
    sub move_suffix
    {
    $_[0] =~ /_(.*)$/ and return "$_[1]_$1";
    return $_[1];
    }

    print(move_suffix('aaa_bbb', 'ccc'), "\n");

    'hhh_jjj' =~ /(.*)_/ and print(move_suffix('aaa_bbb', $1), "\n");
    ------------------------------

    This will print

    ccc_bbb
    bbb_bbb

    instead of

    ccc_bbb
    hhh_bbb

    Since assigning a scalar to another scalar implies copying the value, I
    usually don't use scalar assignments in simple (frequently executed)
    subroutines which don't need 'modifiable state variables' initialized
    to the passed argument values.
    Rainer Weikusat, Mar 26, 2013
    #16
  17. Eric Pozharski <> writes:
    > with <> Ben Morrow wrote:
    > *SKIP*
    >
    >> Try this:
    >>
    >> sub foo {
    >> my ($x) = @_;
    >> "y" =~ /(.)/;
    >> say "$x:$_[0]";
    >> }
    >>
    >> "x" =~ /(.)/;
    >> foo $1;
    >>
    >> Note that this can only happen if you access @_ directly (or by
    >> reference) after doing a pattern match, which is not common.

    >
    > Slowly: they said
    > (<>)
    > that this:
    >
    > #!/usr/bin/perl
    >
    > use strict;
    > use warnings;
    > use feature qw{ say };
    >
    > sub foo {
    > my ($x) = @_;
    > "y" =~ /(.)/;
    > say "$x:$_[0]";
    > }
    >
    > "x" =~ /(.)/;
    > foo $1;
    > say $1;
    > __END__
    >
    > should be this:
    >
    > x:y
    > y
    >
    > or this:
    >
    > y:y
    > y


    The statement was (paraphrased) that passing $1 ... as arguments to a
    subroutine call wouldn't be safe. This actually demonstrated by the
    example you posted because the value of $_[0] does change because of
    the pattern match: If something other than $1 had been passed, the
    output would be two times the value of that, not the value of the
    passed arguments once followed by y.
    Rainer Weikusat, Mar 26, 2013
    #17
  18. On 2013-03-23, John Bokma <> wrote:
    > Klaus <> writes:
    >
    >> However, the following line is not safe:
    >>
    >>> > $seconds_since_epoch = timegm( $6, $5, $4, $3, $2 - 1, $1 - 1900 );

    >
    > Can you give me an example of calling a sub that shows this unsafe
    > behavior? And/or the scoping issues you mention?
    >
    > I've always considered it perfectly safe to do something like:
    >
    > $foo =~ /(.)/;
    > bar( $1 );
    >
    > if not, I like to know cases this can go wrong. Thanks.


    While the answer to your question was already demonstrated in this
    thread (a year ago!), I still want to add my 2¢: I ***always*** write
    such calls as
    bar "$1";

    I think that the slowdown is negligible, while the benefits are
    enormous (you may guess that I spent many hours tracing this kind of bugs).

    Yours,
    Ilya
    Ilya Zakharevich, Jun 1, 2014
    #18
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Yin99

    unexpected return value

    Yin99, Aug 4, 2005, in forum: C++
    Replies:
    3
    Views:
    324
    Yin99
    Aug 4, 2005
  2. Luke Meyers
    Replies:
    8
    Views:
    364
    Luke Meyers
    May 27, 2006
  3. Replies:
    4
    Views:
    347
    Roland Pibinger
    Nov 10, 2006
  4. Jan Danielsson
    Replies:
    1
    Views:
    314
    Jan Danielsson
    Apr 14, 2007
  5. Samuel Lown

    Unexpected BigDecimal rounding issue

    Samuel Lown, May 6, 2009, in forum: Ruby
    Replies:
    5
    Views:
    178
    Robert Klemme
    May 6, 2009
Loading...

Share This Page