regular expression variables under debugger

Tad McClellan · Aug 26, 2006

wlcna said:
I must say a good number of you perl programmers seem like wusses.

You say this to people that you want something from?

I'm
not primarily a perl programmer

It will be a lot harder to get help from Perl programmers after
you call them names.

but just gotta say that.

Cutting off your nose to spite your face will not advance you
towards a solution to your problem.

Not all of
you, but a whole bunch.

We love you too.

Peter J. Holzer · Aug 26, 2006

5.8.2 is the version.

Re: getting the latest, is there a way to update perl without losing
all the libraries that are installed? Can you give me a tip on
dealing with this issue? I compile my own perl (under Linux)...

Yes. The Configure script asks for extra directories to add to @INC.
IIRC it even tries to detect an already installed perl and asks if it
should add these directories.

hp

wlcna · Aug 26, 2006

Dr.Ruud said:
wlcna schreef:

How about the watch on $str?

I didn't see the compelling reason for that one - $str wasn't changing
while this problem was happening. But the watch on $1 produced the
interesting weird stuff happening with that utf8 thing I mentioned.
It's very clear to me that that is where the problem lies or at least
that is where the real problem is happening. There's more sample code
below, but if you want to know without running it, $str in both sets is
"Yahoo! U.S. News."

Another update: I modified the code just previously posted to try to
reproduce the problem using HTML::TreeBuilder instead of XML:: since
that's generally available without an install and easier for wusses to
try. Unfortunately, with HTML::TreeBuilder, this problem didn't seem to
occur (this surprised me)...

Here's the code I used to test HTML::, and I also cut out a few lines
that could be shortened since each single additional line seems to tax
SO MUCH the brains of some out here.

I'd note I picked the string to retrieve in the code below b/c it could
be retrieved via a parse using *either* the HTML or XML library (which
is not necessarily an easy thing to find), and in both cases via the
"as_text" method. This same string when retrieved using the XML library
did again produce the error, but when retrieved using HTML

Another update: I now am sometimes getting "Out of memory" bombs while
running these simple regexes using the XML library. Again, for those
who don't listen: same inputs, regex results different when using XML
library! You can't reproduce w/$str = "Yahoo! U.S. News".

I don't say for sure this is a Perl bug since God only knows it could be
some kind of data corruption in my computer, lightning striking, some
documented but to me unknown incompatibility, who knows. But something
is definitely wrong here and analyzing my two lines of regex code is not
really the point (for those doing that, not talking about you).

I'm going to try it on a second machine if possible....

---------------------

# HTML version
use LWP::Simple;
use HTML::TreeBuilder;

my $strUrl = 'http://rss.news.yahoo.com/rss/us';

my $strHtml = get( $strUrl );

my $t = new HTML::TreeBuilder;
$t->parse( $strHtml );
$t->eof;

my $str = $t->content->[0]->content->[0]->as_text;
$str =~ /(.*)news/i;
my $testPart = $1;
my $testWhole = $&;

my $breakpoint = 3;
print "testPart: <$testPart>, testWhole: <$testWhole>\n";

---------------------

And I may as well show the related XML version which still produces the
error. This code pulls out the exact same string using the same final
access method "as_text". And this one does show the problem (again, for
me of course).

---------------------

#!/usr/bin/perl
use strict;

# *SECOND* XML and RSS VERSION
use LWP::Simple;
use XML::TreeBuilder;

my $strUrl = 'http://rss.news.yahoo.com/rss/us';

# retrieve
my $strHtml = get( $strUrl );

# parse the data retrieved.
my $t = new XML::TreeBuilder;
$t->parse( $strHtml );
$t->eof;

my $str = $t->content->[1]->content->[1]->as_text;
$str =~ /(.*)news/i;
my $testPart = $1;
my $testWhole = $&;
my $breakpoint = 3;
print "testPart: $testPart, testWhole: $testWhole\n";

Dr.Ruud · Aug 26, 2006

wlcna schreef:

Missing:

#!/usr/bin/perl
use warnings ;
use strict ;

# HTML version
use LWP::Simple;
use HTML::TreeBuilder;

my $strUrl = 'http://rss.news.yahoo.com/rss/us';

my $strHtml = get( $strUrl );

my $t = new HTML::TreeBuilder;
$t->parse( $strHtml );
$t->eof;

my $str = $t->content->[0]->content->[0]->as_text;
$str =~ /(.*)news/i;
my $testPart = $1;
my $testWhole = $&;

my $breakpoint = 3;
print "testPart: <$testPart>, testWhole: <$testWhole>\n";

---------------------

And I may as well show the related XML version which still produces
the error. This code pulls out the exact same string using the same
final access method "as_text". And this one does show the problem
(again, for me of course).

Missing:

use warnings ;

use strict;

# *SECOND* XML and RSS VERSION
use LWP::Simple;
use XML::TreeBuilder;

my $strUrl = 'http://rss.news.yahoo.com/rss/us';

# retrieve
my $strHtml = get( $strUrl );

# parse the data retrieved.
my $t = new XML::TreeBuilder;
$t->parse( $strHtml );
$t->eof;

my $str = $t->content->[1]->content->[1]->as_text;

$str =~ /(.*)news/i;
my $testPart = $1;
my $testWhole = $&;
my $breakpoint = 3;
print "testPart: $testPart, testWhole: $testWhole\n";

if ($str =~ /(.*)news/i)
{
my $testPart = $1;
my $testWhole = $&;
my $breakpoint = 3;
print "testPart <$testPart>\n" ;
print "testWhole <$testWhole>\n" ;
}
else
{
die "no match" ;
}

Henry Law · Aug 26, 2006

wlcna said:
It's there, see my other post to your neighbor in this thread.

NNTP doesn't maintain rigid sequence of posts so I have no clue which
one you're talking about. I did find a post containing some code of
yours, which I copied and pasted into a file, to see if I could help
you, but running it produces this:

F:\WIP>wlcna.pl
syntax error at F:\WIP\wlcna.pl line 26, near ", for
"
"use" not allowed in expression at F:\WIP\wlcna.pl line 31, at end of line
BEGIN not safe after errors--compilation aborted at F:\WIP\wlcna.pl line 34.

Goodbye.

wlcna · Aug 27, 2006

Henry Law said:
NNTP doesn't maintain rigid sequence of posts so I have no clue which
one you're talking about. I did find a post containing some code of

You've never heard of a threaded news reader?? Also never heard of
groups.google.com??

yours, which I copied and pasted into a file, to see if I could help
you, but running it produces this:

Well everything I've posted runs fine and I've run the code on two
machines now - the only caveat being the XML:: stuff may not be
installed on your machine (and of course other things may not be
installed if you have an extremely minimal or old perl), but what you
posted looks more like you may have just made a silly copy/paste error -
for instance the word "for" never occurs in anything I posted.

But anyway, as a result of just running it this other machine I think I
can tell you to not worry about trying it.

RESOLUTION: I've just found it doesn't happen on this other machine
with a slightly more recent perl version, 5.8.7.

That means I'm ascribing a resolution to either a more current version
or that my installation on this one machine is possibly messed up (so
possibly back to the lightning/data corruption/whatever explanation).

Now this doesn't entirely resolve the issue in my mind because as I
initially said I've seen the issue in other places when running perl
under the debugger, and though I couldn't reproduce it right now in
HTML:: I'm quite sure I've seen it there too, though of course I never
remembered it actually changing results of code execution as I've stated
(I use the debugger alot to check things over).

So I will keep my eyes open for it the next time and analyze it further
then, AND HOPEFULLY THERE WILL BE NO NEXT TIME OF COURSE, but I won't be
analyzing anything in any important way until I do a complete update of
my perl installation. I considered it recent enough, but it's either
corrupt or not recent enough, hence I'm starting fresh. After this
though, I may post back regarding if I still see the problem on this one
machine, because this machine is Linux and the one where I do NOT see
the problem is Windows, which may also be a reason why I'm not seeing it
(in other words, it may reappear even after I do a fresh perl install).

Regarding installation tips, I've gotten one so far about modifying INC
during the config step of the installation but I'm quite sure you can
modify that even after the installation, so if I wanted to do that still
could later on, I'm quite certain.

But frankly, at this point I'm assuming modifying INC in this way to
point to the older version will be unsafe, partly because I notice that
the path to all these module files has the version number in it, so I'm
just going to reinstall perl libs as they come up, I don't think there
are TONS of them, and they're very easy to add, and I want to ensure the
new install is as clean as possible.

But if any of you know a nice and still very safe way to update the perl
version, like from just 5.8.2 to 5.8.7 or whatever it is now, something
small like that, I'd be interested in hearing if anyone knows (I won't
hold my breath).

So that should wrap this noodling out here for now. I think only one
person who posted out here even has any clue what I'm talking about or
what's going on, but I will leave it at that for now.

Matt Garrish · Aug 27, 2006

wlcna said:
You've never heard of a threaded news reader?? Also never heard of
groups.google.com??

Have you never heard of providing context for your posts? Why do you
think your problem is so profoundly interesting to anyone that they
should go looking for your code, which in this case was pointless and
proved nothing.

Well everything I've posted runs fine and I've run the code on two
machines now - the only caveat being the XML:: stuff may not be
installed on your machine (and of course other things may not be
installed if you have an extremely minimal or old perl), but what you
posted looks more like you may have just made a silly copy/paste error -
for instance the word "for" never occurs in anything I posted.

But anyway, as a result of just running it this other machine I think I
can tell you to not worry about trying it.

RESOLUTION: I've just found it doesn't happen on this other machine
with a slightly more recent perl version, 5.8.7.

That means I'm ascribing a resolution to either a more current version
or that my installation on this one machine is possibly messed up (so
possibly back to the lightning/data corruption/whatever explanation).

Now this doesn't entirely resolve the issue in my mind because as I
initially said I've seen the issue in other places when running perl
under the debugger, and though I couldn't reproduce it right now in
HTML:: I'm quite sure I've seen it there too, though of course I never
remembered it actually changing results of code execution as I've stated
(I use the debugger alot to check things over).

I think everyone reading this thread realizes the problem is with you
and not perl. If you're going to post again, do like you've been asked
numerous times and post a real program that everyone can run that
exhibits your problem. Showing some debugging output proves nothing,
and leads people to the rightful conclusion that your installation is
broken, which for all your blustering is what you've finally come to
realize. Just because *you* think the sky is falling doesn't make it
so.

Matt

wlcna · Aug 27, 2006

Matt Garrish said:
Have you never heard of providing context for your posts? Why do you
think your problem is so profoundly interesting to anyone that they
should go looking for your code, which in this case was pointless and
proved nothing.

Well, I think it *would* be interesting if a regular expression that one
knows ahead of time *does* match and matches fine when in non-debugging
mode, but it does something completely different under the debugger,
gee, silly me, I thought that was interesting.

I still think it is, and if I see it again, I will post back because
this is not the first time I've seen something like that, as I
originally indicated.

And I did post the code that reproduced the problem for me, what like 3
or 4 times?? At first, I didn't have the exact code, and when that was
the case I said so and said I'd post it as soon as I had it because I
thought I had hit upon the actual problem area. And I figured out what
the problem was with no help from you all and I posted the code that
reproduced the problem.

And what did the folks out here say. First, I don't want to install any
XML:: library. Second, what I posted was not minimal enough, when in
fact it was, well at least give or take a line or two. Third, that I
needed use strict and use warnings, when these things of course had
nothing whatsoever to do with the issue.

I do expect people to read an entire thread before they post and be sure
when they say something berating me that they know what they're talking
about. You think that unreasonable? You prefer wasting bandwidth on
trivial discussions that are not to the point. Well, actually, often
usenet involves a little circuitousness while arriving at a proper
discussion, but this discussion never got there and I think I posted
plenty of info about the issue.

I expect people to have threaded news readers and be able to follow a
slightly lengthy thread before they post into it. I can't quote the
entire thread, I quote what came just before, and I don't waste
bandwidth repeating things 2 or 3 times.

Your "sky is falling" item is completely ridiculous as I don't think at
any point I was begging anyone out here for help, simply pointing out
odd behavior and trying to get comments. I think I can safely say that
at no point was I at risk of receiving help here and I wasn't really
asking for help, more like "serving notice" of a problem and getting
comments about what people thought about it (like it it lightning
striking or just par for the course).

I think everyone reading this thread realizes the problem is with you
and not perl.

Well, who knows what some people think and frankly who cares. I don't
have any idea what most of the people who have been posting in this
thread think about. I know that they can't read very well though and I
know they still don't understand things I've explained like 3 times or
so. I'm certainly not lowering myself to further explanations at this
point.

I'm optimistically thinking to myself that the brighter ones among you
probably just opted out of this thread as the one guy did, saying, "it's
probably a version problem" or an installation problem. That was in
fact the apt analysis.

I don't appreciate the people who come out here and make me waste time
by over-analyzing demo code in ways that have absolutely nothing to do
with my problem or my issue and making me hand-hold them through some
quite decently explained observations, and then ask for more and more
code, which I give, and then they simply whine and make pointless
observations that were already addressed elsewhere in the thread. These
people destroy constructive discussions, plain and simple and they are a
menace to usenet.

These people aren't helping me or doing anything useful and I don't care
if I alienate them.

So, what did I do wrong again??

Tad McClellan · Aug 27, 2006

I think only one
person who posted out here even has any clue what I'm talking about

People will think you're strange if you keep talking to yourself.

wlcna · Aug 27, 2006

Tad McClellan said:
People will think you're strange if you keep talking to yourself.

Well, *I* think I'm strange for talking out here, so you got me there.
But I'm actually talking to google now, just noting things for the
record (and my posts *are* the only record here!). I know one of the
other guys out here has never heard of google groups before, maybe you
haven't either...

Uri Guttman · Aug 27, 2006

w> Well, *I* think I'm strange for talking out here, so you got me there.
w> But I'm actually talking to google now, just noting things for the
w> record (and my posts *are* the only record here!). I know one of the
w> other guys out here has never heard of google groups before, maybe you
w> haven't either...

you are beyond clueless. please leave and go learn python. they need to
be told about google groups. and in case you didn't know, google didn't
invent usenet nor even searching usenet. but of course you know all
about google and perl bugs that don't exist.

uri

wlcna · Aug 27, 2006

Uri Guttman said:
you are beyond clueless. please leave and go learn python. they need
to
be told about google groups. and in case you didn't know, google
didn't
invent usenet nor even searching usenet. but of course you know all
about google and perl bugs that don't exist.

Yesh, I fanshy I know alot because I've heard of google. Yesh I do! I
fanshy that google created usenet, yesh I do!

Oh, man. Honestly I'm sorry this discussion is so bad...

First, let me close here by saying I do apologize for calling you a name
several posts ago. Seriously, I apologize for that. Very bad form,
sorry, just let this get the better of me.

But in parting I'd like to point out my beefs as a matter of setting the
record straight and trying to clear the air. Actually, before putting
this list together I hadn't actually realized that you were probably the
one presenting the biggest problem to me out here! I remembered the one
thing, but I had ascribed one of the other problems to someone else. I
thought there was at least one other involved with the below points, but
your name kept popping up when I reviewed the posts. Anyways...

1. THE PROGRAM WAS DOING DIFFERENT THINGS UNDER THE DEBUGGER THAN NOT
UNDER THE DEBUGGER. This was stated many times by me and it is a
self-evident piece of important information that should tell you alot
about what I'm seeing, and almost every single one of those
participating in the thread totally ignored that or just never read it
in the first place.

2. (You actually did this one.) THE PROBLEM WAS IN THE STRINGS COMING
BACK FROM THE XML LIBRARY and someone (actually you!) tell me that I'm
dumb for not reproducing the problem by just typing in the string!! If
I could have reproduced the problem that way I would have. In fact,
that was the premise of my original post, I assumed I *could* do that,
and as stated, THAT DIDN'T DO IT! So, clearly pointed out that it
didn't work that way and if I typed it or copy/pasted it, things worked
fine. In fact, if I got the "identical" string from the HTML library
also, things were fine. Only from the XML library did the problem
appear and this was stated. And there was something I found going on in
a utf8 library with that string that wouldn't happen in the other
scenarios. Again, just blithely ignored.

3. (And of course, you did this one.) YOU TELL ME THE CODE DOES NOT
SHOW ANYTHING BECAUSE I DIDN'T PRINT A VARIABLE FOR YOU. Dude, why not
run the code yourself and print whatever variables you like, isn't that
why I posted it?? So you guys ask for code so you can try things out
for yourelves, I give that plus debugger executions showing a bit about
the problem, and you have a fit because the little debugger context I
show is not done how you'd prefer. The idea of posting the code was not
to provoke analysis but to get people to try it. So, pretty dense here.
If you really didn't want to install it (only takes 30 seconds), why not
say, "Dude, I don't want to install this crap, can you show me a
debugger run printing $str?" Instead you start some silly personal
attack.

4. I think you and like what 2 others telling me over and over again to
check the regex succeeded before accessing $1. That's a fine coding tip
or whatever it's supposed to be but it was entirely missing the point.
The issue was to show the problem happening and that was easiest done
WITH KNOWN MATCHING INPUTS. I.e. I'd prefer to just type it in and that
was the idea of the original code. Show a known matching input, show
that nothing comes back when it should. I had already stated
essentially that the regex is failing under the debugger but not in the
regular run. So what you guys are telling me is, well, this is stupid,
why don't you check if the regex is failing when you're seeing that it's
not working. I know it was failing in the debugger, and not in the
regular run with the identical inputs and this was the issue being
identified!

5. Telling me to use strict and use warnings (I don't think you ever
did this). I actually had use strict in there at the top of the file,
but I was putting only the really relevant clip of the file for people
to reproduce.

So anyway, if someone can tell me how I've been dense, which has been
said, I'd appreciate it, because I don't think I've been dense about
anything here, but anyway, I do apologize for the excessive rudeness.

Klaus · Aug 27, 2006

wlcna said:
1. THE PROGRAM WAS DOING DIFFERENT THINGS UNDER THE DEBUGGER
THAN NOT UNDER THE DEBUGGER.

This does not change the way you resolve your problem, that is by
reducing the complexity of your original program step-by-step until you
can reproduce the problem in a small but complete program.

See my response from 25/08/2006 at 08:17 pm
http://groups.google.fr/group/comp.lang.perl.misc/browse_frm/thread/4829d88d5e36830b/?hl=en&

2. THE PROBLEM WAS IN THE STRINGS COMING
BACK FROM THE XML LIBRARY

That's irrelevant information.

I have a program I was running through the debugger that had a usage
like this (this is not the code but similar):

1: my $str = "hello there yes i am here";
2: $str =~ /([a-z]*)s/;
3: $yes = $1;

[...] the debugger [...] tells me that $1 is "undef"
[...] *NOT IN THE DEBUGGER* and everything is normal,
and I check by doing old-fashioned print statements. $yes is "yes" as
expected.

The info >>$yes is "yes"<< is misleading information.
The problem is that for given a scalar "$str", the resulting "$1" after
the regular expression "/([a-z]*)s/" is not the same with and without
debugger

3. YOU TELL ME THE CODE DOES NOT
SHOW ANYTHING BECAUSE I DIDN'T PRINT
A VARIABLE FOR YOU.

Then add some prints to your code. Here is how I would do it:

print '$str = ', unpack('H*', $str), "\n";
$str =~ /([a-z]*)s/;
print '$1 = ', unpack('H*', $1), "\n";

4. I think you and like what 2 others telling me over and over again to
check the regex succeeded before accessing $1.

Here is how I would check the regex before accessing $1:

$str =~ /([a-z]*)s/ or die qq{No match in '$str'};

5. Telling me to use strict and use warnings

Good advice

Matt Garrish · Aug 27, 2006

wlcna said:
Well, I think it *would* be interesting if a regular expression that one
knows ahead of time *does* match and matches fine when in non-debugging
mode, but it does something completely different under the debugger,
gee, silly me, I thought that was interesting.

Right, and we're all supposed to take your word for it, because
obviously you're so profoundly intelligent that it couldn't be your own
misunderstanding of what is going on in your program. Stop the presses
everyone, someone doesn't know how to use the debugger!

I still think it is, and if I see it again, I will post back because
this is not the first time I've seen something like that, as I
originally indicated.

And I did post the code that reproduced the problem for me, what like 3
or 4 times?? At first, I didn't have the exact code, and when that was
the case I said so and said I'd post it as soon as I had it because I
thought I had hit upon the actual problem area. And I figured out what
the problem was with no help from you all and I posted the code that
reproduced the problem.

And what did the folks out here say. First, I don't want to install any
XML:: library. Second, what I posted was not minimal enough, when in
fact it was, well at least give or take a line or two. Third, that I
needed use strict and use warnings, when these things of course had
nothing whatsoever to do with the issue.

You did nothing of the sort. You posted a lot of useless code and
debugging output that proved nothing, as others showed you. You're
hardly one to judge what will and will not help debug your problem when
your understanding of perl isn't very deep. It's your silly assertions
like your regular expression never failing that make people's eyes
roll. Why should anyone here make any effort to help you when you won't
even take the minimal steps necessary to prove that it's not your
poorly written code that's to blame?

Your "sky is falling" item is completely ridiculous as I don't think at
any point I was begging anyone out here for help, simply pointing out
odd behavior and trying to get comments. I think I can safely say that
at no point was I at risk of receiving help here and I wasn't really
asking for help, more like "serving notice" of a problem and getting
comments about what people thought about it (like it it lightning
striking or just par for the course).

And that's why you've irritated everyone here. I would suggest you
actually read the story of chicken little before you presume to
understand how it applies to you. Because you got bopped on the head by
an acorn, or in this case because you failed to understand what was
going on with your program, does not mean there is any imminent
catastrophe worth investigating. Before wasting people's time with
pointless inquiries after your own incompetence please take the time to
ensure that it's not just an acorn. You don't report a baseless problem
and then expect people to run around trying to reproduce it for you
because you're too lazy to write a program that exhibits it. If there
is a real problem in perl everyone should be able to reproduce it. If
you can't write a minimal script to do this then, gee, perhaps you
haven't really found a bug in perl.

Matt

Tad McClellan · Aug 27, 2006

First, let me close here by saying I do apologize for calling you a name
several posts ago. Seriously, I apologize for that. Very bad form,
sorry, just let this get the better of me.

If only folks had been cautioned against such things!

So anyway, if someone can tell me how I've been dense, which has been
said, I'd appreciate it, because I don't think I've been dense about
anything here,

Have you seen the Posting Guidelines that are posted here frequently?

Count to ten before composing a followup when you are upset

Count to ten after composing and before posting when you are upset

<g>

David Squire · Aug 27, 2006

You don't report a baseless problem
and then expect people to run around trying to reproduce it for you
because you're too lazy to write a program that exhibits it. If there
is a real problem in perl everyone should be able to reproduce it. If
you can't write a minimal script to do this then, gee, perhaps you
haven't really found a bug in perl.

This is not true. wlcna *did* post scripts that exhibit a problem. I
have run them (after adding checks on regex matches and more prints),
and there *is* weirdness under the debugger in one case and not the
other, which would appear to be module-related. This weirdness does not
appear when the scripts are run without the debugger.

Below I include my versions of the scripts, and a transcript showing
them run with and without the debugger. When using XML::TreeBuilder, I
get the sort of "Out of memory!" that wlcna reported.

---- Begin test_with_HTML::TreeBuilder.pl ----

#!/usr/bin/perl

use strict;
use warnings;

# HTML version

use LWP::Simple;
use HTML::TreeBuilder;

my $strUrl = 'http://rss.news.yahoo.com/rss/us';

my $strHtml = get( $strUrl );

my $t = new HTML::TreeBuilder;
$t->parse( $strHtml );
$t->eof;

my $str = $t->content->[0]->content->[0]->as_text;
print "\$str = $str\n";
($str =~ /(.*)news/i) or warn "No match!\n";
my $testPart = $1;
my $testWhole = $&;

my $breakpoint = 3;
print "testPart: <$testPart>, testWhole: <$testWhole>\n";

---- End test_with_HTML::TreeBuilder.pl ----

---- Begin test_with_XML::TreeBuilder.pl ----

#!/usr/bin/perl

use strict;
use warnings;

# *SECOND* XML and RSS VERSION
use LWP::Simple;
use XML::TreeBuilder;

my $strUrl = 'http://rss.news.yahoo.com/rss/us';

# retrieve
my $strHtml = get( $strUrl );

# parse the data retrieved.
my $t = new XML::TreeBuilder;
$t->parse( $strHtml );
$t->eof;

my $str = $t->content->[1]->content->[1]->as_text;
print "\$str = $str\n";
($str =~ /(.*)news/i) or warn "No match!\n";
my $testPart = $1;
my $testWhole = $&;

my $breakpoint = 3;
print "testPart: <$testPart>, testWhole: <$testWhole>\n";

---- End test_with_XML::TreeBuilder.pl ----

---- Begin Transcript ----

~/tmp (davids : dms54)
Aug 27 - 14:31 % ./test_with_HTML::TreeBuilder.pl
$str = Yahoo! U.S. News
testPart: <Yahoo! U.S. >, testWhole: <Yahoo! U.S. News>

~/tmp (davids : dms54)
Aug 27 - 14:31 %
../test_with_XML::TreeBuilder.pl
$str = Yahoo! U.S. News
testPart: <Yahoo! U.S. >, testWhole: <Yahoo! U.S. News>

~/tmp (davids : dms54)
Aug 27 - 14:31 % perl -d
test_with_HTML::TreeBuilder.pl

Loading DB routines from perl5db.pl version 1.28
Editor support available.

Enter h or `h h' for help, or `man perldebug' for more help.

main:

test_with_HTML::TreeBuilder.pl:11):
11: my $strUrl = 'http://rss.news.yahoo.com/rss/us';

DB<1> n
main:

test_with_HTML::TreeBuilder.pl:13):
13: my $strHtml = get( $strUrl );

DB<1> n
main:

test_with_HTML::TreeBuilder.pl:15):
15: my $t = new HTML::TreeBuilder;

DB<1> n
main:

test_with_HTML::TreeBuilder.pl:16):
16: $t->parse( $strHtml );

DB<1> n
main:

test_with_HTML::TreeBuilder.pl:17):
17: $t->eof;

DB<1> n
main:

test_with_HTML::TreeBuilder.pl:19):
19: my $str = $t->content->[0]->content->[0]->as_text;

DB<1> n
main:

test_with_HTML::TreeBuilder.pl:20):
20: print "\$str = $str\n";

DB<1> n
$str = Yahoo! U.S. News
main:

test_with_HTML::TreeBuilder.pl:21):
21: ($str =~ /(.*)news/i) or warn "No match!\n";

DB<1> n
main:

test_with_HTML::TreeBuilder.pl:22):
22: my $testPart = $1;

DB<1> print $1
Yahoo! U.S.

DB<2> n
main:

test_with_HTML::TreeBuilder.pl:23):
23: my $testWhole = $&;

DB<2> n
main:

test_with_HTML::TreeBuilder.pl:25):
25: my $breakpoint = 3;

DB<2> n
main:

test_with_HTML::TreeBuilder.pl:26):
26: print "testPart: <$testPart>, testWhole: <$testWhole>\n";

DB<2> n
testPart: <Yahoo! U.S. >, testWhole: <Yahoo! U.S. News>
Debugged program terminated. Use q to quit or R to restart,
use O inhibit_exit to avoid stopping after program termination,
h q, h R or h O to get additional info.

DB<2> q

~/tmp (davids : dms54)
Aug 27 - 14:32 % perl -d
test_with_XML::TreeBuilder.pl

Loading DB routines from perl5db.pl version 1.28
Editor support available.

Enter h or `h h' for help, or `man perldebug' for more help.

main:

test_with_XML::TreeBuilder.pl:10):
10: my $strUrl = 'http://rss.news.yahoo.com/rss/us';

DB<1> n
main:

test_with_XML::TreeBuilder.pl:13):
13: my $strHtml = get( $strUrl );

DB<1> n
main:

test_with_XML::TreeBuilder.pl:16):
16: my $t = new XML::TreeBuilder;

DB<1> n
main:

test_with_XML::TreeBuilder.pl:17):
17: $t->parse( $strHtml );

DB<1> n
main:

test_with_XML::TreeBuilder.pl:18):
18: $t->eof;

DB<1> n
main:

test_with_XML::TreeBuilder.pl:20):
20: my $str = $t->content->[1]->content->[1]->as_text;

DB<1> n
main:

test_with_XML::TreeBuilder.pl:21):
21: print "\$str = $str\n";

DB<1> n
$str = Yahoo! U.S. News
main:

test_with_XML::TreeBuilder.pl:22):
22: ($str =~ /(.*)news/i) or warn "No match!\n";

DB<1> n
perl(4560) malloc: *** vm_allocate(size=4291080192) failed (error code=3)
perl(4560) malloc: *** error: can't allocate region
perl(4560) malloc: *** set a breakpoint in szone_error to debug
Out of memory!
Debugged program terminated. Use q to quit or R to restart,
use O inhibit_exit to avoid stopping after program termination,
h q, h R or h O to get additional info.

DB<1>

---- End Transcript ----

---- Begin version details ----

~/tmp (davids : dms54)
Aug 27 - 14:40 % perl -v

This is perl, v5.8.6 built for darwin-thread-multi-2level
(with 2 registered patches, see perl -V for more detail)

# I downloaded and installed all the modules used from CPAN yesterday (I
have just reinstalled this system).

---- End version details ----

I have also tested a version where $str is fixed and no modules are
used. It has no weirdness, just as seen when using HTML::TreeBuilder.
I'll post it if you like.

Now, I know next to nothing about using the Perl debugger, but the
behaviour above is not what I would have expected. If this is because I
don't know how to use the debugger, I will be glad to be informed of how
to do it properly.

I think that the lesson of this thread is that both sides of the
discussion got angry early, with the result that very few people
actually read deeply or tried things. Once the insults and name-calling
starts, the chances of analysis and help drop dramatically. There are
lessons for both sides here.

First and foremost, if you *have* done tests to eliminate cases such as
regexes not matching, leave them in the minimal example script so that
folks don't leap on that common newbie error as the likely explanation
for the problem. Likewise for including "use strict; use warnings;".
Getting snippy about this sort of thing results in people not reading
your later posts.

DS

Matt Garrish · Aug 27, 2006

David said:
I think that the lesson of this thread is that both sides of the
discussion got angry early, with the result that very few people
actually read deeply or tried things. Once the insults and name-calling
starts, the chances of analysis and help drop dramatically. There are
lessons for both sides here.

You got that part right, but it's not a burden on any of us to be
responsive to an ass, which is what this fellow has proven to be time
and again in this thread. Hopefully this person will learn from your
example on how to properly diagnose and report a problem, though.

Matt

Peter J. Holzer · Aug 27, 2006

This is not true. wlcna *did* post scripts that exhibit a problem.

He did post scripts, and he claimed that they exhibit the problem. He
didn't show that they did, however (the behaviour depends on the
contents of the variable $str, and since that in turn depends on the
successful retrieval and contents of a web page, and he didn't print it,
we didn't know whether it did contain what he thinks it contained).

I also ran his scripts and could not reproduce the behaviour. I also ran
your scripts with and without the debugger with perl 5.8.4 (debian sarge
package) and 5.8.8 (compiled from source) and couldn't reproduce the
problem, either.

[Excerpts from your transscripts:]

main:test_with_HTML::TreeBuilder.pl:20):
20: print "\$str = $str\n";

DB<1> n
$str = Yahoo! U.S. News

main:test_with_XML::TreeBuilder.pl:21):
21: print "\$str = $str\n";

DB<1> n
$str = Yahoo! News: U.S. News

These do look the same, but what happens when you replace the print with

dumpstr($str);
sub dumpstr {
my ($s) = @_;
print utf8::is_utf8($s) ? "char" : "byte", " string: ";
for (split(//, $s)) {
printf("%#x %s ", ord($_), /[[

rint:]]/ ? $_ : '.');
}
print "\n";
}

?

On my system test_with_HTML::TreeBuilder prints:

byte string: 0x59 Y 0x61 a 0x68 h 0x6f o 0x6f o 0x21 ! 0x20
0x4e N 0x65 e 0x77 w 0x73 s 0x3a : 0x20 0x55 U 0x2e . 0x53 S
0x2e . 0x20 0x4e N 0x65 e 0x77 w 0x73 s

but test_with_XML::TreeBuilder prints:

char string: 0x59 Y 0x61 a 0x68 h 0x6f o 0x6f o 0x21 ! 0x20
0x4e N 0x65 e 0x77 w 0x73 s 0x3a : 0x20 0x55 U 0x2e . 0x53 S
0x2e . 0x20 0x4e N 0x65 e 0x77 w 0x73 s

So one returns a byte string and the other a character string. Perl
should handle both identically, but maybe yours and the OP's doesn't.

So there is an even simpler (Look Ma! No TreeBuilder!) script to test
this:

----- test_without_TreeBuilder ---------------------------------
#!/usr/local/bin/perl

use strict;
use warnings;

my $str = 'Yahoo! U.S. News';
utf8::upgrade($str);
dumpstr($str);
($str =~ /(.*)news/i) or warn "No match!\n";
my $testPart = $1;
my $testWhole = $&;

my $breakpoint = 3;
print "testPart: <$testPart>, testWhole: <$testWhole>\n";

sub dumpstr {
my ($s) = @_;
print utf8::is_utf8($s) ? "char" : "byte", " string: ";
for (split(//, $s)) {
printf("%#x %s ", ord($_), /[[

rint:]]/ ? $_ : '.');
}
print "\n";
}
----------------------------------------------------------------

I think that the lesson of this thread is that both sides of the
discussion got angry early, with the result that very few people
actually read deeply or tried things. Once the insults and
name-calling starts, the chances of analysis and help drop
dramatically.

ACK. I did run the OP's script but by then the thread had degenerated
into name-calling and since I couldn't reproduce the problem I didn't
see much value in posting my results.

First and foremost, if you *have* done tests to eliminate cases such as
regexes not matching, leave them in the minimal example script so that
folks don't leap on that common newbie error as the likely explanation
for the problem.

Right. This is especially important if your script reads data from an
external source. If you get data from a web page and don't test if that
was actually successful, most people (myself included) will assume that
the retrieval of the web page failed and not that there is a bug in
perl, as the former is far more likely.

hp

Matt Garrish · Aug 27, 2006

David said:
#!/usr/bin/perl

use strict;
use warnings;

# *SECOND* XML and RSS VERSION
use LWP::Simple;
use XML::TreeBuilder;

my $strUrl = 'http://rss.news.yahoo.com/rss/us';

# retrieve
my $strHtml = get( $strUrl );

# parse the data retrieved.
my $t = new XML::TreeBuilder;
$t->parse( $strHtml );
$t->eof;

my $str = $t->content->[1]->content->[1]->as_text;
print "\$str = $str\n";
($str =~ /(.*)news/i) or warn "No match!\n";
my $testPart = $1;
my $testWhole = $&;

my $breakpoint = 3;
print "testPart: <$testPart>, testWhole: <$testWhole>\n";

DB<1> n
perl(4560) malloc: *** vm_allocate(size=4291080192) failed (error code=3)
perl(4560) malloc: *** error: can't allocate region
perl(4560) malloc: *** set a breakpoint in szone_error to debug
Out of memory!
Debugged program terminated. Use q to quit or R to restart,

Now, I know next to nothing about using the Perl debugger, but the
behaviour above is not what I would have expected. If this is because I
don't know how to use the debugger, I will be glad to be informed of how
to do it properly.

I tried your code above, and can't reproduce the malloc error in the
debugger or out (on ActivePerl 5.8.8 build 817) . Of course, going back
to the OPs original question, this was never an issue. He seemed to be
having some problem with undefined variables that he never proved was
anything.

Matt

David Squire · Aug 27, 2006

Peter said:
I also ran his scripts and could not reproduce the behaviour. I also ran
your scripts

Well, they're the OP's scripts, with a few little additions of mine.

with and without the debugger with perl 5.8.4 (debian sarge
package) and 5.8.8 (compiled from source) and couldn't reproduce the
problem, either.

[Excerpts from your transscripts:]

main:test_with_HTML::TreeBuilder.pl:20):
20: print "\$str = $str\n";

DB<1> n
$str = Yahoo! U.S. News

main:test_with_XML::TreeBuilder.pl:21):
21: print "\$str = $str\n";

DB<1> n
$str = Yahoo! News: U.S. News

Click to expand...

These do look the same, but what happens when you replace the print with

dumpstr($str);
sub dumpstr {
my ($s) = @_;
print utf8::is_utf8($s) ? "char" : "byte", " string: ";
for (split(//, $s)) {
printf("%#x %s ", ord($_), /[[rint:]]/ ? $_ : '.');
}
print "\n";
}

?

On my system test_with_HTML::TreeBuilder prints:

byte string: 0x59 Y 0x61 a 0x68 h 0x6f o 0x6f o 0x21 ! 0x20
0x4e N 0x65 e 0x77 w 0x73 s 0x3a : 0x20 0x55 U 0x2e . 0x53 S
0x2e . 0x20 0x4e N 0x65 e 0x77 w 0x73 s

but test_with_XML::TreeBuilder prints:

char string: 0x59 Y 0x61 a 0x68 h 0x6f o 0x6f o 0x21 ! 0x20
0x4e N 0x65 e 0x77 w 0x73 s 0x3a : 0x20 0x55 U 0x2e . 0x53 S
0x2e . 0x20 0x4e N 0x65 e 0x77 w 0x73 s

So one returns a byte string and the other a character string. Perl
should handle both identically, but maybe yours and the OP's doesn't.

Ah. My hunch was that this was an encoding issue, especially since the
OP reported something about utf8_heavy.pl earlier in the thread (which
seemed to be related to the earlier "$1 undefined" issue). I have never
played around with encodings though, so I thought it best to wait for
someone else to jump in

The debugger "Out of Memory" error is a real beauty though.

BTW my Perl (today) is the Mac OS X out-of-the box one.

DS

dynamic scoped variables in the debugger	2	Jul 27, 2008
Read class variables in different files	1	May 15, 2022
How do I get the text that is found by a regular expression?	10	Apr 30, 2014
Recursion regular expression (xtended)	1	Aug 16, 2010
Regular expression 'c' modifier	4	Nov 24, 2013
My "telegram_polling()" and "@message_handler" does not work on "herokuapp.com" under gunicorn	0	Dec 12, 2021
debugger	5	Aug 22, 2009
Regular expression problem	13	Mar 10, 2013

regular expression variables under debugger

Tad McClellan

Peter J. Holzer

wlcna

Dr.Ruud

Henry Law

wlcna

Matt Garrish

wlcna

Tad McClellan

wlcna

Uri Guttman

wlcna

Klaus

Matt Garrish

Tad McClellan

David Squire

Matt Garrish

Peter J. Holzer

Matt Garrish

David Squire

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads