OPEN( , Get , or slurping problem

C

Chris

Hi,

I'm trying to import a htm file (from an external site) into an array
and then parse each line to check for a certain line. I have tried
the following:

#!/usr/local/bin/perl -w
use warnings;
use strict;

use LWP::Simple;
my @site = ("http://www.webbuyeruk.co.uk/links.htm");

foreach my $site (@site){
my @content = get ($site);

print "Array entries: $#content\n";
}

the above puts all of the lines into the first array entry [0], how
can I change this??


Also the following:

open(MYFILE, "<($site[0])") || die "Can't open $site[0] : $!\n";;
my @filedata = <MYFILE>;
close(MYFILE);

gives me the following result:
Can't open http://www.webbuyeruk.co.uk/links.htm : Invalid argument

Is this because it is trying to change the file instead of reading it?
How can I get around this?

Chris.
 
P

Paul Lalli

Hi,

I'm trying to import a htm file (from an external site) into an array
and then parse each line to check for a certain line. I have tried
the following:

#!/usr/local/bin/perl -w
use warnings;
use strict;

use LWP::Simple;
my @site = ("http://www.webbuyeruk.co.uk/links.htm");

foreach my $site (@site){
my @content = get ($site);

print "Array entries: $#content\n";
}

the above puts all of the lines into the first array entry [0], how
can I change this??

perldoc LWP::Simple shows that get() returns a single string. That's it's
behavior. If you want each line in a different element of an array, do it
yourself:

my @content = split /\n/, get($site); #assumes \n is what you mean by 'line'
Also the following:

open(MYFILE, "<($site[0])") || die "Can't open $site[0] : $!\n";;
my @filedata = <MYFILE>;
close(MYFILE);

gives me the following result:
Can't open http://www.webbuyeruk.co.uk/links.htm : Invalid argument

Is this because it is trying to change the file instead of reading it?
How can I get around this?


What are you *trying* to do here? Your code is attempting to open a local
file named "(http://www.webbuyeruk.co.uk/links.htm)" and write read from
it. I find it decidedly unlikely such a file exists on your local system.

Paul Lalli
 
B

Ben Morrow

I'm trying to import a htm

HTML. Never mind that some people still use brain-damaged 8.3 names.
file (from an external site) into an array
and then parse each line to check for a certain line. I have tried
the following:

#!/usr/local/bin/perl -w
use warnings;

No need for belt and braces: use warnings replaces -w :).
use strict;

use LWP::Simple;
my @site = ("http://www.webbuyeruk.co.uk/links.htm");

foreach my $site (@site){
my @content = get ($site);

print "Array entries: $#content\n";
}

the above puts all of the lines into the first array entry [0], how
can I change this??

my $content = get $site;
my @content = split /\n/, $content;

Some people would object to my using both $content and @content here...
that is a matter of style you may wish to consider.
Also the following:

open(MYFILE, "<($site[0])") || die "Can't open $site[0] : $!\n";;
my @filedata = <MYFILE>;
close(MYFILE);

gives me the following result:
Can't open http://www.webbuyeruk.co.uk/links.htm : Invalid argument

Well, what did you expect? Perl != PHP: open is for opening *files*.
Presuming you're on a Win32 system (something tells me you are :) this
will be looking for an 'http:' drive, which is, as the error message
said, invalid.
Is this because it is trying to change the file instead of reading it?
How can I get around this?

Use LWP, as you were.

You may also be better off using an HTML-parsing module than trying to
parse it by hand, depending on how constant the format of the page is.

Ben
 
G

Gunnar Hjalmarsson

Chris said:
I'm trying to import a htm file (from an external site) into an
array and then parse each line to check for a certain line. I have
tried the following:

#!/usr/local/bin/perl -w
use warnings;
use strict;

use LWP::Simple;
my @site = ("http://www.webbuyeruk.co.uk/links.htm");

foreach my $site (@site){
my @content = get ($site);

print "Array entries: $#content\n";
}

the above puts all of the lines into the first array entry [0], how
can I change this??

You need to think it over when it's suitable to use an array and when
it's not. the get() function returns the content as a string, so why
not just do:

use LWP::Simple;
my $site = get 'http://www.webbuyeruk.co.uk/links.htm';
while ( $site =~ /(.*)/g ) {
if ($1 =~ /PATTERN/) {
print "Found\n";
last;
}
}
Also the following:

open(MYFILE, "<($site[0])") || die "Can't open $site[0] : $!\n";;

You can't open a URL! Please learn the difference between a path and a
URL.
 
M

Michele Dondi

use LWP::Simple;
my $site = get 'http://www.webbuyeruk.co.uk/links.htm';
while ( $site =~ /(.*)/g ) {
if ($1 =~ /PATTERN/) {
print "Found\n";
last;
}
} [...]
open(MYFILE, "<($site[0])") || die "Can't open $site[0] : $!\n";;

You can't open a URL! Please learn the difference between a path and a
URL.

But then, if he *really* wants to open() the downloaded HTML, he could
do that "in memory":

# untested
open my $file, '<', \$site or die $!;
do_something while <$file>;


Michele
--
#!/usr/bin/perl -lp
BEGIN{*ARGV=do{open $_,q,<,,\$/;$_}}s z^z seek DATA,11,$[;($,
=ucfirst<DATA>)=~s x .*x q^~ZEX69l^^q,^2$;][@,xe.$, zex,s e1e
q 1~BEER XX1^q~4761rA67thb ~eex ,s aba m,P..,,substr$&,$.,age
__END__
 
T

Tore Aursand

#!/usr/local/bin/perl -w
use warnings;
use strict;

No need for the '-w' flag as long as you 'use warnings';

#!/usr/local/bin/perl
#
use strict;
use warnings;
use LWP::Simple;
my @site = ("http://www.webbuyeruk.co.uk/links.htm");

foreach my $site (@site){
my @content = get ($site);

print "Array entries: $#content\n";
}

the above puts all of the lines into the first array entry [0], how
can I change this??

You usually don't want to change this, unless you really have to.
LWP::Simple's get() function returns a string. You could always split the
string on line breaks, but do you really have to?

foreach ( @site ) {
my $content = get( $_ );
unless ( defined $content ) {
# Error
}
}
open(MYFILE, "<($site[0])") || die "Can't open $site[0] : $!\n";;
my @filedata = <MYFILE>;
close(MYFILE);

$site[0] refers to the first element of @site, which is the URL of the
site (mentioned in your code).

If you told us what you want to do with the returning HTML, we could
probably give you some tips to some modules which would help you out.
 
A

Anno Siegel

[...]
my $content = get $site;
my @content = split /\n/, $content;

Some people would object to my using both $content and @content here...
that is a matter of style you may wish to consider.

I wouldn't object at all. When the same content is represented in different
forms, using the same name for both is intuitive and describes the situation.
I do it all the time.

Anno
 
G

Gunnar Hjalmarsson

Michele said:
use LWP::Simple;
my $site = get 'http://www.webbuyeruk.co.uk/links.htm';
while ( $site =~ /(.*)/g ) {
if ($1 =~ /PATTERN/) {
print "Found\n";
last;
}
}
[...]
open(MYFILE, "<($site[0])") || die "Can't open $site[0] : $!\n";;

You can't open a URL! Please learn the difference between a path
and a URL.

But then, if he *really* wants to open() the downloaded HTML, he
could do that "in memory":

# untested
open my $file, '<', \$site or die $!;
do_something while <$file>;

Well, I tested, and it just made my script hang.
 
B

Ben Morrow

Gunnar Hjalmarsson said:
Well, I tested, and it just made my script hang.

Yesss... $site has no newlines in. Michele meant something more like

my $html = get $site;
open my $FILE, '<', \$html or die $!;
do_summat while <$FILE>;

Ben
 
G

Gunnar Hjalmarsson

Ben said:
Yesss... $site has no newlines in.

Yes, in my suggestion it has. :)

I figured out that the reason for my problems was that I run my test
script in taint mode. Untainting $site:

$site = $1 if $site =~ /(.*)/s;

does not make a difference, and I don't get any meaningful error
message. (Only "Premature end of script headers".)

But when running the script without tainting enabled, it worked fine.

Anybody who has experienced this odd behaviour due to taint mode?
 
M

Michele Dondi

Well, I tested, and it just made my script hang.

I assumed that $site contains the downloaded page as a string (as of
the snippet I *quoted* in my post):

my $site = get 'http://www.webbuyeruk.co.uk/links.htm';

Since I'm offline now:


#!/usr/bin/perl

use strict;
use warnings;

my $site=<<"END";
<html>
very minimal HTML indeed!
</html>
END

open my $fh, '<', \$site or die
$!;
/!$/ and print while <$fh>;

__END__


Michele
 
M

Michele Dondi

Yesss... $site has no newlines in. Michele meant something more like
^^^^^^^^^^^^^^^^^^

Hmmm, It shouldn't make a difference: see the following (tested),

#!/usr/bin/perl -l

use strict;
use warnings;

open my $fh, '<', \('foo') or die $!;
print <$fh>;

__END__

my $html = get $site;

But... hey! For once I *think* I paid attention: go back to my post, I
wasn't referring to the OP's script (and hence $site), I quoted some
code where my $site is your $html. (Please forgive me for the pun!)


Michele
 
M

Michele Dondi

[snip]
I figured out that the reason for my problems was that I run my test
script in taint mode. Untainting $site:

$site = $1 if $site =~ /(.*)/s;

does not make a difference, and I don't get any meaningful error
message. (Only "Premature end of script headers".)

Well, it's evident even from your .sig that you "have to do" with CGI
et similia. But from the posts (of yours) I read it seems you're *not*
"yet anoter Perl==CGI-kinda-guy", so is there any good reason for
testing the above snippet in *that* environment? Said this, I hope
you'll find a good answer to your question...


Michele
 
G

Gunnar Hjalmarsson

Michele said:
Gunnar said:
open my $file, '<', \$site or die $!;
do_something while <$file>;
[snip]

I figured out that the reason for my problems was that I run my
test script in taint mode. Untainting $site:

$site = $1 if $site =~ /(.*)/s;

does not make a difference, and I don't get any meaningful error
message. (Only "Premature end of script headers".)

Well, it's evident even from your .sig that you "have to do" with
CGI et similia.

Guilty as charged.
But from the posts (of yours) I read it seems you're *not* "yet
anoter Perl==CGI-kinda-guy",

Yeah, I do know that Perl == CGI returns false (or doesn't
compile...). ;-)
so is there any good reason for testing the above snippet in *that*
environment?

Well, I'm on a W98 box, and not very fond of the MS-DOS window. Maybe
the truth is that I have never bothered to learn how to configure
and/or use it properly.

Anyway, since most Perl things I do actually are CGI apps, I have
simply made it a habit to also run 5 lines test programs as CGI.

Not sure if those reasons are good enough. :)
 
D

David H. Adler

Yeah, I do know that Perl == CGI returns false (or doesn't
compile...). ;-)

Ahem.

~ 18:13:43% perl -e 'print "yikes!\n" if Perl == CGI';
yikes!

eq, however, is a different matter entirely... :)

dha
 
G

Gunnar Hjalmarsson

David said:
Ahem.

~ 18:13:43% perl -e 'print "yikes!\n" if Perl == CGI';
yikes!

eq, however, is a different matter entirely... :)

Ouch! What can I say.. Maybe: You should have enabled strictures! ;-)
 
D

David H. Adler

Ouch! What can I say.. Maybe: You should have enabled strictures! ;-)

But that wouldn't be any fun. :)

(for those of you wondering why this happens, it's because perl treats
all strings containing no digits the same way in numeric context
(iirc)).

dha
 
M

Michele Dondi

Well, it's evident even from your .sig that you "have to do" with
CGI et similia.

Guilty as charged. [snip]
Well, I'm on a W98 box, and not very fond of the MS-DOS window. Maybe
the truth is that I have never bothered to learn how to configure
and/or use it properly.

AFAIK, sad as it can be, there's not much to configure and/or "use
properly". But as far as I'm concerned I'm keen on cmd line UI's,
whatever they are! I've been grown up on good 'ol MS-DOS, oh! those
days when it was natural for me to think that nothing could prevent a
priori anything with a M$ in it to be any good ;-)... I've used both
the standard shell and enhanced ones like 4dos... of course
discovering real shells under Linux was so breathtaking!! Still using
DOS prompt under W98 et similia here, though...
Not sure if those reasons are good enough. :)

Well you didn't need to justify yourself, I was just being curious!


Michele
 
M

Michele Dondi

Ahem.

~ 18:13:43% perl -e 'print "yikes!\n" if Perl == CGI';
yikes!

eq, however, is a different matter entirely... :)

FWIW I realized I should have written 'eq' in the first place soon
after sending my post. I half-heartily (<OT>BTW: is this idiomatically
correct in English?</OT>) wanted to post an amendment to it, but
eventually didn't: had I actually known that 'Perl' == 'CGI' returns
true (which I didn't!), I would have certainly done that! Just too
joke-prone not to take advantage of it myself!!


Michele
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top