OPEN( , Get , or slurping problem

Discussion in 'Perl Misc' started by Chris, Feb 16, 2004.

  1. Chris

    Chris Guest

    Hi,

    I'm trying to import a htm file (from an external site) into an array
    and then parse each line to check for a certain line. I have tried
    the following:

    #!/usr/local/bin/perl -w
    use warnings;
    use strict;

    use LWP::Simple;
    my @site = ("http://www.webbuyeruk.co.uk/links.htm");

    foreach my $site (@site){
    my @content = get ($site);

    print "Array entries: $#content\n";
    }

    the above puts all of the lines into the first array entry [0], how
    can I change this??


    Also the following:

    open(MYFILE, "<($site[0])") || die "Can't open $site[0] : $!\n";;
    my @filedata = <MYFILE>;
    close(MYFILE);

    gives me the following result:
    Can't open http://www.webbuyeruk.co.uk/links.htm : Invalid argument

    Is this because it is trying to change the file instead of reading it?
    How can I get around this?

    Chris.
     
    Chris, Feb 16, 2004
    #1
    1. Advertising

  2. Chris

    Paul Lalli Guest

    On Mon, 16 Feb 2004, Chris wrote:

    > Hi,
    >
    > I'm trying to import a htm file (from an external site) into an array
    > and then parse each line to check for a certain line. I have tried
    > the following:
    >
    > #!/usr/local/bin/perl -w
    > use warnings;
    > use strict;
    >
    > use LWP::Simple;
    > my @site = ("http://www.webbuyeruk.co.uk/links.htm");
    >
    > foreach my $site (@site){
    > my @content = get ($site);
    >
    > print "Array entries: $#content\n";
    > }
    >
    > the above puts all of the lines into the first array entry [0], how
    > can I change this??
    >


    perldoc LWP::Simple shows that get() returns a single string. That's it's
    behavior. If you want each line in a different element of an array, do it
    yourself:

    my @content = split /\n/, get($site); #assumes \n is what you mean by 'line'

    >
    > Also the following:
    >
    > open(MYFILE, "<($site[0])") || die "Can't open $site[0] : $!\n";;
    > my @filedata = <MYFILE>;
    > close(MYFILE);
    >
    > gives me the following result:
    > Can't open http://www.webbuyeruk.co.uk/links.htm : Invalid argument
    >
    > Is this because it is trying to change the file instead of reading it?
    > How can I get around this?



    What are you *trying* to do here? Your code is attempting to open a local
    file named "(http://www.webbuyeruk.co.uk/links.htm)" and write read from
    it. I find it decidedly unlikely such a file exists on your local system.

    Paul Lalli
     
    Paul Lalli, Feb 16, 2004
    #2
    1. Advertising

  3. Chris

    Ben Morrow Guest

    (Chris) wrote:
    > I'm trying to import a htm


    HTML. Never mind that some people still use brain-damaged 8.3 names.

    > file (from an external site) into an array
    > and then parse each line to check for a certain line. I have tried
    > the following:
    >
    > #!/usr/local/bin/perl -w
    > use warnings;


    No need for belt and braces: use warnings replaces -w :).

    > use strict;
    >
    > use LWP::Simple;
    > my @site = ("http://www.webbuyeruk.co.uk/links.htm");
    >
    > foreach my $site (@site){
    > my @content = get ($site);
    >
    > print "Array entries: $#content\n";
    > }
    >
    > the above puts all of the lines into the first array entry [0], how
    > can I change this??


    my $content = get $site;
    my @content = split /\n/, $content;

    Some people would object to my using both $content and @content here...
    that is a matter of style you may wish to consider.

    > Also the following:
    >
    > open(MYFILE, "<($site[0])") || die "Can't open $site[0] : $!\n";;
    > my @filedata = <MYFILE>;
    > close(MYFILE);
    >
    > gives me the following result:
    > Can't open http://www.webbuyeruk.co.uk/links.htm : Invalid argument


    Well, what did you expect? Perl != PHP: open is for opening *files*.
    Presuming you're on a Win32 system (something tells me you are :) this
    will be looking for an 'http:' drive, which is, as the error message
    said, invalid.

    > Is this because it is trying to change the file instead of reading it?
    > How can I get around this?


    Use LWP, as you were.

    You may also be better off using an HTML-parsing module than trying to
    parse it by hand, depending on how constant the format of the page is.

    Ben

    --
    perl -e'print map {/.(.)/s} sort unpack "a2"x26, pack "N"x13,
    qw/1632265075 1651865445 1685354798 1696626283 1752131169 1769237618
    1801808488 1830841936 1886550130 1914728293 1936225377 1969451372
    2047502190/' #
     
    Ben Morrow, Feb 16, 2004
    #3
  4. Chris wrote:
    > I'm trying to import a htm file (from an external site) into an
    > array and then parse each line to check for a certain line. I have
    > tried the following:
    >
    > #!/usr/local/bin/perl -w
    > use warnings;
    > use strict;
    >
    > use LWP::Simple;
    > my @site = ("http://www.webbuyeruk.co.uk/links.htm");
    >
    > foreach my $site (@site){
    > my @content = get ($site);
    >
    > print "Array entries: $#content\n";
    > }
    >
    > the above puts all of the lines into the first array entry [0], how
    > can I change this??


    You need to think it over when it's suitable to use an array and when
    it's not. the get() function returns the content as a string, so why
    not just do:

    use LWP::Simple;
    my $site = get 'http://www.webbuyeruk.co.uk/links.htm';
    while ( $site =~ /(.*)/g ) {
    if ($1 =~ /PATTERN/) {
    print "Found\n";
    last;
    }
    }

    > Also the following:
    >
    > open(MYFILE, "<($site[0])") || die "Can't open $site[0] : $!\n";;


    You can't open a URL! Please learn the difference between a path and a
    URL.

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
     
    Gunnar Hjalmarsson, Feb 16, 2004
    #4
  5. On Mon, 16 Feb 2004 21:59:48 +0100, Gunnar Hjalmarsson
    <> wrote:

    > use LWP::Simple;
    > my $site = get 'http://www.webbuyeruk.co.uk/links.htm';
    > while ( $site =~ /(.*)/g ) {
    > if ($1 =~ /PATTERN/) {
    > print "Found\n";
    > last;
    > }
    > }

    [...]
    >> open(MYFILE, "<($site[0])") || die "Can't open $site[0] : $!\n";;

    >
    >You can't open a URL! Please learn the difference between a path and a
    >URL.


    But then, if he *really* wants to open() the downloaded HTML, he could
    do that "in memory":

    # untested
    open my $file, '<', \$site or die $!;
    do_something while <$file>;


    Michele
    --
    #!/usr/bin/perl -lp
    BEGIN{*ARGV=do{open $_,q,<,,\$/;$_}}s z^z seek DATA,11,$[;($,
    =ucfirst<DATA>)=~s x .*x q^~ZEX69l^^q,^2$;][@,xe.$, zex,s e1e
    q 1~BEER XX1^q~4761rA67thb ~eex ,s aba m,P..,,substr$&,$.,age
    __END__
     
    Michele Dondi, Feb 17, 2004
    #5
  6. Chris

    Tore Aursand Guest

    On Mon, 16 Feb 2004 12:04:27 -0800, Chris wrote:
    > #!/usr/local/bin/perl -w
    > use warnings;
    > use strict;


    No need for the '-w' flag as long as you 'use warnings';

    #!/usr/local/bin/perl
    #
    use strict;
    use warnings;

    > use LWP::Simple;
    > my @site = ("http://www.webbuyeruk.co.uk/links.htm");
    >
    > foreach my $site (@site){
    > my @content = get ($site);
    >
    > print "Array entries: $#content\n";
    > }
    >
    > the above puts all of the lines into the first array entry [0], how
    > can I change this??


    You usually don't want to change this, unless you really have to.
    LWP::Simple's get() function returns a string. You could always split the
    string on line breaks, but do you really have to?

    foreach ( @site ) {
    my $content = get( $_ );
    unless ( defined $content ) {
    # Error
    }
    }

    > open(MYFILE, "<($site[0])") || die "Can't open $site[0] : $!\n";;
    > my @filedata = <MYFILE>;
    > close(MYFILE);


    $site[0] refers to the first element of @site, which is the URL of the
    site (mentioned in your code).

    If you told us what you want to do with the returning HTML, we could
    probably give you some tips to some modules which would help you out.


    --
    Tore Aursand <>
    "Leadership is doing what is right when no one is watching." -- George
    Van Valkenburg
     
    Tore Aursand, Feb 17, 2004
    #6
  7. Chris

    Anno Siegel Guest

    Ben Morrow <> wrote in comp.lang.perl.misc:

    [...]

    > my $content = get $site;
    > my @content = split /\n/, $content;
    >
    > Some people would object to my using both $content and @content here...
    > that is a matter of style you may wish to consider.


    I wouldn't object at all. When the same content is represented in different
    forms, using the same name for both is intuitive and describes the situation.
    I do it all the time.

    Anno
     
    Anno Siegel, Feb 17, 2004
    #7
  8. Michele Dondi wrote:
    > On Mon, 16 Feb 2004 21:59:48 +0100, Gunnar Hjalmarsson
    > <> wrote:
    >>
    >> use LWP::Simple;
    >> my $site = get 'http://www.webbuyeruk.co.uk/links.htm';
    >> while ( $site =~ /(.*)/g ) {
    >> if ($1 =~ /PATTERN/) {
    >> print "Found\n";
    >> last;
    >> }
    >> }

    >
    > [...]
    >
    >>> open(MYFILE, "<($site[0])") || die "Can't open $site[0] : $!\n";;

    >>
    >> You can't open a URL! Please learn the difference between a path
    >> and a URL.

    >
    > But then, if he *really* wants to open() the downloaded HTML, he
    > could do that "in memory":
    >
    > # untested
    > open my $file, '<', \$site or die $!;
    > do_something while <$file>;


    Well, I tested, and it just made my script hang.

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
     
    Gunnar Hjalmarsson, Feb 17, 2004
    #8
  9. Chris

    Ben Morrow Guest

    Gunnar Hjalmarsson <> wrote:
    > Michele Dondi wrote:
    > > But then, if he *really* wants to open() the downloaded HTML, he
    > > could do that "in memory":
    > >
    > > # untested
    > > open my $file, '<', \$site or die $!;
    > > do_something while <$file>;

    >
    > Well, I tested, and it just made my script hang.


    Yesss... $site has no newlines in. Michele meant something more like

    my $html = get $site;
    open my $FILE, '<', \$html or die $!;
    do_summat while <$FILE>;

    Ben

    --
    'Deserve [death]? I daresay he did. Many live that deserve death. And some die
    that deserve life. Can you give it to them? Then do not be too eager to deal
    out death in judgement. For even the very wise cannot see all ends.'
    :-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:
     
    Ben Morrow, Feb 17, 2004
    #9
  10. Ben Morrow wrote:
    > Gunnar Hjalmarsson wrote:
    >>Michele Dondi wrote:
    >>>But then, if he *really* wants to open() the downloaded HTML, he
    >>>could do that "in memory":
    >>>
    >>> # untested
    >>> open my $file, '<', \$site or die $!;
    >>> do_something while <$file>;

    >>
    >>Well, I tested, and it just made my script hang.

    >
    > Yesss... $site has no newlines in.


    Yes, in my suggestion it has. :)

    I figured out that the reason for my problems was that I run my test
    script in taint mode. Untainting $site:

    $site = $1 if $site =~ /(.*)/s;

    does not make a difference, and I don't get any meaningful error
    message. (Only "Premature end of script headers".)

    But when running the script without tainting enabled, it worked fine.

    Anybody who has experienced this odd behaviour due to taint mode?

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
     
    Gunnar Hjalmarsson, Feb 17, 2004
    #10
  11. On Tue, 17 Feb 2004 13:08:46 +0100, Gunnar Hjalmarsson
    <> wrote:

    >> # untested
    >> open my $file, '<', \$site or die $!;
    >> do_something while <$file>;

    >
    >Well, I tested, and it just made my script hang.


    I assumed that $site contains the downloaded page as a string (as of
    the snippet I *quoted* in my post):

    my $site = get 'http://www.webbuyeruk.co.uk/links.htm';

    Since I'm offline now:


    #!/usr/bin/perl

    use strict;
    use warnings;

    my $site=<<"END";
    <html>
    very minimal HTML indeed!
    </html>
    END

    open my $fh, '<', \$site or die
    $!;
    /!$/ and print while <$fh>;

    __END__


    Michele
    --
    you'll see that it shouldn't be so. AND, the writting as usuall is
    fantastic incompetent. To illustrate, i quote:
    - Xah Lee trolling on clpmisc,
    "perl bug File::Basename and Perl's nature"
     
    Michele Dondi, Feb 17, 2004
    #11
  12. On Tue, 17 Feb 2004 12:40:41 +0000 (UTC), Ben Morrow
    <> wrote:

    >> Well, I tested, and it just made my script hang.

    >
    >Yesss... $site has no newlines in. Michele meant something more like

    ^^^^^^^^^^^^^^^^^^

    Hmmm, It shouldn't make a difference: see the following (tested),

    #!/usr/bin/perl -l

    use strict;
    use warnings;

    open my $fh, '<', \('foo') or die $!;
    print <$fh>;

    __END__


    >my $html = get $site;


    But... hey! For once I *think* I paid attention: go back to my post, I
    wasn't referring to the OP's script (and hence $site), I quoted some
    code where my $site is your $html. (Please forgive me for the pun!)


    Michele
    --
    you'll see that it shouldn't be so. AND, the writting as usuall is
    fantastic incompetent. To illustrate, i quote:
    - Xah Lee trolling on clpmisc,
    "perl bug File::Basename and Perl's nature"
     
    Michele Dondi, Feb 17, 2004
    #12
  13. Michele Dondi wrote:
    > On Tue, 17 Feb 2004 13:08:46 +0100, Gunnar Hjalmarsson
    > <> wrote:
    >>
    >>> # untested
    >>> open my $file, '<', \$site or die $!;
    >>> do_something while <$file>;

    >>
    >> Well, I tested, and it just made my script hang.

    >
    > I assumed that $site contains the downloaded page as a string (as
    > of the snippet I *quoted* in my post):
    >
    > my $site = get 'http://www.webbuyeruk.co.uk/links.htm';


    Yep. I explained in a reply to Ben the nature of the problem I
    encountered.

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
     
    Gunnar Hjalmarsson, Feb 18, 2004
    #13
  14. On Tue, 17 Feb 2004 22:01:05 +0100, Gunnar Hjalmarsson
    <> wrote:

    >>>> open my $file, '<', \$site or die $!;
    >>>> do_something while <$file>;

    [snip]
    >I figured out that the reason for my problems was that I run my test
    >script in taint mode. Untainting $site:
    >
    > $site = $1 if $site =~ /(.*)/s;
    >
    >does not make a difference, and I don't get any meaningful error
    >message. (Only "Premature end of script headers".)


    Well, it's evident even from your .sig that you "have to do" with CGI
    et similia. But from the posts (of yours) I read it seems you're *not*
    "yet anoter Perl==CGI-kinda-guy", so is there any good reason for
    testing the above snippet in *that* environment? Said this, I hope
    you'll find a good answer to your question...


    Michele
    --
    # This prints: Just another Perl hacker,
    seek DATA,15,0 and print q... <DATA>;
    __END__
     
    Michele Dondi, Feb 18, 2004
    #14
  15. Michele Dondi wrote:
    > Gunnar Hjalmarsson wrote:
    >
    >>>>> open my $file, '<', \$site or die $!;
    >>>>> do_something while <$file>;

    >
    > [snip]
    >
    >> I figured out that the reason for my problems was that I run my
    >> test script in taint mode. Untainting $site:
    >>
    >> $site = $1 if $site =~ /(.*)/s;
    >>
    >> does not make a difference, and I don't get any meaningful error
    >> message. (Only "Premature end of script headers".)

    >
    > Well, it's evident even from your .sig that you "have to do" with
    > CGI et similia.


    Guilty as charged.

    > But from the posts (of yours) I read it seems you're *not* "yet
    > anoter Perl==CGI-kinda-guy",


    Yeah, I do know that Perl == CGI returns false (or doesn't
    compile...). ;-)

    > so is there any good reason for testing the above snippet in *that*
    > environment?


    Well, I'm on a W98 box, and not very fond of the MS-DOS window. Maybe
    the truth is that I have never bothered to learn how to configure
    and/or use it properly.

    Anyway, since most Perl things I do actually are CGI apps, I have
    simply made it a habit to also run 5 lines test programs as CGI.

    Not sure if those reasons are good enough. :)

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
     
    Gunnar Hjalmarsson, Feb 18, 2004
    #15
  16. In article <c0vtj9$1cqoqv$-berlin.de>, Gunnar
    Hjalmarsson wrote:

    > Yeah, I do know that Perl == CGI returns false (or doesn't
    > compile...). ;-)


    Ahem.

    ~ 18:13:43% perl -e 'print "yikes!\n" if Perl == CGI';
    yikes!

    eq, however, is a different matter entirely... :)

    dha

    --
    David H. Adler - <> - http://www.panix.com/~dha/
    .... we didn't know what the hell we were doing, but we did it loud.
    - Andy Partidge on early XTC
     
    David H. Adler, Feb 18, 2004
    #16
  17. David H. Adler wrote:
    > In article <c0vtj9$1cqoqv$-berlin.de>, Gunnar
    > Hjalmarsson wrote:
    >
    >>Yeah, I do know that Perl == CGI returns false (or doesn't
    >>compile...). ;-)

    >
    > Ahem.
    >
    > ~ 18:13:43% perl -e 'print "yikes!\n" if Perl == CGI';
    > yikes!
    >
    > eq, however, is a different matter entirely... :)


    Ouch! What can I say.. Maybe: You should have enabled strictures! ;-)

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
     
    Gunnar Hjalmarsson, Feb 18, 2004
    #17
  18. In article <c10ud1$1dgan2$-berlin.de>, Gunnar
    Hjalmarsson wrote:
    > David H. Adler wrote:
    >> In article <c0vtj9$1cqoqv$-berlin.de>, Gunnar
    >> Hjalmarsson wrote:
    >>
    >>>Yeah, I do know that Perl == CGI returns false (or doesn't
    >>>compile...). ;-)

    >>
    >> Ahem.
    >>
    >> ~ 18:13:43% perl -e 'print "yikes!\n" if Perl == CGI';
    >> yikes!
    >>
    >> eq, however, is a different matter entirely... :)

    >
    > Ouch! What can I say.. Maybe: You should have enabled strictures! ;-)


    But that wouldn't be any fun. :)

    (for those of you wondering why this happens, it's because perl treats
    all strings containing no digits the same way in numeric context
    (iirc)).

    dha

    --
    David H. Adler - <> - http://www.panix.com/~dha/
    "I'll keep him as an insurance policy, since, unfortunately, I can't
    kill him twice." - Scaroth
     
    David H. Adler, Feb 19, 2004
    #18
  19. On Wed, 18 Feb 2004 15:28:53 +0100, Gunnar Hjalmarsson
    <> wrote:

    >>> does not make a difference, and I don't get any meaningful error
    >>> message. (Only "Premature end of script headers".)

    >>
    >> Well, it's evident even from your .sig that you "have to do" with
    >> CGI et similia.

    >
    >Guilty as charged.

    [snip]
    >Well, I'm on a W98 box, and not very fond of the MS-DOS window. Maybe
    >the truth is that I have never bothered to learn how to configure
    >and/or use it properly.


    AFAIK, sad as it can be, there's not much to configure and/or "use
    properly". But as far as I'm concerned I'm keen on cmd line UI's,
    whatever they are! I've been grown up on good 'ol MS-DOS, oh! those
    days when it was natural for me to think that nothing could prevent a
    priori anything with a M$ in it to be any good ;-)... I've used both
    the standard shell and enhanced ones like 4dos... of course
    discovering real shells under Linux was so breathtaking!! Still using
    DOS prompt under W98 et similia here, though...

    >Not sure if those reasons are good enough. :)


    Well you didn't need to justify yourself, I was just being curious!


    Michele
    --
    you'll see that it shouldn't be so. AND, the writting as usuall is
    fantastic incompetent. To illustrate, i quote:
    - Xah Lee trolling on clpmisc,
    "perl bug File::Basename and Perl's nature"
     
    Michele Dondi, Feb 19, 2004
    #19
  20. On Wed, 18 Feb 2004 23:16:49 +0000 (UTC), "David H. Adler"
    <> wrote:

    >> Yeah, I do know that Perl == CGI returns false (or doesn't
    >> compile...). ;-)

    >
    >Ahem.
    >
    >~ 18:13:43% perl -e 'print "yikes!\n" if Perl == CGI';
    >yikes!
    >
    >eq, however, is a different matter entirely... :)


    FWIW I realized I should have written 'eq' in the first place soon
    after sending my post. I half-heartily (<OT>BTW: is this idiomatically
    correct in English?</OT>) wanted to post an amendment to it, but
    eventually didn't: had I actually known that 'Perl' == 'CGI' returns
    true (which I didn't!), I would have certainly done that! Just too
    joke-prone not to take advantage of it myself!!


    Michele
    --
    you'll see that it shouldn't be so. AND, the writting as usuall is
    fantastic incompetent. To illustrate, i quote:
    - Xah Lee trolling on clpmisc,
    "perl bug File::Basename and Perl's nature"
     
    Michele Dondi, Feb 19, 2004
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Wijaya Edward
    Replies:
    2
    Views:
    291
    Paul McGuire
    Oct 27, 2006
  2. George

    slurping in binary data

    George, Nov 18, 2008, in forum: C Programming
    Replies:
    120
    Views:
    2,437
  3. Nomen Nescio

    slurping in binary data

    Nomen Nescio, Dec 3, 2008, in forum: C Programming
    Replies:
    0
    Views:
    284
    Nomen Nescio
    Dec 3, 2008
  4. Xeno Campanoli

    Slurping a file like in Perl

    Xeno Campanoli, Dec 24, 2003, in forum: Ruby
    Replies:
    2
    Views:
    199
    Gavin Sinclair
    Dec 24, 2003
  5. Replies:
    2
    Views:
    107
    Robert Klemme
    Apr 27, 2006
Loading...

Share This Page