"getting" a website

J

Jürgen Exner

[nested quoted incorrectly marked by Uno]
Uno said:
What I want is a reasonable discussion of perl.

Excuse me, but that reply doesn't answer the question.

You wrote "the value of $_ is more than I wanted". So you were expecting
it to contain some value but it contained more than this value. We
cannot read your mind. Unless you tell us what you were expecting there
is no way for us to find out what this difference is and even less to
tell you why there is this difference.

I don't believe you honestly expected the variable to contain a
reasonable discussion of perl? Because your answer does make it sound
like that.
My notion of
"reasonable" might be another's notion of "highly idiomatic and usually OT."

Maybe, maybe not. The problem here is a very basic one: unless you tell
us what the expected outcome was and what the actual outcome was and how
those two are different we cannot tell you why they are different
because we don't know what that difference is.

jue
 
U

Uno

Well, then why don't you tell the adminstrator of your system that he
didn't finish his job?

He finally did. He just also happens to be my maid, mechanic, chef, and
stunt double.:)

I thought I was ahead of the curve to be able to get around using the f
and b keys:

"cats and dogs" =~ /cat|dog|bird/; # matches "cat"
"cats and dogs" =~ /dog|cat|bird/; # matches "cat"

Interesting. Might a person call the alternating operator "idempotent?"
 
U

Uno

Uno said:
On 03/10/2011 06:35 AM, Tad McClellan wrote: ^^^^^^^^^^^^^^^^^^^

[re-ordered, for thematic reasons}


I did NOT write that...

I did, and wish I had edited differently. Apparently I was surprised
that $_ was giving me the whole line instead of only what I was matching
against.

Well of course it does. It was my first rodeo writing a control with a
match in it.

<snip>
 
U

Uri Guttman

U> "cats and dogs" =~ /cat|dog|bird/; # matches "cat"
U> "cats and dogs" =~ /dog|cat|bird/; # matches "cat"

U> Interesting. Might a person call the alternating operator "idempotent?"

not interesting. you don't get the ordering. a regex scans the data and
then tries to match it in the regex. cat will always match first since
it is first in the string.

uri
 
R

Randal L. Schwartz

Uno> "cats and dogs" =~ /cat|dog|bird/; # matches "cat"
Uno> "cats and dogs" =~ /dog|cat|bird/; # matches "cat"

The matches are tried from left to right, at a starting position that
goes left to right.

Try this instead to see the difference:

"cats dogs" =~ /dog|cat|cats/;
"cats dogs" =~ /dog|cats|cat/;
 
U

Uno

On 03/18/2011 02:45 AM, Uno wrote:

[big snip]

Had I made better editing choices I would simply have posted what I had
here:
success is 200
http://www.germanresistance.com/documents/GermanResistanceV2.zip
success is 200
$ cat gurl9.pl
#!/usr/bin/perl -w
# gurl - get content from an url

use LWP::Simple;
require HTML::TokeParser;

my $file = "bonhoeffer2";
my $URL = 'http://germanresistance.com/index-of-papers/';


my $p = HTML::TokeParser->new(bonhoeffer2);


while (my $token = $p->get_tag("a")) {
my $url = $token->[1]{href} || "-";
if ($url =~ "http://www.germanresistance.com/documents.*(pdf|zip)" ) {
print "$url\n";

my $success = getstore( $url, "/pdfs/$url" );
print "success is $success\n";
}
}

So why am I told that 200 is my success when I have none?

200 OK
Standard response for successful HTTP requests. The actual response
will depend on the request method used. In a GET request, the response
will contain an entity corresponding to the requested resource. In a
POST request the response will contain an entity describing or
containing the result of the action.[2]

getstore($url, $file)

Gets a document identified by a URL and stores it in the file. The
return value is the HTTP response code.


So I was able to track down the loose ends here and get this squared away:

$ perl gurl9.pl
http://www.germanresistance.com/documents/Intro_to_Bonhoeffer.pdf
Intro_to_Bonhoeffer.pdf
success is 200
....
http://www.germanresistance.com/documents/GermanResistanceV2.zip
GermanResistanceV2.zip
success is 200
$ cd pdfs
$ ls
Agent_of_Grace.pdf
Bonhoeffer_on_Abortion.pdf
Chronology_of_the_life_of_Dietrich_Bonhoeffer.pdf
Dietrich_Bonhoeffer_and_Canossa.pdf
Dietrich_Bonhoeffer_and_Karl_Barth.pdf
Dietrich_Bonhoeffer_and_Liberalism.pdf
Dietrich_Bonhoeffer_and_the_Formula_of_Concord.pdf
Dietrich_Bonhoeffer_and_the_Russian_Religious_Renaissance.pdf
Dietrich_Bonhoeffer_and_the_Theology_of_the_Cross.pdf
Dietrich_Bonhoeffer_on_Authority.pdf
Dietrich_Bonhoeffer_the_resistance_and_the_two_kingdoms.pdf
Dietrich_Bonhoeffer_and_the_German_Resistance_'95.pdf
From_Dietrich_Bonhoeffer’s_Wedding_Sermon.pdf
GermanResistanceV2.zip
Intro_to_Bonhoeffer.pdf
Invidious_Comparisons.pdf
Luther_and_Bonhoeffer_misunderstood.pdf
Luther_Bonhoeffer_and_Revolution.pdf
Pius_XII_and_the_Jews.pdf
The_German_Resistance_60_years.pdf
The_German_Resistance_and_Dietrich_Bonhoeffer_'07.pdf
The_stereotyping_of_Dietrich_Bonhoeffer.pdf
$ cd ..
$ cat gurl9.pl
#!/usr/bin/perl -w
# gurl - get content from an url

use LWP::Simple;
require HTML::TokeParser;

my $file = "bonhoeffer2";
my $URL = 'http://germanresistance.com/index-of-papers/';


my $p = HTML::TokeParser->new(bonhoeffer2);

my ($dir, $file2);
$dir = 'pdfs/';

while (my $token = $p->get_tag("a")) {
my $url = $token->[1]{href} || "-";
if ($url =~
"http://www.germanresistance.com/documents.*(pdf|zip)" ) {
print "$url\n";
$file2 = $url;
$file2 =~ s#http://www.germanresistance.com/documents/##;
print "$file2\n";

my $success = getstore( $url, $dir.$file2 );
print "success is $success\n";
}
}

So I'm really pleased with this result and thank the forum for its help.
Also looking for tips on how to make this look like it wasn't written
with crayon.
--
Uno









$
 
U

Uno

U> "cats and dogs" =~ /cat|dog|bird/; # matches "cat"
U> "cats and dogs" =~ /dog|cat|bird/; # matches "cat"

U> Interesting. Might a person call the alternating operator "idempotent?"

not interesting. you don't get the ordering. a regex scans the data and
then tries to match it in the regex. cat will always match first since
it is first in the string.

So, are you saying that man perlretut is uninteresting? There's so much
about regex's that I don't know that i have to skip alot of it.

I think I'm cool as cats because I just went up to it and typed
/alternat
and got

Matching this or that
Sometimes we would like our regexp to be able to match different
possible words or character strings. This is accomplished by
using the
alternation metacharacter "|". To match "dog" or "cat", we form the
regexp "dog|cat". As before, Perl will try to match the regexp
at the
earliest possible point in the string. At each character position,
Perl will first try to match the first alternative, "dog". If "dog"
doesn't match, Perl will then try the next alternative, "cat". If
"cat" doesn't match either, then the match fails and Perl moves
to the
next position in the string. Some examples:

"cats and dogs" =~ /cat|dog|bird/; # matches "cat"
"cats and dogs" =~ /dog|cat|bird/; # matches "cat"

Even though "dog" is the first alternative in the second regexp,
"cat"
is able to match earlier in the string.

, without hitting f once.

What other things do you ascribe idempotence to?
 
U

Uri Guttman

U> "cats and dogs" =~ /cat|dog|bird/; # matches "cat"
U> "cats and dogs" =~ /dog|cat|bird/; # matches "cat"
U> So, are you saying that man perlretut is uninteresting? There's so
U> much about regex's that I don't know that i have to skip alot of it.

no, your comments are uninteresting. you don't seem to get how regexes
run and you read the docs and still don't get them.

U> Matching this or that Sometimes we would like our regexp to be
U> able to match different possible words or character strings. This
U> is accomplished by using the alternation metacharacter "|". To
U> match "dog" or "cat", we form the regexp "dog|cat". As before,
U> Perl will try to match the regexp at the earliest possible point in
U> the string. At each character position, Perl will first try to
U> match the first alternative, "dog". If "dog" doesn't match, Perl
U> will then try the next alternative, "cat". If "cat" doesn't match
U> either, then the match fails and Perl moves to the next position in
U> the string. Some examples:

U> "cats and dogs" =~ /cat|dog|bird/; # matches "cat"
U> "cats and dogs" =~ /dog|cat|bird/; # matches "cat"

U> Even though "dog" is the first alternative in the second
U> regexp, "cat"
U> is able to match earlier in the string.

U> , without hitting f once.

which is just what happens and you seem to think it should be different.

uri
 
C

C.DeRykus

So, are you saying that man perlretut is uninteresting?  There's so much
about regex's that I don't know that i have to skip alot of it.

I think I'm cool as cats because I just went up to it and typed
/alternat
and got

    Matching this or that
        Sometimes we would like our regexp to be able to match different
        possible words or character strings.  This is accomplished by
using the
        alternation metacharacter "|".  To match "dog" or "cat", we form the
        regexp "dog|cat".  As before, Perl will try to match the regexp
at the
        earliest possible point in the string.  At each character position,
        Perl will first try to match the first alternative, "dog"..  If "dog"
        doesn't match, Perl will then try the next alternative, "cat".  If
        "cat" doesn't match either, then the match fails and Perlmoves
to the
        next position in the string.  Some examples:

            "cats and dogs" =~ /cat|dog|bird/;  # matches"cat"
            "cats and dogs" =~ /dog|cat|bird/;  # matches"cat"

        Even though "dog" is the first alternative in the second regexp,
"cat"
        is able to match earlier in the string.

, without hitting f once.

What other things do you ascribe idempotence to?


That's because the regex remains anchored to
position 0 in the string 'cats and dogs' after alternative #1 'dog'
fails. So the regex then
tries the other alternatives in succession.

Perhaps you're conflating alternatives with
separate pattern matches:


$_ = 'cats and dogs';
print "got a dog and a cat\n"
if /dog/ and /cat/;
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,140
Latest member
SweetcalmCBDreview
Top