Making the simple impossible and the impossible unthinkable...


X

xfx.publishing

Heya. I'm up against what is turning out to be a tough one...

I'm trying to get a Perl script with LWP to get information from AOL
profiles.

What I'm doing is I have a LWP useragent logging in with my AIM
screenname and password at
https://my.screenname.aol.com/_cqr/login/login.psp, grabbing the
cookies, and then requesting
http://memberdirectory.aol.com/aolus/profile?sn=$screenname

The login part is working, and I'm caching the cookies fine it looks
like, but the results I get back from
http://memberdirectory.aol.com/aolus/profile?sn=$screenname are not the
same thing I'm getting in my real browser (firefox) -- and not getting
the info I want out to parse.
From what I can tell there's some weird and utterly pointless
Javascripting going on. I set up a logger on my XP box (first free HTTP
logger I could find, so XP it was -- but I'm doing the code under
Debian Linux just so you know) and I'm getting cookies being sent that
I don't have -- which is giving the strong impression that some
Javascript somewhere is making these cookies up on the fly. This gives
me the strong impression that AOL takes the opposite philosophy of
Perl, and possibly has occasional meetings where people get together
and decide 'We don't have enough code -- we need more code to do the
same thing. Let's try and complicate things more.' I bet they have
buzzwords like 'complexify'. It's asinine to do so much stupid crap
just to set up login validation.

But, rant aside, if someone can figure out what's going on with this...

Here's the intended subroutine -- you will notice commented bits of
hair-pulling in places and debuggy bits, no doubt

But in theory this should be able to set up a hashref in the
$obj->{profile} key that contains all their AOL user info. In theory.
And I just can't get back the right info (and yes I double checked my
AIM password and stuff, and got distinctly different results putting in
a nonsense password just to make sure there wasn't manglement going on
there).

Anyway, well... this definitely counts as a challenge, if nothing else.
I'd like to have this working, but for the tme being, I just can't get
the right page back.

sub SetInfo {
my $obj = shift;
my $user = shift;

my $aol_login = $obj->{aol_login};
my $aol_pass = $obj->{aol_pass};

unless ($aol_login and $aol_pass) {
$obj->{errstr} = 'AOL or AIM Screen Name and Password required
to '.
'access.';
return undef;
}


# We is a BIG LIAH saying we's Firefox. In British. On XP. Just
what
# they expect. But some of these sites act like assholes so screw
it
# better that than this potentially not working if they block us
my $web = LWP::UserAgent->new('Mozilla/5.0 (Windows; U; Windows NT
5.1;'.
' en-GB; rv:1.8.0.3) Gecko/20060426 '.
'Firefox/1.5.0.3');
$web->cookie_jar(HTTP::Cookies->new(file =>
"/usr/local/WPCookies.lwp",
autosave => 1,
ignore_discard => 1));
#$web->cookie_jar({});

my $login =
$web->post('https://my.screenname.aol.com/_cqr/login/login.psp',
[sitedomain =>
'memberdirectory-beta.estage.aol.com',
siteId => '',
lang => 'en',
locale => 'us',
authLev => '1',
siteState =>
"OrigUrl%3Dhttp%253A%252F%252Fmemberdirectory.aol.com%252Faolus%252Fprofile%253Fsn%253D$user",
isSiteStateEncoded => 'true',
mcState => 'initialized',
usrd => '1889976',
loginId => $aol_login,
password => $aol_pass,
rememberMe => 'off']);
unless ($login->is_success) {
$obj->{errstr} = 'AOL login failed:'. $login->status_line;
return undef;
}
$obj->{login_page} = $login->content;
$obj->{ua} = $web;
#=stop
$obj->{login_headers} = $login->as_string;
$obj->{login_headers} =~ s/\n\n.*//gsm;
$obj->{login_cookies} = [];
for my $h (split /[\r\n]+/, $obj->{login_headers}) {
my ($k, $v) = split /:\s+/, $h;
push @{$obj->{login_cookies}}, {$k => $v} if lc $k eq
'set-cookie';
}
#=cut
if ($obj->{login_page}
=~ /You have entered an invalid Screen Name or password/) {
$obj->{errstr} = 'AOL login failed: invalid login info.';
return undef;
}

$obj->{pull_success} = "Didn't try yet.";
my $url_base = 'http://memberdirectory.aol.com/aolus/profile?sn=';
my $req = HTTP::Request->new('GET',
"$url_base$user",
[@{$obj->{login_cookies}},

);
my $resp = $web->request($req);

if ($resp->is_success) {
my $prof;
$obj->{pull_success} = "Successful.";
$obj->{page} = $resp->content;
my $p = HTML::pullParser->new(doc => $resp->content,
start => 'tagname, event, attr',
end => 'tagname, event,
skipped_text',
ignore_elements => [qw(script style
applet embed
object)],
report_tags => ['script']);
while (my $token = $p->get_token) {
my $type = $token->[1];
next unless ($type eq 'end');
my $script = $token->[2];
if ($script =~ /var\s+nameString\s*=/) {
# this is the right script with the data in it
# that is easy to read
$script =~ /var\s+memMessage\s*=\s*"I am (\w+)\."/;
$prof->{online} = $1;
$script =~ /var\s+nameDetails\s*=\s*"([^"]*)"/;
$prof->{name} = $1;
$script =~ /var\s+locDetails\s*=\s*"([^"]*)"/;
$prof->{loc} = $1;
$script =~ /var\s+genderDetails\s*=\s*"([^"]*)"/;
$prof->{gender} = $1;
$script =~ /var\s+maritalDetails\s*=\s*"([^"]*)"/;
$prof->{marital} = $1;
$script =~ /var\s+hobbiesDetails\s*=\s*"([^"]*)"/;
$prof->{hobbies} = $1;
$script =~ /var\s+gadgetsDetails\s*=\s*"([^"]*)"/;
$prof->{gadgets} = $1;
$script =~ /var\s+occDetails\s*=\s*"([^"]*)"/;
$prof->{occ} = $1;
$script =~ /var\s+quoteDetails\s*=\s*"([^"]*)"/;
$prof->{quote} = $1;
$script =~ /var\s+linksDetails\s*=\s*"([^"]*)"/;
$prof->{links} = $1;
}
for my $k (keys %{$prof}) {
# Strip out annoying HTML tags in profiles
$prof->{$k} =~ s/<[^>]*>//gsm;
}
$obj->{profile} = $prof;
}
}
else {
$obj->{pull_success} = 'Failed';
$obj->{errstr} = "Can't retrieve AOL member page for
$obj->{user}.\n";
return undef;
}
}
 
Ad

Advertisements

T

Tad McClellan

Subject: Re: Making the simple impossible and the impossible unthinkable...


Please put the subject of your article in the Subject of your article.

Folks that are interested in web scraping will read articles whose
subject mentions web scraping.

They are likely to skip reading articles whose subject cannot be
determined by the Subject.

I'm trying to get a Perl script with LWP to get information from AOL
profiles.

Javascripting going on. I set up a logger on my XP box (first free HTTP
logger I could find, so XP it was --


Web Scraping Proxy

http://www.research.att.com/~hpk/wsp/

but I'm doing the code under
Debian Linux just so you know) and I'm getting cookies being sent that
I don't have -- which is giving the strong impression that some
Javascript somewhere is making these cookies up on the fly. This gives
me the strong impression that AOL takes the opposite philosophy of
Perl, and possibly has occasional meetings where people get together
and decide 'We don't have enough code -- we need more code to do the
same thing. Let's try and complicate things more.' I bet they have
buzzwords like 'complexify'. It's asinine to do so much stupid crap
just to set up login validation.


It is an AOL site.

The sun rises in the East.

So what's new? :)
 
K

krakle

It's asinine to do so much stupid crap
just to set up login validation.

Isnt that the point? And isnt that the reason you cannot access their
site with a script?
 
X

xfx.publishing

Isnt that the point? And isnt that the reason you cannot access their
site with a script?

I rather got the impression it was a side-effect, not the reason.
Thanks.
 
X

xfx.publishing

Tad said:
Please put the subject of your article in the Subject of your article.

Folks that are interested in web scraping will read articles whose
subject mentions web scraping.

They are likely to skip reading articles whose subject cannot be
determined by the Subject.

Well, potentially in my defence, the article is also abut stupidity,
and the subject is, in fact, stupid!
Web Scraping Proxy

Thank you. And Ironic that I work for them and yet would have never
known about that internally.

Okay no, that's not that ironic.
http://www.research.att.com/~hpk/wsp/

It is an AOL site.

The sun rises in the East.

So what's new? :)

Point.
 
Ad

Advertisements

M

Michele Dondi

Please put the subject of your article in the Subject of your article.
[snip]

Well, potentially in my defence, the article is also abut stupidity,
and the subject is, in fact, stupid!

You're potentially confusing stupidity with meta-stupidity. A post
*about* stupidity needs not a stupid Subject, which is better suited
to a stupid post, whereas yours is not.

And IMHO your Subject is not stupid either: it's a witty remark, but
it does not make a good Subject. It does a suboptimal one. It would
have been perfect for a "subtitle", though.


Michele
 
Ad

Advertisements


Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Top