Making the simple impossible and the impossible unthinkable...

Discussion in 'Perl Misc' started by xfx.publishing@gmail.com, Jun 26, 2006.

  1. Guest

    Heya. I'm up against what is turning out to be a tough one...

    I'm trying to get a Perl script with LWP to get information from AOL
    profiles.

    What I'm doing is I have a LWP useragent logging in with my AIM
    screenname and password at
    https://my.screenname.aol.com/_cqr/login/login.psp, grabbing the
    cookies, and then requesting
    http://memberdirectory.aol.com/aolus/profile?sn=$screenname

    The login part is working, and I'm caching the cookies fine it looks
    like, but the results I get back from
    http://memberdirectory.aol.com/aolus/profile?sn=$screenname are not the
    same thing I'm getting in my real browser (firefox) -- and not getting
    the info I want out to parse.

    >From what I can tell there's some weird and utterly pointless

    Javascripting going on. I set up a logger on my XP box (first free HTTP
    logger I could find, so XP it was -- but I'm doing the code under
    Debian Linux just so you know) and I'm getting cookies being sent that
    I don't have -- which is giving the strong impression that some
    Javascript somewhere is making these cookies up on the fly. This gives
    me the strong impression that AOL takes the opposite philosophy of
    Perl, and possibly has occasional meetings where people get together
    and decide 'We don't have enough code -- we need more code to do the
    same thing. Let's try and complicate things more.' I bet they have
    buzzwords like 'complexify'. It's asinine to do so much stupid crap
    just to set up login validation.

    But, rant aside, if someone can figure out what's going on with this...

    Here's the intended subroutine -- you will notice commented bits of
    hair-pulling in places and debuggy bits, no doubt

    But in theory this should be able to set up a hashref in the
    $obj->{profile} key that contains all their AOL user info. In theory.
    And I just can't get back the right info (and yes I double checked my
    AIM password and stuff, and got distinctly different results putting in
    a nonsense password just to make sure there wasn't manglement going on
    there).

    Anyway, well... this definitely counts as a challenge, if nothing else.
    I'd like to have this working, but for the tme being, I just can't get
    the right page back.

    sub SetInfo {
    my $obj = shift;
    my $user = shift;

    my $aol_login = $obj->{aol_login};
    my $aol_pass = $obj->{aol_pass};

    unless ($aol_login and $aol_pass) {
    $obj->{errstr} = 'AOL or AIM Screen Name and Password required
    to '.
    'access.';
    return undef;
    }


    # We is a BIG LIAH saying we's Firefox. In British. On XP. Just
    what
    # they expect. But some of these sites act like assholes so screw
    it
    # better that than this potentially not working if they block us
    my $web = LWP::UserAgent->new('Mozilla/5.0 (Windows; U; Windows NT
    5.1;'.
    ' en-GB; rv:1.8.0.3) Gecko/20060426 '.
    'Firefox/1.5.0.3');
    $web->cookie_jar(HTTP::Cookies->new(file =>
    "/usr/local/WPCookies.lwp",
    autosave => 1,
    ignore_discard => 1));
    #$web->cookie_jar({});

    my $login =
    $web->post('https://my.screenname.aol.com/_cqr/login/login.psp',
    [sitedomain =>
    'memberdirectory-beta.estage.aol.com',
    siteId => '',
    lang => 'en',
    locale => 'us',
    authLev => '1',
    siteState =>
    "OrigUrl%3Dhttp%253A%252F%252Fmemberdirectory.aol.com%252Faolus%252Fprofile%253Fsn%253D$user",
    isSiteStateEncoded => 'true',
    mcState => 'initialized',
    usrd => '1889976',
    loginId => $aol_login,
    password => $aol_pass,
    rememberMe => 'off']);
    unless ($login->is_success) {
    $obj->{errstr} = 'AOL login failed:'. $login->status_line;
    return undef;
    }
    $obj->{login_page} = $login->content;
    $obj->{ua} = $web;
    #=stop
    $obj->{login_headers} = $login->as_string;
    $obj->{login_headers} =~ s/\n\n.*//gsm;
    $obj->{login_cookies} = [];
    for my $h (split /[\r\n]+/, $obj->{login_headers}) {
    my ($k, $v) = split /:\s+/, $h;
    push @{$obj->{login_cookies}}, {$k => $v} if lc $k eq
    'set-cookie';
    }
    #=cut
    if ($obj->{login_page}
    =~ /You have entered an invalid Screen Name or password/) {
    $obj->{errstr} = 'AOL login failed: invalid login info.';
    return undef;
    }

    $obj->{pull_success} = "Didn't try yet.";
    my $url_base = 'http://memberdirectory.aol.com/aolus/profile?sn=';
    my $req = HTTP::Request->new('GET',
    "$url_base$user",
    [@{$obj->{login_cookies}},

    );
    my $resp = $web->request($req);

    if ($resp->is_success) {
    my $prof;
    $obj->{pull_success} = "Successful.";
    $obj->{page} = $resp->content;
    my $p = HTML::pullParser->new(doc => $resp->content,
    start => 'tagname, event, attr',
    end => 'tagname, event,
    skipped_text',
    ignore_elements => [qw(script style
    applet embed
    object)],
    report_tags => ['script']);
    while (my $token = $p->get_token) {
    my $type = $token->[1];
    next unless ($type eq 'end');
    my $script = $token->[2];
    if ($script =~ /var\s+nameString\s*=/) {
    # this is the right script with the data in it
    # that is easy to read
    $script =~ /var\s+memMessage\s*=\s*"I am (\w+)\."/;
    $prof->{online} = $1;
    $script =~ /var\s+nameDetails\s*=\s*"([^"]*)"/;
    $prof->{name} = $1;
    $script =~ /var\s+locDetails\s*=\s*"([^"]*)"/;
    $prof->{loc} = $1;
    $script =~ /var\s+genderDetails\s*=\s*"([^"]*)"/;
    $prof->{gender} = $1;
    $script =~ /var\s+maritalDetails\s*=\s*"([^"]*)"/;
    $prof->{marital} = $1;
    $script =~ /var\s+hobbiesDetails\s*=\s*"([^"]*)"/;
    $prof->{hobbies} = $1;
    $script =~ /var\s+gadgetsDetails\s*=\s*"([^"]*)"/;
    $prof->{gadgets} = $1;
    $script =~ /var\s+occDetails\s*=\s*"([^"]*)"/;
    $prof->{occ} = $1;
    $script =~ /var\s+quoteDetails\s*=\s*"([^"]*)"/;
    $prof->{quote} = $1;
    $script =~ /var\s+linksDetails\s*=\s*"([^"]*)"/;
    $prof->{links} = $1;
    }
    for my $k (keys %{$prof}) {
    # Strip out annoying HTML tags in profiles
    $prof->{$k} =~ s/<[^>]*>//gsm;
    }
    $obj->{profile} = $prof;
    }
    }
    else {
    $obj->{pull_success} = 'Failed';
    $obj->{errstr} = "Can't retrieve AOL member page for
    $obj->{user}.\n";
    return undef;
    }
    }
     
    , Jun 26, 2006
    #1
    1. Advertising

  2. <> wrote:

    > Subject: Re: Making the simple impossible and the impossible unthinkable...



    Please put the subject of your article in the Subject of your article.

    Folks that are interested in web scraping will read articles whose
    subject mentions web scraping.

    They are likely to skip reading articles whose subject cannot be
    determined by the Subject.


    > I'm trying to get a Perl script with LWP to get information from AOL
    > profiles.



    >>From what I can tell there's some weird and utterly pointless

    > Javascripting going on. I set up a logger on my XP box (first free HTTP
    > logger I could find, so XP it was --



    Web Scraping Proxy

    http://www.research.att.com/~hpk/wsp/


    > but I'm doing the code under
    > Debian Linux just so you know) and I'm getting cookies being sent that
    > I don't have -- which is giving the strong impression that some
    > Javascript somewhere is making these cookies up on the fly. This gives
    > me the strong impression that AOL takes the opposite philosophy of
    > Perl, and possibly has occasional meetings where people get together
    > and decide 'We don't have enough code -- we need more code to do the
    > same thing. Let's try and complicate things more.' I bet they have
    > buzzwords like 'complexify'. It's asinine to do so much stupid crap
    > just to set up login validation.



    It is an AOL site.

    The sun rises in the East.

    So what's new? :)


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
     
    Tad McClellan, Jun 26, 2006
    #2
    1. Advertising

  3. Guest

    wrote:
    > It's asinine to do so much stupid crap
    > just to set up login validation.


    Isnt that the point? And isnt that the reason you cannot access their
    site with a script?
     
    , Jun 26, 2006
    #3
  4. Guest

    wrote:
    > wrote:
    > > It's asinine to do so much stupid crap
    > > just to set up login validation.

    >
    > Isnt that the point? And isnt that the reason you cannot access their
    > site with a script?


    I rather got the impression it was a side-effect, not the reason.
    Thanks.
     
    , Jun 26, 2006
    #4
  5. Guest

    Tad McClellan wrote:
    > Please put the subject of your article in the Subject of your article.
    >
    > Folks that are interested in web scraping will read articles whose
    > subject mentions web scraping.
    >
    > They are likely to skip reading articles whose subject cannot be
    > determined by the Subject.


    Well, potentially in my defence, the article is also abut stupidity,
    and the subject is, in fact, stupid!

    > > I'm trying to get a Perl script with LWP to get information from AOL
    > > profiles.

    >
    > Web Scraping Proxy


    Thank you. And Ironic that I work for them and yet would have never
    known about that internally.

    Okay no, that's not that ironic.

    > http://www.research.att.com/~hpk/wsp/
    >
    > It is an AOL site.
    >
    > The sun rises in the East.
    >
    > So what's new? :)


    Point.
     
    , Jun 26, 2006
    #5
  6. On 26 Jun 2006 12:34:02 -0700, wrote:

    >> Please put the subject of your article in the Subject of your article.

    [snip]

    >Well, potentially in my defence, the article is also abut stupidity,
    >and the subject is, in fact, stupid!


    You're potentially confusing stupidity with meta-stupidity. A post
    *about* stupidity needs not a stupid Subject, which is better suited
    to a stupid post, whereas yours is not.

    And IMHO your Subject is not stupid either: it's a witty remark, but
    it does not make a good Subject. It does a suboptimal one. It would
    have been perfect for a "subtitle", though.


    Michele
    --
    {$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
    (($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
    ..'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
    256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,
     
    Michele Dondi, Jun 30, 2006
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    5
    Views:
    1,920
    Andrew Thompson
    Jan 12, 2005
  2. Replies:
    0
    Views:
    404
  3. Bill Sneddon
    Replies:
    3
    Views:
    624
    Bill Sneddon
    Dec 24, 2003
  4. Eric
    Replies:
    2
    Views:
    435
    kbutterly
    Dec 7, 2006
  5. Marcel Brekelmans
    Replies:
    4
    Views:
    148
    Joe Fawcett
    Jan 26, 2004
Loading...

Share This Page