traversing yahoogroups group messages

L

lothar

i want to traverse a set of messages in a Yahoogroups group from a Python
program.

to get to the messages of the group, one must log in.

this presents, i think, two problems,
1) handling the form element for the login, which has a javascript submit
routine,
2) keeping login state with cookies.

to someone who knows something about the issues here, my questions are:

1) is it possible to do this in Python?
2) if so, how do i handle the form and the javascript?
3) does Java Python have a javascript engine and do i need Java Python here?
4) if i need to use cookies, how do i know what to name and what to set into
a cookie?

for context, i include the form element below.
the submit routine hash() is javascript.









<form method=post
action="https://login.yahoo.com/config/login?ej1cd3h7oogel" autocomplete=off
name=login_form onsubmit="return
hash(this,'http://login.yahoo.com/config/login')">
<table bgcolor=#6996e0 border="0" cellpadding="2" cellspacing="0"
width="100%">
<tr><td>
<table bgcolor="#eeeeee" border="0" cellpadding="2" cellspacing="0"
width="100%">
<tr><td bgcolor="#ffffff" align="center">

<table border="0" cellspacing="6" cellpadding="6" bgcolor="ffffff"
width="100%">
<tr bgcolor="eeeeee">
<td align="center">
<font face="arial"><b>Existing Yahoo! users</b></font><br>
<font face="arial" size="-1"><nobr>&nbsp;Enter your ID and password to sign
in&nbsp; </nobr></font>
<table border="0" cellpadding="4" cellspacing="0">
<tr> <td align="right">
<input type=hidden name=".tries" value="1" >
<input type=hidden name=".src" value="ygrp" >
<input type=hidden name=".md5" value="" >
<input type=hidden name=".hash" value="" >
<input type=hidden name=".js" value="" >
<input type=hidden name=".last" value="" >
<input type=hidden name="promo" value="" >
<input type=hidden name=".intl" value="us" >
<input type=hidden name=".bypass" value="" >
<input type=hidden name=".partner" value="" >
<input type=hidden name=".u" value="a4o4r550k8vss" >
<input type=hidden name=".v" value="0" >
<input type=hidden name=".challenge" value="RoxqFKs548c9Abju6nBMrQ3J1uly" >
<input type=hidden name=".yplus" value="" >
<input type=hidden name=".emailCode" value="" >
<input type=hidden name="pkg" value="" >
<input type=hidden name="stepid" value="" >
<input type=hidden name=".ev" value="" >
<input type=hidden name="hasMsgr" value=0>
<input type=hidden name=".chkP" value="Y">
<input type=hidden name=".done"
value="http://groups.yahoo.com/group/legality-of-drivers-license/messages/35
" >
<script language=javascript>
<!--
browser_string = navigator.appVersion + " " + navigator.userAgent;
if ( browser_string.indexOf("MSIE") < 0 ) {
if (navigator.mimeTypes) {
for (i = 0 ; i < navigator.mimeTypes.length ; i++) {
if (navigator.mimeTypes.suffixes.indexOf("yps") > -1) {
doGotIt();
}
}
} else {
dontGotIt();
}
} else {
if (browser_string.indexOf("Windows")>=0) {
doGotIt();
document.write('<object
classid="clsid:41695A8E-6414-11D4-8FB3-00D0B7730277"
CODEBASE="javascript:dontGotIt();" ID="Ymsgr" width="1" height="1">');
document.write('</object>');
}
}
//-->
</script>
<table border="0" cellpadding="2" cellspacing="0">
<tr> <td align="right" nowrap><font face="arial" size="-1">
Yahoo! ID:
</font></td>
<td><input name="login" size="17" value=""></td>
</tr>
<tr> <td align="right" nowrap><font face="arial"
size="-1">Password:</font></td>
<td><input name="passwd" type="password" size="17" maxlength="32"></td></tr>
<tr> <td colspan="2" nowrap align="center"><font face="arial" size="-1">
<input type="checkbox" name=".persistent" value="y">Remember my ID on this
computer</font></td>
</tr><tr>
<td>&nbsp;</td>

<td><input name=".save" type="submit" value="Sign In"></td>
</tr>
</table>
</td></tr>
<tr>
<td nowrap bgcolor="eeeeee" align="center">
<font face="arial" size="-1">Mode:
Standard | <a
href="https://login.yahoo.com/config/login?.src=ygrp&.v=0&.u=a4o4r550k8vss&.
last=&promo=&.intl=us&.bypass=&.partner=&pkg=&stepid=&.done=http%3a//groups.
yahoo.com/group/legality-of-drivers-license/messages/35"> Secure</a>
</font>
</td>
</tr>
</table>
</td></tr>
<tr bgcolor="eeeeee">
<td valign="top" align="center"> <font face="arial" size="-1">
<a
href="http://us.rd.yahoo.com/reg/sihflib/*http://login.yahoo.com/config/logi
n?.src=ygrp&.intl=us&.help=1&.v=0&.u=a4o4r550k8vss&.last=&.last=&promo=&.byp
ass=&.partner=&pkg=&stepid=&.done=http%3a//groups.yahoo.com/group/legality-o
f-drivers-license/messages/35">Sign-in help</a>&nbsp;&nbsp;&nbsp;<a
href="http://us.rd.yahoo.com/reg/fpflib/*http://edit.yahoo.com/config/eval_f
orgot_pw?new=1&.done=http://groups.yahoo.com/group/legality-of-drivers-licen
se/messages/35&.src=ygrp&partner=&.partner=&.intl=us&pkg=&stepid=&.last=">Fo
rgot your password?</a>
</font></td></tr>
</table>
</td></tr></table>
</td></tr></table>
</form>
 
P

Peter Hansen

lothar said:
i want to traverse a set of messages in a Yahoogroups group from a Python
program.

to get to the messages of the group, one must log in.

this presents, i think, two problems,
1) handling the form element for the login, which has a javascript submit
routine,
2) keeping login state with cookies.

to someone who knows something about the issues here, my questions are:

1) is it possible to do this in Python?
Yes.

2) if so, how do i handle the form and the javascript?

There are a variety of approaches, including ones which depend on which
platform you are using (e.g. Win32, Linux, other?) and which depend
on how sophisticated and flexible you want the result to be.
3) does Java Python have a javascript engine and do i need Java Python here?

Do you realize that Java has absolutely nothing to do with Javascript
except forming part of its name? And no, you don't need it here.
4) if i need to use cookies, how do i know what to name and what to set into
a cookie?

By asking the server, and watching the cookies that come back from
it. The ClientCookie module would presumably help. You could also
just turn off cookies in your browser and access the site, and see
if it still works... maybe you don't need them at all.
for context, i include the form element below.
the submit routine hash() is javascript.

There have been similar questions and many responses on this subject
in the past. I suggest using Google Groups to check the newsgroup
archives, using search words such as "web scraping", possibly paying
close attention to any threads with responses by Cameron Laird or
John J Lee. ;-)

-Peter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top