Accessing yahoo.groups with java

C

Craig

I'm trying to write a util app that will let me download all the
messages from one of my interest groups at groups.yahoo.com.
I have all the code to generate the queries working just fine but I
don't know how to handle the login interface presented by
groups.yahoo.com. It appears you make a request like this to be
prompted and redirected.

http://login.yahoo.com/config/login?.intl=us&.src=ygrp&.done=http://groups.yahoo.com/

I've never done much of this and it looks like you would need to store
session info. Maybe a cookie so you can go back?

Can anyone indicate how to go about creating a login interface that can
talk to yahoo and allow access to group messages?

TIA

Craig
 
L

Luke Tulkas

Craig said:
I'm trying to write a util app that will let me download all the
messages from one of my interest groups at groups.yahoo.com.
I have all the code to generate the queries working just fine but I
don't know how to handle the login interface presented by
groups.yahoo.com. It appears you make a request like this to be
prompted and redirected.

http://login.yahoo.com/config/login?.intl=us&.src=ygrp&.done=http://groups.yahoo.com/

I've never done much of this and it looks like you would need to store
session info. Maybe a cookie so you can go back?

Can anyone indicate how to go about creating a login interface that can
talk to yahoo and allow access to group messages?

You should simulate browser behaviour. For starters, write a http fetch
utility (possibly with GET and POST capabilities, with advanced timeout
handling, etc.), which you'll need later on. If there are redirects you
might want to HttpUrlConnection.setFollowRedirects(false) and do it
yourself. Then take it one step at a time. Do one fetch, analyse it, do the
next. At each step figure out which headers need to be set and then set them
accordingly.
 
C

Craig

Luke said:
You should simulate browser behaviour. For starters, write a http fetch
utility (possibly with GET and POST capabilities, with advanced timeout
handling, etc.), which you'll need later on. If there are redirects you
might want to HttpUrlConnection.setFollowRedirects(false) and do it
yourself. Then take it one step at a time. Do one fetch, analyse it, do the
next. At each step figure out which headers need to be set and then set them
accordingly.
Hmmm.. seems I need to catch a browser request or two. Any idea on how
one would do that?
There must be a simple mechanism at work here surely. Make a request
get a response asking for username & password, authenticate and then
allow requests to be processed on the server. There must be
documentation around that describes the process musn't there?

Craig
Craig
 
L

Luke Tulkas

Craig said:
Hmmm.. seems I need to catch a browser request or two. Any idea on how
one would do that?
There must be a simple mechanism at work here surely. Make a request
get a response asking for username & password, authenticate and then
allow requests to be processed on the server. There must be
documentation around that describes the process musn't there?

Yeah, well, try RFCs. But let me help you a little: if authentication is
done using http protocol itself (not http parameters), then at some point
you'll get a response code indicating that authentication is required (iirc,
it's 407: SC_PROXY_AUTHENTICATION_REQUIRED). The next fetch must then set a
header named Authorization with value of the following format: Basic
<Base64encoded(<username>:<password>)>. Note that there's a single space
between literal string "Basic" and Base64 encoded stuff and a colon between
username and password before encoding. If authentication is done via http
parameters then you'll have to figure it out yourself.

The rest is just plain fetching (HttpURLConnection) with appropriate headers
set. Oh, one more thing: if you're doing POST (or mixed: POST, GET) requests
then you have to write POST parameters directly to the output stream of the
HttpURLConnection and set at least the following two headers:
CONTENT-TYPE=application/x-www-form-urlencoded
CONTENT-LENGTH=<the size of the stuff you write to output stream>

You also might want to set the usual headers your favorite browser sets,
like User-Agent, Referer etc.
 
R

Rogan Dawes

Craig wrote:

Hmmm.. seems I need to catch a browser request or two. Any idea on how
one would do that?
There must be a simple mechanism at work here surely. Make a request
get a response asking for username & password, authenticate and then
allow requests to be processed on the server. There must be
documentation around that describes the process musn't there?

Craig

Take a look at WebScarab. It acts as a proxy between your browser and
the sites you visit, and records the exact bytes that your browser
sends, and the server sends back. It is then (relatively) trivial to
write something to automate those requests.

https://sourceforge.net/project/showfiles.php?group_id=64424&package_id=61823

Rogan
 
C

Craig

Andrew said:
My apologies! I was getting a lot more hits for
<http://google.com/groups?q=login+yahoo&group=comp.lang.java.*>
..but thought I'd barrow it down.
It was TIC but I am surprised it is so hard to achieve.
Have a look through some of the posts
from the wider search for the last two months,
I am sure somebody was a attempting it recently
who wrote back to say they'd had success.
I'll take another look through and see what I find.
Thanks for the Heads up.

Craig
 
A

Andrew Thompson

...
..I am surprised it is so hard to achieve.

From my scanning of the responses I got yhe
impression that yahoo discouraged 'bots' and
their strategy to achieve that was to exclude
anything but browsers. Hence the 'pretend
you are a browser' way to approach it.. (AFAIR)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top