screen scrape + login

Discussion in 'ASP .Net' started by n8, Nov 24, 2004.

  1. n8

    n8 Guest

    Hi,

    Hi have to do the followign and have been racking my brain with
    various solutions that have had no so great results.

    I want to use the System.Net.WebClient to submit data to a form (log a
    user in) and then redirect to the correct article.

    Here is the scenerio.
    If you are not logged into the site for certain articles you are
    redirected to a shtml login page. The login.shtml page posts to
    another url for authentication and then lets you in. If have clicked
    on an article that you have to log in to, then you are sent to the
    login page with an appeneded URL,
    www.domainname.com?orq:http://domainname.com/stories/112404/nca_2653091.shtml.
    I have tried setting a webclient request to the url that the above
    login form posts too, but I keep getting Method Not Allowed.

    Any Ideas?
     
    n8, Nov 24, 2004
    #1
    1. Advertising

  2. n8

    bruce barker Guest

    more info required, but here is typical login

    1) you request a page with webclient
    2) you are returned a redirect header to the login page.
    3) you code detects the login redirect, then post the required form data to
    the login page (manually view the login page to get the form fields required
    and method).

    note: an asp.net login site requires that you actually do a get to the
    login page to get valid viewstate to postback. other systems may also
    require scaping of the get data to before doing the actual post.

    4) a successful post to the login will return a cookie value you must send
    on subsequent requests, and a redirect header to the originally requested
    page.


    -- bruce (sqlwork.com)

    "n8" <> wrote in message
    news:...
    | Hi,
    |
    | Hi have to do the followign and have been racking my brain with
    | various solutions that have had no so great results.
    |
    | I want to use the System.Net.WebClient to submit data to a form (log a
    | user in) and then redirect to the correct article.
    |
    | Here is the scenerio.
    | If you are not logged into the site for certain articles you are
    | redirected to a shtml login page. The login.shtml page posts to
    | another url for authentication and then lets you in. If have clicked
    | on an article that you have to log in to, then you are sent to the
    | login page with an appeneded URL,
    |
    www.domainname.com?orq:http://domainname.com/stories/112404/nca_2653091.shtm
    l.
    | I have tried setting a webclient request to the url that the above
    | login form posts too, but I keep getting Method Not Allowed.
    |
    | Any Ideas?
     
    bruce barker, Nov 24, 2004
    #2
    1. Advertising

  3. n8

    Scott Allen Guest

    I have an exampe of this here:

    http://odetocode.com/Articles/162.aspx

    It's basically posting to the login form, getting the cookie back, and
    then making sure to send the cookie along when requesting the
    protected content.

    --
    Scott
    http://www.OdeToCode.com/blogs/scott/

    On 24 Nov 2004 13:55:23 -0800, (n8) wrote:

    >Hi,
    >
    >Hi have to do the followign and have been racking my brain with
    >various solutions that have had no so great results.
    >
    >I want to use the System.Net.WebClient to submit data to a form (log a
    >user in) and then redirect to the correct article.
    >
    >Here is the scenerio.
    >If you are not logged into the site for certain articles you are
    >redirected to a shtml login page. The login.shtml page posts to
    >another url for authentication and then lets you in. If have clicked
    >on an article that you have to log in to, then you are sent to the
    >login page with an appeneded URL,
    >www.domainname.com?orq:http://domainname.com/stories/112404/nca_2653091.shtml.
    > I have tried setting a webclient request to the url that the above
    >login form posts too, but I keep getting Method Not Allowed.
    >
    >Any Ideas?
     
    Scott Allen, Nov 25, 2004
    #3
  4. n8

    Joe Fallon Guest

    Scott,
    FYI - that was one of the best articles on the subject I ever read.
    I was completely stuck on this issue about 6 months ago and I implemented it
    straight away using the concepts you presented here.

    Excellent work and explanation.
    --
    Joe Fallon



    "Scott Allen" <bitmask@[nospam].fred.net> wrote in message
    news:...
    >I have an exampe of this here:
    >
    > http://odetocode.com/Articles/162.aspx
    >
    > It's basically posting to the login form, getting the cookie back, and
    > then making sure to send the cookie along when requesting the
    > protected content.
    >
    > --
    > Scott
    > http://www.OdeToCode.com/blogs/scott/
    >
    > On 24 Nov 2004 13:55:23 -0800, (n8) wrote:
    >
    >>Hi,
    >>
    >>Hi have to do the followign and have been racking my brain with
    >>various solutions that have had no so great results.
    >>
    >>I want to use the System.Net.WebClient to submit data to a form (log a
    >>user in) and then redirect to the correct article.
    >>
    >>Here is the scenerio.
    >>If you are not logged into the site for certain articles you are
    >>redirected to a shtml login page. The login.shtml page posts to
    >>another url for authentication and then lets you in. If have clicked
    >>on an article that you have to log in to, then you are sent to the
    >>login page with an appeneded URL,
    >>www.domainname.com?orq:http://domainname.com/stories/112404/nca_2653091.shtml.
    >> I have tried setting a webclient request to the url that the above
    >>login form posts too, but I keep getting Method Not Allowed.
    >>
    >>Any Ideas?

    >
     
    Joe Fallon, Nov 25, 2004
    #4
  5. n8

    Scott Allen Guest

    Thanks, Joe. I appreciate the feedback.

    --
    Scott
    http://www.OdeToCode.com/blogs/scott/

    On Wed, 24 Nov 2004 20:48:24 -0500, "Joe Fallon"
    <> wrote:

    >Scott,
    >FYI - that was one of the best articles on the subject I ever read.
    >I was completely stuck on this issue about 6 months ago and I implemented it
    >straight away using the concepts you presented here.
    >
    >Excellent work and explanation.
     
    Scott Allen, Nov 25, 2004
    #5
  6. n8

    n8 Guest

    Thanks for the example. I had seen your example earlier and had tried
    it and always get to one particular point where I cannot seem to get
    beyond. There are two hidden fields both called web.fixed_values that
    appear to be something like a view state but the page is shtml. I am
    and have been able to pull down the site, etc. but everytime I try and
    post my data (with or without the web.fixed_values) I always get the
    response Method Not Allowed. Below is the code I am using along with
    the sire I am trying to access with my account. ANy further help on
    this would be greatly appreciated.

    private void Page_Load(object sender, System.EventArgs e)
    {
    string LOGIN_URL = "http://augustachronicle.com/login.shtml";
    string cookieAge = "31536000";

    try
    {
    HttpWebRequest webRequest = WebRequest.Create(LOGIN_URL) as
    HttpWebRequest;

    StreamReader responseReader = new
    StreamReader(webRequest.GetResponse().GetResponseStream());

    string responseData = responseReader.ReadToEnd();
    responseReader.Close();

    // get the web fixed values
    string fixedvalue1 = ExtractFixedValues1(responseData);

    string fixedvalue2 = ExtractFixedValues2(responseData);

    string postData = String.Format("web.fixed_values={0}&web.fixed_values={1}&ACTION=Login&USER={2}&PASS={3}&cookie_age={4}",fixedvalue1,fixedvalue2,userName,
    password, cookieAge);

    // have a cookie container ready to receive the forms auth cookie
    CookieContainer cookies = new CookieContainer();

    // now post to the login form
    webRequest = WebRequest.Create(LOGIN_URL) as HttpWebRequest;
    webRequest.Method = "POST";
    webRequest.ContentType = "application/x-www-form-urlencoded";
    webRequest.CookieContainer = cookies;

    // write the form values into the request message
    StreamWriter requestWriter = new
    StreamWriter(webRequest.GetRequestStream());
    requestWriter.Write(postData);
    requestWriter.Close();

    // we don't need the contents of the response, just the cookie it
    issues
    webRequest.GetResponse().Close();

    // now we can send out cookie along with a request for the protected
    page
    webRequest = WebRequest.Create("http://augustachronicle.com/stories/112404/usc_FBC--SpurrierProfile.shtml")
    as HttpWebRequest;
    webRequest.CookieContainer = cookies;
    responseReader = new
    StreamReader(webRequest.GetResponse().GetResponseStream());

    // and read the response
    responseData = responseReader.ReadToEnd();
    responseReader.Close();

    Response.Write(responseData);
    }
    catch (Exception ex)
    {
    Response.Write(ex.ToString());
    }

    }

    private string ExtractFixedValues1(string s)
    {
    string viewStateNameDelimiter = "web.fixed_values";
    string valueDelimiter = "value=\"";

    int viewStateNamePosition = s.IndexOf(viewStateNameDelimiter);
    int viewStateValuePosition = s.IndexOf(
    valueDelimiter, viewStateNamePosition
    );

    int viewStateStartPosition = viewStateValuePosition +
    valueDelimiter.Length;
    int viewStateEndPosition = s.IndexOf("\"", viewStateStartPosition);

    return HttpUtility.UrlEncodeUnicode(
    s.Substring(viewStateStartPosition,
    viewStateEndPosition - viewStateStartPosition
    )
    );
    }


    private string ExtractFixedValues2(string s)
    {
    string viewStateNameDelimiter = "web.fixed_values";
    string valueDelimiter = "value=\"";

    int viewStateNamePosition = s.IndexOf(viewStateNameDelimiter);
    int viewStateValuePosition = s.IndexOf(valueDelimiter,
    viewStateNamePosition
    );

    int viewStateStartPosition = viewStateValuePosition +
    valueDelimiter.Length;
    int viewStateEndPosition = s.IndexOf("\"", viewStateStartPosition);

    string sTemp = s.Remove(0,viewStateEndPosition);

    viewStateNamePosition = sTemp.IndexOf(viewStateNameDelimiter);
    viewStateValuePosition = sTemp.IndexOf(
    valueDelimiter, viewStateNamePosition
    );

    viewStateStartPosition = viewStateValuePosition +
    valueDelimiter.Length;
    viewStateEndPosition = sTemp.IndexOf("\"", viewStateStartPosition);

    return HttpUtility.UrlEncodeUnicode(
    sTemp.Substring(
    viewStateStartPosition,
    viewStateEndPosition - viewStateStartPosition
    )
    );
    }


    Scott Allen <bitmask@[nospam].fred.net> wrote in message news:<>...
    > Thanks, Joe. I appreciate the feedback.
    >
    > --
    > Scott
    > http://www.OdeToCode.com/blogs/scott/
    >
    > On Wed, 24 Nov 2004 20:48:24 -0500, "Joe Fallon"
    > <> wrote:
    >
    > >Scott,
    > >FYI - that was one of the best articles on the subject I ever read.
    > >I was completely stuck on this issue about 6 months ago and I implemented it
    > >straight away using the concepts you presented here.
    > >
    > >Excellent work and explanation.
     
    n8, Nov 27, 2004
    #6
  7. n8

    Scott Allen Guest

    Everything looks like it is in order, Nathan. I'd examine the HTTP
    traffic between your program and the server to make sure it all
    matches exactly, even little things like the Agent header. I had one
    financial site reject HttpWebRequests until I set the UserAgent
    property to look just like IE. I guess it was a weak attempt at
    preventing screen scraping programs.

    --
    Scott
    http://www.OdeToCode.com/blogs/scott/

    n 27 Nov 2004 12:39:42 -0800, (n8) wrote:

    >Thanks for the example. I had seen your example earlier and had tried
    >it and always get to one particular point where I cannot seem to get
    >beyond. There are two hidden fields both called web.fixed_values that
    >appear to be something like a view state but the page is shtml. I am
    >and have been able to pull down the site, etc. but everytime I try and
    >post my data (with or without the web.fixed_values) I always get the
    >response Method Not Allowed. Below is the code I am using along with
    >the sire I am trying to access with my account. ANy further help on
    >this would be greatly appreciated.
    >
    >private void Page_Load(object sender, System.EventArgs e)
    >{
    >string LOGIN_URL = "http://augustachronicle.com/login.shtml";
    >string cookieAge = "31536000";
    >
    >try
    >{
    >HttpWebRequest webRequest = WebRequest.Create(LOGIN_URL) as
    >HttpWebRequest;
    >
    >StreamReader responseReader = new
    >StreamReader(webRequest.GetResponse().GetResponseStream());
    >
    >string responseData = responseReader.ReadToEnd();
    >responseReader.Close();
    >
    >// get the web fixed values
    >string fixedvalue1 = ExtractFixedValues1(responseData);
    >
    >string fixedvalue2 = ExtractFixedValues2(responseData);
    >
    >string postData = String.Format("web.fixed_values={0}&web.fixed_values={1}&ACTION=Login&USER={2}&PASS={3}&cookie_age={4}",fixedvalue1,fixedvalue2,userName,
    >password, cookieAge);
    >
    >// have a cookie container ready to receive the forms auth cookie
    >CookieContainer cookies = new CookieContainer();
    >
    >// now post to the login form
    >webRequest = WebRequest.Create(LOGIN_URL) as HttpWebRequest;
    >webRequest.Method = "POST";
    >webRequest.ContentType = "application/x-www-form-urlencoded";
    >webRequest.CookieContainer = cookies;
    >
    >// write the form values into the request message
    >StreamWriter requestWriter = new
    >StreamWriter(webRequest.GetRequestStream());
    >requestWriter.Write(postData);
    >requestWriter.Close();
    >
    >// we don't need the contents of the response, just the cookie it
    >issues
    >webRequest.GetResponse().Close();
    >
    >// now we can send out cookie along with a request for the protected
    >page
    >webRequest = WebRequest.Create("http://augustachronicle.com/stories/112404/usc_FBC--SpurrierProfile.shtml")
    >as HttpWebRequest;
    >webRequest.CookieContainer = cookies;
    >responseReader = new
    >StreamReader(webRequest.GetResponse().GetResponseStream());
    >
    >// and read the response
    >responseData = responseReader.ReadToEnd();
    >responseReader.Close();
    >
    >Response.Write(responseData);
    >}
    >catch (Exception ex)
    >{
    >Response.Write(ex.ToString());
    >}
    >
    >}
    >
    >private string ExtractFixedValues1(string s)
    >{
    >string viewStateNameDelimiter = "web.fixed_values";
    >string valueDelimiter = "value=\"";
    >
    >int viewStateNamePosition = s.IndexOf(viewStateNameDelimiter);
    >int viewStateValuePosition = s.IndexOf(
    >valueDelimiter, viewStateNamePosition
    >);
    >
    >int viewStateStartPosition = viewStateValuePosition +
    >valueDelimiter.Length;
    >int viewStateEndPosition = s.IndexOf("\"", viewStateStartPosition);
    >
    >return HttpUtility.UrlEncodeUnicode(
    >s.Substring(viewStateStartPosition,
    > viewStateEndPosition - viewStateStartPosition
    >)
    >);
    >}
    >
    >
    >private string ExtractFixedValues2(string s)
    >{
    >string viewStateNameDelimiter = "web.fixed_values";
    >string valueDelimiter = "value=\"";
    >
    >int viewStateNamePosition = s.IndexOf(viewStateNameDelimiter);
    >int viewStateValuePosition = s.IndexOf(valueDelimiter,
    >viewStateNamePosition
    > );
    >
    >int viewStateStartPosition = viewStateValuePosition +
    >valueDelimiter.Length;
    >int viewStateEndPosition = s.IndexOf("\"", viewStateStartPosition);
    >
    >string sTemp = s.Remove(0,viewStateEndPosition);
    >
    >viewStateNamePosition = sTemp.IndexOf(viewStateNameDelimiter);
    >viewStateValuePosition = sTemp.IndexOf(
    >valueDelimiter, viewStateNamePosition
    >);
    >
    >viewStateStartPosition = viewStateValuePosition +
    >valueDelimiter.Length;
    >viewStateEndPosition = sTemp.IndexOf("\"", viewStateStartPosition);
    >
    >return HttpUtility.UrlEncodeUnicode(
    >sTemp.Substring(
    >viewStateStartPosition,
    >viewStateEndPosition - viewStateStartPosition
    >)
    >);
    >}
    >
    >
    >Scott Allen <bitmask@[nospam].fred.net> wrote in message news:<>...
    >> Thanks, Joe. I appreciate the feedback.
    >>
    >> --
    >> Scott
    >> http://www.OdeToCode.com/blogs/scott/
    >>
    >> On Wed, 24 Nov 2004 20:48:24 -0500, "Joe Fallon"
    >> <> wrote:
    >>
    >> >Scott,
    >> >FYI - that was one of the best articles on the subject I ever read.
    >> >I was completely stuck on this issue about 6 months ago and I implemented it
    >> >straight away using the concepts you presented here.
    >> >
    >> >Excellent work and explanation.
     
    Scott Allen, Nov 28, 2004
    #7
  8. n8

    n8 Guest

    Scott,

    Thanks for the information. I added a useragent to make it look like
    IE, but I still get the 405 Method not allowed error message. What is
    the best way to monitor the HTTP Traffic between my application and
    the remote site? Are there any tools i can download to show me what
    is going back and forth?

    Thanks in advance,

    n8



    Scott Allen <bitmask@[nospam].fred.net> wrote in message news:<>...
    > Everything looks like it is in order, Nathan. I'd examine the HTTP
    > traffic between your program and the server to make sure it all
    > matches exactly, even little things like the Agent header. I had one
    > financial site reject HttpWebRequests until I set the UserAgent
    > property to look just like IE. I guess it was a weak attempt at
    > preventing screen scraping programs.
    >
    > --
    > Scott
    > http://www.OdeToCode.com/blogs/scott/
    >
    > n 27 Nov 2004 12:39:42 -0800, (n8) wrote:
    >
    > >Thanks for the example. I had seen your example earlier and had tried
    > >it and always get to one particular point where I cannot seem to get
    > >beyond. There are two hidden fields both called web.fixed_values that
    > >appear to be something like a view state but the page is shtml. I am
    > >and have been able to pull down the site, etc. but everytime I try and
    > >post my data (with or without the web.fixed_values) I always get the
    > >response Method Not Allowed. Below is the code I am using along with
    > >the sire I am trying to access with my account. ANy further help on
    > >this would be greatly appreciated.
    > >
    > >private void Page_Load(object sender, System.EventArgs e)
    > >{
    > >string LOGIN_URL = "http://augustachronicle.com/login.shtml";
    > >string cookieAge = "31536000";
    > >
    > >try
    > >{
    > >HttpWebRequest webRequest = WebRequest.Create(LOGIN_URL) as
    > >HttpWebRequest;
    > >
    > >StreamReader responseReader = new
    > >StreamReader(webRequest.GetResponse().GetResponseStream());
    > >
    > >string responseData = responseReader.ReadToEnd();
    > >responseReader.Close();
    > >
    > >// get the web fixed values
    > >string fixedvalue1 = ExtractFixedValues1(responseData);
    > >
    > >string fixedvalue2 = ExtractFixedValues2(responseData);
    > >
    > >string postData = String.Format("web.fixed_values={0}&web.fixed_values={1}&ACTION=Login&USER={2}&PASS={3}&cookie_age={4}",fixedvalue1,fixedvalue2,userName,
    > >password, cookieAge);
    > >
    > >// have a cookie container ready to receive the forms auth cookie
    > >CookieContainer cookies = new CookieContainer();
    > >
    > >// now post to the login form
    > >webRequest = WebRequest.Create(LOGIN_URL) as HttpWebRequest;
    > >webRequest.Method = "POST";
    > >webRequest.ContentType = "application/x-www-form-urlencoded";
    > >webRequest.CookieContainer = cookies;
    > >
    > >// write the form values into the request message
    > >StreamWriter requestWriter = new
    > >StreamWriter(webRequest.GetRequestStream());
    > >requestWriter.Write(postData);
    > >requestWriter.Close();
    > >
    > >// we don't need the contents of the response, just the cookie it
    > >issues
    > >webRequest.GetResponse().Close();
    > >
    > >// now we can send out cookie along with a request for the protected
    > >page
    > >webRequest = WebRequest.Create("http://augustachronicle.com/stories/112404/usc_FBC--SpurrierProfile.shtml")
    > >as HttpWebRequest;
    > >webRequest.CookieContainer = cookies;
    > >responseReader = new
    > >StreamReader(webRequest.GetResponse().GetResponseStream());
    > >
    > >// and read the response
    > >responseData = responseReader.ReadToEnd();
    > >responseReader.Close();
    > >
    > >Response.Write(responseData);
    > >}
    > >catch (Exception ex)
    > >{
    > >Response.Write(ex.ToString());
    > >}

    >
    > >}
    > >
    > >private string ExtractFixedValues1(string s)
    > >{
    > >string viewStateNameDelimiter = "web.fixed_values";
    > >string valueDelimiter = "value=\"";
    > >
    > >int viewStateNamePosition = s.IndexOf(viewStateNameDelimiter);
    > >int viewStateValuePosition = s.IndexOf(
    > >valueDelimiter, viewStateNamePosition
    > >);
    > >
    > >int viewStateStartPosition = viewStateValuePosition +
    > >valueDelimiter.Length;
    > >int viewStateEndPosition = s.IndexOf("\"", viewStateStartPosition);
    > >
    > >return HttpUtility.UrlEncodeUnicode(
    > >s.Substring(viewStateStartPosition,
    > > viewStateEndPosition - viewStateStartPosition
    > >)
    > >);
    > >}
    > >
    > >
    > >private string ExtractFixedValues2(string s)
    > >{
    > >string viewStateNameDelimiter = "web.fixed_values";
    > >string valueDelimiter = "value=\"";
    > >
    > >int viewStateNamePosition = s.IndexOf(viewStateNameDelimiter);
    > >int viewStateValuePosition = s.IndexOf(valueDelimiter,
    > >viewStateNamePosition
    > > );
    > >
    > >int viewStateStartPosition = viewStateValuePosition +
    > >valueDelimiter.Length;
    > >int viewStateEndPosition = s.IndexOf("\"", viewStateStartPosition);
    > >
    > >string sTemp = s.Remove(0,viewStateEndPosition);
    > >
    > >viewStateNamePosition = sTemp.IndexOf(viewStateNameDelimiter);
    > >viewStateValuePosition = sTemp.IndexOf(
    > >valueDelimiter, viewStateNamePosition
    > >);
    > >
    > >viewStateStartPosition = viewStateValuePosition +
    > >valueDelimiter.Length;
    > >viewStateEndPosition = sTemp.IndexOf("\"", viewStateStartPosition);
    > >
    > >return HttpUtility.UrlEncodeUnicode(
    > >sTemp.Substring(
    > >viewStateStartPosition,
    > >viewStateEndPosition - viewStateStartPosition
    > >)
    > >);
    > >}
    > >
    > >
    > >Scott Allen <bitmask@[nospam].fred.net> wrote in message news:<>...
    > >> Thanks, Joe. I appreciate the feedback.
    > >>
    > >> --
    > >> Scott
    > >> http://www.OdeToCode.com/blogs/scott/
    > >>
    > >> On Wed, 24 Nov 2004 20:48:24 -0500, "Joe Fallon"
    > >> <> wrote:
    > >>
    > >> >Scott,
    > >> >FYI - that was one of the best articles on the subject I ever read.
    > >> >I was completely stuck on this issue about 6 months ago and I implemented it
    > >> >straight away using the concepts you presented here.
    > >> >
    > >> >Excellent work and explanation.
     
    n8, Nov 28, 2004
    #8
  9. You might try a program called httplook. I think it is
    http://www.httplook.com if not, google for it...

    "n8" <> wrote in message
    news:...
    > Scott,
    >
    > Thanks for the information. I added a useragent to make it look like
    > IE, but I still get the 405 Method not allowed error message. What is
    > the best way to monitor the HTTP Traffic between my application and
    > the remote site? Are there any tools i can download to show me what
    > is going back and forth?
    >
    > Thanks in advance,
    >
    > n8
    >
    >
    >
    > Scott Allen <bitmask@[nospam].fred.net> wrote in message
    > news:<>...
    >> Everything looks like it is in order, Nathan. I'd examine the HTTP
    >> traffic between your program and the server to make sure it all
    >> matches exactly, even little things like the Agent header. I had one
    >> financial site reject HttpWebRequests until I set the UserAgent
    >> property to look just like IE. I guess it was a weak attempt at
    >> preventing screen scraping programs.
    >>
    >> --
    >> Scott
    >> http://www.OdeToCode.com/blogs/scott/
    >>
    >> n 27 Nov 2004 12:39:42 -0800, (n8) wrote:
    >>
    >> >Thanks for the example. I had seen your example earlier and had tried
    >> >it and always get to one particular point where I cannot seem to get
    >> >beyond. There are two hidden fields both called web.fixed_values that
    >> >appear to be something like a view state but the page is shtml. I am
    >> >and have been able to pull down the site, etc. but everytime I try and
    >> >post my data (with or without the web.fixed_values) I always get the
    >> >response Method Not Allowed. Below is the code I am using along with
    >> >the sire I am trying to access with my account. ANy further help on
    >> >this would be greatly appreciated.
    >> >
    >> >private void Page_Load(object sender, System.EventArgs e)
    >> >{
    >> >string LOGIN_URL = "http://augustachronicle.com/login.shtml";
    >> >string cookieAge = "31536000";
    >> >
    >> >try
    >> >{
    >> >HttpWebRequest webRequest = WebRequest.Create(LOGIN_URL) as
    >> >HttpWebRequest;
    >> >
    >> >StreamReader responseReader = new
    >> >StreamReader(webRequest.GetResponse().GetResponseStream());
    >> >
    >> >string responseData = responseReader.ReadToEnd();
    >> >responseReader.Close();
    >> >
    >> >// get the web fixed values
    >> >string fixedvalue1 = ExtractFixedValues1(responseData);
    >> >
    >> >string fixedvalue2 = ExtractFixedValues2(responseData);
    >> >
    >> >string postData =
    >> >String.Format("web.fixed_values={0}&web.fixed_values={1}&ACTION=Login&USER={2}&PASS={3}&cookie_age={4}",fixedvalue1,fixedvalue2,userName,
    >> >password, cookieAge);
    >> >
    >> >// have a cookie container ready to receive the forms auth cookie
    >> >CookieContainer cookies = new CookieContainer();
    >> >
    >> >// now post to the login form
    >> >webRequest = WebRequest.Create(LOGIN_URL) as HttpWebRequest;
    >> >webRequest.Method = "POST";
    >> >webRequest.ContentType = "application/x-www-form-urlencoded";
    >> >webRequest.CookieContainer = cookies;
    >> >
    >> >// write the form values into the request message
    >> >StreamWriter requestWriter = new
    >> >StreamWriter(webRequest.GetRequestStream());
    >> >requestWriter.Write(postData);
    >> >requestWriter.Close();
    >> >
    >> >// we don't need the contents of the response, just the cookie it
    >> >issues
    >> >webRequest.GetResponse().Close();
    >> >
    >> >// now we can send out cookie along with a request for the protected
    >> >page
    >> >webRequest =
    >> >WebRequest.Create("http://augustachronicle.com/stories/112404/usc_FBC--SpurrierProfile.shtml")
    >> >as HttpWebRequest;
    >> >webRequest.CookieContainer = cookies;
    >> >responseReader = new
    >> >StreamReader(webRequest.GetResponse().GetResponseStream());
    >> >
    >> >// and read the response
    >> >responseData = responseReader.ReadToEnd();
    >> >responseReader.Close();
    >> >
    >> >Response.Write(responseData);
    >> >}
    >> >catch (Exception ex)
    >> >{
    >> >Response.Write(ex.ToString());
    >> >}

    >>
    >> >}
    >> >
    >> >private string ExtractFixedValues1(string s)
    >> >{
    >> >string viewStateNameDelimiter = "web.fixed_values";
    >> >string valueDelimiter = "value=\"";
    >> >
    >> >int viewStateNamePosition = s.IndexOf(viewStateNameDelimiter);
    >> >int viewStateValuePosition = s.IndexOf(
    >> >valueDelimiter, viewStateNamePosition
    >> >);
    >> >
    >> >int viewStateStartPosition = viewStateValuePosition +
    >> >valueDelimiter.Length;
    >> >int viewStateEndPosition = s.IndexOf("\"", viewStateStartPosition);
    >> >
    >> >return HttpUtility.UrlEncodeUnicode(
    >> >s.Substring(viewStateStartPosition,
    >> > viewStateEndPosition - viewStateStartPosition
    >> >)
    >> >);
    >> >}
    >> >
    >> >
    >> >private string ExtractFixedValues2(string s)
    >> >{
    >> >string viewStateNameDelimiter = "web.fixed_values";
    >> >string valueDelimiter = "value=\"";
    >> >
    >> >int viewStateNamePosition = s.IndexOf(viewStateNameDelimiter);
    >> >int viewStateValuePosition = s.IndexOf(valueDelimiter,
    >> >viewStateNamePosition
    >> > );
    >> >
    >> >int viewStateStartPosition = viewStateValuePosition +
    >> >valueDelimiter.Length;
    >> >int viewStateEndPosition = s.IndexOf("\"", viewStateStartPosition);
    >> >
    >> >string sTemp = s.Remove(0,viewStateEndPosition);
    >> >
    >> >viewStateNamePosition = sTemp.IndexOf(viewStateNameDelimiter);
    >> >viewStateValuePosition = sTemp.IndexOf(
    >> >valueDelimiter, viewStateNamePosition
    >> >);
    >> >
    >> >viewStateStartPosition = viewStateValuePosition +
    >> >valueDelimiter.Length;
    >> >viewStateEndPosition = sTemp.IndexOf("\"", viewStateStartPosition);
    >> >
    >> >return HttpUtility.UrlEncodeUnicode(
    >> >sTemp.Substring(
    >> >viewStateStartPosition,
    >> >viewStateEndPosition - viewStateStartPosition
    >> >)
    >> >);
    >> >}
    >> >
    >> >
    >> >Scott Allen <bitmask@[nospam].fred.net> wrote in message
    >> >news:<>...
    >> >> Thanks, Joe. I appreciate the feedback.
    >> >>
    >> >> --
    >> >> Scott
    >> >> http://www.OdeToCode.com/blogs/scott/
    >> >>
    >> >> On Wed, 24 Nov 2004 20:48:24 -0500, "Joe Fallon"
    >> >> <> wrote:
    >> >>
    >> >> >Scott,
    >> >> >FYI - that was one of the best articles on the subject I ever read.
    >> >> >I was completely stuck on this issue about 6 months ago and I
    >> >> >implemented it
    >> >> >straight away using the concepts you presented here.
    >> >> >
    >> >> >Excellent work and explanation.
     
    Wayne Brantley, Nov 29, 2004
    #9
  10. Also, if you get a fix - please let us know.

    "n8" <> wrote in message
    news:...
    > Scott,
    >
    > Thanks for the information. I added a useragent to make it look like
    > IE, but I still get the 405 Method not allowed error message. What is
    > the best way to monitor the HTTP Traffic between my application and
    > the remote site? Are there any tools i can download to show me what
    > is going back and forth?
    >
    > Thanks in advance,
    >
    > n8
    >
    >
    >
    > Scott Allen <bitmask@[nospam].fred.net> wrote in message
    > news:<>...
    >> Everything looks like it is in order, Nathan. I'd examine the HTTP
    >> traffic between your program and the server to make sure it all
    >> matches exactly, even little things like the Agent header. I had one
    >> financial site reject HttpWebRequests until I set the UserAgent
    >> property to look just like IE. I guess it was a weak attempt at
    >> preventing screen scraping programs.
    >>
    >> --
    >> Scott
    >> http://www.OdeToCode.com/blogs/scott/
    >>
    >> n 27 Nov 2004 12:39:42 -0800, (n8) wrote:
    >>
    >> >Thanks for the example. I had seen your example earlier and had tried
    >> >it and always get to one particular point where I cannot seem to get
    >> >beyond. There are two hidden fields both called web.fixed_values that
    >> >appear to be something like a view state but the page is shtml. I am
    >> >and have been able to pull down the site, etc. but everytime I try and
    >> >post my data (with or without the web.fixed_values) I always get the
    >> >response Method Not Allowed. Below is the code I am using along with
    >> >the sire I am trying to access with my account. ANy further help on
    >> >this would be greatly appreciated.
    >> >
    >> >private void Page_Load(object sender, System.EventArgs e)
    >> >{
    >> >string LOGIN_URL = "http://augustachronicle.com/login.shtml";
    >> >string cookieAge = "31536000";
    >> >
    >> >try
    >> >{
    >> >HttpWebRequest webRequest = WebRequest.Create(LOGIN_URL) as
    >> >HttpWebRequest;
    >> >
    >> >StreamReader responseReader = new
    >> >StreamReader(webRequest.GetResponse().GetResponseStream());
    >> >
    >> >string responseData = responseReader.ReadToEnd();
    >> >responseReader.Close();
    >> >
    >> >// get the web fixed values
    >> >string fixedvalue1 = ExtractFixedValues1(responseData);
    >> >
    >> >string fixedvalue2 = ExtractFixedValues2(responseData);
    >> >
    >> >string postData =
    >> >String.Format("web.fixed_values={0}&web.fixed_values={1}&ACTION=Login&USER={2}&PASS={3}&cookie_age={4}",fixedvalue1,fixedvalue2,userName,
    >> >password, cookieAge);
    >> >
    >> >// have a cookie container ready to receive the forms auth cookie
    >> >CookieContainer cookies = new CookieContainer();
    >> >
    >> >// now post to the login form
    >> >webRequest = WebRequest.Create(LOGIN_URL) as HttpWebRequest;
    >> >webRequest.Method = "POST";
    >> >webRequest.ContentType = "application/x-www-form-urlencoded";
    >> >webRequest.CookieContainer = cookies;
    >> >
    >> >// write the form values into the request message
    >> >StreamWriter requestWriter = new
    >> >StreamWriter(webRequest.GetRequestStream());
    >> >requestWriter.Write(postData);
    >> >requestWriter.Close();
    >> >
    >> >// we don't need the contents of the response, just the cookie it
    >> >issues
    >> >webRequest.GetResponse().Close();
    >> >
    >> >// now we can send out cookie along with a request for the protected
    >> >page
    >> >webRequest =
    >> >WebRequest.Create("http://augustachronicle.com/stories/112404/usc_FBC--SpurrierProfile.shtml")
    >> >as HttpWebRequest;
    >> >webRequest.CookieContainer = cookies;
    >> >responseReader = new
    >> >StreamReader(webRequest.GetResponse().GetResponseStream());
    >> >
    >> >// and read the response
    >> >responseData = responseReader.ReadToEnd();
    >> >responseReader.Close();
    >> >
    >> >Response.Write(responseData);
    >> >}
    >> >catch (Exception ex)
    >> >{
    >> >Response.Write(ex.ToString());
    >> >}

    >>
    >> >}
    >> >
    >> >private string ExtractFixedValues1(string s)
    >> >{
    >> >string viewStateNameDelimiter = "web.fixed_values";
    >> >string valueDelimiter = "value=\"";
    >> >
    >> >int viewStateNamePosition = s.IndexOf(viewStateNameDelimiter);
    >> >int viewStateValuePosition = s.IndexOf(
    >> >valueDelimiter, viewStateNamePosition
    >> >);
    >> >
    >> >int viewStateStartPosition = viewStateValuePosition +
    >> >valueDelimiter.Length;
    >> >int viewStateEndPosition = s.IndexOf("\"", viewStateStartPosition);
    >> >
    >> >return HttpUtility.UrlEncodeUnicode(
    >> >s.Substring(viewStateStartPosition,
    >> > viewStateEndPosition - viewStateStartPosition
    >> >)
    >> >);
    >> >}
    >> >
    >> >
    >> >private string ExtractFixedValues2(string s)
    >> >{
    >> >string viewStateNameDelimiter = "web.fixed_values";
    >> >string valueDelimiter = "value=\"";
    >> >
    >> >int viewStateNamePosition = s.IndexOf(viewStateNameDelimiter);
    >> >int viewStateValuePosition = s.IndexOf(valueDelimiter,
    >> >viewStateNamePosition
    >> > );
    >> >
    >> >int viewStateStartPosition = viewStateValuePosition +
    >> >valueDelimiter.Length;
    >> >int viewStateEndPosition = s.IndexOf("\"", viewStateStartPosition);
    >> >
    >> >string sTemp = s.Remove(0,viewStateEndPosition);
    >> >
    >> >viewStateNamePosition = sTemp.IndexOf(viewStateNameDelimiter);
    >> >viewStateValuePosition = sTemp.IndexOf(
    >> >valueDelimiter, viewStateNamePosition
    >> >);
    >> >
    >> >viewStateStartPosition = viewStateValuePosition +
    >> >valueDelimiter.Length;
    >> >viewStateEndPosition = sTemp.IndexOf("\"", viewStateStartPosition);
    >> >
    >> >return HttpUtility.UrlEncodeUnicode(
    >> >sTemp.Substring(
    >> >viewStateStartPosition,
    >> >viewStateEndPosition - viewStateStartPosition
    >> >)
    >> >);
    >> >}
    >> >
    >> >
    >> >Scott Allen <bitmask@[nospam].fred.net> wrote in message
    >> >news:<>...
    >> >> Thanks, Joe. I appreciate the feedback.
    >> >>
    >> >> --
    >> >> Scott
    >> >> http://www.OdeToCode.com/blogs/scott/
    >> >>
    >> >> On Wed, 24 Nov 2004 20:48:24 -0500, "Joe Fallon"
    >> >> <> wrote:
    >> >>
    >> >> >Scott,
    >> >> >FYI - that was one of the best articles on the subject I ever read.
    >> >> >I was completely stuck on this issue about 6 months ago and I
    >> >> >implemented it
    >> >> >straight away using the concepts you presented here.
    >> >> >
    >> >> >Excellent work and explanation.
     
    Wayne Brantley, Nov 29, 2004
    #10
  11. n8

    Scott Allen Guest

    One I've used with success is Fiddler.

    http://www.fiddlertool.com/fiddler/

    --
    Scott
    http://www.OdeToCode.com/blogs/scott/

    On 28 Nov 2004 13:36:40 -0800, (n8) wrote:

    >Scott,
    >
    >Thanks for the information. I added a useragent to make it look like
    >IE, but I still get the 405 Method not allowed error message. What is
    >the best way to monitor the HTTP Traffic between my application and
    >the remote site? Are there any tools i can download to show me what
    >is going back and forth?
    >
    >Thanks in advance,
    >
    >n8
    >
     
    Scott Allen, Nov 29, 2004
    #11
  12. n8

    n8 Guest

    Scott,

    I loaded the fiddler tool and traced the HTTP traffic. Everything
    getting sent look sno different than when i go directly to it. Am I
    to assume that they have a way of blocking screen scrapes and if so,
    how would I explain this?

    Thanks,

    n8

    Scott Allen <bitmask@[nospam].fred.net> wrote in message news:<>...
    > One I've used with success is Fiddler.
    >
    > http://www.fiddlertool.com/fiddler/
    >
    > --
    > Scott
    > http://www.OdeToCode.com/blogs/scott/
    >
    > On 28 Nov 2004 13:36:40 -0800, (n8) wrote:
    >
    > >Scott,
    > >
    > >Thanks for the information. I added a useragent to make it look like
    > >IE, but I still get the 405 Method not allowed error message. What is
    > >the best way to monitor the HTTP Traffic between my application and
    > >the remote site? Are there any tools i can download to show me what
    > >is going back and forth?
    > >
    > >Thanks in advance,
    > >
    > >n8
    > >
     
    n8, Nov 29, 2004
    #12
  13. n8

    Scott Allen Guest

    Hmm - I'm running out of ideas n8.

    I know there are sites out there blocking scrapers, but they usually
    either block an IP or use client side script and DHTML to try to screw
    up programs. If your app is sending the same traffic as the browser
    that wouldn't be an issue.

    So, my last idea is this:

    Last year I had a site that would occasionaly reject my web request
    from a screen scraping program. It was in a loop moving through a
    paged result set, and I couldn't figure out the random failures. On a
    whim I put in a few Thread.Sleep calls to slow the scraper down
    between requests and it never failed. I'm not sure if they monitored
    requests by IP to only allow so many per second or minute or what,
    though it was definitely timing related.

    I guess the only other thing I'd do is really double check those HTTP
    payloads and make sure everything matches - the headers, the POST data
    is properly encoded, the cookie is sent, etc. etc.

    HTH!

    --
    Scott
    http://www.OdeToCode.com/blogs/scott/\

    On 29 Nov 2004 11:42:48 -0800, (n8) wrote:

    >Scott,
    >
    >I loaded the fiddler tool and traced the HTTP traffic. Everything
    >getting sent look sno different than when i go directly to it. Am I
    >to assume that they have a way of blocking screen scrapes and if so,
    >how would I explain this?
    >
    >Thanks,
    >
    >n8
    >
    >Scott Allen <bitmask@[nospam].fred.net> wrote in message news:<>...
    >> One I've used with success is Fiddler.
    >>
    >> http://www.fiddlertool.com/fiddler/
    >>
    >> --
    >> Scott
    >> http://www.OdeToCode.com/blogs/scott/
    >>
    >> On 28 Nov 2004 13:36:40 -0800, (n8) wrote:
    >>
    >> >Scott,
    >> >
    >> >Thanks for the information. I added a useragent to make it look like
    >> >IE, but I still get the 405 Method not allowed error message. What is
    >> >the best way to monitor the HTTP Traffic between my application and
    >> >the remote site? Are there any tools i can download to show me what
    >> >is going back and forth?
    >> >
    >> >Thanks in advance,
    >> >
    >> >n8
    >> >
     
    Scott Allen, Nov 29, 2004
    #13
  14. n8

    n8 Guest

    a different approach. since i have been rackign my head against the
    wall with this approach I thought I would try another. I thought I
    would create the cookies on the fly that the site requires for the
    user account and everything would be create. I can create the cookies
    exactly, BUT if i change the domain property or use the domain
    property the cookie does not get written, if i leave the property (do
    not use it), the cooie gets written as localhost. how do i get around
    this so I can set the domain name property?

    thanks again

    n8

    Scott Allen <bitmask@[nospam].fred.net> wrote in message news:<>...
    > Hmm - I'm running out of ideas n8.
    >
    > I know there are sites out there blocking scrapers, but they usually
    > either block an IP or use client side script and DHTML to try to screw
    > up programs. If your app is sending the same traffic as the browser
    > that wouldn't be an issue.
    >
    > So, my last idea is this:
    >
    > Last year I had a site that would occasionaly reject my web request
    > from a screen scraping program. It was in a loop moving through a
    > paged result set, and I couldn't figure out the random failures. On a
    > whim I put in a few Thread.Sleep calls to slow the scraper down
    > between requests and it never failed. I'm not sure if they monitored
    > requests by IP to only allow so many per second or minute or what,
    > though it was definitely timing related.
    >
    > I guess the only other thing I'd do is really double check those HTTP
    > payloads and make sure everything matches - the headers, the POST data
    > is properly encoded, the cookie is sent, etc. etc.
    >
    > HTH!
    >
    > --
    > Scott
    > http://www.OdeToCode.com/blogs/scott/\
    >
    > On 29 Nov 2004 11:42:48 -0800, (n8) wrote:
    >
    > >Scott,
    > >
    > >I loaded the fiddler tool and traced the HTTP traffic. Everything
    > >getting sent look sno different than when i go directly to it. Am I
    > >to assume that they have a way of blocking screen scrapes and if so,
    > >how would I explain this?
    > >
    > >Thanks,
    > >
    > >n8
    > >
    > >Scott Allen <bitmask@[nospam].fred.net> wrote in message news:<>...
    > >> One I've used with success is Fiddler.
    > >>
    > >> http://www.fiddlertool.com/fiddler/
    > >>
    > >> --
    > >> Scott
    > >> http://www.OdeToCode.com/blogs/scott/
    > >>
    > >> On 28 Nov 2004 13:36:40 -0800, (n8) wrote:
    > >>
    > >> >Scott,
    > >> >
    > >> >Thanks for the information. I added a useragent to make it look like
    > >> >IE, but I still get the 405 Method not allowed error message. What is
    > >> >the best way to monitor the HTTP Traffic between my application and
    > >> >the remote site? Are there any tools i can download to show me what
    > >> >is going back and forth?
    > >> >
    > >> >Thanks in advance,
    > >> >
    > >> >n8
    > >> >
     
    n8, Nov 30, 2004
    #14
  15. n8

    Scott Allen Guest

    I remember trying a similar approach once, but I believe it is a
    security feature that doesn't let us create a cookie from another
    domain. The IE ActiveX control wouldn't let me pass cookies in at all
    programaticaly. Argh.

    --
    Scott
    http://www.OdeToCode.com/blogs/scott/

    On 30 Nov 2004 07:51:33 -0800, (n8) wrote:

    >a different approach. since i have been rackign my head against the
    >wall with this approach I thought I would try another. I thought I
    >would create the cookies on the fly that the site requires for the
    >user account and everything would be create. I can create the cookies
    >exactly, BUT if i change the domain property or use the domain
    >property the cookie does not get written, if i leave the property (do
    >not use it), the cooie gets written as localhost. how do i get around
    >this so I can set the domain name property?
    >
    >thanks again
    >
    >n8
    >
     
    Scott Allen, Nov 30, 2004
    #15
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Al Cadalzo

    Unable to get next page in screen scrape

    Al Cadalzo, Nov 12, 2003, in forum: ASP .Net
    Replies:
    2
    Views:
    412
    Al Cadalzo
    Nov 13, 2003
  2. Tony Pino
    Replies:
    5
    Views:
    643
    Tony Pino
    Dec 3, 2003
  3. Rob Lauer
    Replies:
    2
    Views:
    540
    Chris Jackson
    Jan 26, 2004
  4. Ollie
    Replies:
    3
    Views:
    4,312
    Chad Z. Hower aka Kudzu
    Feb 25, 2004
  5. rachel
    Replies:
    4
    Views:
    1,189
    MWells
    Jan 12, 2005
Loading...

Share This Page