ScreenScraping and Viewstate

G

Guest

I'm writing a screenscraper in Visual Basic .NET that is scraping an ASP .NET
website. I've used a tool that echos what my browser submits to the website
and what my scraper submits to the website. The submissions are identical
EXCEPT for the viewstate. I'm having a horrible time finding the right
encoding.

I can successfully parse the viewstate from the page. My parsed results
contain lots of + signs and ends with two = signs. When looking at the
browser submission, I see that these have been changed to %2B and %3D
respectively. I've tried running this viewstate string through the
HttpUtils.urlEncodeUnicode method but no luck; my results still do not match
the web browser submission. Instead the urlEncodeUnicode method changes the +
and = to lowercase %2b and %3d.

Can someone explain the encoding to me? When looking at the view->encoding
for the page I'm trying to scrape in IE, I see the encoding is set to UTF-8.
Am I correct in thinking that ALL I have to do is parse the viewstate, encode
it properly, and send it right back to the server?

There are no cookies involved on this site. Thanks.

Rob Reagan
(e-mail address removed)
 
B

bruce barker

when you scrape the screen the viewstate is html encoded, so you must first
html decode the viewstate value. when you post the viewstate value, it must
be urlencoded.

-- bruce (sqlwork.com)



| I'm writing a screenscraper in Visual Basic .NET that is scraping an ASP
..NET
| website. I've used a tool that echos what my browser submits to the
website
| and what my scraper submits to the website. The submissions are identical
| EXCEPT for the viewstate. I'm having a horrible time finding the right
| encoding.
|
| I can successfully parse the viewstate from the page. My parsed results
| contain lots of + signs and ends with two = signs. When looking at the
| browser submission, I see that these have been changed to %2B and %3D
| respectively. I've tried running this viewstate string through the
| HttpUtils.urlEncodeUnicode method but no luck; my results still do not
match
| the web browser submission. Instead the urlEncodeUnicode method changes
the +
| and = to lowercase %2b and %3d.
|
| Can someone explain the encoding to me? When looking at the view->encoding
| for the page I'm trying to scrape in IE, I see the encoding is set to
UTF-8.
| Am I correct in thinking that ALL I have to do is parse the viewstate,
encode
| it properly, and send it right back to the server?
|
| There are no cookies involved on this site. Thanks.
|
| Rob Reagan
| (e-mail address removed)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top