Spider and get tag information of one web page

D

discountonall

Hi all
i would like to know if anyone knows about a code sample.
Lets say for example
http://shopping.yahoo.com/search;_y...zZWFyY2g-?p=+friendship+roses+&did=&x=51&y=10

As you can see that there is a lot of items.
I need to be able to get the image link, navigate url, price,
description etc. of each item and then store them in a database.

I know that there is a way of searching in the html code and return
values (but don't know how)
And help would be appreciated.
Thank you,
 
G

Guest

I know that there is a way of searching in the html code and return
values (but don't know how)

Use Regular Expressions.
More info: http://www.google.com/search?hl=en&q=regular+expressions+asp.net

In your case you should get the text and parse it using patterns.

Here's the complete pattern to get the link, name, description and
price:

(?<=\<h2\>\<a\shref=\")
(?<url>(.|\n)*?)(\"\>)(?<name>(.|\n)*?)(\<\/a\></h2\>\n\<br\/\>)
(?<description>(.|\n)*?)(\n)
(.|\n)*?
(\<span\sclass\=\"price\"\>)(?<price>.*?)(\<\/span\>)

Note, in the code it has to be in one line.

Here's an example of the code:

string t = "html_from_yahoo";
string e = "(?<=\<h2\>............(\<\/span\>)";

Regex r = new Regex(e, RegexOptions.Compiled);
MatchCollection matches = r.Matches(t);

foreach (Match m in matches)
{
Response.Write("name="+match.Groups["name"]);
Response.Write("description="+match.Groups["name"]);
Response.Write("url="+match.Groups["url"]);
Response.Write("price="+match.Groups["price"]);
}

Hope it helps
 
D

discountonall

I know that there is a way of searching in the html code and return
values (but don't know how)

Use Regular Expressions.
More info:http://www.google.com/search?hl=en&q=regular+expressions+asp.net

In your case you should get the text and parse it using patterns.

Here's the complete pattern to get the link, name, description and
price:

(?<=\<h2\>\<a\shref=\")
(?<url>(.|\n)*?)(\"\>)(?<name>(.|\n)*?)(\<\/a\></h2\>\n\<br\/\>)
(?<description>(.|\n)*?)(\n)
(.|\n)*?
(\<span\sclass\=\"price\"\>)(?<price>.*?)(\<\/span\>)

Note, in the code it has to be in one line.

Here's an example of the code:

string t = "html_from_yahoo";
string e = "(?<=\<h2\>............(\<\/span\>)";

Regex r = new Regex(e, RegexOptions.Compiled);
MatchCollection matches = r.Matches(t);

foreach (Match m in matches)
{
Response.Write("name="+match.Groups["name"]);
Response.Write("description="+match.Groups["name"]);
Response.Write("url="+match.Groups["url"]);
Response.Write("price="+match.Groups["price"]);

}

Hope it helps

I have the full string of the page.
I would like to know what the syntext for example is to find all the
full string from <table class="item_table"
Until the next one and return it as a string
 
G

Guest

On May 9, 5:37 am, "(e-mail address removed)" <[email protected]>
wrote:
In your case you should get the text and parse it using patterns.
Here's the complete pattern to get the link, name, description and
price:

Note, in the code it has to be in one line.
Here's an example of the code:
string t = "html_from_yahoo";
string e = "(?<=\<h2\>............(\<\/span\>)";
Regex r = new Regex(e, RegexOptions.Compiled);
MatchCollection matches = r.Matches(t);
foreach (Match m in matches)
{
Response.Write("name="+match.Groups["name"]);
Response.Write("description="+match.Groups["name"]);
Response.Write("url="+match.Groups["url"]);
Response.Write("price="+match.Groups["price"]);

Hope it helps

I have the full string of the page.
I would like to know what the syntext for example is to find all the
full string from <table class="item_table"
Until the next one and return it as a string- Hide quoted text -

- Show quoted text -

I guess, something similar to the

(\<table\sclass\=\"item_table\")(.|\n)*?(?=\<table\sclass\=\"item_table
\")
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top