Web recognition

N

Nathan

Hello,
Its not really related to code, and more related to an algorithem.
(which will be implemented in perl)
my problem is as follows, given a website, for example, http://www.nokia.com,
how can I really determine whether its the manufacturer site (official
nokia's site...) or not? (for example, htttp://www.nokia-fans.com is
not, assuming there is something like that...).

The real problem arises when the manufacturer name does NOT
corresponds the site name, for example, manufactuere name : YTXT , and
website http://www.XXX.co.uk

any idea?
 
J

Justin C

Hello,
Its not really related to code, and more related to an algorithem.
(which will be implemented in perl)
my problem is as follows, given a website, for example, http://www.nokia.com,
how can I really determine whether its the manufacturer site (official
nokia's site...) or not? (for example, htttp://www.nokia-fans.com is
not, assuming there is something like that...).

The real problem arises when the manufacturer name does NOT
corresponds the site name, for example, manufactuere name : YTXT , and
website http://www.XXX.co.uk

any idea?

You will not be able to do this with code, it's hard enough to do it
manually. Even if you check with the domain registrar there is no
certainty that the domain nokia.com is owned by the company nokia, it
could be owned by a holding company or handled by a marketing company
on behalf of Nokia. There is, therefore, nothing concrete that any
algorithm could test.

Justin.
 
P

Peter J. Holzer

Its not really related to code, and more related to an algorithem.
(which will be implemented in perl)
my problem is as follows, given a website, for example, http://www.nokia.com,
how can I really determine whether its the manufacturer site (official
nokia's site...) or not? (for example, htttp://www.nokia-fans.com is
not, assuming there is something like that...).

This cannot be automated because it needs real-world knowledge. You can
inspect whois data or (if the site uses https) the SSL certificate. But
that will tell you only that the domain is registered to:

Nokia Corporation
Nokia Corporation
P.O.Box 226 Nokia Group
- - 00045
FI

Whether the "Nokia Corporation" which has rented a certain postal box in
Finland is the manufacturer of rubber boots you are looking for is
something only you can decide. There may be several Nokia Corporations
in Finland (ok, there probably aren't, but let's assume you are looking
for a "John Smith" in New York ...).

hp
 
J

Jürgen Exner

Nathan said:
Its not really related to code, and more related to an algorithem.
(which will be implemented in perl)
my problem is as follows, given a website, for example, http://www.nokia.com,
how can I really determine whether its the manufacturer site (official
nokia's site...) or not? (for example, htttp://www.nokia-fans.com is
not, assuming there is something like that...).

The real problem arises when the manufacturer name does NOT
corresponds the site name, for example, manufactuere name : YTXT , and
website http://www.XXX.co.uk

How do _you_ define "real manufacturer web site"? Manufacturer of what?
Maybe Nokia-fans is a legitimate business, too, and has created and is
marketing their own products, maybe related to Nokia, maybe not. Then
which one is the correct "manufacturer" web site? Now how do _you_
know?

jue
 
N

Nathan

How do _you_ define "real manufacturer web site"? Manufacturer of what?
Maybe Nokia-fans is a legitimate business, too, and has created and is
marketing their own products, maybe related to Nokia, maybe not. Then
which one is the correct "manufacturer" web site?  Now how do _you_
know?

jue

first of all, thanks you all for replying.
secondly, I consider a web site as a manufacturer website if and only
if its the official site.
you folks already gave some points which I would go and try, of course
im not looking for 100% accuracy, but 90-95% would meet my
expectations.
 
T

Ted Zlatanov

N> Its not really related to code, and more related to an algorithem.
N> (which will be implemented in perl)
N> my problem is as follows, given a website, for example, http://www.nokia.com,
N> how can I really determine whether its the manufacturer site (official
N> nokia's site...) or not? (for example, htttp://www.nokia-fans.com is
N> not, assuming there is something like that...).

N> The real problem arises when the manufacturer name does NOT
N> corresponds the site name, for example, manufactuere name : YTXT , and
N> website http://www.XXX.co.uk

None of the following are exact but they may be useful to you, depending
on your purpose.

You can look at search engine rankings. Google's ranking may be
relevant, since it qualifies how well-liked the site is and tries to
rank "manufacturer's" websites higher on the manufacturer's keyword.

You could also crawl the site and run a statistical spam filter against
it. Compare the results to known legitimate sites and known unofficial
sites in the same language.

Ted
 
J

Jürgen Exner

Nathan said:
first of all, thanks you all for replying.
secondly, I consider a web site as a manufacturer website if and only
if its the official site.
you folks already gave some points which I would go and try, of course
im not looking for 100% accuracy, but 90-95% would meet my
expectations.

Ok, let me put it more bluntly: how is the program supposed to know,
that a given item has been manufactured by Nokia and not by Nokia-fans?

jue
 
P

Peter J. Holzer

Ok, let me put it more bluntly: how is the program supposed to know,
that a given item has been manufactured by Nokia and not by Nokia-fans?

And for the sake of the argument assume that there is a company called
"Nokia Fans" which produces fans, turbines and propellers.

hp
 
S

smallpond

And for the sake of the argument assume that there is a company called
"Nokia Fans" which produces fans, turbines and propellers.

        hp

Also, there are supporters of that company called Nokia Fan fans
who sell mugs with the Nokia Fan Fan logo from their official website
which is hosted by eBay.

I think the best bet is to look up the site in one of the directories
like dir.yahoo.com. If the site domain matches the company listing
then you have it. For example if you look up MySQL it will give you
a link to www.sun.com.
 
W

Wanna-Be Sys Admin

Nathan said:
Hello,
Its not really related to code, and more related to an algorithem.
(which will be implemented in perl)
my problem is as follows, given a website, for example,
http://www.nokia.com, how can I really determine whether its the
manufacturer site (official nokia's site...) or not? (for example,
htttp://www.nokia-fans.com is not, assuming there is something like
that...).

The real problem arises when the manufacturer name does NOT
corresponds the site name, for example, manufactuere name : YTXT , and
website http://www.XXX.co.uk

any idea?

There's no way to do this automatically, without verifying it manually
first, it's got to be determined by someone, at some point. This is
whatsoever not a Perl question.
 
S

sln

first of all, thanks you all for replying.
secondly, I consider a web site as a manufacturer website if and only
if its the official site.
you folks already gave some points which I would go and try, of course
im not looking for 100% accuracy, but 90-95% would meet my
expectations.

I think what you need is to hack domain registrar' database and
do a heuristic comparison of corporate addresses gleaned from info
gathered from registered stock symbols of known companys.
Create your own database set to do weekly updates.

Somewhere in between all this, as the reliability approaches %100
(never reaching it, of course), your custom database will shrink.

This is using the if/if/if/if..., think outside the box, method.
Another absolute %99.99 method, is to be some resource compactor
like Bill Gates.

-sln
 
K

Keith Thompson

Nathan said:
Its not really related to code, and more related to an algorithem.
(which will be implemented in perl) my problem is as follows, given
a website, for example, http://www.nokia.com, how can I really
determine whether its the manufacturer site (official nokia's
site...) or not?
[...]

Yeah, that's us. :cool:}
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top