Web recognition

Discussion in 'Perl Misc' started by Nathan, Jan 13, 2010.

  1. Nathan

    Nathan Guest

    Hello,
    Its not really related to code, and more related to an algorithem.
    (which will be implemented in perl)
    my problem is as follows, given a website, for example, http://www.nokia.com,
    how can I really determine whether its the manufacturer site (official
    nokia's site...) or not? (for example, htttp://www.nokia-fans.com is
    not, assuming there is something like that...).

    The real problem arises when the manufacturer name does NOT
    corresponds the site name, for example, manufactuere name : YTXT , and
    website http://www.XXX.co.uk

    any idea?
     
    Nathan, Jan 13, 2010
    #1
    1. Advertising

  2. Nathan

    Justin C Guest

    On 2010-01-13, Nathan <> wrote:
    > Hello,
    > Its not really related to code, and more related to an algorithem.
    > (which will be implemented in perl)
    > my problem is as follows, given a website, for example, http://www.nokia.com,
    > how can I really determine whether its the manufacturer site (official
    > nokia's site...) or not? (for example, htttp://www.nokia-fans.com is
    > not, assuming there is something like that...).
    >
    > The real problem arises when the manufacturer name does NOT
    > corresponds the site name, for example, manufactuere name : YTXT , and
    > website http://www.XXX.co.uk
    >
    > any idea?


    You will not be able to do this with code, it's hard enough to do it
    manually. Even if you check with the domain registrar there is no
    certainty that the domain nokia.com is owned by the company nokia, it
    could be owned by a holding company or handled by a marketing company
    on behalf of Nokia. There is, therefore, nothing concrete that any
    algorithm could test.

    Justin.

    --
    Justin C, by the sea.
     
    Justin C, Jan 13, 2010
    #2
    1. Advertising

  3. On 2010-01-13 11:04, Nathan <> wrote:
    > Its not really related to code, and more related to an algorithem.
    > (which will be implemented in perl)
    > my problem is as follows, given a website, for example, http://www.nokia.com,
    > how can I really determine whether its the manufacturer site (official
    > nokia's site...) or not? (for example, htttp://www.nokia-fans.com is
    > not, assuming there is something like that...).


    This cannot be automated because it needs real-world knowledge. You can
    inspect whois data or (if the site uses https) the SSL certificate. But
    that will tell you only that the domain is registered to:

    Nokia Corporation
    Nokia Corporation
    P.O.Box 226 Nokia Group
    - - 00045
    FI

    Whether the "Nokia Corporation" which has rented a certain postal box in
    Finland is the manufacturer of rubber boots you are looking for is
    something only you can decide. There may be several Nokia Corporations
    in Finland (ok, there probably aren't, but let's assume you are looking
    for a "John Smith" in New York ...).

    hp
     
    Peter J. Holzer, Jan 13, 2010
    #3
  4. Nathan <> wrote:
    >Its not really related to code, and more related to an algorithem.
    >(which will be implemented in perl)
    >my problem is as follows, given a website, for example, http://www.nokia.com,
    >how can I really determine whether its the manufacturer site (official
    >nokia's site...) or not? (for example, htttp://www.nokia-fans.com is
    >not, assuming there is something like that...).
    >
    >The real problem arises when the manufacturer name does NOT
    >corresponds the site name, for example, manufactuere name : YTXT , and
    >website http://www.XXX.co.uk


    How do _you_ define "real manufacturer web site"? Manufacturer of what?
    Maybe Nokia-fans is a legitimate business, too, and has created and is
    marketing their own products, maybe related to Nokia, maybe not. Then
    which one is the correct "manufacturer" web site? Now how do _you_
    know?

    jue
     
    Jürgen Exner, Jan 13, 2010
    #4
  5. Nathan

    Nathan Guest

    On Jan 13, 3:36 pm, Jürgen Exner <> wrote:
    > Nathan <> wrote:
    > >Its not really related to code, and more related to an algorithem.
    > >(which will be implemented in perl)
    > >my problem is as  follows, given a website, for example,http://www.nokia.com,
    > >how can I really determine whether its the manufacturer site (official
    > >nokia's site...) or not? (for example, htttp://www.nokia-fans.comis
    > >not, assuming there is something like that...).

    >
    > >The real problem arises when the manufacturer name does NOT
    > >corresponds the site name, for example, manufactuere name : YTXT , and
    > >websitehttp://www.XXX.co.uk

    >
    > How do _you_ define "real manufacturer web site"? Manufacturer of what?
    > Maybe Nokia-fans is a legitimate business, too, and has created and is
    > marketing their own products, maybe related to Nokia, maybe not. Then
    > which one is the correct "manufacturer" web site?  Now how do _you_
    > know?
    >
    > jue


    first of all, thanks you all for replying.
    secondly, I consider a web site as a manufacturer website if and only
    if its the official site.
    you folks already gave some points which I would go and try, of course
    im not looking for 100% accuracy, but 90-95% would meet my
    expectations.
     
    Nathan, Jan 13, 2010
    #5
  6. Nathan

    Ted Zlatanov Guest

    On Wed, 13 Jan 2010 03:04:53 -0800 (PST) Nathan <> wrote:

    N> Its not really related to code, and more related to an algorithem.
    N> (which will be implemented in perl)
    N> my problem is as follows, given a website, for example, http://www.nokia.com,
    N> how can I really determine whether its the manufacturer site (official
    N> nokia's site...) or not? (for example, htttp://www.nokia-fans.com is
    N> not, assuming there is something like that...).

    N> The real problem arises when the manufacturer name does NOT
    N> corresponds the site name, for example, manufactuere name : YTXT , and
    N> website http://www.XXX.co.uk

    None of the following are exact but they may be useful to you, depending
    on your purpose.

    You can look at search engine rankings. Google's ranking may be
    relevant, since it qualifies how well-liked the site is and tries to
    rank "manufacturer's" websites higher on the manufacturer's keyword.

    You could also crawl the site and run a statistical spam filter against
    it. Compare the results to known legitimate sites and known unofficial
    sites in the same language.

    Ted
     
    Ted Zlatanov, Jan 13, 2010
    #6
  7. Nathan <> wrote:
    >On Jan 13, 3:36 pm, Jürgen Exner <> wrote:
    >> Nathan <> wrote:
    >> >Its not really related to code, and more related to an algorithem.
    >> >(which will be implemented in perl)
    >> >my problem is as  follows, given a website, for example,http://www.nokia.com,
    >> >how can I really determine whether its the manufacturer site (official
    >> >nokia's site...) or not? (for example, htttp://www.nokia-fans.comis
    >> >not, assuming there is something like that...).

    >>
    >> >The real problem arises when the manufacturer name does NOT
    >> >corresponds the site name, for example, manufactuere name : YTXT , and
    >> >websitehttp://www.XXX.co.uk

    >>
    >> How do _you_ define "real manufacturer web site"? Manufacturer of what?
    >> Maybe Nokia-fans is a legitimate business, too, and has created and is
    >> marketing their own products, maybe related to Nokia, maybe not. Then
    >> which one is the correct "manufacturer" web site?  Now how do _you_
    >> know?

    >
    >first of all, thanks you all for replying.
    >secondly, I consider a web site as a manufacturer website if and only
    >if its the official site.
    >you folks already gave some points which I would go and try, of course
    >im not looking for 100% accuracy, but 90-95% would meet my
    >expectations.


    Ok, let me put it more bluntly: how is the program supposed to know,
    that a given item has been manufactured by Nokia and not by Nokia-fans?

    jue
     
    Jürgen Exner, Jan 13, 2010
    #7
  8. On 2010-01-13 17:49, Jürgen Exner <> wrote:
    > Nathan <> wrote:
    >>On Jan 13, 3:36 pm, Jürgen Exner <> wrote:
    >>> Nathan <> wrote:
    >>> >Its not really related to code, and more related to an algorithem.
    >>> >(which will be implemented in perl) my problem is as  follows,
    >>> >given a website, for example,http://www.nokia.com, how can I really
    >>> >determine whether its the manufacturer site (official nokia's
    >>> >site...) or not? (for example, htttp://www.nokia-fans.com is not,
    >>> >assuming there is something like that...).
    >>>
    >>> >The real problem arises when the manufacturer name does NOT
    >>> >corresponds the site name, for example, manufactuere name : YTXT ,
    >>> >and websitehttp://www.XXX.co.uk
    >>>
    >>> How do _you_ define "real manufacturer web site"? Manufacturer of what?
    >>> Maybe Nokia-fans is a legitimate business, too, and has created and is
    >>> marketing their own products, maybe related to Nokia, maybe not. Then
    >>> which one is the correct "manufacturer" web site?  Now how do _you_
    >>> know?

    >>
    >>first of all, thanks you all for replying.
    >>secondly, I consider a web site as a manufacturer website if and only
    >>if its the official site.
    >>you folks already gave some points which I would go and try, of course
    >>im not looking for 100% accuracy, but 90-95% would meet my
    >>expectations.

    >
    > Ok, let me put it more bluntly: how is the program supposed to know,
    > that a given item has been manufactured by Nokia and not by Nokia-fans?


    And for the sake of the argument assume that there is a company called
    "Nokia Fans" which produces fans, turbines and propellers.

    hp
     
    Peter J. Holzer, Jan 13, 2010
    #8
  9. Nathan

    smallpond Guest

    On Jan 13, 2:37 pm, "Peter J. Holzer" <> wrote:
    > On 2010-01-13 17:49, J rgen Exner <> wrote:
    >
    >
    >
    > > Nathan <> wrote:
    > >>On Jan 13, 3:36 pm, J rgen Exner <> wrote:
    > >>> Nathan <> wrote:
    > >>> >Its not really related to code, and more related to an algorithem.
    > >>> >(which will be implemented in perl) my problem is as follows,
    > >>> >given a website, for example,http://www.nokia.com, how can I really
    > >>> >determine whether its the manufacturer site (official nokia's
    > >>> >site...) or not? (for example, htttp://www.nokia-fans.comis not,
    > >>> >assuming there is something like that...).

    >
    > >>> >The real problem arises when the manufacturer name does NOT
    > >>> >corresponds the site name, for example, manufactuere name : YTXT ,
    > >>> >and websitehttp://www.XXX.co.uk

    >
    > >>> How do _you_ define "real manufacturer web site"? Manufacturer of what?
    > >>> Maybe Nokia-fans is a legitimate business, too, and has created and is
    > >>> marketing their own products, maybe related to Nokia, maybe not. Then
    > >>> which one is the correct "manufacturer" web site? Now how do _you_
    > >>> know?

    >
    > >>first of all, thanks you all for replying.
    > >>secondly, I consider a web site as a manufacturer website if and only
    > >>if its the official site.
    > >>you folks already gave some points which I would go and try, of course
    > >>im not looking for 100% accuracy, but 90-95% would meet my
    > >>expectations.

    >
    > > Ok, let me put it more bluntly: how is the program supposed to know,
    > > that a given item has been manufactured by Nokia and not by Nokia-fans?

    >
    > And for the sake of the argument assume that there is a company called
    > "Nokia Fans" which produces fans, turbines and propellers.
    >
    >         hp


    Also, there are supporters of that company called Nokia Fan fans
    who sell mugs with the Nokia Fan Fan logo from their official website
    which is hosted by eBay.

    I think the best bet is to look up the site in one of the directories
    like dir.yahoo.com. If the site domain matches the company listing
    then you have it. For example if you look up MySQL it will give you
    a link to www.sun.com.
     
    smallpond, Jan 13, 2010
    #9
  10. Nathan wrote:

    > Hello,
    > Its not really related to code, and more related to an algorithem.
    > (which will be implemented in perl)
    > my problem is as follows, given a website, for example,
    > http://www.nokia.com, how can I really determine whether its the
    > manufacturer site (official nokia's site...) or not? (for example,
    > htttp://www.nokia-fans.com is not, assuming there is something like
    > that...).
    >
    > The real problem arises when the manufacturer name does NOT
    > corresponds the site name, for example, manufactuere name : YTXT , and
    > website http://www.XXX.co.uk
    >
    > any idea?


    There's no way to do this automatically, without verifying it manually
    first, it's got to be determined by someone, at some point. This is
    whatsoever not a Perl question.
    --
    Not really a wanna-be, but I don't know everything.
     
    Wanna-Be Sys Admin, Jan 13, 2010
    #10
  11. Nathan

    Guest

    On Wed, 13 Jan 2010 05:49:54 -0800 (PST), Nathan <> wrote:

    >On Jan 13, 3:36 pm, Jürgen Exner <> wrote:
    >> Nathan <> wrote:
    >> >Its not really related to code, and more related to an algorithem.
    >> >(which will be implemented in perl)
    >> >my problem is as  follows, given a website, for example,http://www.nokia.com,
    >> >how can I really determine whether its the manufacturer site (official
    >> >nokia's site...) or not? (for example, htttp://www.nokia-fans.comis
    >> >not, assuming there is something like that...).

    >>
    >> >The real problem arises when the manufacturer name does NOT
    >> >corresponds the site name, for example, manufactuere name : YTXT , and
    >> >websitehttp://www.XXX.co.uk

    >>
    >> How do _you_ define "real manufacturer web site"? Manufacturer of what?
    >> Maybe Nokia-fans is a legitimate business, too, and has created and is
    >> marketing their own products, maybe related to Nokia, maybe not. Then
    >> which one is the correct "manufacturer" web site?  Now how do _you_
    >> know?
    >>
    >> jue

    >
    >first of all, thanks you all for replying.
    >secondly, I consider a web site as a manufacturer website if and only
    >if its the official site.
    >you folks already gave some points which I would go and try, of course
    >im not looking for 100% accuracy, but 90-95% would meet my
    >expectations.


    I think what you need is to hack domain registrar' database and
    do a heuristic comparison of corporate addresses gleaned from info
    gathered from registered stock symbols of known companys.
    Create your own database set to do weekly updates.

    Somewhere in between all this, as the reliability approaches %100
    (never reaching it, of course), your custom database will shrink.

    This is using the if/if/if/if..., think outside the box, method.
    Another absolute %99.99 method, is to be some resource compactor
    like Bill Gates.

    -sln
     
    , Jan 14, 2010
    #11
  12. Nathan <> writes:
    > Its not really related to code, and more related to an algorithem.
    > (which will be implemented in perl) my problem is as follows, given
    > a website, for example, http://www.nokia.com, how can I really
    > determine whether its the manufacturer site (official nokia's
    > site...) or not?

    [...]

    Yeah, that's us. :cool:}

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Keith Thompson, Jan 14, 2010
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. David
    Replies:
    2
    Views:
    1,007
    Anthony J Bybell
    Nov 2, 2004
  2. Mike Curry

    Character Recognition - Logins Etc..

    Mike Curry, Aug 4, 2004, in forum: ASP .Net
    Replies:
    0
    Views:
    455
    Mike Curry
    Aug 4, 2004
  3. Andreas Viklund via DotNetMonster.com

    object recognition

    Andreas Viklund via DotNetMonster.com, Mar 19, 2005, in forum: ASP .Net
    Replies:
    0
    Views:
    360
    Andreas Viklund via DotNetMonster.com
    Mar 19, 2005
Loading...

Share This Page