Extract domain name

Discussion in 'Perl Misc' started by Shabam, Nov 12, 2004.

  1. Shabam

    Shabam Guest

    How do you fetch just the domain name part of a variable in a script? The
    variable can be "http://www.domain.com/blahblah/whatever/page.htm" or
    "http://sub.domain.com/blahblah/whatever/page.htm".

    What I need is to extract just the "domain.com".
    Shabam, Nov 12, 2004
    #1
    1. Advertising

  2. Shabam

    Paul Lalli Guest

    [removed non-existant groups, removed off topic AOL group, set followups
    to c.l.p.m.]

    "Shabam" <> wrote in message
    news:...
    > How do you fetch just the domain name part of a variable in a script?

    The
    > variable can be "http://www.domain.com/blahblah/whatever/page.htm" or
    > "http://sub.domain.com/blahblah/whatever/page.htm".
    >
    > What I need is to extract just the "domain.com".


    Try using the Regexp::Common module from CPAN. I seem to recall it has
    a method for parsing URIs

    Paul Lalli
    Paul Lalli, Nov 12, 2004
    #2
    1. Advertising

  3. Look for URI module. IMHO, its a good and simple thing for parsing URLs

    use URI;
    ($domain = URI->new("http://www.domain.com/blahblah/whatever/page.htm")->authority) =~ s/^www\.//i


    Regards,
    Andrew

    Shabam wrote on 12 ÐоÑбрь 2004 16:02:

    > How do you fetch just the domain name part of a variable in a script? The
    > variable can be "http://www.domain.com/blahblah/whatever/page.htm" or
    > "http://sub.domain.com/blahblah/whatever/page.htm".
    >
    > What I need is to extract just the "domain.com".


    --
    Andrew
    Andrew Tkachenko, Nov 12, 2004
    #3
  4. Sorry, did'nt pay attention to sub-domains in your example.
    So, IMHO, it depends on your task - if it allows to guess possible
    TLD values, then just split domain name into parts and leave just matched
    TLD and SLD.

    Regards,
    Andrew

    Ryan Thompson wrote on 12 ÐоÑбрь 2004 17:38:

    > [ Cross-post trimmed ]
    >
    > Shabam wrote to :
    >
    >> How do you fetch just the domain name part of a variable in a script?
    >> The variable can be "http://www.domain.com/blahblah/whatever/page.htm"
    >> or "http://sub.domain.com/blahblah/whatever/page.htm".
    >>
    >> What I need is to extract just the "domain.com".

    >
    > This is definitely a non-trivial problem. Fortunately, it's been
    > partially solved already. I'm involved in the SpamAssassin and SURBL
    > projects, where this really became obvious when spammers started
    > obfuscating URIs, and using domains from many different TLDs where it
    > takes a lot of research to determine where to chop the hostname to get
    > the actual registrar domain.
    >
    > There's much more to it than using a library or regexp.
    >
    > See get_uri_list() in SpamAssassin 3's PerMsgStatus.pm for one
    > "industrial strength" solution to this problem, which still has room for
    > improvement.
    >
    > - Ryan
    >


    --
    Andrew
    Andrew Tkachenko, Nov 12, 2004
    #4
  5. Shabam

    Joe Smith Guest

    Shabam wrote:

    > How do you fetch just the domain name part of a variable in a script? The
    > variable can be "http://www.domain.com/blahblah/whatever/page.htm" or
    > "http://sub.domain.com/blahblah/whatever/page.htm".
    >
    > What I need is to extract just the "domain.com".


    The problem is not well defined.

    For "http://www.tacp.toshiba.com/" do you want "tacp.toshiba.com" or just
    "toshiba.com"? For "http://story.news.yahoo.com", is "news" included or not?
    You can't just use the last two components in all cases, such as
    "http://www.toyota.co.jp" or "http://www.bbc.co.uk".

    -Joe
    Joe Smith, Nov 14, 2004
    #5
  6. Shabam

    Shabam Guest

    > The problem is not well defined.
    >
    > For "http://www.tacp.toshiba.com/" do you want "tacp.toshiba.com" or just
    > "toshiba.com"? For "http://story.news.yahoo.com", is "news" included or

    not?
    > You can't just use the last two components in all cases, such as
    > "http://www.toyota.co.jp" or "http://www.bbc.co.uk".


    What I would need is just the domain name part. In this case it would be
    "toshiba.com" only. No subdomains. My domains will be simple
    (com/net/org), so complicated situations like "toyota.co.jp" wouldn't apply.
    Shabam, Nov 14, 2004
    #6
  7. Shabam

    sam Guest

    Shabam wrote:

    >>The problem is not well defined.
    >>
    >>For "http://www.tacp.toshiba.com/" do you want "tacp.toshiba.com" or just
    >>"toshiba.com"? For "http://story.news.yahoo.com", is "news" included or

    >
    > not?
    >
    >>You can't just use the last two components in all cases, such as
    >>"http://www.toyota.co.jp" or "http://www.bbc.co.uk".

    >
    >
    > What I would need is just the domain name part. In this case it would be
    > "toshiba.com" only. No subdomains. My domains will be simple
    > (com/net/org), so complicated situations like "toyota.co.jp" wouldn't apply.
    >
    >

    I m not an expert, but the following regex will apply:

    $url = "http://www.abc.xyz.toy-0-ota.com";
    ($domain) = ($url =~ /http:\/\/.*\.([0-9a-zA-Z\-]+\.com|net|org)/);
    print $domain . "\n";

    Sam
    sam, Nov 18, 2004
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Shabam

    Extract domain name

    Shabam, Nov 12, 2004, in forum: Perl
    Replies:
    2
    Views:
    569
    Andrew Tkachenko
    Nov 12, 2004
  2. Joey
    Replies:
    0
    Views:
    332
  3. Berlin  Brown
    Replies:
    9
    Views:
    25,748
    Nigel Wade
    Mar 21, 2006
  4. Chem Leakhina
    Replies:
    2
    Views:
    128
    Robert Klemme
    Jun 23, 2009
  5. Charles Calvert

    Extract domain name

    Charles Calvert, Aug 20, 2010, in forum: Ruby
    Replies:
    5
    Views:
    138
    Charles Calvert
    Aug 23, 2010
Loading...

Share This Page