Complex regex question

Discussion in 'Perl Misc' started by Tuxedo, Sep 26, 2009.

  1. Tuxedo

    Tuxedo Guest

    Hi,

    I use a simple grep procedure munching through a domain zone file to return
    a report of existing domains against any particular keyword, which also
    includes matches in the nameserver field (although it is not meant to). For
    example, this is the first few lines result of 'grep KOMODO zonefile.txt':

    KOMODODRAGON NS DNS2.GORGE.NET.
    KOMODODRAGON NS SERV.GORGE.NET.
    HELIOCENTRIC NS NS1.KOMODOTEK
    HELIOCENTRIC NS NS2.KOMODOTEK
    DIVEKOMODO NS NS1.PUREHOST
    DIVEKOMODO NS NS2.PUREHOST
    KOMODO-TECH NS NS1.CISCO
    KOMODO-TECH NS NS2.CISCO
    KOMODOSYSTEM NS DNS.NETFORCE.IT.
    KOMODOSYSTEM NS NS2.IPOINT.IT.
    KOMODOISLAND-TOURS NS NS1.BALINTER.NET.
    KOMODOISLAND-TOURS NS NS2.BALINTER.NET.

    Any domain match, being the first string starting with a new line, may have
    two or more name servers associated with the domain, so the result is one
    line p/match and name server (usually two but sometimes more lines).

    However, I would like to output a list with only one line p/domain match,
    regardless of number of nameservers.

    I would also like to exclude any occurrence returned from the nameserver
    field, ie. anything after a white space (eg. the third and fourth listing
    in the above example should not occur at all). In other words, only return
    matches that are not having a whitespace occuring before a new line (does
    this make sense.?.).

    So the result is stripping any matches in the nameserver output altogether
    as well as any duplicate domains. When the above list is processed, the
    result would be simply one domain p/line and one line p/domain:

    KOMODODRAGON
    DIVEKOMODO
    KOMODO-TECH
    KOMODOSYSTEM
    KOMODOISLAND-TOURS

    The purpose is simply to return a list of domains against a particular
    keyword, stripping the irrelevant parts. I'm not quite sure how to do this,
    although I guess Perl is the best tool, being the de-facto regex master!

    Any suggestions or snippet code would be greatly appreciated!

    Many thanks,
    Tuxedo
     
    Tuxedo, Sep 26, 2009
    #1
    1. Advertising

  2. Tuxedo

    Tuxedo Guest

    Tad J McClellan wrote:

    [...]

    > perldoc -q duplicate


    [...]

    > ---------------
    > #!/usr/bin/perl
    > use warnings;
    > use strict;
    >
    > my $term = 'KOMODO';
    >
    > my %seen;
    > while ( <DATA> ) {
    > if ( /^(\S*$term\S*)/ ) {
    > print "$1\n" unless $seen{$1}++;
    > }
    > }
    >


    [...]

    Thanks for the perldoc tip and for the working regex magic :)

    Tuxedo
     
    Tuxedo, Sep 26, 2009
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. news.amnet.net.au
    Replies:
    1
    Views:
    588
    =?UTF-8?b?TMSByrtpZSBUZWNoaWU=?=
    Apr 13, 2004
  2. Stanimir Stamenkov
    Replies:
    2
    Views:
    763
    Stanimir Stamenkov
    Oct 25, 2005
  3. Robert Mark Bram
    Replies:
    0
    Views:
    700
    Robert Mark Bram
    Feb 4, 2007
  4. Replies:
    3
    Views:
    794
    Reedick, Andrew
    Jul 1, 2008
  5. Kottiyath

    How complex is complex?

    Kottiyath, Mar 18, 2009, in forum: Python
    Replies:
    22
    Views:
    776
Loading...

Share This Page