Complex regex question

T

Tuxedo

Hi,

I use a simple grep procedure munching through a domain zone file to return
a report of existing domains against any particular keyword, which also
includes matches in the nameserver field (although it is not meant to). For
example, this is the first few lines result of 'grep KOMODO zonefile.txt':

KOMODODRAGON NS DNS2.GORGE.NET.
KOMODODRAGON NS SERV.GORGE.NET.
HELIOCENTRIC NS NS1.KOMODOTEK
HELIOCENTRIC NS NS2.KOMODOTEK
DIVEKOMODO NS NS1.PUREHOST
DIVEKOMODO NS NS2.PUREHOST
KOMODO-TECH NS NS1.CISCO
KOMODO-TECH NS NS2.CISCO
KOMODOSYSTEM NS DNS.NETFORCE.IT.
KOMODOSYSTEM NS NS2.IPOINT.IT.
KOMODOISLAND-TOURS NS NS1.BALINTER.NET.
KOMODOISLAND-TOURS NS NS2.BALINTER.NET.

Any domain match, being the first string starting with a new line, may have
two or more name servers associated with the domain, so the result is one
line p/match and name server (usually two but sometimes more lines).

However, I would like to output a list with only one line p/domain match,
regardless of number of nameservers.

I would also like to exclude any occurrence returned from the nameserver
field, ie. anything after a white space (eg. the third and fourth listing
in the above example should not occur at all). In other words, only return
matches that are not having a whitespace occuring before a new line (does
this make sense.?.).

So the result is stripping any matches in the nameserver output altogether
as well as any duplicate domains. When the above list is processed, the
result would be simply one domain p/line and one line p/domain:

KOMODODRAGON
DIVEKOMODO
KOMODO-TECH
KOMODOSYSTEM
KOMODOISLAND-TOURS

The purpose is simply to return a list of domains against a particular
keyword, stripping the irrelevant parts. I'm not quite sure how to do this,
although I guess Perl is the best tool, being the de-facto regex master!

Any suggestions or snippet code would be greatly appreciated!

Many thanks,
Tuxedo
 
T

Tuxedo

Tad J McClellan wrote:

[...]
perldoc -q duplicate
[...]

---------------
#!/usr/bin/perl
use warnings;
use strict;

my $term = 'KOMODO';

my %seen;
while ( <DATA> ) {
if ( /^(\S*$term\S*)/ ) {
print "$1\n" unless $seen{$1}++;
}
}

[...]

Thanks for the perldoc tip and for the working regex magic :)

Tuxedo
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

Forum statistics

Threads
473,755
Messages
2,569,535
Members
45,007
Latest member
obedient dusk

Latest Threads

Top