A
Alan J. Flavell
Sorry, I'd love to put a pithy description of the problem in the
subject line, just like the smart-questions FAQ, but this time I
couldn't manage it. Here's what my problem boils down to, as best I
can simplify it (motivation at the end, for anyone who cares).
Simplified problem:
I've got a block of text that contains several dotted IP addresses in
the form [a.b.c.d]. What I need to do is find the first one of those
addresses which I don't recognise.
To define "which I don't recognise", I can provide a list of explicit
addresses, or a pattern, or whatever's convenient.
OK, I've no problem matching a dotted IP address and capturing the
result, that's easy. What I can't work out a strategy for, is how to
skip over a match if it matches a list of, or pattern of, addresses
which aren't of interest.
The constraint of the actual application is that I have to supply a
Perl-compatible regex, which will return the answer via the regex
capture mechanism (...). So, program loops through the text aren't
feasible, it seems.
There will be further such [...] in the text, so it's not the last one
that I'm looking for: it's the first one that doesn't match one of the
known addresses.
Advice please?
OK, the motivation. This wodge of text is in fact the concatenated
contents of a bunch of "Received:" headers from forwarded mail. We
know where the forwarded mail came from (those will be the addresses
which I already know about and am not interested in, so I want to skip
their matches), and it might have been forwarded several times between
different mail servers within the forwarding site (it varies between
examples), so we'll want to skip over a variable number of IPs that we
recognise, in order to pick up the first one that we don't recognise.
This will then be the IP address from which _they_ accepted the mail
before forwarding it to us, and I want to get that IP so that I can
look it up in a dnsRBL to help decide whether it is forwarded spam.
There are several (a small number) of forwarding sites of interest to
us, but if I can see a strategy for dealing with one, then I don't see
any problem with extending it to a few more. It's just that I don't
know how to make the match against, say, \[(\d+\.\d+\.\d+\.\d+)\] get
skipped if it happens to one of the ones that I'm not interested in.
(yes, I have pored over perlretut, but perhaps I'm looking at the
problem in the wrong way...)
cheers
subject line, just like the smart-questions FAQ, but this time I
couldn't manage it. Here's what my problem boils down to, as best I
can simplify it (motivation at the end, for anyone who cares).
Simplified problem:
I've got a block of text that contains several dotted IP addresses in
the form [a.b.c.d]. What I need to do is find the first one of those
addresses which I don't recognise.
To define "which I don't recognise", I can provide a list of explicit
addresses, or a pattern, or whatever's convenient.
OK, I've no problem matching a dotted IP address and capturing the
result, that's easy. What I can't work out a strategy for, is how to
skip over a match if it matches a list of, or pattern of, addresses
which aren't of interest.
The constraint of the actual application is that I have to supply a
Perl-compatible regex, which will return the answer via the regex
capture mechanism (...). So, program loops through the text aren't
feasible, it seems.
There will be further such [...] in the text, so it's not the last one
that I'm looking for: it's the first one that doesn't match one of the
known addresses.
Advice please?
OK, the motivation. This wodge of text is in fact the concatenated
contents of a bunch of "Received:" headers from forwarded mail. We
know where the forwarded mail came from (those will be the addresses
which I already know about and am not interested in, so I want to skip
their matches), and it might have been forwarded several times between
different mail servers within the forwarding site (it varies between
examples), so we'll want to skip over a variable number of IPs that we
recognise, in order to pick up the first one that we don't recognise.
This will then be the IP address from which _they_ accepted the mail
before forwarding it to us, and I want to get that IP so that I can
look it up in a dnsRBL to help decide whether it is forwarded spam.
There are several (a small number) of forwarding sites of interest to
us, but if I can see a strategy for dealing with one, then I don't see
any problem with extending it to a few more. It's just that I don't
know how to make the match against, say, \[(\d+\.\d+\.\d+\.\d+)\] get
skipped if it happens to one of the ones that I'm not interested in.
(yes, I have pored over perlretut, but perhaps I'm looking at the
problem in the wrong way...)
cheers