Re: Can JavaMail detect a non-existant email address?

D

David Segall

I would like to check incoming mail to determine if the sender's
address is reachable without sending an email to the address. Is this
possible using JavaMail? If so, some sample code would be very much
appreciated. False positives are no problem and a small percentage of
false negatives is tolerable.

In case it has some bearing on the answer the application is a spam
filter that treats as spam anything that cannot be replied to.

My apologies for the cross post. It appears that nobody reads
comp.lang.java.misc.
 
H

Harald Hein

David Segall said:
I would like to check incoming mail to determine if the sender's
address is reachable without sending an email to the address. Is this
possible using JavaMail?

It is not possible with ANY mailing package, due to the limitations of
the SMTP protocol.

You could check if there is an MX record for the sender domain - but
this would not guarantee that there is such a user at the domain. You
could use VRFY to check if a user exists - but you would have to
connect to the domain's mail server, and many admins have turned this
feature off because of privacy issues and abuse by spammers for
address verification.
In case it has some bearing on the answer the application is a spam
filter that treats as spam anything that cannot be replied to.

You are building a spam filter, but have no idea how e-mail works? Bad
idea.
 
S

Sudsy

David said:
I would like to check incoming mail to determine if the sender's
address is reachable without sending an email to the address. Is this
possible using JavaMail? If so, some sample code would be very much
appreciated. False positives are no problem and a small percentage of
false negatives is tolerable.

In case it has some bearing on the answer the application is a spam
filter that treats as spam anything that cannot be replied to.

My apologies for the cross post. It appears that nobody reads
comp.lang.java.misc.

There's no guaranteed way to validate the username of the claimed
sender. There's the VRFY command in RFC821 but it's no longer
reliable, primarily because of the spammers. What you CAN do is
verify the originating domain using JNDI. Following is a small
program which effectively does 'nslookup -type=MX <host> | wc -l'
but in Java. Note that an exception will be thrown if the lookup
fails (no DNS records for hostname).

import java.util.Hashtable;
import javax.naming.NamingEnumeration;
import javax.naming.NamingException;
import javax.naming.directory.DirContext;
import javax.naming.directory.InitialDirContext;
import javax.naming.directory.Attribute;
import javax.naming.directory.Attributes;
import javax.naming.directory.BasicAttribute;

public class MXLookup {

public static void main( String args[] ) {
if( args.length == 0 ) {
System.err.println( "Usage: MXLookup host [...]" );
System.exit( 12 );
}
for( int i = 0; i < args.length; i++ ) {
try {
System.out.println( args + " has " +
doLookup( args ) + " mail servers" );
}
catch( Exception e ) {
e.printStackTrace();
}
}
}

static int doLookup( String hostName ) throws NamingException {
Hashtable env = new Hashtable();
env.put("java.naming.factory.initial",
"com.sun.jndi.dns.DnsContextFactory");
DirContext ictx = new InitialDirContext( env );
Attributes attrs = ictx.getAttributes( hostName,
new String[] { "MX" });
Attribute attr = attrs.get( "MX" );
if( attr == null )
return( 0 );
return( attr.size() );
}
}
 
G

GaryM

There's no guaranteed way to validate the username of the claimed
sender. There's the VRFY command in RFC821 but it's no longer
reliable, primarily because of the spammers. What you CAN do is
verify the originating domain using JNDI. Following is a small
program which effectively does 'nslookup -type=MX <host> | wc -l'
but in Java. Note that an exception will be thrown if the lookup
fails (no DNS records for hostname).

One thing to remember about this approach is the some companies write
to you from a domain that has no MX record. I think Fidelity
Investments does this (or did). Consequently if a spam test is based on
a MX record not existing you may get a unwanted false positives. Better
to also include a test for an 'A' record. By doing this you are
effectively rejecting falsified hosts.

HTH.
 
R

Roedy Green

I would like to check incoming mail to determine if the sender's
address is reachable without sending an email to the address. Is this
possible using JavaMail?

I validate email addresses with some regexes and then check the
domains with MX addresses to see if they exist.

The following code is a somewhat stricter than the RFC.

package com.mindprod.bulk;

import java.util.HashSet;
import java.util.Locale;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

import javax.mail.internet.AddressException;
import javax.mail.internet.InternetAddress;

/**
* Validate syntax of email addresses.
* Does not probe to see if mailserver exists in DNS or online.
* See MailProber for that.
* See ValidateEmailFile for an example of how to use this class.
*
* @author Roedy Green, Canadian Mind Products
* @version 1.0
* to do: check validity of & in first part of email address. Appears
in practice.
*/
public class EmailSyntaxValidator
{
private static boolean debugging = false;

/**
* Check how likely an email address is to be valid.
* The higher the number returned, the more likely the address is
valid.
* This method does not probe the internet in any way to see if
* the corresponding mail server or domain exists.
*
* @param email bare computer email address.
* e.g. (e-mail address removed)
* No "Roedy Green" <[email protected]> style
addresses.
* No local addresses, eg. roedy.
*
* @return 0 = email address is definitely malformed, e.g. missing
@.
* ends in .invalid
* <br>
* 1 = address does not meet one of the valid patterns
below.
* It still might be ok according to some obscure rule
in RFC 822
* Java InternetAddress accepts it as valid.
* <br>
* 2 = unknown top level domain.
* <br>
* 3 = dots at beginning or end, doubled in name.
* <br>
* 4 = address of form xxx@[209.139.205.2] using IP
* <br>
* 5 = address of form (e-mail address removed) Dots _ or
- in first part of name
* <br>
* 6 = addreess of form (e-mail address removed) rare, but known,
domain
* <br>
* 7 = address of form (e-mail address removed) or any national
suffix.
* <br>
* 8 = address of form (e-mail address removed) the matching this
national suffix,
* e.g. .ca in Canada, .de in Germany
* <br>
* 9 = address of form (e-mail address removed) .org .net .edu .gov
..biz -- official domains
*/
public static int howValid(String email)
{
if ( email == null )
{
return 0;
}
email = email.trim().toLowerCase();
int dotPlace = email.lastIndexOf('.');
if ( 0 < dotPlace && dotPlace < email.length()-1 )
{
/* have at least x.y */
String tld = email.substring(dotPlace+1);
if ( badTLDs.contains(tld) )
{
/* deliberate invalid address */
return 0;
}
// make sure none of fragments start or end in _ or -
String[] fragments = splitter.split(email);
boolean clean = true;
for ( int i=0; i<fragments.length; i++ )
{
if ( fragments.startsWith("_") ||
fragments.endsWith("_") ||
fragments.startsWith("-") ||
fragments.endsWith("-") )
{
clean = false;
break;
}
} // end for
if ( clean )
{
Matcher m9 = p9.matcher(email);
if ( m9.matches() )
{
if ( officialTLDs.contains(tld) ) return 9;
else if ( thisCountry.equals(tld) ) return 8;
else if ( nationalTLDs.contains(tld) ) return 7;
else if ( rareTLDs.contains(tld) ) return 6;
else return 3; /* unknown tld */
}
// allow dots in name
Matcher m5 = p5.matcher(email);
if ( m5.matches() )
{
if ( officialTLDs.contains(tld) ) return 5;
else if ( thisCountry.equals(tld) ) return 5;
else if ( nationalTLDs.contains(tld) ) return 5;
else if ( rareTLDs.contains(tld) ) return 5;
else return 2; /* unknown tld */
}

// IP
Matcher m4 = p4.matcher(email);
if ( m4.matches() ) return 4; /* can't tell TLD */

// allow even lead/trail dots in name, except at start of
domain
Matcher m3 = p3.matcher(email);
if ( m3.matches() )
{
if ( officialTLDs.contains(tld) ) return 3;
else if ( thisCountry.equals(tld) ) return 3;
else if ( nationalTLDs.contains(tld) ) return 3;
else if ( rareTLDs.contains(tld) ) return 3;
else return 2; /* unknown domain */
}
} // end if clean
}
// allow even unclean addresses, and addresses without a TLD to
have a whack at passing RFC:822
try
{

/* see if InternetAddress likes it, it follows RFC:822. It
will names without domains though. */
InternetAddress.parse(email, true /* strict */);
// it liked it, no exception happened. Seems very sloppy.
return 1;
}
catch ( AddressException e )
{
// it did not like it
return 0;
}
}

// allow _ - in name, lead and trailing ones are filtered later, no
+.
static Pattern p9 =
Pattern.compile("[a-z0-9\\-_]++@[a-z0-9\\-_]++(\\.[a-z0-9\\-_]++)++");

// to split into fields
static Pattern splitter = Pattern.compile("[@\\.]");

// to allow - _ dots in name, no +
static Pattern p5 =
Pattern.compile("[a-z0-9\\-_]++(\\.[a-z0-9\\-_]++)*@[a-z0-9\\-_]++(\\.[a-z0-9\\-_]++)++");

// IP style names, no +
static Pattern p4 =
Pattern.compile("[a-z0-9\\-_]++(\\.[a-z0-9\\-_]++)*@\\[([0-9]{1,3}\\.){3}[0-9]{1,3}\\]");

// allow dots anywhere, but not at start of domain name, no +
static Pattern p3 =
Pattern.compile("[a-z0-9\\-_\\.]++@[a-z0-9\\-_]++(\\.[a-z0-9\\-_]++)++");

/**
* build a HashSet from a array of String literals.
*
* @param list array of strings
* @return HashSet you can use to test if a string is in the set.
*/
static HashSet hmaker(String[] list)
{
HashSet map = new HashSet(Math.max((int) (list.length/.75f) + 1,
16));
for ( int i=0; i<list.length; i++ )
{
map.add(list);
}
return map;
}

static final String thisCountry =
Locale.getDefault().getCountry().toLowerCase();

static final HashSet officialTLDs =
hmaker(new String[]
{
"aero",
"biz",
"coop",
"com",
"edu",
"gov",
"info",
"mil",
"museum",
"name",
"net",
"org",
"pro",
});

static final HashSet rareTLDs =
hmaker(new String[]
{
"cam",
"mp3",
"agent",
"art",
"arts",
"asia",
"auction",
"aus",
"bank",
"cam",
"chat",
"church",
"club",
"corp",
"dds",
"design",
"dns2go",
"e",
"email",
"exp",
"fam",
"family",
"faq",
"fed",
"film",
"firm",
"free",
"fun",
"g",
"game",
"games",
"gay",
"ger",
"globe",
"gmbh",
"golf",
"gov",
"help",
"hola",
"i",
"inc",
"int",
"jpn",
"k12",
"kids",
"law",
"learn",
"llb",
"llc",
"llp",
"lnx",
"love",
"ltd",
"mag",
"mail",
"med",
"media",
"mp3",
"netz",
"nic",
"nom",
"npo",
"per",
"pol",
"prices",
"radio",
"rsc",
"school",
"scifi",
"sea",
"service",
"sex",
"shop",
"sky",
"soc",
"space",
"sport",
"tech",
"tour",
"travel",
"usvi",
"video",
"web",
"wine",
"wir",
"wired",
"zine",
"zoo",
});

static final HashSet nationalTLDs =
hmaker(new String[]
{
"ac",
"ad",
"ae",
"af",
"ag",
"ai",
"al",
"am",
"an",
"ao",
"aq",
"ar",
"as",
"at",
"au",
"aw",
"az",
"ba",
"bb",
"bd",
"be",
"bf",
"bg",
"bh",
"bi",
"bj",
"bm",
"bn",
"bo",
"br",
"bs",
"bt",
"bv",
"bw",
"by",
"bz",
"ca",
"cc",
"cd",
"cf",
"cg",
"ch",
"ci",
"ck",
"cl",
"cm",
"cn",
"co",
"cr",
"cu",
"cv",
"cx",
"cy",
"cz",
"de",
"dj",
"dk",
"dm",
"do",
"dz",
"ec",
"ee",
"eg",
"eh",
"er",
"es",
"et",
"fi",
"fj",
"fk",
"fm",
"fo",
"fr",
"fx",
"ga",
"gb",
"gd",
"ge",
"gf",
"gg",
"gh",
"gi",
"gl",
"gm",
"gn",
"gp",
"gq",
"gr",
"gs",
"gt",
"gu",
"gw",
"gy",
"hk",
"hm",
"hn",
"hr",
"ht",
"hu",
"id",
"ie",
"il",
"im",
"in",
"io",
"iq",
"ir",
"is",
"it",
"je",
"jm",
"jo",
"jp",
"ke",
"kg",
"kh",
"ki",
"km",
"kn",
"kp",
"kr",
"kw",
"ky",
"kz",
"la",
"lb",
"lc",
"li",
"lk",
"lr",
"ls",
"lt",
"lu",
"lv",
"ly",
"ma",
"mc",
"md",
"mg",
"mh",
"mk",
"ml",
"mm",
"mn",
"mo",
"mp",
"mq",
"mr",
"ms",
"mt",
"mu",
"mv",
"mw",
"mx",
"my",
"mz",
"na",
"nc",
"ne",
"nf",
"ng",
"ni",
"nl",
"no",
"np",
"nr",
"nu",
"nz",
"om",
"pa",
"pe",
"pf",
"pg",
"ph",
"pk",
"pl",
"pm",
"pn",
"pr",
"ps",
"pt",
"pw",
"py",
"qa",
"re",
"ro",
"ru",
"rw",
"sa",
"sb",
"sc",
"sd",
"se",
"sg",
"sh",
"si",
"sj",
"sk",
"sl",
"sm",
"sn",
"so",
"sr",
"st",
"sv",
"sy",
"sz",
"tc",
"td",
"tf",
"tg",
"th",
"tj",
"tk",
"tm",
"tn",
"to",
"tp",
"tr",
"tt",
"tv",
"tw",
"tz",
"ua",
"ug",
"uk",
"um",
"us",
"uy",
"uz",
"va",
"vc",
"ve",
"vg",
"vi",
"vn",
"vu",
"wf",
"ws",
"ye",
"yt",
"yu",
"za",
"zm",
"zw",
});

static final HashSet badTLDs =
hmaker(new String[]
{
"invalid",
"nowhere",
"noone",
});


public static void main (String[] args)
{
System.out.println(howValid("[email protected]"));
}
} // end class EmailSyntaxValidator
 
D

David Segall

Harald Hein said:
It is not possible with ANY mailing package, due to the limitations of
the SMTP protocol.

You could check if there is an MX record for the sender domain - but
this would not guarantee that there is such a user at the domain. You
could use VRFY to check if a user exists - but you would have to
connect to the domain's mail server, and many admins have turned this
feature off because of privacy issues and abuse by spammers for
address verification.


You are building a spam filter, but have no idea how e-mail works? Bad
idea.
Thanks for the information and for your caution. I will probably
ignore the caution because I find writing some code the best way of
learning about such a topic.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,733
Messages
2,569,440
Members
44,832
Latest member
GlennSmall

Latest Threads

Top