Re: Can JavaMail detect a non-existant email address?

Discussion in 'Java' started by David Segall, Nov 9, 2003.

  1. David Segall

    David Segall Guest

    I would like to check incoming mail to determine if the sender's
    address is reachable without sending an email to the address. Is this
    possible using JavaMail? If so, some sample code would be very much
    appreciated. False positives are no problem and a small percentage of
    false negatives is tolerable.

    In case it has some bearing on the answer the application is a spam
    filter that treats as spam anything that cannot be replied to.

    My apologies for the cross post. It appears that nobody reads
    comp.lang.java.misc.
    David Segall, Nov 9, 2003
    #1
    1. Advertising

  2. David Segall

    Harald Hein Guest

    "David Segall" wrote:

    > I would like to check incoming mail to determine if the sender's
    > address is reachable without sending an email to the address. Is this
    > possible using JavaMail?


    It is not possible with ANY mailing package, due to the limitations of
    the SMTP protocol.

    You could check if there is an MX record for the sender domain - but
    this would not guarantee that there is such a user at the domain. You
    could use VRFY to check if a user exists - but you would have to
    connect to the domain's mail server, and many admins have turned this
    feature off because of privacy issues and abuse by spammers for
    address verification.

    > In case it has some bearing on the answer the application is a spam
    > filter that treats as spam anything that cannot be replied to.


    You are building a spam filter, but have no idea how e-mail works? Bad
    idea.
    Harald Hein, Nov 9, 2003
    #2
    1. Advertising

  3. David Segall

    Sudsy Guest

    David Segall wrote:
    > I would like to check incoming mail to determine if the sender's
    > address is reachable without sending an email to the address. Is this
    > possible using JavaMail? If so, some sample code would be very much
    > appreciated. False positives are no problem and a small percentage of
    > false negatives is tolerable.
    >
    > In case it has some bearing on the answer the application is a spam
    > filter that treats as spam anything that cannot be replied to.
    >
    > My apologies for the cross post. It appears that nobody reads
    > comp.lang.java.misc.


    There's no guaranteed way to validate the username of the claimed
    sender. There's the VRFY command in RFC821 but it's no longer
    reliable, primarily because of the spammers. What you CAN do is
    verify the originating domain using JNDI. Following is a small
    program which effectively does 'nslookup -type=MX <host> | wc -l'
    but in Java. Note that an exception will be thrown if the lookup
    fails (no DNS records for hostname).

    import java.util.Hashtable;
    import javax.naming.NamingEnumeration;
    import javax.naming.NamingException;
    import javax.naming.directory.DirContext;
    import javax.naming.directory.InitialDirContext;
    import javax.naming.directory.Attribute;
    import javax.naming.directory.Attributes;
    import javax.naming.directory.BasicAttribute;

    public class MXLookup {

    public static void main( String args[] ) {
    if( args.length == 0 ) {
    System.err.println( "Usage: MXLookup host [...]" );
    System.exit( 12 );
    }
    for( int i = 0; i < args.length; i++ ) {
    try {
    System.out.println( args + " has " +
    doLookup( args ) + " mail servers" );
    }
    catch( Exception e ) {
    e.printStackTrace();
    }
    }
    }

    static int doLookup( String hostName ) throws NamingException {
    Hashtable env = new Hashtable();
    env.put("java.naming.factory.initial",
    "com.sun.jndi.dns.DnsContextFactory");
    DirContext ictx = new InitialDirContext( env );
    Attributes attrs = ictx.getAttributes( hostName,
    new String[] { "MX" });
    Attribute attr = attrs.get( "MX" );
    if( attr == null )
    return( 0 );
    return( attr.size() );
    }
    }
    Sudsy, Nov 9, 2003
    #3
  4. David Segall

    GaryM Guest

    Sudsy <> wrote in news:3FAE7D10.8000702
    @hotmail.com:

    > There's no guaranteed way to validate the username of the claimed
    > sender. There's the VRFY command in RFC821 but it's no longer
    > reliable, primarily because of the spammers. What you CAN do is
    > verify the originating domain using JNDI. Following is a small
    > program which effectively does 'nslookup -type=MX <host> | wc -l'
    > but in Java. Note that an exception will be thrown if the lookup
    > fails (no DNS records for hostname).


    One thing to remember about this approach is the some companies write
    to you from a domain that has no MX record. I think Fidelity
    Investments does this (or did). Consequently if a spam test is based on
    a MX record not existing you may get a unwanted false positives. Better
    to also include a test for an 'A' record. By doing this you are
    effectively rejecting falsified hosts.

    HTH.
    GaryM, Nov 9, 2003
    #4
  5. David Segall

    Roedy Green Guest

    On Sun, 09 Nov 2003 12:09:09 GMT, David Segall
    <> wrote or quoted :

    >I would like to check incoming mail to determine if the sender's
    >address is reachable without sending an email to the address. Is this
    >possible using JavaMail?


    I validate email addresses with some regexes and then check the
    domains with MX addresses to see if they exist.

    The following code is a somewhat stricter than the RFC.

    package com.mindprod.bulk;

    import java.util.HashSet;
    import java.util.Locale;
    import java.util.regex.Matcher;
    import java.util.regex.Pattern;

    import javax.mail.internet.AddressException;
    import javax.mail.internet.InternetAddress;

    /**
    * Validate syntax of email addresses.
    * Does not probe to see if mailserver exists in DNS or online.
    * See MailProber for that.
    * See ValidateEmailFile for an example of how to use this class.
    *
    * @author Roedy Green, Canadian Mind Products
    * @version 1.0
    * to do: check validity of & in first part of email address. Appears
    in practice.
    */
    public class EmailSyntaxValidator
    {
    private static boolean debugging = false;

    /**
    * Check how likely an email address is to be valid.
    * The higher the number returned, the more likely the address is
    valid.
    * This method does not probe the internet in any way to see if
    * the corresponding mail server or domain exists.
    *
    * @param email bare computer email address.
    * e.g.
    * No "Roedy Green" <> style
    addresses.
    * No local addresses, eg. roedy.
    *
    * @return 0 = email address is definitely malformed, e.g. missing
    @.
    * ends in .invalid
    * <br>
    * 1 = address does not meet one of the valid patterns
    below.
    * It still might be ok according to some obscure rule
    in RFC 822
    * Java InternetAddress accepts it as valid.
    * <br>
    * 2 = unknown top level domain.
    * <br>
    * 3 = dots at beginning or end, doubled in name.
    * <br>
    * 4 = address of form xxx@[209.139.205.2] using IP
    * <br>
    * 5 = address of form Dots _ or
    - in first part of name
    * <br>
    * 6 = addreess of form rare, but known,
    domain
    * <br>
    * 7 = address of form or any national
    suffix.
    * <br>
    * 8 = address of form the matching this
    national suffix,
    * e.g. .ca in Canada, .de in Germany
    * <br>
    * 9 = address of form .org .net .edu .gov
    ..biz -- official domains
    */
    public static int howValid(String email)
    {
    if ( email == null )
    {
    return 0;
    }
    email = email.trim().toLowerCase();
    int dotPlace = email.lastIndexOf('.');
    if ( 0 < dotPlace && dotPlace < email.length()-1 )
    {
    /* have at least x.y */
    String tld = email.substring(dotPlace+1);
    if ( badTLDs.contains(tld) )
    {
    /* deliberate invalid address */
    return 0;
    }
    // make sure none of fragments start or end in _ or -
    String[] fragments = splitter.split(email);
    boolean clean = true;
    for ( int i=0; i<fragments.length; i++ )
    {
    if ( fragments.startsWith("_") ||
    fragments.endsWith("_") ||
    fragments.startsWith("-") ||
    fragments.endsWith("-") )
    {
    clean = false;
    break;
    }
    } // end for
    if ( clean )
    {
    Matcher m9 = p9.matcher(email);
    if ( m9.matches() )
    {
    if ( officialTLDs.contains(tld) ) return 9;
    else if ( thisCountry.equals(tld) ) return 8;
    else if ( nationalTLDs.contains(tld) ) return 7;
    else if ( rareTLDs.contains(tld) ) return 6;
    else return 3; /* unknown tld */
    }
    // allow dots in name
    Matcher m5 = p5.matcher(email);
    if ( m5.matches() )
    {
    if ( officialTLDs.contains(tld) ) return 5;
    else if ( thisCountry.equals(tld) ) return 5;
    else if ( nationalTLDs.contains(tld) ) return 5;
    else if ( rareTLDs.contains(tld) ) return 5;
    else return 2; /* unknown tld */
    }

    // IP
    Matcher m4 = p4.matcher(email);
    if ( m4.matches() ) return 4; /* can't tell TLD */

    // allow even lead/trail dots in name, except at start of
    domain
    Matcher m3 = p3.matcher(email);
    if ( m3.matches() )
    {
    if ( officialTLDs.contains(tld) ) return 3;
    else if ( thisCountry.equals(tld) ) return 3;
    else if ( nationalTLDs.contains(tld) ) return 3;
    else if ( rareTLDs.contains(tld) ) return 3;
    else return 2; /* unknown domain */
    }
    } // end if clean
    }
    // allow even unclean addresses, and addresses without a TLD to
    have a whack at passing RFC:822
    try
    {

    /* see if InternetAddress likes it, it follows RFC:822. It
    will names without domains though. */
    InternetAddress.parse(email, true /* strict */);
    // it liked it, no exception happened. Seems very sloppy.
    return 1;
    }
    catch ( AddressException e )
    {
    // it did not like it
    return 0;
    }
    }

    // allow _ - in name, lead and trailing ones are filtered later, no
    +.
    static Pattern p9 =
    Pattern.compile("[a-z0-9\\-_]++@[a-z0-9\\-_]++(\\.[a-z0-9\\-_]++)++");

    // to split into fields
    static Pattern splitter = Pattern.compile("[@\\.]");

    // to allow - _ dots in name, no +
    static Pattern p5 =
    Pattern.compile("[a-z0-9\\-_]++(\\.[a-z0-9\\-_]++)*@[a-z0-9\\-_]++(\\.[a-z0-9\\-_]++)++");

    // IP style names, no +
    static Pattern p4 =
    Pattern.compile("[a-z0-9\\-_]++(\\.[a-z0-9\\-_]++)*@\\[([0-9]{1,3}\\.){3}[0-9]{1,3}\\]");

    // allow dots anywhere, but not at start of domain name, no +
    static Pattern p3 =
    Pattern.compile("[a-z0-9\\-_\\.]++@[a-z0-9\\-_]++(\\.[a-z0-9\\-_]++)++");

    /**
    * build a HashSet from a array of String literals.
    *
    * @param list array of strings
    * @return HashSet you can use to test if a string is in the set.
    */
    static HashSet hmaker(String[] list)
    {
    HashSet map = new HashSet(Math.max((int) (list.length/.75f) + 1,
    16));
    for ( int i=0; i<list.length; i++ )
    {
    map.add(list);
    }
    return map;
    }

    static final String thisCountry =
    Locale.getDefault().getCountry().toLowerCase();

    static final HashSet officialTLDs =
    hmaker(new String[]
    {
    "aero",
    "biz",
    "coop",
    "com",
    "edu",
    "gov",
    "info",
    "mil",
    "museum",
    "name",
    "net",
    "org",
    "pro",
    });

    static final HashSet rareTLDs =
    hmaker(new String[]
    {
    "cam",
    "mp3",
    "agent",
    "art",
    "arts",
    "asia",
    "auction",
    "aus",
    "bank",
    "cam",
    "chat",
    "church",
    "club",
    "corp",
    "dds",
    "design",
    "dns2go",
    "e",
    "email",
    "exp",
    "fam",
    "family",
    "faq",
    "fed",
    "film",
    "firm",
    "free",
    "fun",
    "g",
    "game",
    "games",
    "gay",
    "ger",
    "globe",
    "gmbh",
    "golf",
    "gov",
    "help",
    "hola",
    "i",
    "inc",
    "int",
    "jpn",
    "k12",
    "kids",
    "law",
    "learn",
    "llb",
    "llc",
    "llp",
    "lnx",
    "love",
    "ltd",
    "mag",
    "mail",
    "med",
    "media",
    "mp3",
    "netz",
    "nic",
    "nom",
    "npo",
    "per",
    "pol",
    "prices",
    "radio",
    "rsc",
    "school",
    "scifi",
    "sea",
    "service",
    "sex",
    "shop",
    "sky",
    "soc",
    "space",
    "sport",
    "tech",
    "tour",
    "travel",
    "usvi",
    "video",
    "web",
    "wine",
    "wir",
    "wired",
    "zine",
    "zoo",
    });

    static final HashSet nationalTLDs =
    hmaker(new String[]
    {
    "ac",
    "ad",
    "ae",
    "af",
    "ag",
    "ai",
    "al",
    "am",
    "an",
    "ao",
    "aq",
    "ar",
    "as",
    "at",
    "au",
    "aw",
    "az",
    "ba",
    "bb",
    "bd",
    "be",
    "bf",
    "bg",
    "bh",
    "bi",
    "bj",
    "bm",
    "bn",
    "bo",
    "br",
    "bs",
    "bt",
    "bv",
    "bw",
    "by",
    "bz",
    "ca",
    "cc",
    "cd",
    "cf",
    "cg",
    "ch",
    "ci",
    "ck",
    "cl",
    "cm",
    "cn",
    "co",
    "cr",
    "cu",
    "cv",
    "cx",
    "cy",
    "cz",
    "de",
    "dj",
    "dk",
    "dm",
    "do",
    "dz",
    "ec",
    "ee",
    "eg",
    "eh",
    "er",
    "es",
    "et",
    "fi",
    "fj",
    "fk",
    "fm",
    "fo",
    "fr",
    "fx",
    "ga",
    "gb",
    "gd",
    "ge",
    "gf",
    "gg",
    "gh",
    "gi",
    "gl",
    "gm",
    "gn",
    "gp",
    "gq",
    "gr",
    "gs",
    "gt",
    "gu",
    "gw",
    "gy",
    "hk",
    "hm",
    "hn",
    "hr",
    "ht",
    "hu",
    "id",
    "ie",
    "il",
    "im",
    "in",
    "io",
    "iq",
    "ir",
    "is",
    "it",
    "je",
    "jm",
    "jo",
    "jp",
    "ke",
    "kg",
    "kh",
    "ki",
    "km",
    "kn",
    "kp",
    "kr",
    "kw",
    "ky",
    "kz",
    "la",
    "lb",
    "lc",
    "li",
    "lk",
    "lr",
    "ls",
    "lt",
    "lu",
    "lv",
    "ly",
    "ma",
    "mc",
    "md",
    "mg",
    "mh",
    "mk",
    "ml",
    "mm",
    "mn",
    "mo",
    "mp",
    "mq",
    "mr",
    "ms",
    "mt",
    "mu",
    "mv",
    "mw",
    "mx",
    "my",
    "mz",
    "na",
    "nc",
    "ne",
    "nf",
    "ng",
    "ni",
    "nl",
    "no",
    "np",
    "nr",
    "nu",
    "nz",
    "om",
    "pa",
    "pe",
    "pf",
    "pg",
    "ph",
    "pk",
    "pl",
    "pm",
    "pn",
    "pr",
    "ps",
    "pt",
    "pw",
    "py",
    "qa",
    "re",
    "ro",
    "ru",
    "rw",
    "sa",
    "sb",
    "sc",
    "sd",
    "se",
    "sg",
    "sh",
    "si",
    "sj",
    "sk",
    "sl",
    "sm",
    "sn",
    "so",
    "sr",
    "st",
    "sv",
    "sy",
    "sz",
    "tc",
    "td",
    "tf",
    "tg",
    "th",
    "tj",
    "tk",
    "tm",
    "tn",
    "to",
    "tp",
    "tr",
    "tt",
    "tv",
    "tw",
    "tz",
    "ua",
    "ug",
    "uk",
    "um",
    "us",
    "uy",
    "uz",
    "va",
    "vc",
    "ve",
    "vg",
    "vi",
    "vn",
    "vu",
    "wf",
    "ws",
    "ye",
    "yt",
    "yu",
    "za",
    "zm",
    "zw",
    });

    static final HashSet badTLDs =
    hmaker(new String[]
    {
    "invalid",
    "nowhere",
    "noone",
    });


    public static void main (String[] args)
    {
    System.out.println(howValid("kellizer@.hotmail.com"));
    }
    } // end class EmailSyntaxValidator


    --
    Canadian Mind Products, Roedy Green.
    Coaching, problem solving, economical contract programming.
    See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
    Roedy Green, Nov 9, 2003
    #5
  6. David Segall

    David Segall Guest

    Harald Hein <> wrote:

    >"David Segall" wrote:
    >
    >> I would like to check incoming mail to determine if the sender's
    >> address is reachable without sending an email to the address. Is this
    >> possible using JavaMail?

    >
    >It is not possible with ANY mailing package, due to the limitations of
    >the SMTP protocol.
    >
    >You could check if there is an MX record for the sender domain - but
    >this would not guarantee that there is such a user at the domain. You
    >could use VRFY to check if a user exists - but you would have to
    >connect to the domain's mail server, and many admins have turned this
    >feature off because of privacy issues and abuse by spammers for
    >address verification.
    >
    >> In case it has some bearing on the answer the application is a spam
    >> filter that treats as spam anything that cannot be replied to.

    >
    >You are building a spam filter, but have no idea how e-mail works? Bad
    >idea.

    Thanks for the information and for your caution. I will probably
    ignore the caution because I find writing some code the best way of
    learning about such a topic.
    David Segall, Nov 10, 2003
    #6
  7. David Segall

    GaryM Guest

    David Segall <> wrote in
    news::

    > Thanks for the information and for your caution. I will probably
    > ignore the caution because I find writing some code the best way of
    > learning about such a topic.
    >


    See also http://www.paulgraham.com/spam.html for some excellent advice
    on spam filtering.
    GaryM, Nov 10, 2003
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Guest
    Replies:
    4
    Views:
    1,311
    Guest
    Nov 26, 2004
  2. news
    Replies:
    14
    Views:
    7,706
    GaryM
    Oct 24, 2003
  3. vsoler
    Replies:
    6
    Views:
    472
    Mark Tolonen
    Aug 25, 2010
  4. Larry
    Replies:
    0
    Views:
    182
    Larry
    May 12, 2004
  5. charlie_M
    Replies:
    6
    Views:
    82
    Thomas 'PointedEars' Lahn
    May 19, 2005
Loading...

Share This Page