[OT] E-mail syntax

  • Thread starter Thomas Mlynarczyk
  • Start date
T

Thomas Mlynarczyk

Hi,

I apologize if this is off-topic here, but can anyone tell me where to find
information about the correct syntax for e-mail addresses. I know and have
read RFC2822, but it seems that further restrictions apply. And except for
the RFCs, Google didn't bring up any really qualified information.

Greetings,
Thomas
 
P

Philip Ronan

Thomas said:
Hi,

I apologize if this is off-topic here, but can anyone tell me where to find
information about the correct syntax for e-mail addresses. I know and have
read RFC2822, but it seems that further restrictions apply. And except for
the RFCs, Google didn't bring up any really qualified information.

What else do you need?

RFC 1034 (I think) describes further restrictions on the characters that can
be used in domain names, but I assume you know that already if you're
familiar with RFC 2822.

Phil
 
T

Thomas Mlynarczyk

Also sprach Philip Ronan:

[RFC2822]
What else do you need?
RFC 1034 (I think) describes further restrictions on the characters
that can be used in domain names, but I assume you know that already
if you're familiar with RFC 2822.

Yes, but that's just where my confusion starts. If RFC1034/1035 imposes
further restrictions on the domain part, then why does RFC2822 allow a
"richer" syntax in the first place? (Besides, the formulation in RFC1035 is
"The following syntax will result in fewer problems with many applications
that use domain names...", which sounds like a non-committing recommendation
at best). Then, unless I have overlooked something, RFC2822 does not impose
any size limits, while RFC1034/1035 does, RFC2822 does allow a domain
without any dots, while RFC2821 requires at least one dot. How am I to cope
with all those contradictory specifications?
 
P

Philip Ronan

Thomas said:
Yes, but that's just where my confusion starts. If RFC1034/1035 imposes
further restrictions on the domain part, then why does RFC2822 allow a
"richer" syntax in the first place?

I suppose that's because there's no point in having valid domain names
defined in multiple documents. A valid email address should conform to both.
Then, unless I have overlooked something, RFC2822 does not impose
any size limits, while RFC1034/1035 does, RFC2822 does allow a domain
without any dots, while RFC2821 requires at least one dot. How am I to cope
with all those contradictory specifications?

OK, I'm not that familiar with RFC1034/1035, but I assume they only relate
to the part of an email address that comes after the "@". A dot would be
essential because there are no web domains that consist of a single TLD
(like "com" or "uk").

AFAIK, there is no limit on the length of the part before the "@"

Phil
 
T

Thomas Mlynarczyk

Also sprach Philip Ronan:
I suppose that's because there's no point in having valid domain names
defined in multiple documents.

That makes sense, but then why not a simple reference to the RFC defining
domain names?
A valid email address should conform to both.

So I must combine all the different specs and use the "greatest common
denominator"?
OK, I'm not that familiar with RFC1034/1035, but I assume they only
relate to the part of an email address that comes after the "@". A
dot would be essential because there are no web domains that consist
of a single TLD (like "com" or "uk").

What about "localhost"? But it seems, that RFC2821 is contradictory within
itself. Section 2.3.5 says: "A domain (or domain name) consists of one or
more dot-separated components." Thus, "my-domain" would be valid. But - same
RFC(!) - section 4.1.2 defines

Domain = (sub-domain 1*("." sub-domain)) / address-literal

In other words, at least one dot is required. Which section am I to believe?
AFAIK, there is no limit on the length of the part before the "@"

RFC2821, section 4.5.3.1 says there is a limit of 64 characters. But then, I
think it is RFC1035 which explains that mail addresses are converted to
domain names by making the part before the @ another "subdomain", and
because of the way such a domain name is encoded, the limit should be 63
characters (not counting quotes or escape-backslashes). So, again, what am I
to believe? It's all so confusing :-(
 
P

Philip Ronan

Thomas said:
Also sprach Philip Ronan:


That makes sense, but then why not a simple reference to the RFC defining
domain names?

I agree I could be made clearer
So I must combine all the different specs and use the "greatest common
denominator"?

That sounds like the safest way of doing things, yes.
What about "localhost"? But it seems, that RFC2821 is contradictory within
itself. Section 2.3.5 says: "A domain (or domain name) consists of one or
more dot-separated components." Thus, "my-domain" would be valid. But - same
RFC(!) - section 4.1.2 defines

Domain = (sub-domain 1*("." sub-domain)) / address-literal

In other words, at least one dot is required. Which section am I to believe?

My head is starting to hurt now :-(
You should submit a comment about that.
RFC2821, section 4.5.3.1 says there is a limit of 64 characters.

It also says that longer strings may sometimes be required. So if the
local-part contains more than 64 characters it could still be syntactically
valid even it it isn't accepted everywhere.
But then, I
think it is RFC1035 which explains that mail addresses are converted to
domain names by making the part before the @ another "subdomain", and
because of the way such a domain name is encoded, the limit should be 63
characters (not counting quotes or escape-backslashes). So, again, what am I
to believe? It's all so confusing :-(

Now I'm confused too!
 
T

Toby Inkster

Thomas said:
What about "localhost"? But it seems, that RFC2821 is contradictory within
itself.

Many RFCs are.

From RFC 791 (The Internet Protocol), but applicable to pretty much any
Internet Standard:
| In general, an implementation must be conservative in its sending
| behavior, and liberal in its receiving behavior.

Bear this in mind and you'll know in your heart which address formats you
should send and which you should accept. :)
 
T

Thomas Mlynarczyk

Also sprach Toby Inkster:
Many RFCs are.

<astonishment level="utter">But aren't they the stuff standards are made
from? (RFC2821 being a standard itelf, if I'm not wrong.) So shouldn't they
be as "perfect" as possible? said:
From RFC 791 (The Internet Protocol), but applicable to pretty much
any Internet Standard:
Bear this in mind and you'll know in your heart which address formats
you should send and which you should accept. :)

So if I want a script to syntactically validate an e-mail address entered by
a user, my only reference is to be RFC2822 including all "obsolete parts"?
Thus permitting addresses like $@$ or (@(@)@)a@[@@@]? Or would it be
reasonable to "restrict" the domain part further? (I guess I can ignore the
comment syntax for the purpose mentioned above, as it cannot be "regexed"
due to the possible nesting.)
 
T

Thomas Mlynarczyk

Also sprach Philip Ronan:
That sounds like the safest way of doing things, yes.

Makes things awfully complicated, though. Especially, as I cannot be sure if
all those other restrictions would indeed *always* apply.

[RFC2821 is contradictory within itself]
My head is starting to hurt now :-(
You should submit a comment about that.

This RFC is now more than three and a half years old. I can't quite believe
being the first to see an error in it. I'd rather assume there is a "natural
explanation" - but where to find it?
Now I'm confused too!

Hey, this was supposed to be the other way round, so that we would now both
be enlightened and not both confused! :-(
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,733
Messages
2,569,440
Members
44,830
Latest member
ZADIva7383

Latest Threads

Top