package emile

  1. Overview
  2. Docs
Legend:
Library
Module
Module type
Parameter
Class
Class type

This is an aggregation of rules used to parse an e-mail address. The goal of this documentation is to show relations between RFCs, updates, and final description of parts needed to parse an e-mail address.

Obviously, this part is most a copy-paste from RFCs to explain what we implement. And for a client, it's a boring and indigestible (but needed) work. We provide implementations only for people know what they really need — and avoid duplicate code in some ways.

But the biggest advise about this module is just to ignore it and move on — like what I really want when I wrote this documentation.

val is_vchar : char -> bool

From RFC5234 (used in RFC5322).

VCHAR = %x21-7E ; visible (printing) characters
val is_obs_no_ws_ctl : char -> bool

From RFC5322.

obs-NO-WS-CTL = %d1-8 /   ; US-ASCII control
                %d11 /    ;  characters that do not
                %d12 /    ;  include the carriage
                %d14-31 / ;  return, line feed, and
                %d127     ;  white space characters
val is_ctext : char -> bool

From RFC822.

ctext = <any CHAR excluding "(",          ; => may be folded
         ")", BACKSLASH & CR, & including
         linear-white-space>

From RFC1522 (occurrences).

5. Use of encoded-words in message headers

  (2) An encoded-word may appear within a comment delimited by "(" and
      ")", i.e., wherever a "ctext" is allowed.  More precisely, the
      RFC 822 ABNF definition for "comment" is amended as follows:

      comment = "(" *(ctext / quoted-pair / comment / encoded-word) ")"

      A "Q"-encoded encoded-word which appears in a comment MUST NOT
      contain the characters "(", ")" or DQUOTE encoded-word that
      appears in a "comment" MUST be separated from any adjacent
      encoded-word or "ctext" by linear-white-space.

7. Conformance

  A mail reading program claiming compliance with this specification
  must be able to distinguish encoded-words from "text", "ctext", or
  "word"s, according to the rules in section 6, anytime they appear in
  appropriate places in message headers.  It must support both the "B"
  and "Q" encodings for any character set which it supports.  The
  program must be able to display the unencoded text if the character

From RFC2047 § Appendix.

+ clarification: an 'encoded-word' may appear immediately following
  the initial "(" or immediately before the final ")" that delimits a
  comment, not just adjacent to "(" and ")" *within* *ctext.

From RFC2822.

ctext = NO-WS-CTL / ; Non white space controls
        %d33-39 /   ; The rest of the US-ASCII
        %d42-91 /   ;  characters not including "(",
        %d93-126    ;  ")", or BACKSLASH

From RFC5322.

ctext     = %d33-39 /     ; Printable US-ASCII
            %d42-91 /     ;  characters not including
            %d93-126 /    ;  "(", ")", or BACKSLASH
            obs-ctext
obs-ctext = obs-NO-WS-CTL

Update from RFC 2822
+ Removed NO-WS-CTL from ctext

From RFC5335.

ctext          =/ UTF8-xtra-char
UTF8-xtra-char =  UTF8-2 / UTF8-3 / UTF8-4
UTF8-2         =  %xC2-DF UTF8-tail
UTF8-3         =  %xE0 %xA0-BF UTF8-tail /
                  %xE1-EC 2(UTF8-tail) /
                  %xED %x80-9F UTF8-tail /
                  %xEE-EF 2(UTF8-tail)
UTF8-4         =  %xF0 %x90-BF 2( UTF8-tail ) /
                  %xF1-F3 3( UTF8-tail ) /
                  %xF4 %x80-8F 2( UTF8-tail )
UTF8-tail      =  %x80-BF

From RFC6532.

ctext =/ UTF8-non-ascii

Note about UTF-8, the process is out of this scope where we check only one byte here. Note about compliance with RFC1522, it's out of scope where we check only one byte here.

val is_qtext : char -> bool

From RFC822.

qtext = <any CHAR excepting DQUOTE,    ; => may be folded
         BACKSLASH & CR, and including
         linear-white-space>

From RFC2822.

qtext = NO-WS-CTL / ; Non white space controls
        %d33 /      ; The rest of the US-ASCII
        %d35-91 /   ;  characters not including BACKSLASH
        %d93-126    ;  or the quote character

From RFC5322.

qtext     = %d33 /        ; Printable US-ASCII
            %d35-91 /     ;  characters not including
            %d93-126 /    ;  BACKSLASH or the quote character
            obs-qtext
obs-qtext = obs-NO-WS-CTL

From RFC5335 (see is_ctext about UTF-xtra-char).

utf8-qtext = qtext / UTF8-xtra-char

From RFC6532.

qtext =/ UTF8-non-ascii

Note about UTF-8, the process is out of this scope where we check only one byte here.

val is_atext : char -> bool

The ABNF of atext is not explicit from RFC822 but the relic could be find here.

atom = 1*<any CHAR except specials, SPACE and CTLs>

From RFC2822.

atext = ALPHA / DIGIT / ; Any character except controls,
        "!" / "#" /     ;  SP, and specials.
        "$" / "%" /     ;  Used for atoms
        "&" / "'" /
        "*" / "+" /
        "-" / "/" /
        "=" / "?" /
        "^" / "_" /
        "`" / "{" /
        "|" / "}" /
        "~"

From RFC5322.

atext = ALPHA / DIGIT / ; Printable US-ASCII
        "!" / "#" /     ;  characters not including
        "$" / "%" /     ;  specials.  Used for atoms.
        "&" / "'" /
        "*" / "+" /
        "-" / "/" /
        "=" / "?" /
        "^" / "_" /
        "`" / "{" /
        "|" / "}" /
        "~"

From RFC535 (see is_ctext about UTF-xtra-char).

utf8-atext = ALPHA / DIGIT /
             "!" / "#" /     ; Any character except
             "$" / "%" /     ; controls, SP, and specials.
             "&" / "'" /     ; Used for atoms.
             "*" / "+" /
             "-" / "/" /
             "=" / "?" /
             "^" / "_" /
             "`" / "{" /
             "|" / "}" /
             "~" /
             UTF8-xtra-char

From RFC6532.

atext =/ UTF8-non-ascii

Note about, UTF-8, the process is out of this scope where we check only byte here.

val is_wsp : char -> bool

From RFC822.

LWSP-char = SPACE / HTAB ; semantics = SPACE

From RFC2882 and RFC5322, we did not find any occurrence of LWSP-char. It replaced by WSP (available on RFC5234).

val is_quoted_pair : char -> bool

From RFC822.

quoted-pair = BACKSLASH CHAR ; may quote any char
CHAR is case-sensitive

From RFC2822.

quoted-pair = (BACKSLASH text) / obs-qp
text        = %d1-9 /                     ; Characters excluding CR and LF
              %d11 /
              %d12 /
              %d14-127 /
              obs-text
obs-text    = *LF *CR *(obs-char *LF *CR)
obs-char    = %d0-9 / %d11 /              ; %d0-127 except CR and
              %d12 / %d14-127             ;  LF
obs-qp      = BACKSLASH (%d0-127)

From RFC5322.

quoted-pair = (BACKSLASH (VCHAR / WSP)) / obs-qp
obs-qp      = BACKSLASH (%d0 / obs-NO-WS-CTL / LF / CR)

From RFC5335 (see is_ctext about UTF-xtra-char).

utf8-text        = %d1-9 /         ; all UTF-8 characters except
                   %d11-12 /       ; US-ASCII NUL, CR, and LF
                   %d14-127 /
                   UTF8-xtra-char
utf8-quoted-pair = (BACKSLASH utf8-text) / obs-qp

Note this function is fun _chr -> true. Note RFC5322 (last version of e-mail) does not mention an update from RFC2822. RFC6532 does not mention an update of quoted-pair. This implemention follow RFC5322 without unicode support.

val is_dtext : char -> bool

From RFC822.

dtext = <any CHAR excluding "[",          ; => may be folded
         "]", BACKSLASH & CR, & including
         linear-white-space>

From RFC2822.

dtext = NO-WS-CTL / ; Non white space controls

        %d33-90 /   ; The rest of the US-ASCII
        %d94-126    ;  characters not including "[",
                    ;  "]", or BACKSLASH

From RFC5322.

+ Removed NO-WS-CTL from dtext

dtext     = %d33-90 /                   ; Printable US-ASCII
            %d94-126 /                  ;  characters not including
            obs-dtext                   ;  "[", "]", or BACKSLASH
obs-dtext = obs-NO-WS-CTL / quoted-pair

Note quoted-pair can not be processed here where we handle only one byte.

val quoted_pair : char Angstrom.t
val fws : (bool * bool * bool) Angstrom.t

From RFC822.

Each header field can be viewed as a single, logical  line  of
ASCII  characters,  comprising  a field-name and a field-body.
For convenience, the field-body  portion  of  this  conceptual
entity  can be split into a multiple-line representation; this
is called "folding".  The general rule is that wherever  there
may  be  linear-white-space  (NOT  simply  LWSP-chars), a CRLF
immediately followed by AT LEAST one LWSP-char may instead  be
inserted.  Thus, the single line

    To:  "Joe & J. Harvey" <ddd @Org>, JJV @ BBN

can be represented as:

    To:  "Joe & J. Harvey" <ddd @ Org>,
            JJV@BBN

and

    To:  "Joe & J. Harvey"
                    <ddd@ Org>, JJV
     @BBN

and

    To:  "Joe &
     J. Harvey" <ddd @ Org>, JJV @ BBN

     The process of moving  from  this  folded   multiple-line
representation  of a header field to its single line represen-
tation is called "unfolding".  Unfolding  is  accomplished  by
regarding   CRLF   immediately  followed  by  a  LWSP-char  as
equivalent to the LWSP-char.

Note:  While the standard  permits  folding  wherever  linear-
       white-space is permitted, it is recommended that struc-
       tured fields, such as those containing addresses, limit
       folding  to higher-level syntactic breaks.  For address
       fields, it  is  recommended  that  such  folding  occur
       between addresses, after the separating comma.

From RFC2822 § 3.2.3 & RFC2822 § 4.2.

White space characters, including white space used in folding
(described in section 2.2.3), may appear between many elements in
header field bodies.  Also, strings of characters that are treated as
comments may be included in structured field bodies as characters
enclosed in parentheses.  The following defines the folding white
space (FWS) and comment constructs.

Strings of characters enclosed in parentheses are considered comments
so long as they do not appear within a "quoted-string", as defined in
section 3.2.5.  Comments may nest.

There are several places in this standard where comments and FWS may
be freely inserted.  To accommodate that syntax, an additional token
for "CFWS" is defined for places where comments and/or FWS can occur.
However, where CFWS occurs in this standard, it MUST NOT be inserted
in such a way that any line of a folded header field is made up
entirely of WSP characters and nothing else.

FWS = ([*WSP CRLF] 1*WSP) /   ; Folding white space
      obs-FWS

In the obsolete syntax, any amount of folding white space MAY be
inserted where the obs-FWS rule is allowed.  This creates the
possibility of having two consecutive "folds" in a line, and
therefore the possibility that a line which makes up a folded header
field could be composed entirely of white space.

obs-FWS = 1*WSP *(CRLF 1*WSP)

From RFC5322 § 3.2.2 & RFC322 § 4.2.

White space characters, including white space used in folding
(described in section 2.2.3), may appear between many elements in
header field bodies.  Also, strings of characters that are treated as
comments may be included in structured field bodies as characters
enclosed in parentheses.  The following defines the folding white
space (FWS) and comment constructs.

Strings of characters enclosed in parentheses are considered comments
so long as they do not appear within a "quoted-string", as defined in
section 3.2.4.  Comments may nest.

There are several places in this specification where comments and FWS
may be freely inserted.  To accommodate that syntax, an additional
token for "CFWS" is defined for places where comments and/or FWS can
occur.  However, where CFWS occurs in this specification, it MUST NOT
be inserted in such a way that any line of a folded header field is
made up entirely of WSP characters and nothing else.

FWS = ([*WSP CRLF] 1*WSP) / obs-FWS ; Folding white space

In the obsolete syntax, any amount of folding white space MAY be
inserted where the obs-FWS rule is allowed.  This creates the
possibility of having two consecutive "folds" in a line, and
therefore the possibility that a line which makes up a folded header
field could be composed entirely of white space.

obs-FWS = 1*WSP *(CRLF 1*WSP)
val comment : unit Angstrom.t

From RFC822.

comment = "(" *(ctext / quoted-pair / comment) ")"

From RFC2822.

ccontent = ctext / quoted-pair / comment
comment  = "(" *([FWS] ccontent) [FWS] ")"

From RFC5322.

ccontent = ctext / quoted-pair / comment
comment  = "(" *([FWS] ccontent) [FWS] ")"
val cfws : unit Angstrom.t

From RFC822, see fws and obs_fws.

From RFC2822.

CFWS = *([FWS] comment) (([FWS] comment) / FWS)

From RFC5322.

CFWS = (1*([FWS] comment) [FWS]) / FWS

Update from RFC 2822:
+ Simplified CFWS syntax.
val qcontent : string Angstrom.t

From RFC822.

quoted-string = DQUOTE *(qtext/quoted-pair) DQUOTE ; Regular qtext or
                                                   ;   quoted chars.

From RFC2822.

qcontent = qtext / quoted-pair

From RFC5322.

qcontent = qtext / quoted-pair

From RFC5355.

utf8-qcontent = utf8-qtext / utf8-quoted-pair
qcontent = utf8-qcontent
val quoted_string : string Angstrom.t

From RFC822.

quoted-string = DQUOTE *(qtext/quoted-pair) DQUOTE ; Regular qtext or
                                                   ;   quoted chars.

From RFC2047.

+ An 'encoded-word' MUST NOT appear within a 'quoted-string'

From RFC2822.

quoted-string = [CFWS]
                DQUOTE *([FWS] qcontent) [FWS] DQUOTE
                [CFWS]

A quoted-string is treated as a unit.  That is, quoted-string is
identical to atom, semantically.  Since a quoted-string is allowed to
contain FWS, folding is permitted.  Also note that since quoted-pair
is allowed in a quoted-string, the quote and backslash characters may
appear in a quoted-string so long as they appear as a quoted-pair.

Semantically, neither the optional CFWS outside of the quote
characters nor the quote characters themselves are part of the
quoted-string; the quoted-string is what is contained between the two
quote characters.  As stated earlier, the BACKSLASH in any quoted-pair
and the CRLF in any FWS/CFWS that appears within the quoted-string are
semantically "invisible" and therefore not part of the quoted-string
either.

Note in other words, space(s) in FWS are "visible" between DQUOTE.

From RFC5322.

quoted-string = [CFWS]
                DQUOTE *([FWS] qcontent) [FWS] DQUOTE
                [CFWS]

Note currenlty, this implementation has a bug about multiple spaces in quoted-string. We need to update fws to count how many space(s) we skip.

val atom : string Angstrom.t

From RFC822.

atom = 1*<any CHAR except specials, SPACE and CTLs>

Difference from RFC 733:
- Atoms may not contain SPACE.

From RFC2822.

atom = [CFWS] 1*atext [CFWS]

From RFC5322.

atom = [CFWS] 1*atext [CFWS]

From RFC5335.

utf8-atom = [CFWS] 1*utf8-atext [CFWS]
val word : word Angstrom.t

From RFC822.

word = atom / quoted-string

From RFC2822.

word = atom / quoted-string

From RFC5322.

word = atom / quoted-string
val dot_atom_text : string list Angstrom.t

From RFC2822.

dot-atom-text = 1*atext *("." 1*atext)

From RFC5322.

dot-atom-text = 1*atext *("." 1*atext)
val dot_atom : string list Angstrom.t

From RFC2822.

dot-atom = [CFWS] dot-atom-text [CFWS]

From RFC5322.

dot-atom = [CFWS] dot-atom-text [CFWS]
val local_part : local Angstrom.t

From RFC822.

local-part = word *("." word) ; uninterpreted
                              ; case-preserved

The local-part of an  addr-spec  in  a  mailbox  specification
(i.e.,  the  host's  name for the mailbox) is understood to be
whatever the receiving mail protocol server allows.  For exam-
ple,  some systems do not understand mailbox references of the
form "P. D. Q. Bach", but others do.

This specification treats periods (".") as lexical separators.
Hence,  their  presence  in  local-parts which are not quoted-
strings, is detected.   However,  such  occurrences  carry  NO
semantics.  That is, if a local-part has periods within it, an
address parser will divide the local-part into several tokens,
but  the  sequence  of  tokens will be treated as one uninter-
preted unit.  The sequence  will  be  re-assembled,  when  the
address is passed outside of the system such as to a mail pro-
tocol service.

For example, the address:

                   First.Last@Registry.Org

is legal and does not require the local-part to be  surrounded
with  quotation-marks.   (However,  "First  Last" DOES require
quoting.)  The local-part of the address, when passed  outside
of  the  mail  system,  within  the  Registry.Org  domain,  is
"First.Last", again without quotation marks.

From RFC2822 § 3.4.1 & RFC2822 § 4.4.

local-part = dot-atom / quoted-string / obs-local-part
obs-local-part = word *("." word)

The local-part portion is a domain dependent string.  In addresses,
it is simply interpreted on the particular host as a name of a
particular mailbox.

Update:
+ CFWS within local-parts and domains not allowed.*

From RFC5322 § 3.4.1 & RFC5322 § 4.4.

local-part = dot-atom / quoted-string / obs-local-part
obs-local-part = word *("." word)
val obs_local_part : local Angstrom.t
val domain_literal : string Angstrom.t

From RFC822.

domain-literal = "[" *(dtext / quoted-pair) "]"

o   Square brackets ("[" and "]") are used to indicate the
    presence  of  a  domain-literal, which the appropriate
    name-domain  is  to  use  directly,  bypassing  normal
    name-resolution mechanisms.

Domain-literals which refer to domains within the ARPA  Inter-
net  specify  32-bit  Internet addresses, in four 8-bit fields
noted in decimal, as described in Request for  Comments  #820,
"Assigned Numbers."  For example:

                         [10.0.3.19]

Note:  THE USE OF DOMAIN-LITERALS IS STRONGLY DISCOURAGED.  It
       is  permitted  only  as  a means of bypassing temporary
       system limitations, such as name tables which  are  not
       complete.

From RFC2822.

domain-literal = [CFWS] "[" *([FWS] dcontent) [FWS] "]" [CFWS]

From RFC5322.

domain-literal = [CFWS] "[" *([FWS] dtext) [FWS] "]" [CFWS]
val obs_domain : string list Angstrom.t
val domain : domain Angstrom.t

From RFC822 § 6.1, RFC822 § 6.2.1, RFC822 § 6.2.2 & RFC822 § 6.2.3.

domain = sub-domain *("." sub-domain)
sub-domain = domain-ref / domain-literal
domain-ref = atom                        ; symbolic reference

6.2.1.  DOMAINS

A name-domain is a set of registered (mail)  names.   A  name-
domain  specification  resolves  to  a subordinate name-domain
specification  or  to  a  terminal  domain-dependent   string.
Hence,  domain  specification  is  extensible,  permitting any
number of registration levels.

Name-domains model a global, logical, hierarchical  addressing
scheme.   The  model is logical, in that an address specifica-
tion is related to name registration and  is  not  necessarily
tied  to  transmission  path.   The  model's  hierarchy  is  a
directed graph, called an in-tree, such that there is a single
path  from  the root of the tree to any node in the hierarchy.
If more than one path actually exists, they are considered  to
be different addresses.

The root node is common to all addresses; consequently, it  is
not  referenced.   Its  children  constitute "top-level" name-
domains.  Usually, a service has access to its own full domain
specification and to the names of all top-level name-domains.

The "top" of the domain addressing hierarchy -- a child of the
root  --  is  indicated  by  the right-most field, in a domain
specification.  Its child is specified to the left, its  child
to the left, and so on.

Some groups provide formal registration services;  these  con-
stitute   name-domains   that  are  independent  logically  of
specific machines.  In addition, networks and machines  impli-
citly  compose name-domains, since their membership usually is
registered in name tables.

In the case of formal registration, an organization implements
a  (distributed)  data base which provides an address-to-route
mapping service for addresses of the form:

                 person@registry.organization

Note that "organization" is a logical  entity,  separate  from
any particular communication network.

A mechanism for accessing "organization" is universally avail-
able.   That mechanism, in turn, seeks an instantiation of the
registry; its location is not indicated in the address specif-
ication.   It  is assumed that the system which operates under
the name "organization" knows how to find a subordinate regis-
try.  The registry will then use the "person" string to deter-
mine where to send the mail specification.

The latter,  network-oriented  case  permits  simple,  direct,
attachment-related address specification, such as:

                      user@host.network

Once the network is accessed, it is expected  that  a  message
will  go  directly  to the host and that the host will resolve
the user name, placing the message in the user's mailbox.

6.2.2.  ABBREVIATED DOMAIN SPECIFICATION

Since any number of  levels  is  possible  within  the  domain
hierarchy,  specification  of  a  fully  qualified address can
become inconvenient.  This standard permits abbreviated domain
specification, in a special case:

    For the address of  the  sender,  call  the  left-most
    sub-domain  Level  N.   In a header address, if all of
    the sub-domains above (i.e., to the right of) Level  N
    are  the same as those of the sender, then they do not
    have to appear in the specification.   Otherwise,  the
    address must be fully qualified.

    This feature is subject  to  approval  by  local  sub-
    domains.   Individual  sub-domains  may  require their
    member systems, which originate mail, to provide  full
    domain  specification only.  When permitted, abbrevia-
    tions may be present  only  while  the  message  stays
    within the sub-domain of the sender.

    Use of this mechanism requires the sender's sub-domain
    to reserve the names of all top-level domains, so that
    full specifications can be distinguished from abbrevi-
    ated specifications.

For example, if a sender's address is:

         sender@registry-A.registry-1.organization-X

and one recipient's address is:

        recipient@registry-B.registry-1.organization-X

and another's is:

        recipient@registry-C.registry-2.organization-X

then ".registry-1.organization-X" need not be specified in the
the  message,  but  "registry-C.registry-2"  DOES  have  to be
specified.  That is, the first two addresses may  be  abbrevi-
ated, but the third address must be fully specified.

When a message crosses a domain boundary, all  addresses  must
be  specified  in  the  full format, ending with the top-level
name-domain in the right-most field.  It is the responsibility
of  mail  forwarding services to ensure that addresses conform
with this requirement.  In the case of abbreviated  addresses,
the  relaying  service must make the necessary expansions.  It
should be noted that it often is difficult for such a  service
to locate all occurrences of address abbreviations.  For exam-
ple, it will not be possible to find such abbreviations within
the  body  of  the  message.   The "Return-Path" field can aid
recipients in recovering from these errors.

Note:  When passing any portion of an addr-spec onto a process
       which  does  not interpret data according to this stan-
       dard (e.g., mail protocol servers).  There must  be  NO
       LWSP-chars  preceding  or  following the at-sign or any
       delimiting period ("."), such as  shown  in  the  above
       examples,   and   only  ONE  SPACE  between  contiguous
       <word>s.

6.2.3.  DOMAIN TERMS

A domain-ref must be THE official name of a registry, network,
or  host.   It  is  a  symbolic  reference, within a name sub-
domain.  At times, it is necessary to bypass standard  mechan-
isms  for  resolving  such  references,  using  more primitive
information, such as a network host address  rather  than  its
associated host name.

To permit such references, this standard provides the  domain-
literal  construct.   Its contents must conform with the needs
of the sub-domain in which it is interpreted.

Domain-literals which refer to domains within the ARPA  Inter-
net  specify  32-bit  Internet addresses, in four 8-bit fields
noted in decimal, as described in Request for  Comments  #820,
"Assigned Numbers."  For example:

                         [10.0.3.19]

Note:  THE USE OF DOMAIN-LITERALS IS STRONGLY DISCOURAGED.  It
       is  permitted  only  as  a means of bypassing temporary
       system limitations, such as name tables which  are  not
       complete.

The names of "top-level" domains, and  the  names  of  domains
under  in  the  ARPA Internet, are registered with the Network
Information Center, SRI International, Menlo Park, California.

From RFC2822 § 3.4.1 & RFC2822 § 4.4.

domain = dot-atom / domain-literal / obs-domain
obs-domain = atom *("." atom)

From RFC5322 § 3.4.1 & RFC5322 § 4.4.

domain = dot-atom / domain-literal / obs-domain
obs-domain = atom *("." atom)

Note from RFC5322, we should accept any domain as `Literal and let the user to resolve it. Currently, we fail when we catch a `Literal and do the best effort where we follow RFC5321. But may be it's inconvenient (or not?) to fail.

val id_left : local Angstrom.t

From RFC2822 § 3.6.4 & RFC2822 § 4.5.4.

obs-id-left = local-part
no-fold-quote = DQUOTE *(qtext / quoted-pair) DQUOTE
id-left = dot-atom-text / no-fold-quote / obs-id-left

From RFC5322 § 3.6.4 & RFC5322 § 4.5.4.

id-left = dot-atom-text / obs-id-left
obs-id-left = local-part
val no_fold_literal : string Angstrom.t

From RFC2822.

no-fold-literal = "[" *(dtext / quoted-pair) "]"

From RFC5322.

no-fold-literal = "[" *dtext "]"
val id_right : domain Angstrom.t

From RFC2822 § 3.6.4 & RFC2822 § 4.5.4.

id-right = dot-atom-text / no-fold-literal / obs-id-right
obs-id-right = domain

From RFC5322 § 3.6.4 & RFC5322 § 4.5.4.

id-right = dot-atom-text / no-fold-literal / obs-id-right
obs-id-right = domain
val msg_id : (local * domain) Angstrom.t

From RFC822 § 4.1 & RFC822 § 6.1.

addr-spec = local-part "@" domain ; global address
msg-id = "<" addr-spec ">" ; Unique message id

From RFC2822.

msg-id = [CFWS] "<" id-left "@" id-right ">" [CFWS]

Update:
+ CFWS within msg-id not allowed.*

The message identifier (msg-id) is similar in syntax to an angle-addr
construct without the internal CFWS.

From RFC5322.

msg-id = [CFWS] "<" id-left "@" id-right ">" [CFWS]

Update:
+ Removed no-fold-quote from msg-id.  Clarified syntax

The message identifier (msg-id) itself MUST be a globally unique
identifier for a message.  The generator of the message identifier
MUST guarantee that the msg-id is unique.  There are several
algorithms that can be used to accomplish this.  Since the msg-id has
a similar syntax to addr-spec (identical except that quoted strings,
comments, and folding white space are not allowed), a good method is
to put the domain name (or a domain literal IP address) of the host
on which the message identifier was created on the right-hand side of
the "@" (since domain names and IP addresses are normally unique),
and put a combination of the current absolute date and time along
with some other currently unique (perhaps sequential) identifier
available on the system (for example, a process id number) on the
left-hand side.  Though other algorithms will work, it is RECOMMENDED
that the right-hand side contain some domain identifier (either of
the host itself or otherwise) such that the generator of the message
identifier can guarantee the uniqueness of the left-hand side within
the scope of that domain.

Semantically, the angle bracket characters are not part of the
msg-id; the msg-id is what is contained between the two angle bracket
characters.
val addr_spec : mailbox Angstrom.t

From RFC822.

addr-spec = local-part "@" domain ; global address

From RFC2822.

An addr-spec is a specific Internet identifier that contains a
locally interpreted string followed by the at-sign character ("@",
ASCII value 64) followed by an Internet domain.  The locally
interpreted string is either a quoted-string or a dot-atom.  If the
string can be represented as a dot-atom (that is, it contains no
characters other than atext characters or "." surrounded by atext
characters), then the dot-atom form SHOULD be used and the
quoted-string form SHOULD NOT be used. Comments and folding white
space SHOULD NOT be used around the "@" in the addr-spec.

addr-spec = local-part "@" domain

From RFC5322.

Note: A liberal syntax for the domain portion of addr-spec is
given here.  However, the domain portion contains addressing
information specified by and used in other protocols (e.g.,
[RFC1034], [RFC1035], [RFC1123], [RFC5321]).  It is therefore
incumbent upon implementations to conform to the syntax of
addresses for the context in which they are used.

addr-spec = local-part "@" domain
val angle_addr : mailbox Angstrom.t

From RFC822.

The ABNF of angle-addr is not explicit from RFC 822 but the relic could be find here, as a part of mailbox:

mailbox =  addr-spec         ; simple address
        /  phrase route-addr ; name & addr-spec

From RFC2822 § 3.4 & RFC2822 § 4.4.

obs-domain-list = "@" domain *( *(CFWS / "," ) [CFWS] "@" domain)
obs-route = [CFWS] obs-domain-list ":" [CFWS]
obs-angle-addr = [CFWS] "<" [obs-route] addr-spec ">" [CFWS]
angle-addr = [CFWS] "<" addr-spec ">" [CFWS] / obs-angle-addr

From RFC5322 § 3.4 & RFC5322 § 4.4.

obs-domain-list = *(CFWS / ",") "@" domain
                  *("," [CFWS] ["@" domain])
obs-route = obs-domain-list ":"
obs-angle-addr = [CFWS] "<" obs-route addr-spec ">" [CFWS]
angle-addr = [CFWS] "<" addr-spec ">" [CFWS] /
val obs_domain_list : domain list Angstrom.t
val obs_route : domain list Angstrom.t
val obs_angle_addr : mailbox Angstrom.t
val phrase : phrase Angstrom.t

From RFC822.

phrase = 1*word ; Sequence of words

From RFC2047 § 2 & RFC2047 § 5.

(3) As a replacement for a 'word' entity within a 'phrase', for example,
    one that precedes an address in a From, To, or Cc header.  The ABNF
    definition for 'phrase' from RFC 822 thus becomes:

    phrase = 1*( encoded-word / word )

    In this case the set of characters that may be used in a "Q"-encoded
    'encoded-word' is restricted to: <upper and lower case ASCII
    letters, decimal digits, "!", "*", "+", "-", "/", "=", and "_"
    (underscore, ASCII 95.)>.  An 'encoded-word' that appears within a
    'phrase' MUST be separated from any adjacent 'word', 'text' or
    'special' by 'linear-white-space'.

encoded-word = "=?" charset "?" encoding "?" encoded-text "?="
charset = token    ; see section 3
encoding = token   ; see section 4
token = 1*<Any CHAR except SPACE, CTLs, and especials>
especials = "(" / ")" / "<" / ">" / "@" / "," / ";" / ":" / "
            <"> / "/" / "[" / "]" / "?" / "." / "="
encoded-text = 1*<Any printable ASCII character other than "?"
                  or SPACE>
               ; (but see "Use of encoded-words in message
               ; headers", section 5)

From RFC2822 § 3.2.6 & RFC2822 § 4.1.

obs-phrase = word *(word / "." / CFWS)
phrase = 1*word / obs-phrase

Update:
+ Period allowed in obsolete form of phrase.

From RFC5322 § 3.2.5 & RFC5322 § 4.1.

phrase = 1*word / obs-phrase

Note: The "period" (or "full stop") character (".") in obs-phrase
is not a form that was allowed in earlier versions of this or any
other specification.  Period (nor any other character from
specials) was not allowed in phrase because it introduced a
parsing difficulty distinguishing between phrases and portions of
an addr-spec (see section 4.4).  It appears here because the
period character is currently used in many messages in the
display-name portion of addresses, especially for initials in
names, and therefore must be interpreted properly.

obs-phrase = word *(word / "." / CFWS)
val obs_phrase : phrase Angstrom.t

See phrase.

val display_name : phrase Angstrom.t

From RFC822.

mailbox =  addr-spec         ; simple address
        /  phrase route-addr ; name & addr-spec

From RFC2822.

display-name = phrase
name-addr = [display-name] angle-addr

Note: Some legacy implementations used the simple form where the
addr-spec appears without the angle brackets, but included the name
of the recipient in parentheses as a comment following the addr-spec.
Since the meaning of the information in a comment is unspecified,
implementations SHOULD use the full name-addr form of the mailbox,
instead of the legacy form, to specify the display name associated
with a mailbox.  Also, because some legacy implementations interpret
the comment, comments generally SHOULD NOT be used in address fields
to avoid confusing such implementations.

From RFC5322.

name-addr = [display-name] angle-addr
display-name = phrase
val mailbox : mailbox Angstrom.t

From RFC822.

mailbox =  addr-spec         ; simple address
        /  phrase route-addr ; name & addr-spec

From RFC2822.

mailbox = name-addr / addr-spec

From RFC5322.

mailbox = name-addr / addr-spec
OCaml

Innovation. Community. Security.