$Id: E2E-spec.txt,v 1.21 2006/08/08 23:30:17 nickm Exp $

                                MIX3:2
         Type-III Remailers: End-to-end Encoding and Delivery

                            George Danezis
                           Roger Dingledine
                            Nick Mathewson
                             (who else?)

Status of this Document

   This draft document ("E2E-spec.txt") describes a proposed
   specification for Type III remailers.  It is not a final version.
   It has not yet been submitted to any standards body.

Abstract

   This document describes a formats and algorithms used by client
   software for Type III mixes; and by Type III mixes that support
   delivery.  It does not contain information for the formats and
   algorithms that these applications share with relay-only remailers
   -- for these, see "minion-spec.txt".

   Although this document discusses security issues in implementing
   Type III mix software, it is not comprehensive, nor does it discuss
   all implementation issues.

Table of Contents

            Status of this Document                                    X
            Abstract
            Table of Contents
   1.       Introduction
   1.1.     Terminology

   2.       End-to-end message encoding
   2.1.     Building blocks
   2.1.1.   Building blocks: Compression
   2.1.2.   Building blocks: K-of-N fragmentation
   2.1.3.   Building blocks: Whitening
   2.2.     Generating messages
   2.2.1.   Bare message format
   2.2.2.   Generating plaintext forward messages
   2.2.3.   Generating encrypted forward messages
   2.2.4.   Generating reply messages
   2.2.5.   Generating stateless SURBs
   2.3.     Decoding messages
   2.3.1.   Decoding algorithm
   2.3.2.   Overcompressed messages
   2.3.3.   Reconstruction issues

   3.       Message Delivery
   3.1.     General issues
   3.1.1.   ASCII armor
   3.2.     MBOX
   3.2.1.   Formatting: Routing information
   3.2.2.   Formatting: Message body
   3.2.3.   Delivery
   3.2.4.   Server descriptor section
   3.3.     SMTP
   3.3.1.   Formatting: Routing information
   3.3.2.   Formatting: Message body
   3.3.3.   Delivery
   3.3.4.   Server descriptor section
   3.4.     Fragments
   3.4.1.   Server descriptor section
   3.5.     News
   3.5.1.   Formatting: routing information
   3.5.2.   Formatting: message body
   3.5.3.   Delivery
   3.5.4.   Server descriptor section
   A.1.     Appendix: Versioning and alphas

1. Introduction

   This document is an adjunct to the main Type III (Mixminion) Mix
   protocol specification in "minion-spec.txt".  Whereas the main
   specification describes formats and algorithms which all compliant
   Type III mixes and clients users must generate and process, this
   document describes formats and algorithms for use at the edges of
   the system: by user software that generates anonymous messages; by
   user software that generates reply blocks; by exit nodes that
   deliver Type III messages; and by user software that decrypts
   encrypted Type III messages.

   Although it is possible for Type III mixes to support other
   delivery methods, and possible for clients to encode end-to-end
   messages in different ways, we _strongly_ encourage new
   implementations to remain compatible with these methods.  Since
   users receive anonymity from the cover traffic provided by other
   users, additional methods and options for delivery and encoding
   come at the expense of decreasing the anonymity provided by the
   system.

1.1. Terminology

      * "Packet" - A Type III mix packet.  Fixed in size to 32 KB.

      * "Message" - A variable size end-to-end message, transmitted
        from one location from another.  May be larger or smaller
        than a single packet.

      * "Forward message" - A message sent anonymously to a known
        recipient.

      * "Direct reply message" - A message sent to an known recipient,
        with no attempt to maintain sender anonymity.

      * "Anonymous reply message" - A message sent anonymously to an
        unknown recipient".

      * "Forward plaintext message" - A forward message sent in the
        clear.

      * "Forward encrypted message" - A forward message encrypted to
        the public key of its recipient.

      * "Corrupted packet" - A packet which has had its payload
        corrupted on the second leg of its path.  [Other forms of
        corruption will result in the packet not being delivered.]

2. End-to-end message encoding

   This section describes a method for encoding messages of any size
   into payloads for one or more Type III packets, using compression
   to reduce bandwidth and K-of-N message fragmentation to improve
   reliability with large messages.  This method limits opportunities
   for end-to-end traffic analysis by making corrupted packets,
   packets from forward encrypted messages, and packets from reply
   messages indistinguishable to anyone besides their intended
   recipients.  It allows forward plaintext messages, however, to be
   read by exit servers, so that they can deliver them to recipients
   who aren't running Type III client software.

   This form of End-to-End encoding is used for the delivery types
   SMTP and MBOX, and will likely be appropriate for other delivery
   methods with similar needs.

2.1. Building blocks
   [XXXX]

2.1.1. Compression

   We define a compression primitive based on ZLIB, as defined in
   RFC-1950 and RFC-1951.  Because these standards describe only a
   message format, and do not mandate a single compression algorithm,
   we must restrict the allowable means of compression further.  (If
   we allowed message encoders to choose among valid ZLIB compression
   algorithms, they would become partitionable.)

   Specifically, we define COMPRESS(M) to equal the result of
   compressing M using zlib 1.1.4 (as available from www.gzip.org) with
   the following parameters, not forcing any explicit flushes until the
   message is compressed:
         Compression level    = 9 (maximum)
         Compression method   = DEFLATED (default)
         Window size          = 32K (default)
         Memory level         = 8 (default)
         Compression strategy = DEFAULT (default)

   Implementation note: Any software may be used, so long as it gives
   the same result as zlib 1.1.4 with these parameters.  Mark Adler
   from the Gzip team has averred that this is so with all zlib
   versions from 1.0.4 through 1.1.4, but *may* change in some future
   version.

   We also define DECOMPRESS as the inverse of COMPRESS: namely, ZLIB
   decompression as described in RFCs 1950 and 1951.  Note that
   DECOMPRESS is not defined for every sequence of octets.

2.1.2. K-of-N fragmentation

   We define a primitive, FRAGMENT, that breaks a K-packet message
   into N>K packets, such that any K of those packets are sufficient
   to reconstruct the original message with high probability.
   FRAGMENT(M,K,N,i) is the i'th such packet.

   We also define a primitive RECONSTRUCT(K,N,P_i1, P_i2,...P_iK) that
   reconstructs the original message.

   Note that these primitives only need to provide the above property
   of erasure correction, and need not provide "secret splitting": It
   is harmless if the message can be reconstructed from less than K
   packets.

   Currently, we use the algorithm described by Luigi Rizzo in several
   papers and implemented in software at
             http://info.iet.unipi.it/~luigi/fec.html .
   This algorithm has several advantages: first, it has freely-available
   implementations in C and Java under a modified-BSD style license.
   Second, it runs with acceptable performance for modest values of K
   (less than 30 or so).  Third, it does not seem to be patent
   encumbered.

   Of course, this algorithm has disadvantages.  First, it lacks a
   complete, byte-level specification beyond that given at the URL
   above.  Second, it performs poorly with very large K.  Sadly, it
   seems to be about the best we can do without touching patented
   algorithms.

   To avoid large K, we split the message into chunks, and do K-of-N
   fragmentation on the chunks.  With this scheme, a message can be
   reconstructed if and only if K packets from each chunk arrive
   intact. Because packet loss is likely to be nonuniformly concentrated
   at specific remailers and links, this is not so dangerous to
   reliability as it might initially appear.

   [NOTE: I am not happy with this current situation: there needs to
    be _some_ unpatented probabilistic O(N) algorithm out there! -NM]

   We divide and fragment as follows:

       PROCEDURE: Divide a message into packets.  [DIVIDE(M,PS,EXF)]
       ARGUMENTS
           M: the message to send
           PS: payload sized (fixed)
           EXF: expansion factor 

       Let M_SIZE = CEIL(LEN(M) / PS)

       Let K = Min(16, 2**CEIL(Log2(M_SIZE)))
       Let NUM_CHUNKS = CEIL(M_SIZE / K)

       Let M = M | PRNG(Rand(KEY_LEN), Len(M) - NUM_CHUNKS*PS*K)

       For i from 1 to NUM_CHUNKS:
          Let CHUNK_i = M[(i-1)*PS*K : i*PS*K]
       End

       Let N = Ceil(EXF*K)

       For i from 0 to NUM_CHUNKS-1:
         For j from 0 to N-1:
           FRAGMENTS[i*N+j] = FRAGMENT(CHUNK_i, K, N, j)
         End loop
       End loop

       Return FRAGMENTS.

  [If we find an unpatented O(N) algorithm, we use it instead of this
   junk. -NM]

  Everyone must use the same EXF. The value of EXF is 4/3. 

2.1.3. Building blocks: whitening

   While some fragments of a message are stored, but before the entire
   message has been received, a window of vulnerability exists on the
   exit server.  To prevent any portion of a message from being read
   in the clear before enough packets from the message have arrived,
   apply the following whitening formula to messages before fragmentation:

   WHITEN(M) = SPRP_Encrypt(K_whiten, "WHITEN", M)
   UNWHITEN(M) = SPRP_Decrypt(K_whiten, "WHITEN", M)

   where K_whiten is equal to the octet sequence [57 48 49 54 45 4E].

   Note that applying K_whiten and DIVIDE together has the effect of
   turning DIVIDE from an erasure correcting code into a full secret
   sharing encoding: If insufficient packets are received to reconstruct
   the whole message, none of the message can be reconstructed.

2.2. Generating messages

   When sending a message as packets, a sender follows these steps:

      1. Determine the type of the message: encrypted forward, plaintext
         forward, or reply.
      2. Compress and possibly fragment the message into a set of
         payloads.  (The size of the payloads will depend on the type
         of the message.)
      3. Annotate each payload with a payload header.  (The payload
         header includes size, integrity, and fragmentation
         information.)
      4. According to the type of the message, encode each payload into
         a final 28KB paylaod and (possibly) 20-octet decoding handle.
      5. For each payload, select a list of servers to form a path
         through the network.
      6. Using the decoding handle, payload contents, and route for
         each payload, generate a 32KB type III packet.
      7. Deliver each packet

   This section will describe steps 1, 2, 3, 4.  Step 5 is described
   more fully in "path-spec.txt".  Steps 6 and 7 are described in
   "minion-spec.txt".

2.2.1. Packet format

   As described in minion-design.txt, every mixminion packet contains a
   28KB playload and two 2KB headers.  In this design, the routing
   information for the final hop in the final header contains a 20-octet
   decoding handle or 'tag'.

2.2.1.1. Decoding handles

   The decoding handle is used by the recipient to determine how to
   decode or decrypt the final message.

   (Because it is part of the header, this decoding handle must
   be generated by the same entity that creates the second leg of the
   path: the message sender in the case of forward messages, and the
   SURB generator in the case of reply messages.)

   In all types of messages, the decoding handle looks like 159-bit
   random number, preceeded by a single zero bit.

   In the various packet types, the decoding handle is used as follows:

      * Plaintext forward - the handle is unused; its value is random.

      * Encrypted forward - the handle holds the first 20 octets of the
        RSA-encrypted session key for this packet.

      * Reply - the handle (generated by the SURB creator) contains a
        random seed used to seed the RNG that generated the master
        secrets for this SURB header.

2.2.1.2. Payload formats

   The payload of every packet, when decrypted, begins with the one of
   the two following headers.

   SINGLE-PACKET-MESSAGE Payload:  [Header size is 22 octets]

      (Single-packet message flag) [1 bit: 0]
      (Length of packet contents)  [15 bits]
      (Hash of packet contents)    [20 octets]

   The length field encodes the size of the contents of the packet
   after compression.

   FRAGMENT Header:  [Header size is 47 octets]

      (Multi-packet message flag)       [1 bit: 1]
      (Packet index)                    [23 bits]
      (Hash of remaining fields and packet
       contents)                        [20 octets]
      (Message ID)                      [20 octets]
      (Message size)                    [4 octets]

   The packet index contains the position of the packet within the
   FRAGMENTS list generated by DIVIDE.  The Message ID is a random
   20-octet sequence.  The Message Size is the size of the whole
   message after compression.

   We define the constants FRAGMENT_HEADER_LEN = 47 and
   SINGLETON_HEADER_LEN=22.

   To break a message into packets with headers (steps 2 and 3 in 2.2
   above), an implementation follows these steps:

      PROCEDURE: PACKETIZE_MESSAGE(M, OVERHEAD)
      ARGUMENTS
          M: the message to send
          OVERHEAD: overhead needed for message type

      Let M_C = COMPRESS(M).

      If LEN(M_C)+SINGLETON_HEADER_LEN+OVERHEAD <= 28KB :
          Let PADDING_LEN = 28KB - LEN(M_C) - SINGLETON_HEADER_LEN
                            - OVERHEAD

          Let PADDING = Rand(PADDING_LEN)

          Return a singleton payload containing:
              Flag 0 | Int(15,LEN(M_C)) | Hash(M_C | PADDING) | M_C | PADDING

      Otherwise:
          Let FRAGMENTS = DIVIDE(M_C, 28KB-OVERHEAD-FRAGMENT_HEADER_LEN)

          Let ID = Rand(TAG_LEN)

          Let SZ = Int(32,Len(M_C))

          For every FRAGMENT_i in FRAGMENTS:

             Let PAYLOAD_i = a fragment payload containing:
                Flag 1 | Int(23,i) | Hash(ID | SZ | FRAGMENT_i ) | ID
                   | SZ | FRAGMENT_i

          return every PAYLOAD_i.

2.2.2. Generating plaintext forward messages

   To send a plaintext forward message M, we first let PAYLOADS =
   PACKETIZE_MESSAGE(M, 0).  For every PAYLOAD_i, we set TAG_i = to a
   randomly chosen 159-bit integer.

   We then transmit each payload with its corresponding tag.

2.2.2.1. Plaintext forward messages and fragments

   We require some additional complexity to minimize the time in which
   the destination of a plaintext forward fragmented message is
   decrypted in remailer's storage.

   Unlike encrypted forward and reply fragmented messages, which are
   delivered packet-by-packet to their recipients, plaintext forward
   fragmented messages are reassembled by an exit node before
   delivery.  (We cannot require that their recipients reassemble
   them, since we assume that the recipient of a plaintext forward
   message may not be running any anonymity software.)

   Having exit nodes reassemble fragments introduces two problems:

      1) Because the exit node must hold fragments until they can be
         reassembled, an attacker can attempt to exhaust the exit
         node's resources by flooding it with fragments for messages
         that never arrive.

      2) Because the recipient of a type III packet is encoded in its
         final subheader, an attacker who compromises a type III
         remailer can read the identities of all the users who have
         pending plaintext forward messages.

   We specify a solution to 2 here.  When generating plaintext forward
   fragmented messages, the message generator uses a routing type of
   "FRAGMENT" (0x0103), an empty routing info, and prepends the
   following fields to the message body before compressing and
   fragmenting it:

         RS Routing size    2 octets
         RT Routing type    2 octets
         RI Routing info    (variable length; RS=Len(RI))
    (These fields are as described in minion-spec.txt.)

    Thus, the destination of a message is not retrievable by the exit
    node until enough fragments have been received to decrypt the
    message itself.  In this way, we minimize the window in which the
    identity of a recipient is visible to the exit node.

2.2.3.   Generating encrypted forward messages

   To send an encrypted forward message M to a user with an RSA public
   key PK with length PKLEN (in octets), we set PAYLOADS =
   PACKETIZE_MESSAGE(M, PK_OVERHEAD_LEN-TAG_LEN+SPRP_KEY_LEN).  (We lose 42
   octets to OAEP padding and 20 to encode the session key, but gain
   20 by spilling the encrypted data into the decoding tag.)

   For every payload PAYLOAD_i:

      Repeat:
           Let K = Rand(SPRP_KEY_LEN).
           Let P = K | PAYLOAD_i
           Let P0 = PK_Encrypt(PK, P[0:PKLEN-PK_OVERHEAD_LEN])
      Until the most significant bit of P0[0] is equal to 1.
      Let P1 = SPRP_Encrypt(K, "END-TO-END ENCRYPT",
                            P[PKLEN-PK_OVERHEAD_LEN: Len(P)-PKLEN-PK_OVERHEAD_LEN])
      Let TAG_i = P0[0:TAG_LEN]
      Let EPAYLOAD_i = P0[TAG_LEN:Len(P0)-TAG_LEN] | P1

   We then transmit every payload EPAYLOAD_i with the corresponding tag TAG_i.

2.2.4.   Generating reply messages

   To send a reply message M to an anonymous recipient, we set
   PAYLOADS = PACKETIZE(M, 0).   We send each PAYLOAD_i with a
   separate SURB_i -- we must have enough to use a different SURB for
   each message.  We do not need to include TAG (decoding handle)
   fields: they are a part of the SURB.

   SURB users SHOULD keep track of which SURBs they have used to
   prevent multiple use, at least until the SURBs have expired.

2.2.5.   Generating stateless SURBs

   In order to avoid storing a set of keys for every outstanding
   SURB, SURB generators use the following SURB generation
   procedure.  To use this method, SURB generators must store a
   separate long-term secret for each identity they wish to associate
   with a chain of SURBs.

   (Client software MUST support multiple identities, and MUST make
   it clear to the user which identity has been associated with each
   incoming SURB.

   Nickname comparisons SHOULD be done in a case insensitive
   manner.)

   To generate a SURB for a path of length PATH_LEN, using a long-term
   secret SEC:

      Repeat:
         Let SEED = a random 159-bit seed.
      Until Hash(SEED | SEC | "Validate") ends with a 0 octet.

      Let K = Hash(SEED | SEC | "Generate")[0:KEY_LEN]

      Let STREAM = PRNG(K, KEY_LEN*(PATH_LEN + 1))

      Let SHARED_SECRET = STREAM[PATH_LEN*KEY_LEN:KEY_LEN]

      For i in 1 .. PATH_LEN
         Let MS_i = STREAM[(PATH_LEN-i)*KEY_LEN : KEY_LEN]

      Generate a reply block using MS_i as the master secret for the
      i'th node in the hop, SEED as the tag, and SHARED_SECRET as the
      end-to-end shared secret.

2.3. Decoding messages

   When a Type III mix receives an exit packet, it tries to decode it
   (if it can) before delivery, and otherwise delivers it undecoded.
   When a client receives an undecoded exit packet, it tries to
   decode it before presenting it to the user.

2.3.1. Decoding algorithm

   Message decoders recognize plaintext (singleton or fragment)
   payloads by checking whether the hash fields match the calculated
   hash of the rest of the packet.

   If a message decoder knows one or more SURB secrets, it then checks
   the decoding handle 'TAG' to see whether Hash(TAG | SEC |
   "Validate") ends with a zero octet for any secret SEC.  If so, the
   decoder generates secrets from TAG | SEC as in SURB generation, and
   successively decrypts the payload with up to MAX_PATH of them,
   checking each time for a plaintext payload.

   If no SURB secrets are known, or if no SURB secrets yield a
   plaintext payload, and the decoder knows one or more secret keys
   SK_i, it then checks whether PK_Decrypt(SK_i, TAG |
   P[0:PK_LEN-TAG_LEN]) has valid OAEP padding for some SK_i.  If so,
   it extracts K from the first 20 octets of the decrypted value, and
   uses K to LIONESS-decrypt the rest of the payload.

   If none of these approaches works, the decoder has failed.  Upon
   failure, an exit node simply delivers the undecoded message and
   decoding handle to the message's recipient, in hopes that the
   recipient will have SEC or SK values.  A client, however, marks
   the message as JUNK.


   PROCEDURE: DECODE_PLAINTEXT_PAYLOAD
   ARGUMENTS:
       P: a payload to decode

   If the first bit of P[0] is 0:
      If P[2:HASH_LEN] = Hash(P[2+HASH_LEN:Len(P)-2-HASH_LEN]):
         SZ = P[0:2]
         Return "Singleton", P[2+HASH_LEN : SZ]
      Otherwise,
         Return "Unknown"
   Otherwise, if the first bit of P[0] is 1:
      If P[3:HASH_LEN] = Hash(P[3+HASH_LEN:Len(P)-3-HASH_LEN]):
         IDX = P[0:3] & 0x7fffff
         MSG_ID = P[3+HASH_LEN:20]
         MSG_SZ = P[3+HASH_LEN+20:4]
         FRAG = P[3+HASH_LEN+20+4:Len(P)-3-HASH_LEN-20-4]
         Return "Fragment", IDX, MSG_ID, MSG_SZ, FRAG
      Otherwise:
         Return "Unkown"

   PROCEDURE: DECODE_PAYLOAD
   ARGUMENTS:
       P: a payload to decode
       TAG: decoding handle for the payload
       SK_1 ... SK_n: Optionally, a list of RSA secret keys
       SEC_1 ... SEC_n: Optionally, a list of SURB secrets.

   If DECODE_PLAINTEXT_PAYLOAD(P) is not "unknown", return it.

   For all SEC_i:
      If H(TAG | SEC_i | "Validate") ends with a zero octet:
         K = H(TAG | SEC_i | "Generate")
         STREAM = PRNG(K, MAX_PATH * KEY_LEN)
         Let P_t = P.
         For j in 0 ... MAX_PATH-1:
            Let P_t = SPRP_Encrypt(STREAM[j * KEY_LEN : KEY_LEN],
                                   "PAYLOAD_ENCRYPT",
                                   P_t)
            If DECODE_PLAINTEXT_PAYLOAD(P_t) is not "Unknown", return it.

   For all SK_i:
      Let E0 = TAG | P[0:Len(SK_i)-TAG_LEN]
      Let P0 = PK_Decrypt(SK_i, E0).
      If the OAEP padding is valid:
         Let K = P0[0:KEY_LEN]
         Let P0' = P0[KEY_LEN:Len(P0)-KEY_LEN]
         Let P1 = SPRP_Decrypt(K, "END-TO-END ENCRYPT",
                               P[Len(SK_i)-TAG_LEN : Len(P)-Len(SK_i)+TAG_LEN])
         If DECODE_PLAINTEXT_PAYLOAD(P0'|P1) is not "Unknown", return it.

   Otherwise, return "Unknown".

2.3.2. Overcompressed messages

   Because zlib allows up to 1000-fold compression, using zlib for
   message compression creates opportunities for serious mailbombing.

   When decoding a message, decoders MUST check whether the
   decompressed size of the message will be "far longer" than the
   compressed size.  In general, if C = COMPRESS(P), and Len(P) > 20K,
   and Len(P)/Len(Z) > 20, then P SHOULD BE considered overcompressed.

   Decoders MUST decode incrementally, so that they can notice
   overcompressed messages without using too much space.  Upon
   encountering an overcompressed message, an exit node MUST mark it
   as such and deliver it to the user without uncompressing it.

   Upon encountering an overcompressed message, client software SHOULD
   alert the user and require explicit confirmation before
   decompressing the message.

2.3.3. Reconstruction issues

   When a server receives a plaintext forward packet containing a
   fragment of a message, it should behave as follows:

      - If the server does not support reconstruction, or if the hash
        of the packet is invalid, it drops the packet.

      - From the declared message size of the message, it calculates
        the number of packets it expects.  If this number is higher
        than the maximum number of packets the server is willing to
        construct, it drops the packet.

      - If the server has already logged this message ID as finished,
        it drops the packet.

      - If the server has other packets with this message ID, and
        those packets have a different message size, it drops all
        packets with this message ID and logs this message ID as
        finished.  If the server encounters multiple packets with the
        same message ID and packet index, it drops all packets with
        that message ID, and marks the message ID as finished.

      - The server stores this packet pending reconstruction.  If
        enough packets are held for this message ID to reconstruct
        the message, the server reconstructs the message (as
        described below), delivers it, deletes all fragment packets
        for this message ID, and marks the message ID as finished.

   To prevent denial of service attacks, a server MAY delete old
   fragments when it is low on disk space.  [XXXX is this the best
   approach?]

3. Delivery

   This section describes the standard message delivery types provided
   with the Mixminion Type III mix implementation.  Other
   implementations MAY implement other types in other ways, though
   they SHOULD avoid adding new types in ways that would exacerbate
   partitioning attacks.

   This document *does not* describe routing types or transfer methods
   used for mix-to-mix communication; see "minion-spec.txt" for those.

3.1. General issues

3.1.1. ASCII armor

   When encoding an overcompressed or undecodeable Type III message,
   exit nodes MUST apply OpenPGP ASCII armor, as defined in RFC2440,
   section 6.2.  The header text is "BEGIN TYPE III ANONYMOUS
   MESSAGE".  There are two armor headers: "Message-type" (required)
   and "Decoding-handle" (optional).  The value of "Message-type" must
   be "encrypted" for an undecodeable message, and "overcompressed"
   for an overcompressed message.  For an undecodeable message, the
   decoding handle MUST be included, base-64-encoded, as the value
   of "Decoding-handle".  Otherwise, the "Decoding-handle" header
   MUST be omitted.

   When encountering a plaintext fragment, if the exit type is not
   "FRAGMENT", the exit node SHOULD deliver the fragment as-is, with
   armor described above, and "Message-type" set to "fragment".

   When encoding a plaintext Type III message, exit nodes MAY apply
   OpenPGP ASCII armor if the message contains characters other than
   printing ASCII, and no encoding is specified in the message.  When
   doing so, the "Message-type" header must be "binary".

   Otherwise, exit nodes MAY format the message with OpenPGP armor
   headers and dash-escaped text.  In this case, the "Message-type"
   header MUST be "plaintext".

   [XXXX Right now, there's no way to specify an encoding in a
   message.  Don't worry--you didn't misread. -NM]

3.1.2. RFC822 headers

   Delivery types that deliver messages via email or news protocols
   need to support setting limited set of headers from message
   payloads.

   Headers can fall in 4 classes:
      1. Set by exit node, not by message sender.  (Example: "Date")
      2. Set in packet header by path generator.  (Example: "To")
      3. May be set by message sender. (Example: "Subject")
      4. May be set partially by message sender. (Example: "From")

   To encode header values, we use the following message format:
      MESSAGE ::= HEADERS DATA
      HEADERS ::= HEADER HEADERS | HEADER_END
      DATA ::= (any sequence of octets)
      HEADER ::= HEADER_NAME COLON HEADER_VAL NL
      HEADER_END ::= NL
      NL ::= (ascii NL, hex 0A).
      COLON ::= (ascii ':', hex 3A).
      HEADER_VAL ::= HEADER_VAL_CHAR HEADER_VAL |
      HEADER_VAL_CHAR ::= (any character in the range hex 20 through
                           hex 7E inclusive)
      HEADER_NAME ::= HEADER_NAME_CHAR HEADER_NAME | HEADER_NAME_CHAR
      HEADER_NAME_CHAR ::= (any character in the range hex 21 through
                           hex 7E inclusive, excluding hex 3A.)

   Design note: We explicitly decline to implement full RFC[2]822.  This
   would add to the implementation complexity of Type III
   implementations, and endanger anonymity by allowing nonuniformity
   among client software packages.

   Unlike RFC[2]822, clients MUST use only recognized header names,
   and SHOULD normalize header values by removing leading or trailing
   space.  Unlike RFC[2]822, servers MUST remove unrecognized headers.

   To prevent distinguishability between clients, headers MUST appear
   in lexical (alphabetical) order.  Servers MUST NOT use out-of-order
   headers.

   To help implementations comply with RFC2822, each header MUST NOT
   be longer than 900 characters.

3.2. MBOX

   The routing type 0x101 corresponds to MBOX delivery.  Conceptually,
   an MBOX is an internally visible, Type III-only delivery address,
   specific to a single exit node.

3.2.1. Formatting: Routing information

   The routing info for an MBOX header MUST contain a 20-octet
   decoding handle, followed by a variable width MBOX name.  Exit
   nodes MUST drop packets addressed to unknown MBOXes, or packets
   with malformed routing info fields.

   The interpretation of the MBOX name is left to the exit node, and
   will vary between exit nodes.  Typically, exit nodes map from
   'username' to 'username@localhost' for a limited set of their user
   names, and deliver messages via sendmail.  Exit nodes MAY implement
   other schemes.

3.2.2. Formatting: Message body

  Header encoding is as described in 3.1.2 above.

  The following headers are allowed:
        "SUBJECT"  (any.  Must be no more than 900 characters long.)
        "FROM"     (any sequence of printing ASCII characters
                    excluding '"', '[', ']', and ':'. )

        "IN-REPLY-TO" (an RFC2822 msg-id)
        "REFERENCES" (a list of RFC2822 msg-ids)

  [XXXX Are msg-ids really what we want? Should we say more?  Should
        we restrict encoding? -NM]

  [XXXX The client should not be allowed to supply IN-REPLY-TO.  Instead
        the exit node should set IN-REPLY-TO to the last msg-id in the
        REFERENCES header.
        Also there should be a standard encoding of msgd-ids: for
        instance msg-ids should be seperated by one space (0x20)
        exactly.
        Additionaly there should be a specified way of dealing with
        references lines that are too long.  The standard in news seems
        to be to keep the first msg-id in the list (the original post
        that started a thread), and then remove as many of the following
        msg-ids so that the line fits in the space available.
        (XXX: need to cross-check that with the usefor drafts).  - PP]

  [XXXX Okay; let me know when you have crosschecked successfully, and
        when you have an citation for this. -NM]

  Unrecognized or malformatted headers MUST be removed.

3.2.3. Delivery

   When delivering an MBOX message via email, an exit node MUST
   construct an RFC2822 message as follows:

   The "To" line is the mailbox of the corresponding recipient.

   The "Subject" line is taken from the contents of the "SUBJECT"
   header, removing trailing and leading whitespace.  OPTIONALLY,
   implementations may prepend a short marker (e.g., "[ANON]" or
   "[MBOX]").  If no "Subject" line is provided in the message, exit
   nodes SHOULD include preconfigured one, such as "Type III Anonymous
   Message".

   If the server is configured to allow user-supplied From addresses,
   the "From" line is generated as follows:
      Let F be the contents of the "FROM" header in the message.
      Remove all leading or trailing whitespace from F.
      Replace all sequences of 2 or more space characters in F with a
        single space.

      Prepend a double quote, a preconfigured marker (e.g., "[ANON]"),
        and a preconfigured exit node mailbox (e.g.,
        <nobody@____.com>).

      (Thus, if the sender specifies a "FROM" header of 'Lance
      Cottrell', an implementation could generate a 'From' header of
      the form:  "From: "[ANON] Lance Cottrell" <nobody@___.org>".)
   Otherwise (if the server is not configured to allow user-supplied
   From addresses) the "From" line is set to  a preconfigured value.

      [Rationale: Previously, I'd argued for having only a single
      supported "From" policy, as a measure to prevent linkability
      based on client option preferences.  Adam Back correctly pointed
      out that this is silly.  Consider that _any_ use or non-use of
      From addresses makes messages linkable *in itself*.  In other
      words, Eve can already tell which messages set their from
      addresses; she gains nothing by learning that those messages
      have chosen an exit node with From support to do so. -NM]

   The "Date" line should be the current date.

   The "In-Reply-To" and "References" lines should be taken verbatim
   from the corresponding headers, if those headers are present.
   [XXXX Is this sensible? -NM]
   [XXXX See my comment about in-reply-to above. -PP]

   The "X-Anonymous" line SHOULD be present, and set to "yes".
   [XXXX Do you really think this is a good idea? -PP]
   [XXXX Yes.  There's nothing wrong with letting people who don't
         want anonymous messages block 'em.  (On IRC, PP says that he
         is worried about large ISPs filtering.)  This bears more
         thought, but I think it's probably a bad idea to deliberately
         screw over unwilling recipients in order to bypass
         clueless/draconian sysadmins. -NM]

   Note again that all unrecognized or misformatted headers MUST be
   rejected.

   The payload SHOULD be excaped as described in 3.1.1.

3.2.4. Server descriptor section

   Servers that support MBOX delivery MAY include a [Delivery/MBOX]
   section, containing the entry "Version: 1.0".  Other servers
   MUST NOT include a [Delivery/MBOX] section.

   This section MUST include a "Maximum-size" line, containing the
   maximum permitted message size in KB (before compression).  Note
   that because of base64-encoding, actual delivered messages may be
   longer than this by a factor of ~1.33.  The value must be at least
   "32".  It MUST contain an "Allow-From" line, containing 'yes' if the
   server allows user-supplied from addresses and 'no' if it does not.

3.3. SMTP

   The routing type 0x100 corresponds to SMTP (email) delivery.

3.3.1. Formatting: Routing information

   The routing information for an SMTP header MUST contain a 20-octet
   decoding handle, followed by a variable-width mailbox.

   A mailbox MUST be a list of at least one but no more than eight addresses,
   separated by NUL characters ([00]).  Each address MUST be the
   "username@host" part of an RFC2821 mailbox.  (Using full RFC2822 allows
   too much distinguishability between senders, and makes blacklisting hard.)
   A mailbox MUST obey the following format:

   [XXXX Mixminion through 0.0.6 does not support multiple destination
     addresses.]

      MAILBOX ::= LOCALPART AT HOSTPART
      LOCALPART ::= ATOM | LOCALPART DOT ATOM
      HOSTPART ::= ATOM | HOSTPART DOT ATOM
      ATOM ::= ATOMCHAR | ATOM ATOMCHAR
      ATOMCHAR ::= Any character in the range hex 21 through hex 7E,
             excluding '[', ']', '(', ')', '<', '>', '@', ',', '.',
             ';', ':', '\', and '"'.
      AT ::= '@' (ASCII hex 40)
      DOT ::= '.' (ASCII hex 2E)

   Additionally a HOSTPART MUST NOT be an IP address -- it would make
   blacklisting hard, and encourage senders to resolve target hosts.

   [XXXX I suspect the above should be "SHOULD NOT." -NM]
   [XXXX It is possible that an email address is in fact
         example@[192.186.2.1], so it really should be SHOULD NOT.
         Also can we reach all possible mail addresses using this
         strict syntax?  Do we care? -PP]
   [XXXX Using IP addresses does, as noted above, make blacklisting
         hard.  But we should review RFC2821 to see if we care. -NM]

   Software that allows users to send a message to multiple recipients SHOULD
   automatically place the recipient mailboxes in lexicographical order,
   eliminate duplicates, and divide them into groups of eight (with one short
   group at the end).

3.3.2. Formatting: Message body

   The message body format is exactly as the MBOX format, as
   described above in 3.2.2.

3.3.3. Delivery

   To deliver an SMTP message, an exit node that supports the SMTP
   delivery type SHOULD construct an RFC2822 message as described in
   3.2.3 above, additionally setting the 'To' line to the mailboxes
   given in the message header.

   [XXXX Should a separate outgoing message be created for each incoming
      address? -NM]

   Implementations SHOULD allow exit node operators to configure
   additional fields, and to block specific 'To' addresses.

3.3.4. Server descriptor section

   Servers that support SMTP delivery MAY include a [Delivery/SMTP]
   section, containing the entry "Version: 1.0".  Other servers
   MUST NOT include a [Delivery/SMTP] section.

   This section MUST include a "Maximum-size" line, containing the
   maximum permitted message size in KB (before compression).  Note
   that because of base64-encoding, actual delivered messages may be
   longer than this by a factor of ~1.33. The value must be at least
   "32".  A server MAY drop any message that uncompresses to be
   longer than this type. It MUST contain an "Allow-From"
   line, containing 'yes' if the server allows user-supplied from
   addresses and 'no' if it does not.

3.4. Fragments

3.4.1. Server descriptor section

   When a server supports message reconstruction, it MAY include a
   "[Delivery/Fragmented]" section as described here.  Other servers
   MUST NOT include a "[Delivery/Fragmented]" section.

   The section, if present, MUST contain a 'Version' entry, with the
   value "1.0".  It also MUST contain a "Maximum-Fragments" line,
   containing the maximum size (in fragments) of a message that the
   server is willing to reconstruct.

3.5. News

   [XXXX expand this from notes.]

3.5.1. Formatting: routing information

   [RI must contain 1-3 newsgroups, 0-8 mailboxes, and a subject.]

3.5.2. Formatting: message body

   [Headers are followup-to, reply-to, references, from, (in-reply-to?),
   x-no-archive, (messageid?) ]

3.5.3. Delivery

3.5.4. Server descriptor section

A.1. Apendix: versioning and alphas

   Today's alpha code does not publish its version as '1.0'; it uses
   '0.x' instead (currently '0.1' for all versions in this document).
   Production versions MUST NOT retain backward compatibility
   with pre-production releases.

A.2. Appendix: storing client secrets

   The following describes the format used by the Mixminion reference
   software to store SURB keys.  Other software MAY use this format.
   Clients that do so MUST implement it as described here.

   [Rationale: earlier, we specified a standard export format for SURB
   secrets. Exporting secret keys, however, is _bad_: once the ability
   exists, the secret keys tend to get sucked out and
   re-stored---often less securely---by other applications.  Peter
   Gutmann describes several instances of this somewhere, and I'm not
   the kind of guy to argue with Peter Gutmann. -NM]

   [XXX This format is supported by Mixminion 0.0.6 and later: earlier
   versions of the software use a more Python-specific format that
   you really shouldn't try to read.]

   First, the keyring itself is stored with RFC2440-style ASCII armor,
   with header text "BEGIN TYPE III KEYRING" and an armor header
   "Version" with a value "0.1".  The contents to be encoded are:

         magic       [8 octets]
         format type [1 octet == 0]
         salt        [8 octets]
         encdata     [variable]

   Where 'magic' is "KEYRING2" [ 4B 45 59 52 49 4E 47 32 ], 'salt' is
   a randomly chosen octet sequence, and 'encdata' is computed from
   the actual identity data 'data' and a user-selected password
   'password' as follows:

         Let padding = Rand(1024*CEIL(LEN(data)/1024) - LEN(data))
         Let data' = Int(32, LEN(data)) | data | padding
         Let hash = H(data' | salt | magic)
         Let key = H(salt | password | salt)[0:KEY_LEN]
         Let encdata = Encrypt(key, data' | hash)

   The format of the actual data is as follows:

         KeyData ::= Item *
         Item ::= ItemType [1 octet]
                  ItemLen  [2 octets]
                  ItemVal  [ItemLen octets]

   Implementations MUST skip over items with unrecognized types, and
   preserve them when modifying the keyring.  Implementations MUST NOT
   depend on any order of items within the keyring.

   SURB keys have the following format:
         SURBKeyType    [00]
         SURBKeyLen     [2 octets]
         SURBKeyExpires [4 octets]
         SURBKeyName    [Variable; NUL-terminated]
         SURBKeySecret  [Variable]

   SURBKeyType and SURBKeyLen confirm to the fields ItemType and
   ItemLen as described above.  SURBKeyExpires is a 4-octet timestamp
   (rounded to the nearest midnight GMT), after which the key should
   be removed from the keyring.  SURBKeyName is a NUL-terminated name
   for this identity, in lowercase.  SURBKeySecret is the master
   secret for this SURB identity -- it should be at least 20 octets
   long.

   In order to implement key rotation, multiple SURB keys may exist
   for the same identity.  Clients SHOULD always generate SURBs using
   the latest-expiring key, and SHOULD accept reply messages using all
   unexpired keys.  Client software SHOULD generate a new key for an
   identity whenever they are generating a SURB, and newest existing
   key for that identity would expire before the software expects to
   receive messages sent using that SURB.