$Id: E2E-spec.txt,v 1.21 2006/08/08 23:30:17 nickm Exp $ MIX3:2 Type-III Remailers: End-to-end Encoding and Delivery George Danezis Roger Dingledine Nick Mathewson (who else?) Status of this Document This draft document ("E2E-spec.txt") describes a proposed specification for Type III remailers. It is not a final version. It has not yet been submitted to any standards body. Abstract This document describes a formats and algorithms used by client software for Type III mixes; and by Type III mixes that support delivery. It does not contain information for the formats and algorithms that these applications share with relay-only remailers -- for these, see "minion-spec.txt". Although this document discusses security issues in implementing Type III mix software, it is not comprehensive, nor does it discuss all implementation issues. Table of Contents Status of this Document X Abstract Table of Contents 1. Introduction 1.1. Terminology 2. End-to-end message encoding 2.1. Building blocks 2.1.1. Building blocks: Compression 2.1.2. Building blocks: K-of-N fragmentation 2.1.3. Building blocks: Whitening 2.2. Generating messages 2.2.1. Bare message format 2.2.2. Generating plaintext forward messages 2.2.3. Generating encrypted forward messages 2.2.4. Generating reply messages 2.2.5. Generating stateless SURBs 2.3. Decoding messages 2.3.1. Decoding algorithm 2.3.2. Overcompressed messages 2.3.3. Reconstruction issues 3. Message Delivery 3.1. General issues 3.1.1. ASCII armor 3.2. MBOX 3.2.1. Formatting: Routing information 3.2.2. Formatting: Message body 3.2.3. Delivery 3.2.4. Server descriptor section 3.3. SMTP 3.3.1. Formatting: Routing information 3.3.2. Formatting: Message body 3.3.3. Delivery 3.3.4. Server descriptor section 3.4. Fragments 3.4.1. Server descriptor section 3.5. News 3.5.1. Formatting: routing information 3.5.2. Formatting: message body 3.5.3. Delivery 3.5.4. Server descriptor section A.1. Appendix: Versioning and alphas 1. Introduction This document is an adjunct to the main Type III (Mixminion) Mix protocol specification in "minion-spec.txt". Whereas the main specification describes formats and algorithms which all compliant Type III mixes and clients users must generate and process, this document describes formats and algorithms for use at the edges of the system: by user software that generates anonymous messages; by user software that generates reply blocks; by exit nodes that deliver Type III messages; and by user software that decrypts encrypted Type III messages. Although it is possible for Type III mixes to support other delivery methods, and possible for clients to encode end-to-end messages in different ways, we _strongly_ encourage new implementations to remain compatible with these methods. Since users receive anonymity from the cover traffic provided by other users, additional methods and options for delivery and encoding come at the expense of decreasing the anonymity provided by the system. 1.1. Terminology * "Packet" - A Type III mix packet. Fixed in size to 32 KB. * "Message" - A variable size end-to-end message, transmitted from one location from another. May be larger or smaller than a single packet. * "Forward message" - A message sent anonymously to a known recipient. * "Direct reply message" - A message sent to an known recipient, with no attempt to maintain sender anonymity. * "Anonymous reply message" - A message sent anonymously to an unknown recipient". * "Forward plaintext message" - A forward message sent in the clear. * "Forward encrypted message" - A forward message encrypted to the public key of its recipient. * "Corrupted packet" - A packet which has had its payload corrupted on the second leg of its path. [Other forms of corruption will result in the packet not being delivered.] 2. End-to-end message encoding This section describes a method for encoding messages of any size into payloads for one or more Type III packets, using compression to reduce bandwidth and K-of-N message fragmentation to improve reliability with large messages. This method limits opportunities for end-to-end traffic analysis by making corrupted packets, packets from forward encrypted messages, and packets from reply messages indistinguishable to anyone besides their intended recipients. It allows forward plaintext messages, however, to be read by exit servers, so that they can deliver them to recipients who aren't running Type III client software. This form of End-to-End encoding is used for the delivery types SMTP and MBOX, and will likely be appropriate for other delivery methods with similar needs. 2.1. Building blocks [XXXX] 2.1.1. Compression We define a compression primitive based on ZLIB, as defined in RFC-1950 and RFC-1951. Because these standards describe only a message format, and do not mandate a single compression algorithm, we must restrict the allowable means of compression further. (If we allowed message encoders to choose among valid ZLIB compression algorithms, they would become partitionable.) Specifically, we define COMPRESS(M) to equal the result of compressing M using zlib 1.1.4 (as available from www.gzip.org) with the following parameters, not forcing any explicit flushes until the message is compressed: Compression level = 9 (maximum) Compression method = DEFLATED (default) Window size = 32K (default) Memory level = 8 (default) Compression strategy = DEFAULT (default) Implementation note: Any software may be used, so long as it gives the same result as zlib 1.1.4 with these parameters. Mark Adler from the Gzip team has averred that this is so with all zlib versions from 1.0.4 through 1.1.4, but *may* change in some future version. We also define DECOMPRESS as the inverse of COMPRESS: namely, ZLIB decompression as described in RFCs 1950 and 1951. Note that DECOMPRESS is not defined for every sequence of octets. 2.1.2. K-of-N fragmentation We define a primitive, FRAGMENT, that breaks a K-packet message into N>K packets, such that any K of those packets are sufficient to reconstruct the original message with high probability. FRAGMENT(M,K,N,i) is the i'th such packet. We also define a primitive RECONSTRUCT(K,N,P_i1, P_i2,...P_iK) that reconstructs the original message. Note that these primitives only need to provide the above property of erasure correction, and need not provide "secret splitting": It is harmless if the message can be reconstructed from less than K packets. Currently, we use the algorithm described by Luigi Rizzo in several papers and implemented in software at http://info.iet.unipi.it/~luigi/fec.html . This algorithm has several advantages: first, it has freely-available implementations in C and Java under a modified-BSD style license. Second, it runs with acceptable performance for modest values of K (less than 30 or so). Third, it does not seem to be patent encumbered. Of course, this algorithm has disadvantages. First, it lacks a complete, byte-level specification beyond that given at the URL above. Second, it performs poorly with very large K. Sadly, it seems to be about the best we can do without touching patented algorithms. To avoid large K, we split the message into chunks, and do K-of-N fragmentation on the chunks. With this scheme, a message can be reconstructed if and only if K packets from each chunk arrive intact. Because packet loss is likely to be nonuniformly concentrated at specific remailers and links, this is not so dangerous to reliability as it might initially appear. [NOTE: I am not happy with this current situation: there needs to be _some_ unpatented probabilistic O(N) algorithm out there! -NM] We divide and fragment as follows: PROCEDURE: Divide a message into packets. [DIVIDE(M,PS,EXF)] ARGUMENTS M: the message to send PS: payload sized (fixed) EXF: expansion factor Let M_SIZE = CEIL(LEN(M) / PS) Let K = Min(16, 2**CEIL(Log2(M_SIZE))) Let NUM_CHUNKS = CEIL(M_SIZE / K) Let M = M | PRNG(Rand(KEY_LEN), Len(M) - NUM_CHUNKS*PS*K) For i from 1 to NUM_CHUNKS: Let CHUNK_i = M[(i-1)*PS*K : i*PS*K] End Let N = Ceil(EXF*K) For i from 0 to NUM_CHUNKS-1: For j from 0 to N-1: FRAGMENTS[i*N+j] = FRAGMENT(CHUNK_i, K, N, j) End loop End loop Return FRAGMENTS. [If we find an unpatented O(N) algorithm, we use it instead of this junk. -NM] Everyone must use the same EXF. The value of EXF is 4/3. 2.1.3. Building blocks: whitening While some fragments of a message are stored, but before the entire message has been received, a window of vulnerability exists on the exit server. To prevent any portion of a message from being read in the clear before enough packets from the message have arrived, apply the following whitening formula to messages before fragmentation: WHITEN(M) = SPRP_Encrypt(K_whiten, "WHITEN", M) UNWHITEN(M) = SPRP_Decrypt(K_whiten, "WHITEN", M) where K_whiten is equal to the octet sequence [57 48 49 54 45 4E]. Note that applying K_whiten and DIVIDE together has the effect of turning DIVIDE from an erasure correcting code into a full secret sharing encoding: If insufficient packets are received to reconstruct the whole message, none of the message can be reconstructed. 2.2. Generating messages When sending a message as packets, a sender follows these steps: 1. Determine the type of the message: encrypted forward, plaintext forward, or reply. 2. Compress and possibly fragment the message into a set of payloads. (The size of the payloads will depend on the type of the message.) 3. Annotate each payload with a payload header. (The payload header includes size, integrity, and fragmentation information.) 4. According to the type of the message, encode each payload into a final 28KB paylaod and (possibly) 20-octet decoding handle. 5. For each payload, select a list of servers to form a path through the network. 6. Using the decoding handle, payload contents, and route for each payload, generate a 32KB type III packet. 7. Deliver each packet This section will describe steps 1, 2, 3, 4. Step 5 is described more fully in "path-spec.txt". Steps 6 and 7 are described in "minion-spec.txt". 2.2.1. Packet format As described in minion-design.txt, every mixminion packet contains a 28KB playload and two 2KB headers. In this design, the routing information for the final hop in the final header contains a 20-octet decoding handle or 'tag'. 2.2.1.1. Decoding handles The decoding handle is used by the recipient to determine how to decode or decrypt the final message. (Because it is part of the header, this decoding handle must be generated by the same entity that creates the second leg of the path: the message sender in the case of forward messages, and the SURB generator in the case of reply messages.) In all types of messages, the decoding handle looks like 159-bit random number, preceeded by a single zero bit. In the various packet types, the decoding handle is used as follows: * Plaintext forward - the handle is unused; its value is random. * Encrypted forward - the handle holds the first 20 octets of the RSA-encrypted session key for this packet. * Reply - the handle (generated by the SURB creator) contains a random seed used to seed the RNG that generated the master secrets for this SURB header. 2.2.1.2. Payload formats The payload of every packet, when decrypted, begins with the one of the two following headers. SINGLE-PACKET-MESSAGE Payload: [Header size is 22 octets] (Single-packet message flag) [1 bit: 0] (Length of packet contents) [15 bits] (Hash of packet contents) [20 octets] The length field encodes the size of the contents of the packet after compression. FRAGMENT Header: [Header size is 47 octets] (Multi-packet message flag) [1 bit: 1] (Packet index) [23 bits] (Hash of remaining fields and packet contents) [20 octets] (Message ID) [20 octets] (Message size) [4 octets] The packet index contains the position of the packet within the FRAGMENTS list generated by DIVIDE. The Message ID is a random 20-octet sequence. The Message Size is the size of the whole message after compression. We define the constants FRAGMENT_HEADER_LEN = 47 and SINGLETON_HEADER_LEN=22. To break a message into packets with headers (steps 2 and 3 in 2.2 above), an implementation follows these steps: PROCEDURE: PACKETIZE_MESSAGE(M, OVERHEAD) ARGUMENTS M: the message to send OVERHEAD: overhead needed for message type Let M_C = COMPRESS(M). If LEN(M_C)+SINGLETON_HEADER_LEN+OVERHEAD <= 28KB : Let PADDING_LEN = 28KB - LEN(M_C) - SINGLETON_HEADER_LEN - OVERHEAD Let PADDING = Rand(PADDING_LEN) Return a singleton payload containing: Flag 0 | Int(15,LEN(M_C)) | Hash(M_C | PADDING) | M_C | PADDING Otherwise: Let FRAGMENTS = DIVIDE(M_C, 28KB-OVERHEAD-FRAGMENT_HEADER_LEN) Let ID = Rand(TAG_LEN) Let SZ = Int(32,Len(M_C)) For every FRAGMENT_i in FRAGMENTS: Let PAYLOAD_i = a fragment payload containing: Flag 1 | Int(23,i) | Hash(ID | SZ | FRAGMENT_i ) | ID | SZ | FRAGMENT_i return every PAYLOAD_i. 2.2.2. Generating plaintext forward messages To send a plaintext forward message M, we first let PAYLOADS = PACKETIZE_MESSAGE(M, 0). For every PAYLOAD_i, we set TAG_i = to a randomly chosen 159-bit integer. We then transmit each payload with its corresponding tag. 2.2.2.1. Plaintext forward messages and fragments We require some additional complexity to minimize the time in which the destination of a plaintext forward fragmented message is decrypted in remailer's storage. Unlike encrypted forward and reply fragmented messages, which are delivered packet-by-packet to their recipients, plaintext forward fragmented messages are reassembled by an exit node before delivery. (We cannot require that their recipients reassemble them, since we assume that the recipient of a plaintext forward message may not be running any anonymity software.) Having exit nodes reassemble fragments introduces two problems: 1) Because the exit node must hold fragments until they can be reassembled, an attacker can attempt to exhaust the exit node's resources by flooding it with fragments for messages that never arrive. 2) Because the recipient of a type III packet is encoded in its final subheader, an attacker who compromises a type III remailer can read the identities of all the users who have pending plaintext forward messages. We specify a solution to 2 here. When generating plaintext forward fragmented messages, the message generator uses a routing type of "FRAGMENT" (0x0103), an empty routing info, and prepends the following fields to the message body before compressing and fragmenting it: RS Routing size 2 octets RT Routing type 2 octets RI Routing info (variable length; RS=Len(RI)) (These fields are as described in minion-spec.txt.) Thus, the destination of a message is not retrievable by the exit node until enough fragments have been received to decrypt the message itself. In this way, we minimize the window in which the identity of a recipient is visible to the exit node. 2.2.3. Generating encrypted forward messages To send an encrypted forward message M to a user with an RSA public key PK with length PKLEN (in octets), we set PAYLOADS = PACKETIZE_MESSAGE(M, PK_OVERHEAD_LEN-TAG_LEN+SPRP_KEY_LEN). (We lose 42 octets to OAEP padding and 20 to encode the session key, but gain 20 by spilling the encrypted data into the decoding tag.) For every payload PAYLOAD_i: Repeat: Let K = Rand(SPRP_KEY_LEN). Let P = K | PAYLOAD_i Let P0 = PK_Encrypt(PK, P[0:PKLEN-PK_OVERHEAD_LEN]) Until the most significant bit of P0[0] is equal to 1. Let P1 = SPRP_Encrypt(K, "END-TO-END ENCRYPT", P[PKLEN-PK_OVERHEAD_LEN: Len(P)-PKLEN-PK_OVERHEAD_LEN]) Let TAG_i = P0[0:TAG_LEN] Let EPAYLOAD_i = P0[TAG_LEN:Len(P0)-TAG_LEN] | P1 We then transmit every payload EPAYLOAD_i with the corresponding tag TAG_i. 2.2.4. Generating reply messages To send a reply message M to an anonymous recipient, we set PAYLOADS = PACKETIZE(M, 0). We send each PAYLOAD_i with a separate SURB_i -- we must have enough to use a different SURB for each message. We do not need to include TAG (decoding handle) fields: they are a part of the SURB. SURB users SHOULD keep track of which SURBs they have used to prevent multiple use, at least until the SURBs have expired. 2.2.5. Generating stateless SURBs In order to avoid storing a set of keys for every outstanding SURB, SURB generators use the following SURB generation procedure. To use this method, SURB generators must store a separate long-term secret for each identity they wish to associate with a chain of SURBs. (Client software MUST support multiple identities, and MUST make it clear to the user which identity has been associated with each incoming SURB. Nickname comparisons SHOULD be done in a case insensitive manner.) To generate a SURB for a path of length PATH_LEN, using a long-term secret SEC: Repeat: Let SEED = a random 159-bit seed. Until Hash(SEED | SEC | "Validate") ends with a 0 octet. Let K = Hash(SEED | SEC | "Generate")[0:KEY_LEN] Let STREAM = PRNG(K, KEY_LEN*(PATH_LEN + 1)) Let SHARED_SECRET = STREAM[PATH_LEN*KEY_LEN:KEY_LEN] For i in 1 .. PATH_LEN Let MS_i = STREAM[(PATH_LEN-i)*KEY_LEN : KEY_LEN] Generate a reply block using MS_i as the master secret for the i'th node in the hop, SEED as the tag, and SHARED_SECRET as the end-to-end shared secret. 2.3. Decoding messages When a Type III mix receives an exit packet, it tries to decode it (if it can) before delivery, and otherwise delivers it undecoded. When a client receives an undecoded exit packet, it tries to decode it before presenting it to the user. 2.3.1. Decoding algorithm Message decoders recognize plaintext (singleton or fragment) payloads by checking whether the hash fields match the calculated hash of the rest of the packet. If a message decoder knows one or more SURB secrets, it then checks the decoding handle 'TAG' to see whether Hash(TAG | SEC | "Validate") ends with a zero octet for any secret SEC. If so, the decoder generates secrets from TAG | SEC as in SURB generation, and successively decrypts the payload with up to MAX_PATH of them, checking each time for a plaintext payload. If no SURB secrets are known, or if no SURB secrets yield a plaintext payload, and the decoder knows one or more secret keys SK_i, it then checks whether PK_Decrypt(SK_i, TAG | P[0:PK_LEN-TAG_LEN]) has valid OAEP padding for some SK_i. If so, it extracts K from the first 20 octets of the decrypted value, and uses K to LIONESS-decrypt the rest of the payload. If none of these approaches works, the decoder has failed. Upon failure, an exit node simply delivers the undecoded message and decoding handle to the message's recipient, in hopes that the recipient will have SEC or SK values. A client, however, marks the message as JUNK. PROCEDURE: DECODE_PLAINTEXT_PAYLOAD ARGUMENTS: P: a payload to decode If the first bit of P[0] is 0: If P[2:HASH_LEN] = Hash(P[2+HASH_LEN:Len(P)-2-HASH_LEN]): SZ = P[0:2] Return "Singleton", P[2+HASH_LEN : SZ] Otherwise, Return "Unknown" Otherwise, if the first bit of P[0] is 1: If P[3:HASH_LEN] = Hash(P[3+HASH_LEN:Len(P)-3-HASH_LEN]): IDX = P[0:3] & 0x7fffff MSG_ID = P[3+HASH_LEN:20] MSG_SZ = P[3+HASH_LEN+20:4] FRAG = P[3+HASH_LEN+20+4:Len(P)-3-HASH_LEN-20-4] Return "Fragment", IDX, MSG_ID, MSG_SZ, FRAG Otherwise: Return "Unkown" PROCEDURE: DECODE_PAYLOAD ARGUMENTS: P: a payload to decode TAG: decoding handle for the payload SK_1 ... SK_n: Optionally, a list of RSA secret keys SEC_1 ... SEC_n: Optionally, a list of SURB secrets. If DECODE_PLAINTEXT_PAYLOAD(P) is not "unknown", return it. For all SEC_i: If H(TAG | SEC_i | "Validate") ends with a zero octet: K = H(TAG | SEC_i | "Generate") STREAM = PRNG(K, MAX_PATH * KEY_LEN) Let P_t = P. For j in 0 ... MAX_PATH-1: Let P_t = SPRP_Encrypt(STREAM[j * KEY_LEN : KEY_LEN], "PAYLOAD_ENCRYPT", P_t) If DECODE_PLAINTEXT_PAYLOAD(P_t) is not "Unknown", return it. For all SK_i: Let E0 = TAG | P[0:Len(SK_i)-TAG_LEN] Let P0 = PK_Decrypt(SK_i, E0). If the OAEP padding is valid: Let K = P0[0:KEY_LEN] Let P0' = P0[KEY_LEN:Len(P0)-KEY_LEN] Let P1 = SPRP_Decrypt(K, "END-TO-END ENCRYPT", P[Len(SK_i)-TAG_LEN : Len(P)-Len(SK_i)+TAG_LEN]) If DECODE_PLAINTEXT_PAYLOAD(P0'|P1) is not "Unknown", return it. Otherwise, return "Unknown". 2.3.2. Overcompressed messages Because zlib allows up to 1000-fold compression, using zlib for message compression creates opportunities for serious mailbombing. When decoding a message, decoders MUST check whether the decompressed size of the message will be "far longer" than the compressed size. In general, if C = COMPRESS(P), and Len(P) > 20K, and Len(P)/Len(Z) > 20, then P SHOULD BE considered overcompressed. Decoders MUST decode incrementally, so that they can notice overcompressed messages without using too much space. Upon encountering an overcompressed message, an exit node MUST mark it as such and deliver it to the user without uncompressing it. Upon encountering an overcompressed message, client software SHOULD alert the user and require explicit confirmation before decompressing the message. 2.3.3. Reconstruction issues When a server receives a plaintext forward packet containing a fragment of a message, it should behave as follows: - If the server does not support reconstruction, or if the hash of the packet is invalid, it drops the packet. - From the declared message size of the message, it calculates the number of packets it expects. If this number is higher than the maximum number of packets the server is willing to construct, it drops the packet. - If the server has already logged this message ID as finished, it drops the packet. - If the server has other packets with this message ID, and those packets have a different message size, it drops all packets with this message ID and logs this message ID as finished. If the server encounters multiple packets with the same message ID and packet index, it drops all packets with that message ID, and marks the message ID as finished. - The server stores this packet pending reconstruction. If enough packets are held for this message ID to reconstruct the message, the server reconstructs the message (as described below), delivers it, deletes all fragment packets for this message ID, and marks the message ID as finished. To prevent denial of service attacks, a server MAY delete old fragments when it is low on disk space. [XXXX is this the best approach?] 3. Delivery This section describes the standard message delivery types provided with the Mixminion Type III mix implementation. Other implementations MAY implement other types in other ways, though they SHOULD avoid adding new types in ways that would exacerbate partitioning attacks. This document *does not* describe routing types or transfer methods used for mix-to-mix communication; see "minion-spec.txt" for those. 3.1. General issues 3.1.1. ASCII armor When encoding an overcompressed or undecodeable Type III message, exit nodes MUST apply OpenPGP ASCII armor, as defined in RFC2440, section 6.2. The header text is "BEGIN TYPE III ANONYMOUS MESSAGE". There are two armor headers: "Message-type" (required) and "Decoding-handle" (optional). The value of "Message-type" must be "encrypted" for an undecodeable message, and "overcompressed" for an overcompressed message. For an undecodeable message, the decoding handle MUST be included, base-64-encoded, as the value of "Decoding-handle". Otherwise, the "Decoding-handle" header MUST be omitted. When encountering a plaintext fragment, if the exit type is not "FRAGMENT", the exit node SHOULD deliver the fragment as-is, with armor described above, and "Message-type" set to "fragment". When encoding a plaintext Type III message, exit nodes MAY apply OpenPGP ASCII armor if the message contains characters other than printing ASCII, and no encoding is specified in the message. When doing so, the "Message-type" header must be "binary". Otherwise, exit nodes MAY format the message with OpenPGP armor headers and dash-escaped text. In this case, the "Message-type" header MUST be "plaintext". [XXXX Right now, there's no way to specify an encoding in a message. Don't worry--you didn't misread. -NM] 3.1.2. RFC822 headers Delivery types that deliver messages via email or news protocols need to support setting limited set of headers from message payloads. Headers can fall in 4 classes: 1. Set by exit node, not by message sender. (Example: "Date") 2. Set in packet header by path generator. (Example: "To") 3. May be set by message sender. (Example: "Subject") 4. May be set partially by message sender. (Example: "From") To encode header values, we use the following message format: MESSAGE ::= HEADERS DATA HEADERS ::= HEADER HEADERS | HEADER_END DATA ::= (any sequence of octets) HEADER ::= HEADER_NAME COLON HEADER_VAL NL HEADER_END ::= NL NL ::= (ascii NL, hex 0A). COLON ::= (ascii ':', hex 3A). HEADER_VAL ::= HEADER_VAL_CHAR HEADER_VAL | HEADER_VAL_CHAR ::= (any character in the range hex 20 through hex 7E inclusive) HEADER_NAME ::= HEADER_NAME_CHAR HEADER_NAME | HEADER_NAME_CHAR HEADER_NAME_CHAR ::= (any character in the range hex 21 through hex 7E inclusive, excluding hex 3A.) Design note: We explicitly decline to implement full RFC[2]822. This would add to the implementation complexity of Type III implementations, and endanger anonymity by allowing nonuniformity among client software packages. Unlike RFC[2]822, clients MUST use only recognized header names, and SHOULD normalize header values by removing leading or trailing space. Unlike RFC[2]822, servers MUST remove unrecognized headers. To prevent distinguishability between clients, headers MUST appear in lexical (alphabetical) order. Servers MUST NOT use out-of-order headers. To help implementations comply with RFC2822, each header MUST NOT be longer than 900 characters. 3.2. MBOX The routing type 0x101 corresponds to MBOX delivery. Conceptually, an MBOX is an internally visible, Type III-only delivery address, specific to a single exit node. 3.2.1. Formatting: Routing information The routing info for an MBOX header MUST contain a 20-octet decoding handle, followed by a variable width MBOX name. Exit nodes MUST drop packets addressed to unknown MBOXes, or packets with malformed routing info fields. The interpretation of the MBOX name is left to the exit node, and will vary between exit nodes. Typically, exit nodes map from 'username' to 'username@localhost' for a limited set of their user names, and deliver messages via sendmail. Exit nodes MAY implement other schemes. 3.2.2. Formatting: Message body Header encoding is as described in 3.1.2 above. The following headers are allowed: "SUBJECT" (any. Must be no more than 900 characters long.) "FROM" (any sequence of printing ASCII characters excluding '"', '[', ']', and ':'. ) "IN-REPLY-TO" (an RFC2822 msg-id) "REFERENCES" (a list of RFC2822 msg-ids) [XXXX Are msg-ids really what we want? Should we say more? Should we restrict encoding? -NM] [XXXX The client should not be allowed to supply IN-REPLY-TO. Instead the exit node should set IN-REPLY-TO to the last msg-id in the REFERENCES header. Also there should be a standard encoding of msgd-ids: for instance msg-ids should be seperated by one space (0x20) exactly. Additionaly there should be a specified way of dealing with references lines that are too long. The standard in news seems to be to keep the first msg-id in the list (the original post that started a thread), and then remove as many of the following msg-ids so that the line fits in the space available. (XXX: need to cross-check that with the usefor drafts). - PP] [XXXX Okay; let me know when you have crosschecked successfully, and when you have an citation for this. -NM] Unrecognized or malformatted headers MUST be removed. 3.2.3. Delivery When delivering an MBOX message via email, an exit node MUST construct an RFC2822 message as follows: The "To" line is the mailbox of the corresponding recipient. The "Subject" line is taken from the contents of the "SUBJECT" header, removing trailing and leading whitespace. OPTIONALLY, implementations may prepend a short marker (e.g., "[ANON]" or "[MBOX]"). If no "Subject" line is provided in the message, exit nodes SHOULD include preconfigured one, such as "Type III Anonymous Message". If the server is configured to allow user-supplied From addresses, the "From" line is generated as follows: Let F be the contents of the "FROM" header in the message. Remove all leading or trailing whitespace from F. Replace all sequences of 2 or more space characters in F with a single space. Prepend a double quote, a preconfigured marker (e.g., "[ANON]"), and a preconfigured exit node mailbox (e.g., ). (Thus, if the sender specifies a "FROM" header of 'Lance Cottrell', an implementation could generate a 'From' header of the form: "From: "[ANON] Lance Cottrell" ".) Otherwise (if the server is not configured to allow user-supplied From addresses) the "From" line is set to a preconfigured value. [Rationale: Previously, I'd argued for having only a single supported "From" policy, as a measure to prevent linkability based on client option preferences. Adam Back correctly pointed out that this is silly. Consider that _any_ use or non-use of From addresses makes messages linkable *in itself*. In other words, Eve can already tell which messages set their from addresses; she gains nothing by learning that those messages have chosen an exit node with From support to do so. -NM] The "Date" line should be the current date. The "In-Reply-To" and "References" lines should be taken verbatim from the corresponding headers, if those headers are present. [XXXX Is this sensible? -NM] [XXXX See my comment about in-reply-to above. -PP] The "X-Anonymous" line SHOULD be present, and set to "yes". [XXXX Do you really think this is a good idea? -PP] [XXXX Yes. There's nothing wrong with letting people who don't want anonymous messages block 'em. (On IRC, PP says that he is worried about large ISPs filtering.) This bears more thought, but I think it's probably a bad idea to deliberately screw over unwilling recipients in order to bypass clueless/draconian sysadmins. -NM] Note again that all unrecognized or misformatted headers MUST be rejected. The payload SHOULD be excaped as described in 3.1.1. 3.2.4. Server descriptor section Servers that support MBOX delivery MAY include a [Delivery/MBOX] section, containing the entry "Version: 1.0". Other servers MUST NOT include a [Delivery/MBOX] section. This section MUST include a "Maximum-size" line, containing the maximum permitted message size in KB (before compression). Note that because of base64-encoding, actual delivered messages may be longer than this by a factor of ~1.33. The value must be at least "32". It MUST contain an "Allow-From" line, containing 'yes' if the server allows user-supplied from addresses and 'no' if it does not. 3.3. SMTP The routing type 0x100 corresponds to SMTP (email) delivery. 3.3.1. Formatting: Routing information The routing information for an SMTP header MUST contain a 20-octet decoding handle, followed by a variable-width mailbox. A mailbox MUST be a list of at least one but no more than eight addresses, separated by NUL characters ([00]). Each address MUST be the "username@host" part of an RFC2821 mailbox. (Using full RFC2822 allows too much distinguishability between senders, and makes blacklisting hard.) A mailbox MUST obey the following format: [XXXX Mixminion through 0.0.6 does not support multiple destination addresses.] MAILBOX ::= LOCALPART AT HOSTPART LOCALPART ::= ATOM | LOCALPART DOT ATOM HOSTPART ::= ATOM | HOSTPART DOT ATOM ATOM ::= ATOMCHAR | ATOM ATOMCHAR ATOMCHAR ::= Any character in the range hex 21 through hex 7E, excluding '[', ']', '(', ')', '<', '>', '@', ',', '.', ';', ':', '\', and '"'. AT ::= '@' (ASCII hex 40) DOT ::= '.' (ASCII hex 2E) Additionally a HOSTPART MUST NOT be an IP address -- it would make blacklisting hard, and encourage senders to resolve target hosts. [XXXX I suspect the above should be "SHOULD NOT." -NM] [XXXX It is possible that an email address is in fact example@[192.186.2.1], so it really should be SHOULD NOT. Also can we reach all possible mail addresses using this strict syntax? Do we care? -PP] [XXXX Using IP addresses does, as noted above, make blacklisting hard. But we should review RFC2821 to see if we care. -NM] Software that allows users to send a message to multiple recipients SHOULD automatically place the recipient mailboxes in lexicographical order, eliminate duplicates, and divide them into groups of eight (with one short group at the end). 3.3.2. Formatting: Message body The message body format is exactly as the MBOX format, as described above in 3.2.2. 3.3.3. Delivery To deliver an SMTP message, an exit node that supports the SMTP delivery type SHOULD construct an RFC2822 message as described in 3.2.3 above, additionally setting the 'To' line to the mailboxes given in the message header. [XXXX Should a separate outgoing message be created for each incoming address? -NM] Implementations SHOULD allow exit node operators to configure additional fields, and to block specific 'To' addresses. 3.3.4. Server descriptor section Servers that support SMTP delivery MAY include a [Delivery/SMTP] section, containing the entry "Version: 1.0". Other servers MUST NOT include a [Delivery/SMTP] section. This section MUST include a "Maximum-size" line, containing the maximum permitted message size in KB (before compression). Note that because of base64-encoding, actual delivered messages may be longer than this by a factor of ~1.33. The value must be at least "32". A server MAY drop any message that uncompresses to be longer than this type. It MUST contain an "Allow-From" line, containing 'yes' if the server allows user-supplied from addresses and 'no' if it does not. 3.4. Fragments 3.4.1. Server descriptor section When a server supports message reconstruction, it MAY include a "[Delivery/Fragmented]" section as described here. Other servers MUST NOT include a "[Delivery/Fragmented]" section. The section, if present, MUST contain a 'Version' entry, with the value "1.0". It also MUST contain a "Maximum-Fragments" line, containing the maximum size (in fragments) of a message that the server is willing to reconstruct. 3.5. News [XXXX expand this from notes.] 3.5.1. Formatting: routing information [RI must contain 1-3 newsgroups, 0-8 mailboxes, and a subject.] 3.5.2. Formatting: message body [Headers are followup-to, reply-to, references, from, (in-reply-to?), x-no-archive, (messageid?) ] 3.5.3. Delivery 3.5.4. Server descriptor section A.1. Apendix: versioning and alphas Today's alpha code does not publish its version as '1.0'; it uses '0.x' instead (currently '0.1' for all versions in this document). Production versions MUST NOT retain backward compatibility with pre-production releases. A.2. Appendix: storing client secrets The following describes the format used by the Mixminion reference software to store SURB keys. Other software MAY use this format. Clients that do so MUST implement it as described here. [Rationale: earlier, we specified a standard export format for SURB secrets. Exporting secret keys, however, is _bad_: once the ability exists, the secret keys tend to get sucked out and re-stored---often less securely---by other applications. Peter Gutmann describes several instances of this somewhere, and I'm not the kind of guy to argue with Peter Gutmann. -NM] [XXX This format is supported by Mixminion 0.0.6 and later: earlier versions of the software use a more Python-specific format that you really shouldn't try to read.] First, the keyring itself is stored with RFC2440-style ASCII armor, with header text "BEGIN TYPE III KEYRING" and an armor header "Version" with a value "0.1". The contents to be encoded are: magic [8 octets] format type [1 octet == 0] salt [8 octets] encdata [variable] Where 'magic' is "KEYRING2" [ 4B 45 59 52 49 4E 47 32 ], 'salt' is a randomly chosen octet sequence, and 'encdata' is computed from the actual identity data 'data' and a user-selected password 'password' as follows: Let padding = Rand(1024*CEIL(LEN(data)/1024) - LEN(data)) Let data' = Int(32, LEN(data)) | data | padding Let hash = H(data' | salt | magic) Let key = H(salt | password | salt)[0:KEY_LEN] Let encdata = Encrypt(key, data' | hash) The format of the actual data is as follows: KeyData ::= Item * Item ::= ItemType [1 octet] ItemLen [2 octets] ItemVal [ItemLen octets] Implementations MUST skip over items with unrecognized types, and preserve them when modifying the keyring. Implementations MUST NOT depend on any order of items within the keyring. SURB keys have the following format: SURBKeyType [00] SURBKeyLen [2 octets] SURBKeyExpires [4 octets] SURBKeyName [Variable; NUL-terminated] SURBKeySecret [Variable] SURBKeyType and SURBKeyLen confirm to the fields ItemType and ItemLen as described above. SURBKeyExpires is a 4-octet timestamp (rounded to the nearest midnight GMT), after which the key should be removed from the keyring. SURBKeyName is a NUL-terminated name for this identity, in lowercase. SURBKeySecret is the master secret for this SURB identity -- it should be at least 20 octets long. In order to implement key rotation, multiple SURB keys may exist for the same identity. Clients SHOULD always generate SURBs using the latest-expiring key, and SHOULD accept reply messages using all unexpired keys. Client software SHOULD generate a new key for an identity whenever they are generating a SURB, and newest existing key for that identity would expire before the software expects to receive messages sent using that SURB.