$Id: HACKING,v 1.26 2004/05/10 16:46:21 nickm Exp $ Hacking Mixminion HOW TO CONTRIBUTE PATCHES: - Send them to the list mixminion-dev@freehaven.net, or to me (nickm@freehaven.net). - Include a statement saying that you give permission for me to redistribute your work under any FSF-approved license Mixminion may use in the future, including (but not limited to) LGPL, GPL, modified (3-clause) BSD, X11, or MIT. System requirements: Python >= 2.0 (see PORTING NOTES below) OpenSSL 0.9.7 (If you don't have it, the Makefile knows how to download it for you.) NON-HACKING PROJECTS: - We need documentation. - See the spec for open issues. - We need a logo. (An alien with an eggbeater or an alien at a DJ's mixer are two ideas.) THINGS TO HACK: (Good introductory projects that I won't get to myself for at least a version or two.) - We need some distributed stress-test code. Ideally, it should use SSH to build Mixminion on a number of machines and start servers on those machines, then run a bunch of clients to send messages through the system. It should be possible to run the code with different network and server configurations. (Difficulty: moderate. Invasiveness: none.) - The current implementations for the MBOX and SMTP modules open a new connection to the local MTA for each outgoing message. This is inefficient; they should batch as much as possible. (Difficulty: easy. Invasiveness: slight.) - The current implementation for the MBOX and SMTP modules do not support ESMTP (over TLS). They should. (Difficulty: easy. Invasiveness: slight.) - In addition to the stress-test code above, we should have some automated integration test code to start many servers with different configurations on the same server, and test them with calls to the client code. This can probably share a good deal of logic with the stress-test code. A nice extra would be to allow testing multiple versions of Mixminion with multiple versions of the client, and have multihost support. (Difficulty: moderate. Invasiveness: none.) - It would be neat if all the boilerplate that servers spit out were configurable via some kind of generic boilerplate mechanism, and stored in separate files. (Difficulty: easy. Invasiveness: slight.) - We could use some way to cap total disk usage, and handle full disks sanely. (Difficulty: moderate. Invasiveness: moderate.) - If you have access to a multiprocessor machine, it would be nice to make good use of more than one CPU. Right now, the network code runs in parallel with the processing code, but the processing thread accounts (I think) for most of the CPU use. It would be nice to support multiple processing threads and multiple network threads (round-robin, not one-per-connection). (Difficulty: easy/moderate, depending on your knowledge of writing multithreaded code. Invasiveness: slight.) - We have a specification for multiple exit addresses, but it needs to be implemented... (Difficulty: easy. Invasiveness: slight.) THINGS TO THINK ABOUT AND HACK: (Introductory projects that will take some specification work. Please, get your spec discussed on mixminion-dev and checked into CVS *before* you submit an implementation for any of these! [See spec-issues.txt for more open issues].) - We should have an incoming email gateway for users to use reply blocks to send messages anonymously without using Mixminion software. (Spec difficulty: easy. Implementation difficulty: moderate. Invasiveness: slight.) - Want a real challenge? We have an allusive description for how to do K-of-N fragmentation in our E2E-spec documentation. Go flesh out the description. (Spec difficulty: moderate. Implementation difficulty: already implemented.) - Right now, we never generate link padding or dummy messages. The code is there, but it never gets triggered. Specify when it gets triggered (and justify why this improves anonymity). (Spec difficulty: ????. Implementation difficulty: easy once you know how. Invasiveness: some.) - We could use IPv6 support. The big specification problem here is routing: an IPv4-only server simply cannot deliver to a server without an IPv4 address. Any path-generation algorithm I can come up with has troublesome anonymity implications. If you can come up with an algorithm that doesn't, the code should be pretty easy to do. (Spec difficulty: ????. Implementation difficulty: easy. Invasiveness: some.) HARD HACKING: - Do you want to dive into Python and OpenSSL internals? Right now, we allow all our data to get swapped out to disk. That's no good! Now, you _could_ install an OS that encrypts your swap, but that's not an option for everybody. Write code to use memlock to protect sensitive data structures (keys, packets, etc) as necessary. [If you're feeling unsubtle, use memlockall to keep *all* data from getting swapped out... but watch out for ticked-off admins!] (Difficulty: easy/hard depending on whether you take the easy way out with memlockall, or whether you actually do the right thing. Invasiveness: all over the place.) - Want a real challenge? Right now, we store sensitive files right on the file system, and do our best to overwrite them when they need to be deleted... but go read the comment in Common.py to see why this doesn't really work. We *could* have people use encrypted filesystems.... but that's not an option for everyone. Here's what you do: Write code for a generic encrypted filestore, and have Mixminion use that as appropriate instead of the filesystem directly. This will be easier than writing a real encrypted filesystem, since: - All of the files are the same size (within a few K). - There are no hard or soft links - There are no attributes: no ownership, no modes, no special files, no atime/mtime/ctime... just name and size. - There is no arbitrary nesting of directories; the set of directories is very small. - No more than one process needs to be able to access the filestore at a time. (Though multiple threads might need to.) Your code should probably be generic enough for other Python projects to use. :) (Difficulty: hard. Invasiveness: moderate.) - Port the code to use NSS or libgcrypt/GNUTLS instead of/in addition to OpenSSL. (The OpenSSL license conflicts with the GPL, and makes us unlinkable with GPL'd code under some circumstances.) Note that NSS does not, today, have any support for server-side DHE; if you're going to go down the NSS route, you should contribute an implementation for server-side DHE. (Difficulty: hard. Invasiveness: moderate.) - We should eventually port to Twisted, which looks to be an incredibly cool and featureful networking platform written in Python. Issues: - I don't want to switch Twisted's favored crypto library: PyOpenSSL. This library, however cool, lacks a lot of symmetric cipher functionality I need, and I really don't want to carry _two_ crypto library dependencies. This means that we need to either bring PyOpenSSL up to date, or change Twisted's SSL wrapper to speak _minionlib's admittedly limited OpenSSL dialect. - Twisted has functionality for a few things that Mixminion's libraries currently duplicate, such as logging. We'll need to integrate these. - I'd rather keep the changes one-by-one, and not have to port the entire codebase in one step. First, we'd change the async loop, then change more and more of the other IO-intensive stuff to use it. - The install process MUST NOT get hard! Requiring users to install Twisted beforehand could get tricky. We'll need to either make the Twisted install project learn all of our Python-Python-Who's-Got-The- Python trickery, or add a 'download and install twisted' target. Benefits: - Move server HTTP activity into async loop. - Freebies like windows portability, bandwidth throttling, etc. (Investigate more) (Difficulty: moderate. Invasiveness: moderate.) - After porting to GnuTLS or NSS, separate out all of the C code to its own process! Feel free to refactor the code while you're at it. - We could really use a nymserver. DESIGN PRINCIPLES: - It's not done till it's documented. - It's not done till it's tested. - Don't build general-purpose functionality. Only build the operations you need. - "Premature optimization is the root of all evil." -Knuth Resist the temptation to optimize until it becomes a necessity. CODING STYLE: - See PEP-0008: "Style Guide For Python Code". I believe in most of it. (http://www.python.org/peps/pep-0008.html) - Also see PEP-0257 for documentation; we're not there yet, but it's happening. (http://www.python.org/peps/pep-0257.html) - Magic strings: "XXXX" indicates a defect in the code. "FFFF" indicates a missing feature. "????" indicates an untested or iffy block. "DOCDOC" indicates missing documentation. PORTABILITY NOTES: - I've already backported to Python 2.0. (I refuse to backport to 1.5 or 1.6.) - Right now, we're dependent on OpenSSL. OpenSSL's license has an old-style BSD license that isn't compatible with the GPL. We have two other options, it seems: - libnss: this is a dual-license GPL/MPL library from Mozilla. Sadly, we can't use it now, because it doesn't yet support server-side DHE. Bugzilla says that server-side DHE is targeted for 3.5. Perhaps then we can port, but I wouldn't hold my breath. - gnutls/libgcrypt: These are the GNU offerings; the relevant portions of each are licensed under the LGPL. They don't support OAEP, but we've already got an implementation of that in Python. So for now, it's OpenSSL. I'll accept any patches that make us run under gnutls/libgcrypt as well, but I think in the long term we should migrate to libnss entirely. CAVEATS: - If I haven't got a test for it in tests.py, assume it doesn't work. - The code isn't threadsafe. It will become so only if it must. FINDING YOUR WAY AROUND THE CODE. - All the C code is in src/. Right now, this is just a set of thin Python wrappers for the OpenSSL functionality we use. - The Python code lives in lib/mixminion (for client and shared code) and lib/mixminion/server (for server code only). The main loop is in lib/mixminion/server/ServerMain; the asynchronous server code is in lib/mixminion/server/MMTPServer. - A message coming in to a Mixminion server takes the following path: Received in MMTPServer; V Stored in an 'Incoming' queue, implemented in ServerMain and Queue. V Validated, decrypted, padded by code in PacketHandler. V Stored in a 'Mix' pool, implemented in ServerMain and Queue. V A batch of messages is selected by the 'Mix pool' for delivery. V ---Some are placed in an 'Outgoing' queue... I V I ... and delivered to other Mixminion servers via MMTPServer V ---Others are queued and delivered by other exit methods, implemented and selected by ModuleManager. --Nick (for emacs) Local Variables: mode:text indent-tabs-mode:nil fill-column:77 End: