A method of free speech on the Internet: random pads

A plea

Please help me make this page easier to understand. I know that I am not expressing myself adequately. I appreciate any suggestions or comments that would improve this document.

I also would like your help turning this idea into a standard. Link this page from your own web pages, or copy it, and spread the idea of free communication through random pads around you. If you are adventurous, make your own pads, or mirror other people's pads.

Copyright © 2000 by David A. Madore (david.madore@ens.fr).

This document can be distributed under the terms of the GNU General Public License.

The original version of this document is http://www.eleves.ens.fr:8080/home/madore/misc/freespeech.html.

What is this all about?

The Internet is a wonderful place for free speech. However, this free speech is sometimes actively suppressed by various factors. For example, some people want to prevent various kinds of information from spreading on the Internet. In some cases this is quite moral. I suppose it is understandable to try to forbid terrorists from using the Internet as a medium for organizing their unethical actions. In most cases, however, it is an attempt at censorship.

Now it is a fact that all such attempts must fail. We can rejoice or be chagrined, but there is no sense in trying to forbid information from being spread, because the very nature of information makes it completely intractable and unlocalized. The suggestion outlined in this page makes this blatant. It makes it possible to spread any kind of information in any way and in a completely undetectable way.

Let me make this clear: I am not completely persuaded that this possibility of free speech is entirely a good thing. If abused, it may even be a dangerous and terrible thing. But it exists, and it makes no sense to suppress it. It can be used for good, and it can be used for evil. Please stick to the former. Thank you.

Disclaimer: In case I have not been clear enough: I am opposed to any criminal actions. I do not wish the technique described here to be used to such ends. I firmly condemn them.

The principle

(Read this carefully. It may not seem to make sense at first, if you are not used to this kind of methods. But it is, in fact, all very trivial.)

The principle is very simple. People distribute samples of random data. We call these samples pads. A pad is a file containing random bits, completely indistinguishable from white noise, and of a fixed length. I propose a standard length of 128kB (131072 bytes) for each pad.

Each pad should be given a name such as pad-md5-d41d8cd98f00b204e9800998ecf8427e.dat, where the 32 hexadecimal digits following the prefix pad-md5- are the MD5 fingerprint of the pad's data. This is so the pads can be recognized. This naming convention guarantees that in practice we don't risk a collision. It is essential that there be nothing in a pad to tell when it was created (no date information). (A former version of this document suggested using the first 8 bytes of the pad's actual data. Now this is deprecated.)

If you need a tool to compute an MD5 fingerprint, you can find one on my FTP site (but note that many systems come with such an md5sum tool in standard).

Pads should be mirrored as much as possible around the Internet. However, no single site should ever mirror all the pads — nor a too large fraction of them.

Each pad by itself is completely without value. It is a mere hunk of random data. However, if you combine several pads together by XORing them, you can recover some data that are hidden in the pads. The information is in no single pad, but it is somehow delocalized in all the pads together.

The point is that if a suppressed piece of information can be recovered from combining n pads (and no less), no single person is distributing anything of value since all he is doing is givint out a sample of random data. Indeed, most of them might not even be aware that their pad can be used to produce the information in question.

Again, each pad is truly mathematically indistinguishable from random noise, and it is completely impossible, in the absence of date information, to know who put the data in a set of pads.

How does it work?

To produce a pad, just produce 128 kilobytes of random data and store them in a file that is named according to its contents. Under the GNU/Linux operating system, this is done as follows:

dd if=/dev/urandom of=pad.dat bs=1k count=128
mv pad.dat pad-md5-`md5sum pad.dat | cut -c 1-32`.dat

If you wish to store some information in a set of pads, you must choose a certain number of pads that other people have produced, and which are stored on different sites (please do not choose several pads from a single site). Choose about five such pads (three is an absolute minimum, unless the data you are storing is really innocuous, and seven is a maximum, beyond which retrieving the data will be too much of a pain). Then take the data you wish to store (it must be at most 128 kilobytes), and XOR it with all the pads you have selected (using the Perl script given below). This will give you a new pad: it is also made of completely random data, but XORing it together with the pads you have selected will give back the hidden data, padded (pun unintended) with zeroes.

You must name this new pad with the same convention used to name randomly generated pads (see above). You must make sure it is completely indistiguishable from such code. If it can be proved that your pad was generated after the other ones, you lose. It is up to you to find ways to arrange for this to be practically impossible. Some suggestions follow:

At some point, you or someone else must release the information that such a set of pads, when XORed together, produce such hidden data. It is best if the person disclosing this information is not distributing any pads. The pads, of course, are just named by their 32-hex-digit MD5.

Now suppose you want to recover some data. Your first task is to locate an announcement stating that the data you want are recoverable by XORing such a set of pads. Then you must locate the pads in question by their numbers. They will be sitting on different pad repositories, otherwise the security of the data would be questionable. Perhaps someone could implement a pad search engine or a maintain list of pad repositories. Anyway, once you have all the pads, you simply XOR them together using the Perl script given below, and you get the data.

Provably innocent pads

A provably innocent pad is one whose contents are produced not truly randomly but by some method producing seemingly random data, but which in fact can be easily described. Here are some examples:

Producing a provably innocent pad is less risky than producing a truly random one, because you always have the option of proving your innocence by showing how the pad was generated (which you do not have in the case of a truly random pad, since in the latter case you cannot prove that your pad was truly random).

Extra notes

I am not a lawyer. The law is so fundamentally perverse that it might end up deciding that it is illegal to distribute a 128-kilobyte-long block of random data.

If you distribute illegal information, regardless of the method used to hide the information, you may be caught. There is a fundamental difference between wanting freedom of speech and wanting to break the law.

That being said, consider the following. Suppose something illegal has been distribued using a set of pads. That is, the XOR of a certain number of pads gives a file whose distribution is considered illegal.

Then what? Can anyone be convincted? The people distributing the pads might not know about the data in the first place. In fact, most of them have merely uploaded a block of random data with a funny name: can this truly be considered a crime? (Remember also that some of the pads might be provably innocent, although they are not immediately revealed as such. Certainly it is not a crime to distribute an encrypted version of Homer's Odyssey.)

Nor can it be claimed that this whole pad system is devised for illegal purposes because this is not the case. The point of this system is to promote free speech on the Internet, nothing else.

Nor can anyone order every pad making up the set be removed or destroyed, because every one of them might be used in some other XOR operation to produce a completely innocuous piece of text. So the freedom of speech prevents from issuing that injunction.

A reply on Slashdot

This site has been mentioned by an article on Slashdot (2000/06/18). Many comments have been posted and many people have sent me emails with various suggestions. As I can't answer them individually, I am publishing the following response, which I also posted on Slashdot.

Please note that this was written in a hurry, so it is probably even more lousy than the rest of this page.

Hi. I'm the author of the page in question, and victim unaware of the Slashdot effect (well, not truly unaware: Erik Moeller, who posted the story, was kind to notify me in time). I received many emails about it, which I've all read, as well as a good many posts in the current discussion. I can't possibly reply to them all, but I'll try to answer some of the most frequent or important comments here.

First note that the page was written in february (2000/02/19 to 2000/02/23 to be precise), so it is not new. However, I do not claim any kind of originality, nor paternity of the idea: it is a small variation on the protocol described in section 6.3 ("Anonymous Message Broadcast") of Bruce Schneier's book on cryptography. In any case, I think it is pretty obvious in the first place. I am merely suggesting a few practical ideas to make it workable. There is nothing great or revolutionary about anything, and I never made that claim.

One thing should be made clear from the start: the whole idea is not about obscuring what the data is (i.e. it is not strictly speaking cryptography) but about who is sending the data. And, even more specifically, it is about making legal conviction impossible so long as the presumption of innocence is maintained (whether the presumption of innocence still means anything in these dark days is another question :-/ ); thus, it is normal that the story appeared on Slashdot's "Your Rights Online" section.

Please also note that I am not making a political statement. This is not a libertarian manifesto. I am not stating that you should use this system to send out assassination messages against the President / the Prime Minister / the King / the Pope / <insert your favorite assassination victim here>; I am merely stating that you can, and that this is none of my business.

Many have pointed out that my suggested way of naming pads is bad. That's true: using the MD5 (or SHA1 or any other kind of hash) signature would be a better idea. But it doesn't really matter all that much what the pads are named unless we want the system to be resistant to malicious tampering, which was not one of my avowed goals. Indeed, we can get this almost for free, so we might as well. Let's say we could have a symlink pointing from pad_md5_whatever.dat to the pad of the given md5 for each pad in each repository, and "combination recipes" could be given with these links so as to make them resistant to tampering.

Similarly for secret sharing: my idea was not to have a system which is hard to censor (there are other, far better, solutions for this), but to have one which is hard to track.

Another thing I should make quite clear is that the system in itself is not used to hide data: it is used to hide the origin of data. This is why all comments on the "OTP is secure as long as the pad is truly one-time" line, or all remarks to the effect that it is trivial to find all relevant data among the padset, are quite true but completely irrelevant. If you want to hide the data on top of hiding the origin, then you use a traditional cipher; for example, you encrypt your data using blowfish and you use that data (the ciphertext, which for all intents and purposes is random) as input to the pad system. So long as you don't release the key, nobody can tell that there's a blowfish-encrypted data hidden in the pad system. The two are completely orthogonal. (It is true that my remark about the difficulty of finding "recognizable data" in the pad system is very misleading and irrelevant. I should remove that: never mind that part.) As for my comment about the birthday effect, it is merely about accidental collisions, not at all about malicious action.

Somebody asks what is wrong with storing all pads in the same place since anyone can download them all. That is true, but that is beside the point. The point is that as long as a site does not have a complete set of pads yielding readable data, it is not, by iself, breaking any law, and all it is distributing is white noise; whereas if it stores one complete set of pads, then it is distributing the forbidden document in some form. Naturally, if someone wants to collect a complete set of pads, it is a good idea; but to distribute it is dangerous.

Finally, there is the central question of whether the legal argument (which is the crux of the matter) holds water. Presumably it doesn't, but that will at leas prove one thing: the argument shows that any kind of law restricting free speech contradicts the presumption of innocence. Some have pointed out that one could monitor the pad system, and the last pad published in a set of pads would always be the culprit: this is not true, because it might have been delayed, or it might be provably innocent (which implies the former, actually), and you can never quite be sure.

Imagine the following scenario: someone points out on some Usenet group that eight publically available pads, when XORed together, give something like DeCSS code. Judge summons the 'someone' in question, who claims that he just noticed that by randomly XORing pads together; not unconvincing, so judge lets the guy go. Then judge summons the pad owners. Starts with the most recently published pad: but the owner explains "look, my pad is just an encryption using the key 'foobar' of the first 128kB of (some standard transcription of) Shakespeare's Tempest; the idea had been floating around for some time, I just decided to publish it". Judge checks statement: it's true. So apparently the data was "published" earlier than was thought, it just took some time to come out; that makes things rather difficult to track. Second owner similarly points out that his pad is just a sequence of decimals of pi in binary. Third owner is in a country over which judge has no jurisdiction, so nothing to do there. Fourth and fifth owners seem to have created their pads at the very same time, and both state obstinately that they generated pure white noise (following, say, a story on Slashdot about pads being a great idea). Sixth owner says he generated his pad by XORing another dozen other pads with an innocent message (which he shows to judge). Seventh owner refuses to answer judge's question. Eighth owner posted his pad before DeCSS even appeared, so must be innocent (or really?). Now what does judge do? Convict some owners? All? None? Problem is, judge is impressed with first poster's proof, and can't run the risk of convicting someone who might afterward prove that his pad was innocent. Presumption of innocence. Even if judge merely issues an injunction that the pads be taken off the network, every owner appeals on the ground that the pads were reused in making some other messages (innocuous ones) and that removing them would be a serious breach of first amendment (or whatever you call this thing about free speech).

Anyhow, this is the summary: there's nothing new or revolutionary about the whole pad system; in fact, it's pretty trivial. But it does make one point: that information is fundamentally delocalized and that any attempt to pinpoint it or to find a culprit will fail. For the better or for the worse.

The Perl script

The following Perl script, which you can also get by FTP, will take the files given on the command line, XOR them together, and send the result of the XOR on the standard output:

#! /usr/local/bin/perl
use IO::File;
for ( $k=0 ; $k<=$#ARGV ; $k++ ) {
    $F[$k] = new IO::File "<$ARGV[$k]"
	or die "Can't open $ARGV[$k]\n";
}
for ( $i=0 ; $i<128*1024 ; $i++ ) {
    $byte = 0;
    for ( $k=0 ; $k<=$#ARGV ; $k++ ) {
	$b = "\000" unless ((read ($F[$k], $b, 1)) or 0)==1;
	$byte ^= unpack "C", $b;
    }
    print (pack "C", $byte);
}

I am told that this script does not work correctly in Microsoft Windows. I have no further information about this fact.

Actually, the following script written by Sven Neuhaus (neuhaus@scruznet.com) is much faster:

#!/usr/local/bin/perl -w
# much faster version of xorpad.pl
# This program is in the public domain.

use strict;
use constant BUFFER_SIZE => 4096;

my %files;

foreach my $file (@ARGV) {
	local *FH;
	open(FH, "<$file") or die "Can't open $file: $!\n";
	$files{$file} = *FH;
}
for (my $i=0; $i < 128*1024/BUFFER_SIZE; $i++) {
	my $outbuf;
	foreach my $file (keys %files) {
		my $buffer;
		my $count = read($files{$file}, $buffer, BUFFER_SIZE);
		die "Error reading from $file: $!\n" unless defined $count;
		$buffer .= "\000" x (BUFFER_SIZE - $count)
			if $count < BUFFER_SIZE; # padding
		$outbuf ^= $buffer;
	}
	print $outbuf;
}

exit 0;
#eof. This file has not been truncate

You can download it by FTP.

Contributed code

Marcel Popescu (marcel@aiurea.com) has written a program in Delphi, that runs under Microsoft Windows, that will let you generate random pads, or XOR some pads together (to hide data or to retreive hidden data, just like the previous Perl Script). You can download this program from this FTP directory. The author has put the program in the Public Domain and claims no copyright over it.

I have not been able to test this program, as I do not have MS Windows. Two days ago (2000/03/05), he gave me a new version (it is the new version which is accessible with the links above), which corrects a weakness in the previous version (the random number algorithm used to generate the pads was not cryptographically secure).

Rob Ostensen (rob@txcyber.com) wrote a simple pad generation script in Perl (it seems that it does not use the new MD5 naming convention, so, uh, perhaps it is best not to use it). See also this page. An anonymous person also sent me a little shell script that does something similar.

xercist (xercist@lammah.com) wrote a C program to XOR pads together.

Christopher T. Johnson (cjohnson@camelot.com) wrote a C program under the GPL to create random pads.

Big Attic House put a program to combine pads together on their Delphi page.

Michael Orlov (orlovm@cs.bgu.ac.il) wrote a servlet that will let you easily combine several pads and download the result. The source code to this servlet is also available (local mirror here).

Sample pads

I have generated three sample pads. They are available (together with the perl script) in this FTP directory. Here are their codes:

The XOR of these three pads will produce a readable text (this is contrary to the principles I have stated, according to which only should be used the XOR of a set of pads no two of which are on the same site, but since the text is innocuous, this is not a problem). I encourage you to check this, so as to make sure you have understood the principles. Notice that you cannot tell which two pads were randomly generated and which one was obtained by XORing the two others with the message (I myself do not know it any more).

Known pad repositories

Here is the list of known pad repositories so far. In order to minimize information on the date of creation, I am listing them in alphabetical order of URL:

If you set up a pad repository of your own, whether to mirror an existing one, or to come up with your own set of pads, please tell me about it, so I can mention it here. Maintaining a pad repository is a pretty easy process. The one important point to remember is not to let out any hint as to the date of creation of the pads you store. Remember that even your HTTP server might give some information as to that without your knowing about it: so please at least use the unix touch utility on your pads (e.g. touch -m 197004011428.57 pad*.dat) to avoid this.

Generate a lot of random pads. And please do not start storing data in the pad system until it has reached a decent size.

Of related interest is the Freenet project, which uses a java client/server program to establish a distributed and decentralized data network on which to store information, so that it would be very hard to censor. It might be possible to combine the Freenet project with the pads system, that is, store some pads on Freenet, to guarantee at once both anonymity of free speech and security against removal attempts.

The Publius project is vastly more ambitious than my simple pad suggestion. It suggests using a true secret-sharing mechanism rather than simple XOR. On the subject of secret sharing, you might want to look at the secret sharing program (instructions for use are included in the source itself) that is mentioned on my programs page.

Thanks

Thanks to Jon Robertson (touri@pobox.com) for having read through this page and made suggestions for improvements. Thanks also to Erik Moeller (moeller@scireview.de) for having written the Slashdot article about this page. Thanks to all those who wrote to me (and sorry if I didn't have time to write back to each individually).


David Madore

Last modified: $Date: 2002/09/02 21:33:14 $