How to mirror this site

I'm putting up this page because I noticed that a few people have been mirroring my site; notably using wget. I have nothing against that (as long as the robots.txt file is read and obeyed — which wget does), but it is probably not very convenient and almost certainly resource-wasteful.

Consequently, I have put up an rsync server to simplify the mirroring. It serves all the files on my web site, and it is accessible at rsync://www.eleves.ens.fr:8873/www/.

Quite probably, your web browser does not understand rsync://-type URLs (I know of none that does, and they are not even an official standard). However, you can download the rsync program from rsync.samba.org, and use it to make sense of the above URL. The advantage of using rsync over wget is that it is smarter, faster and more bandwidth-efficient; furthermore, it will preserve modification times and file access modes verbatim (if asked to).

Once you have rsync installed, you can use it to mirror this site by doing

rsync -avcz --delete rsync://www.eleves.ens.fr:8873/www/ /some/directory
where /some/directory is the place where you want the mirror made. Note that if you want to mirror only a subset of this site, e.g. the computers directory, you can do that with
rsync -avcz --delete rsync://www.eleves.ens.fr:8873/www/computers/ /some/directory
For more information, please consult the rsync documentation.

Please note, once more, that I have nothing against the use of wget: I merely provide the rsync method as a convenience. However, please do not use a program of your own invention to perform a recursive web-suck: this is very dangerous if you have not properly studied all the relevant documentation or if you do not follow the standards to the letter, and it can cause all sorts of problems like log flooding on the server side, denial of service attacks and so on. If you have any questions, please do not hesitate to contact me. Also do not hesitate to contact me if you wish to learn how to set up an rsync server of your own.

Lastly, if you plan to mirror this site, especially of you wish to make the mirror publicly accessible, please tell me about it. It's just that I like to know about such things.

Thank you for your attention.


David Madore
Last modified: $Date: 2002/06/17 22:41:22 $