David Madore's WebLog: More Linux woes

Index of all entries / Index de toutes les entréesXML (RSS 1.0) • Recent comments / Commentaires récents

↓Entry #0654 [older| permalink|newer] / ↓Entrée #0654 [précédente| permalien|suivante] ↓


More Linux woes

I don't know how (or even exactly when) it started, but my computer has this strange problem which causes it to unpredictably replace certain parts of random files by null bytes (always a multiple of 256 bytes, it seems, and aligned at such multiples; typically around 2 kilobytes in a given file). As one can expect, this causes all sorts of horrendous difficulties, and it tends to be pretty damn hard to find out where the problem lies (even knowing that this behavior occurs, finding exactly which file has been altered to cause a given malfunction is not an easy task).

So far none of my personal data has been affected, it seems—only various system files, which I have been able to recover. But the nagging doubt is always present: what if one of my important files gets corrupted and I don't notice it and make backups of it in various places, and really end up screwing everything? I'd like to have my peace of mind back.

The trouble is that I have no idea what causes the problem. It's probably not a hardware flaw: I have good reasons to believe that memory, CPU and hard drives are sane. I suspect a bug in the Linux kernel, in the ReiserFS layer, perhaps occurring only in SMP boxen, and perhaps starting only with the 2.6.6 or 2.6.5 version. But the bug has proven remarkably elusive: I tried all sorts of intensive stress-testing on the filesystem (creating a small number of large files, a large number of small files, simultaneously writing and reading, and all sorts of variants, with RC4 streams), and found no way to reproduce the corruption in vitro if I may say. So I can't write any kind of bug report that would be of any use, and I don't know which Linux version I should downgrade to (or even whether the problem is really in the kernel and not, for example, some obscure part of the C library).

I'm rather annoyed at this, but I really don't know what to do. If I had just a little more knowledge about the problem I could post on the linux-kernel mailing-list, but as things are this would be pretty useless.

↑Entry #0654 [older| permalink|newer] / ↑Entrée #0654 [précédente| permalien|suivante] ↑

Recent entries / Entrées récentesIndex of all entries / Index de toutes les entrées