David Madore's WebLog: ECC errors → mystery solved; watchdog timer

[Index of all entries / Index de toutes les entréesLatest entries / Dernières entréesXML (RSS 1.0) • Recent comments / Commentaires récents]

↓Entry #1490 [older| permalink|newer] / ↓Entrée #1490 [précédente| permalien|suivante] ↓

(Tuesday)

ECC errors → mystery solved; watchdog timer

And now for something completely different equally geeky: more about ECC RAM.

First, I'm told that what I called memory sticks should properly be referred to as memory modules (memory sticks are, indeed, something rather different). Well screw the Linux source for confusing me: I should have trusted my feelings in this respect. But no matter.

The mystery of my phony ECC errors has been solved and, of course, the answer is ridiculously simple: my BIOS was in a quick boot mode in which it does not initialize RAM at startup, so there was bogus ECC data causing numerous errors as long as the regions in question were not written to. (Part of the mystery remains, however: if these regions of RAM had never been written to, why ever were they being read? They couldn't possibly contain anything useful… I suspect one of two things: either entire pages were being read, because that's simpler, or else a memory bus read operation always addresses a certain quantity (perhaps 128 bytes) which is greater than a write operation, so bogus data was being read at the edge of previously written valid data.)

The silver lining of this is that now I know the ECC reporting mechanism works, and I was able to use this to test a little ECC-checking Perl script I wrote for my chipset under Linux (you can find it here). So if an error is detected or corrected on one of three computers I administer which have ECC RAM, I will get an email telling me about it. I feel my files are much safer now. ☺️

Also along the lines of improving computer reliability, I discovered (almost by accident) that the same three computers, as with most recent Intel chipsets, have a hardware watchdog feature—which I activated. This means that if the computer hangs up, the hardware will detect it (once the watchdog is started, it needs to be pinged at regular intervals) and cause a reboot. Hopefully this means I will no longer (or not so often) need to email my mother and ask her to reboot the computer in my room in Orsay. 😉

↑Entry #1490 [older| permalink|newer] / ↑Entrée #1490 [précédente| permalien|suivante] ↑

[Index of all entries / Index de toutes les entréesLatest entries / Dernières entréesXML (RSS 1.0) • Recent comments / Commentaires récents]