[geeks] memtest86 question (are correctable ECC errors, errors?)
Jonathan C. Patschke
jp at celestrion.net
Wed Sep 26 01:05:17 CDT 2007
On Wed, 26 Sep 2007, Patrick Giagnocavo wrote:
> I have been running memtest86 against a 16GB RAM , dual Opteron 248
> CPU system for almost 5 hours.
>
> During that time, there have been no errors, but, there have been 215
> ECC errors, all of which were corrected.
That's an -awful- lot for only 5 hours of testing time. You should see
approximately zero. $ork has a couple of 64GB systems, and they take a
large chunk of a day to do a full run of memtest86. The goal of burn-in
for those systems is zero errors over the course of a week.
> Can someone who knows more than me about memory architecture explain
> whether that means the RAM is bad or is it OK?
Think of it like RAID. You're having to rely on your parity data a lot
more than you'd really like.
If it were non-ECC memory, you'd have an extremely unreliable system.
Since you have ECC, you have a system that is hopefully reliable, but
whose memory is failing. It just hasn't failed enough to eat through
the redundancy yet.
Try re-seating the memory first. Macs Pro, for instance, tend to throw
very large numbers of parity errors if the memory modules aren't seated
absolutely perfectly (I'd seen upwards of 4000/hr) but will run without
other ill effects.
If anything, this illustrates why ECC should be a mandatory requirement
for anything other than a gaming/media-playback system.
--
Jonathan Patschke ) "So far, 99% of illegal activity has been caused
Elgin, TX ( by criminals."
USA ) --David Willis
More information about the geeks
mailing list