[rescue] weird Opteron 865 / Tyan 4882 problem

Patrick Giagnocavo patrick at zill.net
Tue Jul 14 20:35:29 CDT 2009


Hoping someone can give me some advice on this system:

4x Opteron 865 (dual core CPUs)

As you know, each Opteron has its "own" RAM using its on-board memory
controller.  Each CPU has 4 slots.

All CPUs are fully populated with 4x1GB ECC DIMMs, except for CPU1.

Conditions under which system will hang:
CPU1 with no RAM
CPU1 with 4x DIMMs

Conditions under which system will boot fine:
CPU1 with 2x DIMMs

BUT the system does not see the RAM as existing!  That is, I see 12GB
RAM at bootup and when the system is running, not 14GB as it should be.

My thinking is that either:

1.  CPU1 is partially busted (specifically its memory controller) and
should be replaced

OR

2.  There is a problem with the DIMM slots itself, like maybe a resistor
 or some other electrical channel problem.

My main concern is reliability - I want to use ESXi v4 on this beast and
put it in colocation... I can get by with 12GB provided the system
remains stable, and performance even going with UMA instead of NUMA is
still pretty decent.

Any ideas?  Suggestions for further testing?

Cordially

Patrick



More information about the rescue mailing list