[rescue] mysterious hard hangs, two different sun4m's
James Lockwood
james at foonly.com
Tue Sep 3 02:06:43 CDT 2002
On Mon, 2 Sep 2002, Skeezics Boondoggle wrote:
> anyway, i'm at a loss. i suppose i should try enabling the deadman code
> in the kernel, see if i can get ANY kind of debugging info out of it...
> i'm not sure what advice y'all might have that i haven't already thought
> of or tried; in 12 years banging on sun hardware these kinds of hard hangs
> are so rare that i'm just *mystified* that i've now moved the problem from
> one machine to another by just swapping their places on the desk. i think
> it's gremlins. a cia plot. sunspots. or i'm just going mad.
How hard of a lock is it? Does the system still respond to L1-A?
If so, boot with kadb and try to reproduce it. Once it hangs, break into
the debugger and get a backtrace. First order analysis: suspect hardware
if the hang point wanders dramatically, suspect software if it stays in a
relatively small number of places. Using the SX stresses some weird parts
of the memory controller.
Is the watchdog reset enabled? If it is a "hard" hang, does it respond to
a keyboard replug event (which enters the kernel at a higher interrupt
priority than L1-A)?
Try pulling one CPU and see what happens. If you still get the hangs,
swap it for the other. Drop down to a single DIMM. You know, standard
problem isolation techniques.
Unless your desk has a Tesla coil directly underneath it I wouldn't worry
about the problem migrating with position. :)
-James
More information about the rescue
mailing list