[rescue] replacing an Ultra2

Jonathan C. Patschke jp at celestrion.net
Wed Apr 18 17:22:47 CDT 2007


On Wed, 18 Apr 2007, William Enestvedt wrote:

>> Ask Jonathan Patschke about his field service tech horror stories.
>>
>   Oh, everyone has stories -- but pictures, oy, that's another thing
> entirely! Jonathan, anything to share with the rest of the class?  :7)

This isn't so much a horror story as it is a serial of incompetence, due
to the scope of hardware involved.  If he'd have touched one of the IBM
minis over at my previous job (however, we had IBM service there, which
was stellar), it'd be a horror story.

Before I came to work at my current day-job, there was no one here with
any real Sun experience, despite having a couple hundred of them in
service.  They had a service contract with a very large IT
outsourcer[0], as a result.

So, when one of our Blade 1000 systems started giving e-cache errors, I
called in a ticket on the system.  I pulled the system from its home,
set it aside in our vendor workroom, and left him to his business.

"Are you sure this system was working otherwise?" he asks, after having
fit a new CPU.  It was coming up with all sorts of foul errors.  Ah!
He'd swapped in a US-IIIcu CPU to replace a US-III.  That requires a
firmware update.  So I downed another system, swapped CPUs, updated the
firmware, and swapped back.  The system still didn't come up.  We
determined that the replacement CPU must be defective, so he set about
removing it to take back.

He started by completely unscrewing one side of CPU carrier, -lifting up
on it-, and then remembering he had to unscrew the other side.  You
Don't Do That with US-III edge connectors.  I asked him if that's how he
installed it, so he proceeded to show me; that he did exactly that,
while using a cheap ratchet instead of the Sun-supplied torque tool.

So, I lectured him about that, showed him the proper way to do it (even
showed him chapter-and-verse in the Blade 1000 service guide and
pointed out the nifty (if cheesy) torque-wrench Sun supplied with every
system) and sent him back to his employer to obtain another CPU.

He returned about a week later with a new CPU, which he hurriedly
installed, after which the system didn't come up at all.  So, he tried
the other slot; still didn't come up.  I turned around in time to see
him trying the original slot by forcing the module in using a pair of
pliers.

   "GET THE HELL OUT OF MY WORKROOM."

http://jonathan.celestrion.net/photos/ngc/

So, I called the asshat's manager, and they shipped us a new system
board.  Same tech.  I figured he -had- to have learned his lesson by
now.  No, he chewed that one up the same way.

The phonecall that ensued was one that I wish I'd have recorded.
Suffice it to say that $outsourcer would have a replacement system in
the datacenter the following day or we would drop the contract and
deduct the replacement cost of the system from any balances outstanding.

The following day, another Blade 1000 showed up.  A nonworking Blade
1000, so we sent it and the nonworking tech away.  I've no idea whether
it was working when it arrived and he killed it swapping memory and
disks or whether it was a dead unit from the start.

We canceled out contract with them that afternoon.  To their credit,
they did show up three weeks later with a working system, which is still
running, modulo having eaten two or three DIMMs since then.

Word is that they canned that tech.  I've no reason to doubt them, as
he'd likely eaten up a couple weeks' salary in replacement parts at our
site alone.

Now we buy our own spare parts, and I install them.  Life is good.

Bad-enough as that was, I can't think of many things that could beat
Pete Wargo's story of the CSE seating node boards in an E10000 by
-kicking them- into the midplane.


[0] Which I shan't name, but let's just say that, given the company and
     this fellow's method of "making things work", he must've been
     working on armored military hardware previously.
-- 
Jonathan Patschke ) "If we keep our pride, though paradise is lost, we
Elgin, TX        (   will pay the price, but we cannot count the cost."
USA               )                             --Neil Peart, "Bravado"



More information about the rescue mailing list