[geeks] computer room gallery 8-)
Eric Dittman
geeks at sunhelp.org
Sat Jan 5 00:57:53 CST 2002
> > I think requiring an NDA while investigating is terrible service. I've
> > never had to sign an NDA to get a vendor to investigate or debug a problem.
> > I don't think blaming the problems on the environment was any more than a
> > delaying factor.
>
> I thought so too. However, I spent q good amount of time analyzing
> this and other problems. I saw some pretty out-of-spec datacenters. Liek
> the one with an open door to the outside, and the one that was running
> at 65F and 80% humidity... While it certainly wasn't the cause of the
> problem, a poor datacenter enviroment really brought problems like this
> and others to light. (Makes sense, like margin testing.)
I've seen some pretty poor data centers, too. However, I've seen systems
running in those data centers without any problems.
> > There also appear to have been a couple of revised modules which didn't
> > actually fix the problem as the cache was mirrored but still didn't
> > have ECC. There was also the fix that Sun produced that impacted
> > performance.
>
> ECC didn't come along until the UltraSPARC III. The Mirrored SRAM works
> quite well, as the chance of getting pairty hits on the same bit in two
> modules is about the same as that of me voting for any of the George
> Bushes.
After the mirrored caches were installed in eBay's systems didn't they
still have some crashes related to cache corruption?
> > I hope they got new architects for their CPUs. The design problems they
> > had with the CPU module was not consistent with their earlier work.
>
> The CPU isn't really at fault. The UltraSparc I and II are good chips,
> and in certain modules perform well - the Ultra-1 is a great box, and the
> CPU modules in the U2/30/60/etc. are really solid. I had an E4000 with the
> 1M cache 250's that ran for 3 years w/2 unscheduled downtimes, both were
> disk failures. The problem was poor planning for the scaling of the
> cache, and Sun suffered (as they should have) as a result.
I can't really agree here. The problem with memory cell errors due
to radiation was known long enough for the designers to have taken
that into account.
--
Eric Dittman
dittman at dittman.net
Check out the DEC Enthusiasts Club at http://www.dittman.net/
More information about the geeks
mailing list