Anyone else having problems with Solaris 10 and WD SATA drives?
I’m at my wit’s end, here, so I’m posting in hopes that someone else might have had this problem.
I finally got all of the hardware for the new SunHELP server in place at the remote colo (in Austin),
and shipped them the disk (WD RE3 1TB “Raid Edition” SATA). It arrived and was installed in the system today.
The T1000 has the latest firmware for the OBP and ALOM installed as of last Friday.
The disk was fine when I tested it here before shipping, and didn’t have any problems during the (jumpstart) install. However, after install (with zfs root pool on the single disk) and reboot, I get a ton of these:
WARNING: /pci@7c0/pci@0/pci@8/scsi@2/sd@0,0 (sd1): Error for Command: write(10) Error Level: Retryable Requested Block: 369161036 Error Block: 369161036 Vendor: ATA Serial Number: WD-WMAT Sense Key: Unit Attention ASC: 0x29 (power on, reset, or bus reset occurred), ASCQ: 0x0, FRU: 0x0
Never anything more than a “Retryable” error level. “zpool scrub” says I have no errors. The disk is using
the same SATA and power cables that were plugged into the “factory” 80G SATA disk, which worked fine before
it was removed to put in the 1T drive.
If anyone else has seen issues like this with WD SATA drives and SPARC systems, please let me know.
Update: I’d like to publicly thank Hichael Morton for buying and overnighting a Hitachi 1TB SATA disk so that the server upgrade can go as planned while I figure out what the problem with the Western Digital disk is.
I had some similar problems, but I doubt they were drive related; and I ended up with a data corruption. Now let me explain how I did set things up.
I am using a sun blade 150 (@650Mhz) with scsi drives, I have two operating systems on it, one (Open)Solaris (which I usually keep current) and FreeBSD/sparc. The reason there is that both have ZFS support, so I can easily ‘mount’ the ZFS partition on either operating system.
Well, after a few months, and a few upgrades (which shouldn’t have caused any problem) my ZFS partition got corrupted. I have no clue how this happened, but this drove me away from this very neat system… I have tried the drives on a linux (x86) system to check for bad blocks or anything, but I haven’t found anything suspicious.
Odd no? I’ve submitted messages on the solaris forum, but nobody was able to help, so I no longer run ZFS or OpenSolaris, which is a shame.
Jason: Which release of FreeBSD were you using? The ZFS code in FreeBSD 7 was unquestionably not ready for production. Supposedly FreeBSD 8’s ZFS code is finally stable, but I’d still hesitate to trust it in production. If I had to take a guess, my call would be that the odds are FreeBSD corrupted your ZFS diskset.
Phil:
I’ve used FreeBSD 7 (on sparc, most obviously), ZFS seemed the right way to store and “exchange” data between my (open)solaris and freebsd system while having snapshots (which I like to abuse on).
The sad part of this story is that the instability (which lead, in my case to a filesystem corruption) and lack of support (in the opensolaris forums) drove me away from the opensolaris project altogether, and I’ve ported (or had to) my code to linux, where a tiny portion still run on linux/(ultra)sparc, but sometimes I wonder for how long.
Part of the reason why this code was on sparc in the first place is the high requirement in context switches.
Things have, since then, changed in the linux scene, and filesystems with snapshot capabilities seems to surface, although, most obviously none is as advanced as ZFS at this time, knowing how the linux community moves fast, I wouldn’t be surprised if things were looking much better next year.
One thing that would drive people back to (open)solaris would clearly be a kernel + userland as easy to compile and install as *BSD, but I yet have to see this.
I’m not sure where solaris is heading with Oracle, whom has always been a huge linux supporter.