Anyone else having problems with Solaris 10 and WD SATA drives?

Posted by Bill Bradford on Mar 18, 2010

I’m at my wit’s end, here, so I’m posting in hopes that someone else might have had this problem.

I finally got all of the hardware for the new SunHELP server in place at the remote colo (in Austin),
and shipped them the disk (WD RE3 1TB “Raid Edition” SATA). It arrived and was installed in the system today.
The T1000 has the latest firmware for the OBP and ALOM installed as of last Friday.

The disk was fine when I tested it here before shipping, and didn’t have any problems during the (jumpstart) install. However, after install (with zfs root pool on the single disk) and reboot, I get a ton of these:

WARNING: /pci@7c0/pci@0/pci@8/scsi@2/sd@0,0 (sd1):
        Error for Command: write(10)               Error Level: Retryable
        Requested Block: 369161036                 Error Block: 369161036
        Vendor: ATA                                Serial Number:      WD-WMAT
        Sense Key: Unit Attention
        ASC: 0x29 (power on, reset, or bus reset occurred), ASCQ: 0x0, FRU: 0x0

Never anything more than a “Retryable” error level. “zpool scrub” says I have no errors. The disk is using
the same SATA and power cables that were plugged into the “factory” 80G SATA disk, which worked fine before
it was removed to put in the 1T drive.

If anyone else has seen issues like this with WD SATA drives and SPARC systems, please let me know.

Update: I’d like to publicly thank Hichael Morton for buying and overnighting a Hitachi 1TB SATA disk so that the server upgrade can go as planned while I figure out what the problem with the Western Digital disk is.

4 responses to “Anyone else having problems with Solaris 10 and WD SATA drives?”

  1. jason says:

    I had some similar problems, but I doubt they were drive related; and I ended up with a data corruption. Now let me explain how I did set things up.

    I am using a sun blade 150 (@650Mhz) with scsi drives, I have two operating systems on it, one (Open)Solaris (which I usually keep current) and FreeBSD/sparc. The reason there is that both have ZFS support, so I can easily ‘mount’ the ZFS partition on either operating system.

    Well, after a few months, and a few upgrades (which shouldn’t have caused any problem) my ZFS partition got corrupted. I have no clue how this happened, but this drove me away from this very neat system… I have tried the drives on a linux (x86) system to check for bad blocks or anything, but I haven’t found anything suspicious.

    Odd no? I’ve submitted messages on the solaris forum, but nobody was able to help, so I no longer run ZFS or OpenSolaris, which is a shame.

  2. Phil says:

    Jason: Which release of FreeBSD were you using? The ZFS code in FreeBSD 7 was unquestionably not ready for production. Supposedly FreeBSD 8’s ZFS code is finally stable, but I’d still hesitate to trust it in production. If I had to take a guess, my call would be that the odds are FreeBSD corrupted your ZFS diskset.

  3. jason says:

    Phil:

    I’ve used FreeBSD 7 (on sparc, most obviously), ZFS seemed the right way to store and “exchange” data between my (open)solaris and freebsd system while having snapshots (which I like to abuse on).

    The sad part of this story is that the instability (which lead, in my case to a filesystem corruption) and lack of support (in the opensolaris forums) drove me away from the opensolaris project altogether, and I’ve ported (or had to) my code to linux, where a tiny portion still run on linux/(ultra)sparc, but sometimes I wonder for how long.

    Part of the reason why this code was on sparc in the first place is the high requirement in context switches.

    Things have, since then, changed in the linux scene, and filesystems with snapshot capabilities seems to surface, although, most obviously none is as advanced as ZFS at this time, knowing how the linux community moves fast, I wouldn’t be surprised if things were looking much better next year.

  4. jason says:

    One thing that would drive people back to (open)solaris would clearly be a kernel + userland as easy to compile and install as *BSD, but I yet have to see this.

    I’m not sure where solaris is heading with Oracle, whom has always been a huge linux supporter.