[geeks] A U30 puzzle
Phil Stracchino
alaric at metrocast.net
Fri Jul 3 12:26:31 CDT 2009
Folks,
As previously discussed, I transferred my NAS to a new dual-Zeon box. I
just reinstalled the U30 that was previously doing the job with Solaris
10u7 SPARC (200905). One thing I discovered in the process was that
during the month or so it's been shut down, one of the SCSI controllers
(a SunSwift PCI) had gone south. So I took that board out and replaced
it with a Symbios SYM22801 dual SCSI card (dual U160, I think), which
fixed that issue and spread the twelve array disks across two SCSI
controllers they now have all to themselves, entirely separate from the
internal disks. They're configured as raidz1 with a hot spare slice on
the second internal disk.
minbar:root:~:44 # prtdiag
System Configuration: Sun Microsystems sun4u Sun Ultra 30 UPA/PCI
(UltraSPARC-II 248MHz)
System clock frequency: 83 MHz
Memory size: 512 Megabytes
========================= CPUs =========================
Run Ecache CPU CPU
Brd CPU Module MHz MB Impl. Mask
--- --- ------- ----- ------ ------ ----
0 0 0 248 1.0 US-II 1.1
========================= IO Cards =========================
Bus Freq
Brd Type MHz Slot Name Model
--- ---- ---- ---------- ----------------------------
--------------------
0 PCI 33 On-Board network-SUNW,hme
0 PCI 33 On-Board scsi-glm/disk (block) Symbios,53C875
0 PCI 33 pcib slot 2 scsi-glm/disk (block) Symbios,53C875
0 PCI 33 pcib slot 2 scsi-glm/disk (block) Symbios,53C875
0 PCI 66 pcia slot 1 ethernet-pci8086,1001
0 UPA 83 29 FFB, Double Buffered SUNW,501-4788
0 UPA 83 30 AFB, Double Buffered
No failures found in System
===========================
Of course, the machine could stand to have more RAM, but I only have
32MB DIMMs in it and don't have anything bigger.
Time to create a 1GB file on the array is actually slightly faster than
on the single boot disk, which rather argues against ZFS bringing the
machine to its knees:
minbar:root:~:42 # time dd if=/dev/zero bs=1M count=1000
of=/spool/export/bigfile
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 43.014 s, 24.4 MB/s
real 0m43.088s
user 0m0.037s
sys 0m11.405s
minbar:root:~:43 # time dd if=/dev/zero bs=1M count=1000 of=/bigfile
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 45.1339 s, 23.2 MB/s
real 0m45.197s
user 0m0.031s
sys 0m10.449s
Now, here's the puzzling part. When I was using this machine as my
primary NAS box, running Solaris 9 on it, I could get an honest 98Mb/s
transfer rate off it across the network over its internal hme. But
after being shut down for a month or so then brought back up as a backup
cache of the data on babylon4's array, it's slower than a sick dog. One
reason I reinstalled it with solaris 10 was to see whether something had
gotten badly corrupted on the OS somehow. Even just a copy across the
network from /dev/zero dumped to /dev/null is slow. Here's 100MB from
babylon4 to /dev/null on minbar:
babylon4:root:~:52 # dd if=/dev/zero bs=1M count=100 | ssh minbar dd
of=/dev/null
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 64.7485 s, 1.6 MB/s
204800+0 records in
204800+0 records out
Wondering if the onboard hme had developed a fault similar to the
SunSwift, I popped a spare pgi64 ge interface into minbar's pci64 slot.
This is the same copy, over a direct point-to-point connection from
babylon4's bge1 to minbar's e1000g0:
babylon4:root:~:53 # dd if=/dev/zero bs=1M count=100 | ssh minbar-sync
dd of=/dev/null
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 50.4264 s, 2.1 MB/s
204800+0 records in
204800+0 records out
2.1 megabytes/s from /dev/zero to /dev/null over a point-to-point
gigabit connection is absurd. rsync, zfs send, cp -av via nfs mount,
everything is slow.
Anyone have any thoughts on the matter? I've looked everywhere I can
think of, and I can't find anything obviously *wrong* (aside from the
failing SCSI controller that I already replaced) ... it just seems to be
running at a fraction of the throughput it ought to be, for no apparent
reason that I can figure out. vmstat looks reasonable, the machine's
not swapping hard. iostat -Cx looks sane, prstat shows the CPU sitting
at a couple of percent utilization.
minbar:root:~:47 # vmstat -p
memory page executable anonymous
filesystem
swap free re mf fr de sr epi epo epf api apo apf fpi
fpo fpf
2093840 155632 6 20 0 0 2 0 0 0 0 0 0 18
0 0
minbar:root:~:48 # vmstat
kthr memory page disk faults cpu
r b w swap free re mf pi po fr de sr f0 rm s0 s3 in sy cs us
sy id
0 0 0 2093848 155600 6 20 18 0 0 0 2 -0 -0 0 3 469 1180 160 4
6 90
Typical iostat frame during a cp -av from an nfs mount of babylon4's
array to minbar's array:
extended device statistics
device r/s w/s kr/s kw/s wait actv svc_t %w %b
c0 0.2 0.0 17.0 0.0 0.0 0.0 8.8 0 0
sd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
sd3 0.2 0.0 17.0 0.0 0.0 0.0 8.8 0 0
sd6 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
c1 0.0 6.7 0.0 21.1 0.0 0.1 21.8 0 5
sd23 0.0 1.1 0.0 3.7 0.0 0.0 23.9 0 1
sd24 0.0 1.1 0.0 3.7 0.0 0.0 21.5 0 1
sd25 0.0 1.0 0.0 3.6 0.0 0.0 16.8 0 1
sd26 0.0 1.3 0.0 2.7 0.0 0.0 24.4 0 1
sd27 0.0 1.1 0.0 3.8 0.0 0.0 22.7 0 1
sd28 0.0 1.1 0.0 3.7 0.0 0.0 20.3 0 1
c2 0.0 8.6 0.0 19.4 0.0 0.3 31.1 0 7
sd38 0.0 1.0 0.0 3.7 0.0 0.0 28.2 0 1
sd39 0.0 1.2 0.0 1.9 0.0 0.0 19.3 0 1
sd40 0.0 1.5 0.0 3.6 0.0 0.1 33.4 0 1
sd41 0.0 1.5 0.0 3.6 0.0 0.0 32.4 0 1
sd42 0.0 1.5 0.0 3.6 0.0 0.0 29.6 0 1
sd43 0.0 1.8 0.0 2.9 0.0 0.1 39.0 0 1
fd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
ramdisk1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
nfs1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
nfs2 6.8 0.0 29.4 0.0 0.0 0.4 63.4 0 43
Nothing here looks like a machine that's struggling to keep its head
above water or sitting hard up against a bottleneck. I'm baffled. The
closest I can get to the kind of throughput I *should* be seeing is a
direct cp -av from an NFS mount over the point-to-point gigE connection:
babylon4:root:~:67 # time dd if=/dev/zero bs=1M count=1000
of=/netstore/bigfile
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 16.6926 s, 62.8 MB/s
real 0m16.709s
user 0m0.015s
sys 0m3.395s
minbar:root:~:61 # time dd if=/netstore-sync/bigfile bs=1M
of=/export/bigfile
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 113.532 s, 9.2 MB/s
real 1m53.764s
user 0m0.048s
sys 0m24.881s
This is clearly still nowhere near what it ought to be, though.
At this point, I'm baffled.
--
Phil Stracchino, CDK#2 DoD#299792458 ICBM: 43.5607, -71.355
alaric at caerllewys.net alaric at metrocast.net phil at co.ordinate.org
Renaissance Man, Unix ronin, Perl hacker, Free Stater
It's not the years, it's the mileage.
More information about the geeks
mailing list