[geeks] Growing a ZFS pool, a workaround (was Re: Impressive...)
Francois Dion
francois.dion at gmail.com
Mon Mar 9 12:14:11 CDT 2009
On Mon, Mar 9, 2009 at 12:21 PM, Francois Dion <francois.dion at gmail.com>
wrote:
> On Sat, Mar 7, 2009 at 10:21 AM, velociraptor <velociraptor at gmail.com>
wrote:
>> On Fri, Mar 6, 2009 at 3:04 PM, velociraptor <velociraptor at gmail.com>
wrote:
>>
>>> I need to educate myself more, obviously. I didn't realize you could
>>> not grow the pools transparently.
>
> If they had that, can you imagine their market share? Even the new Sun
> FISHworks appliances dont do it.
>
> Actually, I do want to mention something. You can, somewhat. You can
> add another disk to an already created pool as spare, or add two
> mirrored drives to a pool. You dont have to create a new pool per
> pair, you can add. But, it doesn't redistribute the data across all
> the drives in a striped / mirror way as if it had been created like
> that from the start.
>
> There is also a trick I've been using for the past few years though.
> With zfs features like compression and endianness, if you move the
> files it will do some of the restructuring. Let me explain:
>
> Create a zpool, create a zfs with compression off. Copy files on it.
> Then enable compression, move the files around between 2 zfs FS on the
> same pool (both sharing the total disk space set by the zpool, then
> you'd unmount the original FS and set a new mount point on the 2nd FS
> to the original FS mountpoint), and you now have more free space as on
> the new writes the data is now being compressed.
>
> I quickly mention compression here:
>
http://solarisdesktop.blogspot.com/2007/02/stick-to-zfs-or-laptop-with-mirror
ed.html
> (there are all kinds of hacks you can do with zfs that are not
> officially supported or much documented)
>
> Similarly, create a zpool on a x86 machine. Copy files to it. Then
> export the zpool from the x86 machine to a sparc machine. Again, move
> files around from one zfs FS to the other. The blocks are now in the
> native endianness, so no translation is done on the fly anymore. I've
> done that to move a large Oracle database from sparc to x86, and then
> later taking that SATA disk on the x86 box and loading it on a sparc
> box. Then shuffling files to get native endianness, and finally adding
> a mirror after the fact.
>
> This same approach should work to take advantage of all the drives in
> the expanded zpool, but I'm not sure how to test this. It would be a
> good question for the zfs list. This is in theory for that part, as
> I've not tried this yet, and like I said, no easy way to test that
> data is really striped across the board.
>
>>> My original plan was to take two
>>> new disks and stripe them, move the data to those, then use the
>>> existing 3 disks to start the zpool, transfer the data to zfs, then
>>> put the two new disks into the zpool and grow it. This will have to
>>> be revised, and I'm going to need to reconsider the disks I use for
>>> the project.
>>
>> Looking at my data, I think I can still work this with the planned
>> hard drives, though the data transfer may be a bit hairy since I'll be
>> using a non-redundant zfs pool for the temporary storage.
>
> Just to make sure, make sure you do a scrub on the new pool with the
> data and it completes without error before destroying the original
> data...
>
>> One question I do have--and I'll search some more after I get some
>> solid sleep (fscking on-call pager did that just the right interval to
>> not really sleep thing this evening :-( ): If I create a zpool over
>> the top of a hw raid array, will zfs see the space when that raid
>> device gets bigger?
>
> No. This is easily tested with slices. You would have to add that
> extra space as a new lun, not the same lun. I wouldn't use hardware
> raid5 at all. Even if you did, you would need to have a minimum of 2
> luns so you can mirror with zfs for zfs to be able to recover any
> corruption (which the hardware raid wouldn't detect most of the time).
> The nice feature of zfs is end to end data integrity.
>
> Does your raid controller have battery backed up cache?
>
>> I know it's not any more "space efficient" than mirroring two raidz
>> zpools (if that is even possible).
>
> The equivalent of raid 0+1 or raid 1+0 is possible. You could also
> mirror with zfs two hardware raid5. Or raidz hardware mirrors. Or
> mirror hardware stripes. etc.
>
>> I see a couple of mentions of
>> using raid controllers in jbod mode, and given what zfs does it's
>> obvious why you'd do that.
>
> Even SANs. You can use LUNs, but again you need to mirror them with
> zfs or you could raidz 3+ LUNs.
>
>> I'm just wondering about zfs over things
>> like LUNs, which I don't see much talk about other than some
>> operational "here's how you add a LUN to a zpool" kind of thing.
>> Pointers to more "enterprise-y" info appreciated.
>
> LUNs add complexity with no real benefit in real life.
>
> BTW, you can test anything zfs with files instead of actual devs.
>
> If I get a chance I'll try the dual fs trick and add more "disks"
> (files) and see if I cant use dtrace or something to monitor which
> "disks" are accessed for a given file. Since this pool would be quiet
> except for my direct operations, zpool iostat -v should be at least
> enough to give a definitely no or maybe. If definitely no, there's no
> point in writing dtrace code for nothing.
>
> Francois
I edited the subject since we are going a bit on a tangent...
So, as to the compression/endian trick applied to growing a pool with
another mirror, It does appears to work, viz:
root at wel:/ # mkdir zfs_test
root at wel:/ # cd zfs_test
root at wel:/zfs_test # mkfile 128M disk1
root at wel:/zfs_test # mkfile 128M disk2
root at wel:/zfs_test # mkfile 128M disk3
root at wel:/zfs_test # mkfile 128M disk4
root at wel:/zfs_test # zpool create mypool mirror /zfs_test/disk1
/zfs_test/disk2
root at wel:/zfs_test # zfs create mypool/fs1
root at wel:/zfs_test # ls /mypool/
fs1
root at wel:/zfs_test # cp /usb/mp3/f/Francois\ Dion/IDM-4011\ -\ Test\
Tones\ I/* /mypool/fs1
root at wel:/zfs_test # zpool add mypool mirror /zfs_test/disk3
/zfs_test/disk4root at wel2500:/zfs_test # zfs create mypool/fs2
root at wel:/zfs_test # cd /mypool/fs1
root at wel:/mypool/fs1 # ls
00 - IDM-4011 - Francois Dion - Test Tones I.m3u
00 - Test Tones I.m3u
01 - 2600.mp3
02 - Nynex.mp3
03 - KP.mp3
04 - override.mp3
05 - ST.mp3
06 - Gibson.mp3
07 - DCC.mp3
08 - phrack.mp3
09 - 0062.mp3
Readme.rtf
TestToneI.cl5
http---www.cimastudios.com-fdion-.URL
test_tones.png
In another window I'm doing zpool iostat -v mypool 5:
capacity operations bandwidth
pool used avail read write read write
------------------- ----- ----- ----- ----- ----- -----
mypool 33.8M 212M 0 0 0 0
mirror 33.7M 89.3M 0 0 0 0
/zfs_test/disk1 - - 0 0 0 0
/zfs_test/disk2 - - 0 0 0 0
mirror 54K 123M 0 0 0 0
/zfs_test/disk3 - - 0 0 0 0
/zfs_test/disk4 - - 0 0 0 0
------------------- ----- ----- ----- ----- ----- -----
capacity operations bandwidth
pool used avail read write read write
------------------- ----- ----- ----- ----- ----- -----
mypool 33.8M 212M 31 84 3.77M 4.34M
mirror 33.7M 89.3M 31 47 3.77M 2.06M
/zfs_test/disk1 - - 19 24 2.43M 2.06M
/zfs_test/disk2 - - 11 23 1.35M 2.06M
mirror 54K 123M 0 36 0 2.28M
/zfs_test/disk3 - - 0 23 0 2.28M
/zfs_test/disk4 - - 0 23 0 2.28M
------------------- ----- ----- ----- ----- ----- -----
capacity operations bandwidth
pool used avail read write read write
------------------- ----- ----- ----- ----- ----- -----
mypool 41.5M 205M 2 66 167K 2.31M
mirror 30.0M 93.0M 2 25 167K 1019K
/zfs_test/disk1 - - 0 14 77.5K 1020K
/zfs_test/disk2 - - 1 13 89.6K 1020K
mirror 11.4M 112M 0 41 0 1.31M
/zfs_test/disk3 - - 0 20 0 1.31M
/zfs_test/disk4 - - 0 19 0 1.31M
------------------- ----- ----- ----- ----- ----- -----
capacity operations bandwidth
pool used avail read write read write
------------------- ----- ----- ----- ----- ----- -----
mypool 33.9M 212M 0 0 0 0
mirror 16.0M 107M 0 0 0 0
/zfs_test/disk1 - - 0 0 0 817
/zfs_test/disk2 - - 0 0 0 817
mirror 17.9M 105M 0 0 0 0
/zfs_test/disk3 - - 0 0 0 817
/zfs_test/disk4 - - 0 0 0 817
------------------- ----- ----- ----- ----- ----- -----
capacity operations bandwidth
pool used avail read write read write
------------------- ----- ----- ----- ----- ----- -----
mypool 33.9M 212M 0 0 0 0
mirror 16.0M 107M 0 0 0 0
/zfs_test/disk1 - - 0 0 0 0
/zfs_test/disk2 - - 0 0 0 0
mirror 17.9M 105M 0 0 0 0
/zfs_test/disk3 - - 0 0 0 0
/zfs_test/disk4 - - 0 0 0 0
------------------- ----- ----- ----- ----- ----- -----
Then I did:
root at wel:/mypool/fs1 # mv * ../fs2
root at wel:/mypool/fs1 # cd ../fs2
waited a bit more and did:
root at wel:/mypool/fs2 # mplayer 02\ -\ Nynex.mp3
while running zpool iostat again:
capacity operations bandwidth
pool used avail read write read write
------------------- ----- ----- ----- ----- ----- -----
mypool 33.8M 212M 0 0 51.1K 0
mirror 15.4M 108M 0 0 25.5K 0
/zfs_test/disk1 - - 0 0 0 0
/zfs_test/disk2 - - 0 0 25.5K 0
mirror 18.5M 105M 0 0 25.5K 0
/zfs_test/disk3 - - 0 0 0 0
/zfs_test/disk4 - - 0 0 25.5K 0
------------------- ----- ----- ----- ----- ----- -----
capacity operations bandwidth
pool used avail read write read write
------------------- ----- ----- ----- ----- ----- -----
mypool 33.8M 212M 0 0 76.6K 0
mirror 15.4M 108M 0 0 51.1K 0
/zfs_test/disk1 - - 0 0 0 0
/zfs_test/disk2 - - 0 0 51.1K 0
mirror 18.5M 105M 0 0 25.5K 0
/zfs_test/disk3 - - 0 0 25.5K 0
/zfs_test/disk4 - - 0 0 0 0
------------------- ----- ----- ----- ----- ----- -----
capacity operations bandwidth
pool used avail read write read write
------------------- ----- ----- ----- ----- ----- -----
mypool 33.8M 212M 0 0 76.6K 0
mirror 15.4M 108M 0 0 0 0
/zfs_test/disk1 - - 0 0 0 0
/zfs_test/disk2 - - 0 0 0 0
mirror 18.5M 105M 0 0 76.6K 0
/zfs_test/disk3 - - 0 0 76.6K 0
/zfs_test/disk4 - - 0 0 0 0
------------------- ----- ----- ----- ----- ----- -----
capacity operations bandwidth
pool used avail read write read write
------------------- ----- ----- ----- ----- ----- -----
mypool 33.8M 212M 1 0 204K 0
mirror 15.4M 108M 0 0 76.6K 0
/zfs_test/disk1 - - 0 0 0 0
/zfs_test/disk2 - - 0 0 76.6K 0
mirror 18.5M 105M 0 0 128K 0
/zfs_test/disk3 - - 0 0 128K 0
/zfs_test/disk4 - - 0 0 0 0
------------------- ----- ----- ----- ----- ----- -----
capacity operations bandwidth
pool used avail read write read write
------------------- ----- ----- ----- ----- ----- -----
mypool 33.8M 212M 1 0 255K 0
mirror 15.4M 108M 0 0 128K 0
/zfs_test/disk1 - - 0 0 0 0
/zfs_test/disk2 - - 0 0 128K 0
mirror 18.5M 105M 0 0 128K 0
/zfs_test/disk3 - - 0 0 128K 0
/zfs_test/disk4 - - 0 0 0 0
------------------- ----- ----- ----- ----- ----- -----
Now, this is just with mirrors. With raidz, if you try to add a disk,
you'll have a pool with 1 raidz1 and 1 single disk, so your pool is
not redundant. You would really need to add mirror disk4 disk5. Let me
try this.
Yep, you get on a mv from fs1 to fs2:
capacity operations bandwidth
pool used avail read write read write
------------------- ----- ----- ----- ----- ----- -----
mypool 50.7M 442M 25 77 3.08M 4.36M
raidz1 50.6M 319M 25 44 3.08M 2.16M
/zfs_test/disk1 - - 16 17 1.03M 1.08M
/zfs_test/disk2 - - 15 16 1000K 1.08M
/zfs_test/disk3 - - 17 16 1.09M 1.08M
mirror 60K 123M 0 33 0 2.20M
/zfs_test/disk4 - - 0 22 0 2.20M
/zfs_test/disk5 - - 0 22 0 2.20M
------------------- ----- ----- ----- ----- ----- -----
reads from disk1,2,3 and writes to disks 1,2,3,4,5.
and using mplayer, i see:
capacity operations bandwidth
pool used avail read write read write
------------------- ----- ----- ----- ----- ----- -----
mypool 42.5M 451M 1 0 204K 0
raidz1 25.8M 344M 0 0 102K 0
/zfs_test/disk1 - - 0 0 51.1K 0
/zfs_test/disk2 - - 0 0 0 0
/zfs_test/disk3 - - 0 0 51.1K 0
mirror 16.7M 106M 0 0 102K 0
/zfs_test/disk4 - - 0 0 102K 0
/zfs_test/disk5 - - 0 0 0 0
------------------- ----- ----- ----- ----- ----- -----
So that would work, it appears. This is just a bit of a pain to do.
And of course dont wait until your raidz1 is near full to try this.
Final point, a warning: I think I've mentionned before but you cannot
shrink a pool. Whereas you can use my trick to get around the growing
the pool problem, there is absolutely no way to move the data off of a
specific disk to remove it from the pool. Once it is added you can
only replace it with a spare.
Francois
More information about the geeks
mailing list