multipathing and bandwidth


Sorry to rehash an old question here, but I'm panicking a bit.

I've done the math, and our SAN should be capable of pushing out 2000MB/s+, easy. I have a Mac Pro Server with an Apple quad port fibre channel card installed. My Qlogic switch ports are all set correctly per the recommended Xsan/Qlogic settings from Apple. I'm barely breaking 500MB/s through this one server, though. I'm seeing all the LUNs on every FC port, so I don't understand why this is an issue. It seems to me like I've either misconfigured something or I've greatly underestimated the capabilities of this card.

Here's a summary of the fiber paths:
I've got 4x 16TB ActiveRAIDs, each configured with two 7-drive RAID5 LUNs. Each RAID controller is spanned across two switches. There are five Qlogic switches in total, and all the 10Gb stacking ports are connected for maximum redundancy. The RAID controllers populate most of switches 1 and 2. The Mac Pro Server with the quad FC card is connected to switches 1-4.

I'd gladly take any recommendations for config changes or hardware upgrades. Perhaps we just need a different FC card?



I tested the SAN throughput to the server with only 1 FC port connected, and I got about 300MB/s. Adding another port got me up to about 500MB/s. Adding more ports after that does nothing.

While running the same bandwidth test on multiple SAN clients simultaneously, I'm beginning to see them collectively approach the total bandwidth that the SAN is capable of. So I know that works.

So why can't I get it coming through one server? Is it my Apple LSI HBA, my Qlogic 5600 switches, or (God forbid) my ActiveRAIDs' controllers?

[b]Additional background:/b
I'm planning to share this SAN to our edit labs via 2x 10Gb Ethernet. Everything makes sense on paper, so I can't figure out why this isn't working.

[b]Followup #2:/b
My first tests were done using Blackmagic's Disk Speed Test. Best I could do was 500MB/s. A test with Disk Fire resulted with 800MB/s. Much better, but still not what I'd expect from a 4x4Gb/s FC HBA.

AJA's System Test was a bit harder to gauge, but I could get about 700-800MB/s.

I'm going to scavenge another HBA from another box tomorrow and see if adding more ports yields better results.

Maybe this was all in my head... How do these numbers look to everyone?

[b]Followup #3:/b
Well, that was decidedly unpleasant...

My Mac Pro Server has an Apple quad FC card (LSI7404EP). I added a dual port FC card (LSI7204EP) to see if I could get up to six ports. I hadn't connected the two new ports to the fabric yet, but the original four were still connected when I booted the machine. As soon as it booted, every single controller in my RAIDs began to crash. I had to physically reseat them all to get things running again. The SAN volume came back online alright (although all my clients were rather cranky), but I've stopped the volume while I run triage on the whole setup. In the Active Viewer app, every array says it's regenerating its parity data.


cvfsck -j
cvfsck -nv

[code]Super Block information.
FS Created On : Sat Oct 1 22:59:37 2011
Inode Version : '2.5' - XSan 2.2 named streams inode version (0x205)
File System Status : Clean
Allocated Inodes : 1137664
Free Inodes : 241973
FL Blocks : 176
Next Inode Chunk : 0x1481e
Metadump Seqno : 0
Restore Journal Seqno : 0
Windows Security Indx Inode : 0x7
Windows Security Data Inode : 0x8
Quota Database Inode : 0x9
ID Database Inode : 0xd
Client Write Opens Inode : 0xa

Stripe Group MetadataAndJournal ( 0) 0x3a354c0 blocks.
Stripe Group Video-1 ( 1) 0x57500f40 blocks.
Stripe Group Video-2 ( 2) 0x57500f40 blocks.
Stripe Group Other ( 3) 0x24615b00 blocks.

Building Inode Index Database 2871296 (100%).
Super Block - Number of Inodes of 0x115c00 is wrong! Should be 0x2bd000.
Super Block - Next Inode Chunk of 0x1481e is wrong! Should be 0x312b6.

Verifying NT Security Descriptors
Found 13719 NT Security Descriptors: all are good

Verifying Free List Extents.
Super Block - Free List Size of 0xb0 is wrong! Should be 0x180.

Scanning inodes 2871296 (100%).

Sorting extent list for MetadataAndJournal pass 1/1
Updating bitmap for MetadataAndJournal extents 65255 ( 1%).
Sorting extent list for Video-1 pass 1/1
Updating bitmap for Video-1 extents 2683966 ( 50%).
Sorting extent list for Video-2 pass 1/1
Updating bitmap for Video-2 extents 5055572 ( 95%).
Sorting extent list for Other pass 1/1
Updating bitmap for Other extents 5279327 (100%).

Checking for dead inodes 2871296 (100%).

Checking directories 58409 (100%).

Scanning for orphaned inodes 2871296 (100%).

Verifying link & subdir counts 2871296 (100%).

Super Block - Free Inodes of 0x3b135 is wrong! Should be 0x14560.

Repairing free list.

Checking pending free list.

Checking Arbitration Control Block.

Checking MetadataAndJournal allocation bit maps (100%).
Checking Video-1 allocation bit maps (100%).
Checking Video-2 allocation bit maps (100%).
Checking Other allocation bit maps (100%).

File system 'Xsan' was modified.

File system 'Xsan'. Blocks-3540089216 free-940766956 Inodes-2871296 free-83296. /code

I ran cvfsck -wv and it seems to have repaired everything. I'm going to wait a little while for my RAID controllers to settle down before I contintue.

abstractrude's picture

Hmmm. 2000 MB/Sec is a lot. What is your workflow requiring?
Are you doing 4K??

Blackmagic disk speed test is a strange tool, it has some features to keep SSD and other modern drives from gaming the tool. I have noticed this gives odd responses to Xsan/Stornext tests. I recommend using command line tools or even the AJA system test, which is feeling a bit long in the tooth.

How are you storage pools configured?

-Trevor Carlson

We already used this SAN with 40+ Xsan clients for the past few years, and it's been working great. Now we're switching to GigE for our client infrastructure (iMacs, budget limits, new hardware uncertainty...). The Mac Pro is going to serve approximately 40 video editing clients via 2x10GbE links into our gigabit switches. Bandwidth needs on the client end are very minimal, due to H.264 and AVCHD. It's mostly for intro editing classes. I've already set up this same infrastructure for one lab of 16 clients and it works flawlessly.

[b]Storage Pools:/b
Video 1 = 4x 7x1TB RAID5 LUNs (Video)
Video 2 = 4x 7x1TB RAID5 LUNs (Video)
Other = 2x 6x1TB RAID5 LUNs (cache files, small data only)

The Video pools are for general SAN storage. I have a few folders set with affinity for the Other pool.

I just hope I don't have to buy another whole server to make this work. Maybe this could be accomplished with a few Mac Mini's, Promise SANlinks, and Thunderbolt to 10Gb adapters.

Anyone have any input on setting MultiPathMethod for my two "Video" pools? Is this safe to do with a live volume, or do I really risk breaking things?

An update to my saga over the past couple weeks, mostly for posterity:

First, unbeknownst to me, Apple's LSI7404EP is a x8 PCIe card. By default, Apple ships all their FC cards in slot 4, which is a x4 slot. I moved the card to a x16 slot and I'm seeing the numbers I expect to see. Rookie mistake, but hurray it works!

I thought the cause for my RAID crashes might have been their firmware. Though Active Storage is no longer in business, I was still able to get the v1.40 firmware from their old support site. After updating all my controllers from v1.28 (which went flawlessly), I discovered that my MDCs were both complaining about duplicate LUNs. The update apparently changed the model identifier string for each of the RAIDs as well as their WWNs. I stopped the volume and shut everything down. Following a reboot (in proper order), the duplicate LUNs were still there. On my active MDC, I stopped the volume using cvadmin, then told the Xsan Admin app to close the connection to this SAN. I reconnected to it with File > New, and all the duplicates were gone. Ran cvfsck and it reported no anomalies. Repeated the disconnect/reconnect procedure for my secondary MDC after it booted. Volume started and mounts just fine.

Back to my crashed RAID controllers from last week, I suspect that it was probably a firmware version mismatch between the two FC HBAs I installed for testing (LSI7404EP from 2013 and LSI7204EP from 2008). I installed another LSI7404EP and this has not caused any complaints from my RAIDs. Not sure if it was the card or the firmware update on the RAIDs themselves, but everything seems to be working (knock on wood).

DLpres's picture

Thanks for all the detailed info, pgsengstock. I've been researching and tinkering with multipathing recently, and it does seem to be somewhat of a dark art in our (video) industry. Most just use it for redundancy and not bandwidth.

What I was trying to do is break the 4Gb-per-controller barrier (in a 4Gb system) where the fiber port is the bottleneck. We currently have single-controller RAIDs (Maxx Digital / QSAN) running MetaSAN, but Fibre Channel is Fibre Channel.
I get 480MB/s reads so apparently I did "break the 4Gb barrier" but was hoping to see numbers closer to 550-600MB/s. Just as you've noticed, there's very little help to be had.
If I recall correctly I did setup LUNs to identify and discover via port WWN rather than node WWN, to avoid seeing duplicate LUNs.

Are you plugging in more than 1 fiber link per controller? And if so are you seeing more than ~380MB/s per controller?

Our controllers are active/active controllers, with two 4Gb ports each (four ports total per chassis). I've got every port connected to the fabric. Can't really say that I'm seeing more than 380MB/s per controller, because we've got eight controllers total. I know that the theoretical limit for my SAN is about 2400MB/s, which is far below 8*380MB/s. For a single 4Gb FC port, 480MB/s is about what I'd expect, actually. Someone smarter than me can confirm.

When I said my MDCs were complaining about duplicate LUNs, I meant they were complaining about duplicate LUN labels. Each LUN retained it's label following my RAID controller update, but the WWN changed. Both the MDCs retained records of the "old" LUNs, causing some confusing displays in the Xsan Admin GUI. Disconnecting and reconnecting the GUI to the SAN on both MDCs cleared things right up. There were no apparent issues with starting the volume throughout this.

DLpres's picture

For a single 4Gb FC port, 480MB/s is about what I'd expect, actually./quote
I suppose you meant 380MB/s? That's the well-known practical limit. 4Gbps divided by 10 (8 data bits + 2 parity) makes the 400MB/s theoretical limit and in practice somewhere around 380MB/s.

pgsengstock wrote:
Can't really say that I'm seeing more than 380MB/s per controller/quote
That always got me confused. I was told by several manufacturers that the actual throughput of a typical 4Gb FC RAID controller should be 550-600MB/s - which you don't see unless you multipath both ports.

I know that the theoretical limit for my SAN is about 2400MB/s/quote
Do you have any guidelines (or config guide) that you're basing that off? I was going to post my own closely-related question about bandwidth. Basically, if you have 8 controllers, 16 x 4Gb links, and 64 spindles I'd expect your total SAN bandwidth to be nearly 4,600MB/s with the bottleneck being the controllers (8x575MB/s). What am I getting wrong?

In particular, I'm confused by numbers I often see about # of spindles vs. speed. Any decent modern drive pushes over 130MB/s sustained, so 5 drives should be enough to saturate a controller. Why is it that "you can add 2 expansion chassis to an E-class before speed stops improving"? 3x16 spindles can push over 6,000MB/s from the drives' perspective. I don't understand why so many spindles are needed for speed benefits in video applications with long sustained reads (databases and image sequences are a different story, of course).

About retaining phantom records of the old LUNs - makes sense.