Bulletproof Zoning on Xsan

aaron's picture

The most mystifying component of our SANs has got to be the fibre channel network. With heavy black boxes, bright orange cables, even lasers, it sends our PKE meter off the scale. Even the management software manages to either look like a Windows program or require -- ack -- Internet Explorer!

This article may help shed just a little light on your fabric. The idea is to set up zones -- lots of zones -- to make your SAN behave more predictably. It can be a lot of work at the beginning, but you get several important benefits.

  • Broadcast traffic is isolated between clients, minimizing client-to-client conflicts
  • New devices are effectively invisible until deliberately added to a zone

The first benefit takes care of RSCN suppression, so you won't need to worry about I/O StreamGuard (on QLogic switches) or Stealth (on Emulex switches). The second benefit is useful for when you add new storage to your SAN. Without appropriate zoning, the new LUN may attach itself to a random host before you label and add it to Xsan.

I focus QLogic switches, (over three-quarters of you use them), but the concepts should translate to other switches as well. Also, we're setiing up a sort of modified single initiator zoning. There are other zoning schemes there (common host is probably what you were using before), but for me this scheme has the right balance of reliability and manageability.

Background: Zones, Zone Sets, and Aliases

When you first plug them in, fibre channel switches allow every device to see every other device. Normally, that's a good thing. It lets your computer see all the storage you attach, for example.

But what if you don't want a particular device to be visible by every other? What if you have a tape library that is used by only one server? Or if you have a RAID that you want "direct-attached" to a particular host? For that, you'll want to begin partitioning your fabric into zones.

A zone is a group of devices on your SAN. The devices in that group can see each other and exchange data. A device in a zone cannot see or exchange data with devices outside its zone. Zones are equivalent to VLANs on Ethernet switches.

When you plan your zones, you'll inevitably want to use more than one at the same time. One zone containing that tape library and its host server, another containing the direct-attached storage and its edit station. A group of zones active at the same time is called a zone set. Only one zone set can be active at a time.

There's nothing that limits you from adding a device to more than one zone. So if you zone A to B, and A to C, then A can see both B and C. (But B and C will be like estranged siblings, and won't communicate with each other.)

If you add the same set of devices again and again to zones -- say adding shared storage to each client zone -- you may find yourself wanting to describe that set of devices in a shorthand way. An alias is just that -- a shorthand for a set of devices that can be added to multiple zones.

Zone by Port, FC Address, or WWN?

That's a conceptual definition of zones. More practically, when you "add a device to a zone," you have a choice. How do you identify that device? On QLogic switches, you can identify by port, by FC address, or by World-Wide Name (WWN).

Zoning by port is very common on smaller SANs. It's an easy way to get more from your fibre switch by making it do double or triple duty. "I want ports 0-3 together for LUN-masked storage, ports 4-7 to share a tape library." Unfortunately, by-port zoning breaks down for larger installations, when relatively frequent device moves are sure to drive you nuts.

The FC Address is the way the switch internally identifies your device. So even though your metadata controller uses two ports, the switch will assign a single FC address. I've never seen this used for zoning, but please comment if you have.

I recommend using WWNs to set up your zones. A WWN is a 64-bit address assigned to each fibre channel device by the manufacturer. It's called a "World-wide name" because it's supposed to be unique in the world. Like Ethernet MAC addresses, each manufacturer is assigned a prefix, and is supposed to ensure that each device gets a unique suffix. It's even possible to identify devices based on the prefix. (Security experts say WWN zoning isn't the most secure, which is true. But for our Xsan we're looking for stability and ease of configuration, not top security.)

The big benefit to zoning by WWN is that you are able to move hosts from cable to cable, port to port, and you won't need to modify the zones. But if you add new storage or clients, or if you swap a RAID controller or a PCI Fibre Channel card, you will need to update the zone set and add the new WWNs appropriately. Otherwise you'll be left wondering why the new device can't communicate with anything else!

Setting up your Zone Set

Let's get to work. Open up the SANsurfer Switch Manager or point your web browser to you switch and log in. Wait a few seconds until all the icons on the left should turn green, indicating good communication. If you have only a single fibre channel switch, click it's icon in the left part of the window. But if you have more than one, you'll need to select the line that says "Stack." After a few more moments, the "Zoning" button in the toolbar will enable itself. Click it, and the "Edit Zoning" window will open.

The left half of this window lists all your zone sets, zones, and aliases. The right half lists every port, FC Address and WWN on your switch. You probably have an "ORPHAN ZONE SET" already defined -- this is a built-in zone set that adopts any user-defined zones which are not members of other zone sets.

So here's the first big step: Using the "Zone Set+", "Zone+", and "Alias+" buttons, make a new zone set, zones for each client, and an alias that look like the following:

You should create a well-named zone for each client on your system. Make a zone for each dedicated metadata controller as well. Don't make a zone for each storage device; we'll put those into the alias.

Populating the Alias

Next, add the WWNs of your storage controllers to your storage alias. How do you identify them? QLogic does a good job identifying "APPLE Xserve RAID," itself, as you can see two pictures above. If your switch doesn't clearly identify your storage, then use your storage management software to read the WWNs, and search for them in the list. Include controllers handling only metadata as well. The QLogic zone editor works with drag-and-drop.

When you have all the storage accounted for (double-check!), you can add your storage alias to all client zones, again using drag-and-drop. Expand the client zones to ensure that you remembered to add the alias to each client (triple-check!).

Note: On boot, an Xsan client scans to make sure all storage pools are visible, including storage pools dedicated to metadata and journaling. If you keep your metadata LUNs out of your client zones, you will get errors in the system.log and cvlog.

Adding WWNs to Zones

Make sure all your clients are on, so the switch learns the WWNs. You may have to apply your changes (don't activate a zone set yet!) and reopen the zone editor to see new clients.

The tricky part here is to identify which client is which.

  • If you have a good OmniGraffle of your network you can add clients by port.
  • If you don't, or if you want to double-check (a great idea), you can look up WWNs in each client's System Profiler.
  • Via SSH, you can look up WWNs with this command: system_profiler | grep "10:00:00"

Depending on how many fibre channel cables you use, drag one or two WWNs from the list on the right side to the appropriate zone on the left.

Notice in the right-hand pane: a WWN that belongs to a zone is bold and italic. This makes it easy to identify the WWNs that remain. When you are finished, every WWN should be bold and italic. If not, you'll have problems!

Pre-activation Checklist

  1. Is there a zone for each client and MDC?
  2. Does each zone contain a copy of the storage alias?
  3. Does the storage alias list the correct number of RAID controllers?
  4. Are there any WWNs that are not bold and italic?

Activation

When you are ready to activate the first time, I strongly recommend you have the clients and MDCs shut down. Even little mistakes are effectively the same as taking a wire cutter to your optical cables.

When you click the "Apply" button, you can perform an error check. Then click "Save Zoning." The software will ask if you will want to activate a zone set. (Clicking "No" is a good way to save your changes without affecting your fabric.) If you do want to activate your new zone set, make sure you select it in the popup window.

After you've applied your changes and activated the zone set, boot your MDC, checking its system log for any warnings of down stripe groups. If you are OK, bring up clients one-by-one.

Making Changes

Once you've done the work to create the zoning, maintenance is pretty easy. You can safely plug in devices to the switch/stack before configuration.

  • If you swap a RAID controller, remove the old WWN from the storage alias (it will be grayed out), and drag the new WWN from the left pane (it will be the device not bold and italics).
  • If you add a new client, create a new zone, add the storage alias and the new fibre WWNs.
  • If you add new storage, add its WWN to your active MDC's zone. After you label the LUN(s), remove the WWN from the MDC's zone, and add it to the storage alias. Then continue building your storage pool.

After any modification, make sure you activate the zone set after saving changes. Otherwise your changes will be saved but not applied.

Conclusion

I've successfully implemented this zoning in a 31-seat Xsan. Even though there was enormous initial effort getting the scheme up and running, the SAN is now much easier to manage. Plus, it was an opportunity to get my port maps and client IPs 100% up to date.

Please share your own zoning experiences in the comments below.

--
Aaron Freimark
Tekserve Professional Services, (212) 929-3645 x301

Comments

7
IHateFriction's picture

Great article, Aaron! I followed your instructions, and everything worked great. Thanks again!

mnouwens's picture

Great article!

We went 1 step further...

There is no need for clients to see the Metadata lun, therefore we did not add
the metadata lun to the alias (for user storage)

Instead we have only attached the Metadata lun to the metadata controllers...

Also remember to correctly set Devicescan property to be disabled on
initiator ports.

Gr. Michiel Nouwens

---

Micom Service Point
Apple Consultant, troubleshooter
http://www.micomservicepoint.nl

aaron's picture

There is no need for clients to see the Metadata lun, therefore we did
not add the metadata lun to the alias (for user storage)

That's what I had though, too. But then my clients began complaining that
"Stripe group 0 is DOWN" when booting. Things were otherwise working, but I
chose to get rid of the logged errors by including metadata LUNs in shared
alias.

---
Aaron Freimark
http://www.tekserve.com/vcard/af.vcf

Aaron Freimark
CTO, Tekserve

proton's picture

What about latter additions to the zone? For example when I'm adding a new client, do I need to turn off all servers and clients before activating the zone changes? Or this must be done only first time you create the zone?

JonThompson's picture

Ok, here's the $10,000 question...

Is it safe to assign a XSan storage zone to the same port on the xserve as your
Tape Library zone?

Right now, I have them broken up, but I could conceivably get twice the
bandwidth if I did it that way.

JonThompson's picture

Wow, almost a year without an answer. Bueller? Bueller?

ambassadors's picture

A very interesting all worth visiting. I’m glad I found

this article. It has helped me a lot. Keep up the good

work.
vacation rental koh samui thailand