Issues with XSAN...Anyone Care to comment?

brett's picture

1. I believe there is an obvious Bug with the XSAN Admin Application when setting up a 'client-only' (non-controller) via the GUI.

Scenario: After having established a functional dedicated Xserve G5 MetaData Controller as well as establishing a dedicated XServe G4 Backup MetaData Controller as members on a private GbEthernet Network, as well as connected to our Public LAN/Intranet, upon trying to add an additional host as a 'Client-Only' to the private GbEthernet XSAN network, the option to force the client to 'talk' to the SAN over the private network is not available to choose from. It is grayed out and non selectable. It defaults to our Public LAN/Intranet network which causes the volumes to disconnect erratically. I have tried setting the network ports in different priority settings and neither setting resolve the problem. The only way I see currently to force an intended 'client-only' connection is to change the setting to a 'Controller' and setting the Priority to 'LOW' and having it participate on the now selectable private GbEthernet network.

Ex:
XSAN MDC Settings = Controller Priority 'HIGH' Public Ethernet: 172.16.1.81 Private Ethernet: 192.168.10.1
XSAN Backup MDC Settings = Controller Priority 'MEDIUM' Public Ethernet: 172.16.1.60 Private Ethernet: 192.168.10.2
XSAN Host Client Setting = Client (Grayed out option for popup menus to select private network address) Public Ethernet: 172.16.1.71 Private Ethernet: 192.168.10.3

There needs to be a way to bind the 'Client-Only' type XSAN host to the Private Network which there currently is not. Unless I am misinterpreting the configurations this is a severe problem that I have seen others make mention of on discussion boards, both on Apple's website as well as on sites that focus on this XSAN product.

2. I believe there is a definite oversight in the Admin GUI application when it comes to assigning LUNS to storage pools to create XSAN volumes. It does not disable used LUNS when a multi-SAN storage fabric exists.

Scenario: I have setup two separate independent SAN's that share the same Fiber Channel Switch Fabric. I have not setup any FC Switch Segmenting to isolate the Storage so both SAN's see all the LUNs available. I have two separate Private MetaData Networks to handle the MetaData traffic but I have discovered that when setting up a new volume through the XSAN Admin Application, if a LUN has already been used in a pre-existing Pool it is still available to select to add to a separate SAN's Storage Pool which can have instant impact or lead to instant data loss. This forces the System Administrator to manually keep track of each LUN used for each separate SAN. I believe this to be a bug. I would hope that the used LUN, regardless of the FC Switch Fabric not being segmented, to show in the XSAN Admin Application a status of a given LUN preferably 'used' or grayed out from the selectable RAW XRAID Storage back-end of selectable LUNS.

3. With a XSAN Volume created as an Project Archives repository that contains QuarkXpress files, If an end user accessing the storage via an AFP share from an Xserve front-end to the XSAN Volume on the back end, If that user opens a Quark file directly from the Server over the network and tries to save the file to overwrite or to a new location on the server the client as well as the server and MDC and Backup MDC will lock up and force the need to reboot to regain stability. This problem is an administrative nightmare.

dgf's picture

As far as issue #1, while this may be considered a bug in the Xsan Admin GUI, you can force the client to contact the private IP by modifying

/Library/Filesystems/Xsan/config/fsnameservers

This is the file the SAN filesystem references for the metadata controller(s) IP(s).

Use vi, pico or emacs to replace the public LAN IP of the MDC(s) with that of the Private LAN interface.

You will need to be root to modify the file.

With regards to issue #3, which version of Xsan are you running? Xsan 1.1 is MUCH better with regards to sharing the SAN volume over AFP, among other improvements.

brett's picture

DGF:

Are you saying that an Xsan 'client' (non-mdc) can be added to this fsnameservers file and it will obey the private IP assignment and not switch to the public/intranet IP address as it is doing now because of the greyed out pop-up menu?

I just installed 1.1 today and still having these problems. 1,2 and 3.

dgf's picture

The fsnameservers file in your case should read:

192.168.10.1
192.168.10.2

This is the list of MDC's IP's to contact for metadata requests.

(Just a note that I typically don't assign the last octet of a host's IP as .1, that is typically reserved for gateways - but I digress)

Xsan Admin, attempting to automatically configure your host's metadata LAN may have populated your fsnameservers file (incorrectly) on the client with the "public" LAN interface IP's of the MDC's:

172.16.1.81
172.16.1.60

or a mixture of both. If this is so, you can manually edit the file to appear like the first example. Then your client will be contacting the MDC's via the private LAN (192.168.10.x) vs. the public LAN (172.16.1.x).

Just to clarify, is the metadata LAN on a separate physical interface (PCI NIC) or are you multi-homing the Built-In Ethernet? I would recommend a separate interface on a separate physical network?

brett's picture

I checked the fsnameservers and it read correctly as you referenced.

Additionally, to answer you question, I do have two independent ports per machine.

Xserve G5:
Port 1 used for a statically assigned public/intranet IP (172.16.1.x)
Port 2 used for a statically assigned private IP (192.168.10.x)

I am still very confused about how a "Client" non-MDC host is supposed to participate and be seen via the Xsan Admin App. When I look at the info about the 'client' it shows the public/intranet IP address and not the expected assigned private metadata IP address. Is this right? What is the point then of having a private network connection for 'client only' when it shows the public address. Are we supposed to have a leap of faith that the 'client' Xsan host will just know which interface to use? In looking at diagrams that Apple Provides with the product it always shows each Xsan Host participant connected to both Private and Public networks for the expected MDC's and Backup MDC as well as just the plain 'ol 'client'. Why is the option greyed out to pick a network interface for the 'client' only connection?

dgf's picture

Quote:
I checked the fsnameservers and it read correctly as you referenced. /quote

Good. That means your client(s) are making metadata requests over the appropriate network.

Quote:
I am still very confused about how a "Client" non-MDC host is supposed to participate and be seen via the Xsan Admin App. When I look at the info about the 'client' it shows the public/intranet IP address and not the expected assigned private metadata IP address. Is this right? /quote

Xsan Admin uses Rendevous (Bonjour) technology to discover Xsan clients for configuration purposes. Whichever interface answers first, is the one that is used. I get around this by using a proxy Xsan Admin box (not an xsan client - only xsan admin installed) with only the "public" LAN interface. Therefore all 'admining' is forced over the 'public' LAN.

Quote:
Are we supposed to have a leap of faith that the 'client' Xsan host will just know which interface to use? /quote

Perhaps. Or just check the fsnameservers file.

Quote:
Why is the option greyed out to pick a network interface for the 'client' only connection?/quote

Dunno. I think the idea may be that the fsnameservers entries on client only are pushed down and don't need to be manually configured. However, I have seen Xsan Admin configure the fsnameservers file incorrectly.

aaron's picture

In my experience, you can't trust Xsan Admin to report the client IPs. That's not to say the clients connect on the incorrect interface, only that Xsan Admin may not show it.

To tell where the metadata is traveling, issue this command from your primary MDC:

[code]sudo lsof -ni4 | grep fsm/code

You'll get a list of all the clients connected to the MDC. If the IP addresses are in your private range, then all is good.

Finally, you may consider disconnecting your MDCs from the public network completely. Then you'll sleep better at night.

Aaron Freimark
Tekserve Professional Services
(212) 929-3645 x301

Aaron Freimark
CTO, Tekserve

brett's picture

aaron,

I tried your command in terminal and it gives an error: command not found. Please advise.

aaron's picture

The font is poor, but the "lsof" is all letters (LiSt Open Files). Try again, perhaps by copying and pasting.

Aaron Freimark
CTO, Tekserve

xorro's picture

I've looked over this thread carefully. Here's what we have:
Tiger10.4.1/1.1 MDC (Dual 1.8 G5/2 GB)
2 Tiger 1.1 clients
2 Panther 1.1 clients
All use Apple PCI-X NICs

We are experiencing the following:
Whenever the "public" interface enabled and up on the MDC, all bets are off and client connectivity becomes very unstable. For our testing of what follows, the MDC is only live on the built-in en0 at 10.1.1.1.

Whenever a "public" interface is enabled (DHCP or fixed) on EITHER the built-in or the PCI NIC, previously sound communications with the client will fail with a 311 communications error. The interface in Xsan Admin (running from either the MDC or a monitoring laptop) will report the client changing from, for example: 10.1.1.3/green to 192.168.1.3/red and communications will fail. Depending on circumstances, the SAN volume will continue to stay mounted or may become unresponsive.

The only way to recover the previous client state is to disable the "offending" ethernet interface and reboot the client. In most cases Xsan admin must be quit and relaunched.

Following the instuctions in the thread (fsnameservers, etc), we can ascertain that the clients are on the MDN when this occurs.

Given the scenario, disabling the "public" interface on the MDC does not solve the problem described by Brett and (I think) echoed here. Even so, there are implications for this, since with nothing but the isolated metadata network live we cannot bind to a directory server. I have been forced to run things as a local netinfo san.

Grateful for any input.

dgf's picture

Is your metadata network a separate physical network from the public network? I have my metadata network on a separate gig-e switch with no physical connectivity to the public LAN and have never experienced any of these issues. But, I'm also running 10.3.9 and Xsan 1.1 exclusively. What type of switch(es) (brand, model, managed, unmanaged) are you using?

xorro's picture

The networks are completely isolated:

Public/outer LAN: Asante 10/100 24 port
MDN: 3COM GB 24 port

We have alternated these to see if that had any effect, running the MDN thru the Asante, etc. No change.

We will be swapping out both switches for replacements after the holiday weekend to see if that changes anything.

Its worth noting that we tested the same rig under Xsan 1.0 and 1.01, and had the SAN up with no problems. We suspect the introduction of 1.1.

xorro's picture

Swapping out for 2 new switches had no effect. The networks are fully isolated. MDN is 10.0.1.X static. The outer network is 192.168.1.X dynamic.

Introducing a public LAN to the MDC (10.4.1/1.1), with fixed 192 causes havoc. Specifically, it causes the clients to switch over, in the Xsan admin client views to the 192 range, turn to red buttons and become unresponsive with the 311 error.

On an isolated MDC, we can produce the effect on a Tiger client by activating the PCI NIC and doing DHCP.

We are rolling back to 1.01 to test further.

xorro's picture

I'm beginning to suspect that this phenomenon has to do with using a G5 - Dual 2/1.5 GB as an MDC. Anyone have a working PowerMac G5 SAN?

Thanks!

dgf's picture

Quote:
Anyone have a working PowerMac G5 SAN?/quote

haven't tried. is it running Server?

another thought. is your 10.1.1/24 network the only one with a gateway entered? if the 192.168.1/24 interface has a gateway as well, perhaps it is trying to route MD traffic elsewhere.

bforcier's picture

Please advise of your results in regards to using a PowerMac G5 as MDC/sMDC since I have a client who absolutly wants to do that exact thing for the sMDC and I don't want to run into too much hassles...

Thanks

xorro's picture

Using a gateway on the 10.x.x.x network did not improve this, although I thought it was a smashing idea.

So, it would appear this is not an isolated phenomenon. See this link in Apple's Discussions:

http://discussions.info.apple.com/webx?14@397.IzNDaxXHR5c.1@.68b2d089

The solution recommended by some is to configure clients as low-priority controllers to allow the admin to select the NIC being used for the MDN.

xorro's picture

Our lab SAN is now stable with a Dual 1.8 as MDC. We have destroyed and remade the SAN as 10.3.8/1.01, 10.3.9/1.1, and 10.4.1/1.1.

Is anyone able to comment on the wisdom or implications of the client as low-priority controller approach?