MDCs both in standby, volume won't mount

Had some serious funkiness with our SAN volume. Lots of inode issues. I ran cvfsck -j, then -nv, then -wv, which seems to have cleaned up all the errors. After trying to start the volume again, it immediately went into a RPL_upgrade process. A couple times, that failed near the end (fsm process crashed), then the volume would fail over to the other MDC and start the who RPL process again. I saw a "completed" message at least three times now, but the volumes will still not mount correctly. Both MDCs have their fsm processes running in standby mode now, though the volume appears to be started and hosted by the master in Xsan Admin. Mounting the volume anywhere just causes another crash.

After running cvadmin activate on the master MDC, it eventually tells me:

Admin Tap Connection to FSM failed

This also kicks off another round of RPL upgrades.

I'm hoping I don't have to completely restore our backup, because that'll take days… Any suggestions great appreciated. Thank you!

hernand's picture

Xsan display incorrect Lun size

I have these problem. The Xsan display incorrect lun size. As you can see the Disk Utility app shows the correct configuration, but the Xsan admin displays one LUN of 20TB (I dont where it came from) and another 42TB (XsanLUN1).
I think it has to show 2 LUNS of 20TB each acording to Disk Utility app shows.
The drive began experience read write problems and I think it would be parte of these problem.
Can you help me?




thomasb's picture

Qlogic getting out of the Fibre Channel switch business


Did anybody else notice these news?

QLogic's save-the-biz pitch: Ditch the switch glitch - get rich
Server adapter firm to focus on, er, server adapters...

We've had three Qlogic 5800-switches "bricked" in under a year at work, running firmware Two of them because of a power failure, and one of them simply didn't want to turn on again after turning it off. But isn't a power failure simmilar to turning it off? I know there is a shutdown command in the command line interface, but this is very well hidden in some old manuals for the SANbox switches. We have always simply cut the power to turn off Qlogic 5600/5800 switches, without issues, and suddenly this year three 5800 switches die when cutting the power. Not good.

Anybody else had issues with 5800-switches going into "brick mode" with the heartbeat LED constantly lit? Pushing the maintenance mode button doesn't help. It just tries to reload, and then it's stuck again.

The 5600-swtiches we have, have been running solid since 2006 though, without any serious issues.

Time to move on I guess.

ygini's picture

AFP and Barracuda LB 340, not stable?



I've a Xsan setup with two 10.8 server used for AFP (and for few systems, SMB) and I try to use a Barracuda LB 340 to handle HA constraints.

We don't have a need of big bandwidth, but we need easy to use HA system for end user.

The LAN is a flat network, so I've configure the LB 340 with the TCP proxy mode.

Most of the time, it work. But users connected to the AFP share through the LB experience some problems. File in read only, impossible to copy file from a share to an other, etc.

Same users, but connected directly to one of the OS X Server don't have the problem.

So, I'm wondering. How do you handle your HA setup, and if you use Barracuda products, how do you configure it?

matx's picture

Promise SANLink2

Just noticed that Promise has released information about their Thunderbolt 2 based products, the SANLink2 fibre channel adapter and their Pegasus2 RAIDs.

Not sure when are released to market officially. But we finally have a dual 8GB thunderbolt to fibre channel adapter from Promise.

Xsan AFP reshare problems

Hi all,

Firstly, sorry for cross-posting, but I did already post this over on [url=]Creative Cow/url.

I've been hitting a huge problem with our Xsan reshare this week. I'm getting to the end of my rope pretty fast, so any insights are greatly appreciated.

Our setup:

Xsan 2.3
2x MDCs @ 10.7.5
4x (ActiveStorage) head raids + 1x head chassis for metadata and other data
Qlogic 5600 switches
1x beefy Mac Pro AFP reshare @ 10.7.5
- 12-core @ 2.4GHz
- 64GB RAM
- 6x 4Gb FC link
- 2x 10Gb Ethernet bonded link via SmallTree card (lab1)
- 6x 1Gb Ethernet bonded link via SmallTree card (lab2)
Lab 1 = 40x 2013 iMacs @ 10.8.4
Lab 2 = 17x 2012 iMacs @ 10.8.4

The media that the students are using is primarily old DV-NTSC footage or AVHCD right off our new cameras. We also have several students building simple websites, and they store all their source files and images on this volume. The volume is definitely able to handle these data rates, so the problem must be with my reshare configuration. After a server reboot, the labs all perform fine, but as the day wears on, we eventually reach a point where new users can no longer connect to the AFP reshare. One class arrives, logs in, edits, logs out, (repeat), but by the fourth class, the students get a nasty prompt that the sever is unavailable. Users who are already connected can continue working just fine.

In additional to that, we occasionally have users experience choppy connections to the server. Their footage will suddenly just pause for about 30 seconds. A reboot of the client workstation usually clears this up, but not always (it'll crop up again). When I check the AFP connection list on the server, I usually see two or more duplicate connections from the workstation in question, the older "stale" ones all with increasing idle times while the active connection shows 0 idle time.

When new connections can no longer be made to the server, that's usually the point where I need to run and forcibly shut it off. Even after a fresh reboot of the server and the clients, things aren't perfect. I'm now seeing constant metadata traffic between my reshare server and the acting MDC. It's all very small data (<1MB/s), but it's constant, even when no users are connected via AFP. My acting MDC also has >100% CPU load for the fsm process, which currently boasts 299 threads (is this normal?). The kernel process on the reshare server is constantly at about 20% CPU, and the AppleFileServer process always has about 204 threads (seems high, right?).

Some frequent log messages I'm seeing on the reshare server are:

Oct 24 18:10:39 reshare-server kernel[0]: add_fsevent: unable to get path for vp 0xffffff80881e67c0 (-UNKNOWN-FILE; ret 22; type 2)
Oct 24 18:10:39 reshare-server kernel[0]: add_fsevent: unable to get path for vp 0xffffff80881e67c0 (-UNKNOWN-FILE; ret 22; type 4)
Oct 24 19:03:51: --- last message repeated 1 time ---

Oct 24 19:55:23 reshare-server AppleFileServer[958]: received message with invalid client_id 1583
Oct 24 19:55:25 reshare-server AppleFileServer[958]: _Assert: /SourceCache/afpserver/afpserver-585.7/afpserver/AFPRequest.cpp, 2006
Oct 24 19:55:25: --- last message repeated 2 times ---

Oct 24 20:33:01 reshare-server AppleFileServer[958]: MDSChannelPeerCreate: (os/kern) invalid argument
Oct 24 20:33:51: --- last message repeated 1 time ---

The first group of log messages above seems to relate to when someone has experienced a bad (stale) connection, though I can't confirm this. After said client reboots, these message cease. I was really happy when I made it the whole morning without seeing any of them, but then they started cropping up again. All these messages tend to increase with more use of the client systems, but I can't pinpoint the cause.

I've been debating converting this setup to NFS instead of AFP, but several of our workflows rely pretty heavily on the ACLs that are in place on this volume. My understanding is that Apple's nfsd doesn't pass the ACLs along to the clients. We tried NFS for a small subset of machines for a semester, but AFP was much easier and gave us our ACLs. I also have another almost identical reshare server ready and waiting to go in the rack, but I'd like to make sure I'm heading in the right direction before I throw more hardware at the situation.

Thank you!

Sam Edwards's picture

Any Mavericks out there?

Hey Folks,
I can see apple advertises xsan with the new server v3. Has anybody tried it? Sorry if I missed another thread on this subject.

Chanmax07's picture

Xsan extend volume problem with total free space


I have an xsan with 2 mdc (mountain lion) with 1 eclass promise and 1 jclass and 2 volumes.

Recently I need to add space and more bandwitch.

I add 1 e class and 1 j class to my fabric swicth with four lun.

I extend the first volume in my san. After that my volume is up to 32To.

The free space of the volume is wrong. I see in xsan admin 9To of free space but I have only 9To of data in my volumes.

I think the extension of volume create a mirror of my lun.

Do you confirm that ? And what si the method to expand a volume with new contrôler and more space

King regards<

Gerard's picture

StorNext Reshare (SMB vs NFS)



In my current environment, we are running
Xsan 3.0
Few fiber switches (5600s)
Few RAIDs (Promise E & Js)
Two MDCs (Xserves)
MDCs and fiber-clients are running OSX 10.6.8

We are looking into several, new solutions and StorNext is one of them. Alot of our users reshare, via AFP, from a Xserve, which has the Xsan volume attached on there

If we proceed with StorNext, we can either go with a SMB reshare over a Windows Server (Windows 7 Server) or a NFS reshare over a Linux server (Red Hat Enterprise).

In my experience with SMB, it is more stable since it will group multiple request together, reducing network overload. With NFS, it will offer better speeds

With a large group of reshare users, fifty, accessing large, Adobe files (Photoshop, Indesign, etc) at 4Gbs and up, which in your opinion would be the best reshare protocol to use?


Disaster recovery using cvfsb ( spaces issues )

Hi All

Have a customer with an Xsan volume that got nasty, it won't mount anymore

We can retrieve files using the method described in this page

running commands like this works fine

[code]echo “save 0x3c4de8 /Volumes/restore/Projects/moonshine” | cvfsdb Testvol/code

My main issue is with folder paths that's got spaces in it ( tell me about naming guidelines ! )

If the destination has a space in the folder name, it doesn't work

So if we have a directory structure like VolumeRoot/Face/To Sound/

I can't retrieve files to it

running this

[code]echo "save 0x357c51f /Volumes/Restore1/Face/To\ Sound/test2.dv" | cvfsdb XSAN/code

gives : [code]syntax error. Enter "help" or "?" for help/code.

While if I eliminate the space, it works fine like below

[code]echo "save 0x357c51f /Volumes/Restore1/Face/To-Sound/test2.dv" | cvfsdb XSAN/code

I've tried various options for the syntax, but none seems to work with the spaces

Needless to say the volume structure is full of spaces ( as well as & , / and non-latin characters too! ) , so it's close to impossible to rename all such folders

Any help about including non-latin characters in this command syntax would be appreciated



Subscribe to Xsanity RSS