Andrew Allen's picture

Unable to Demote or Make Metadata Controllers

I have recently taken over managing a Xsan system. The system is composed of 2 (soon to be 3) SANs, 3 metadata controllers and 11 client machines. MDC1 (MetaData Controller 1) hosts SAN1, and MDC2 hosts SAN2. They fail over to each other. The third controller is also the Final Cut Server. It is the last failover. The SANs, clients and MDCs a're all connected via a Qlogic fibre switch. The switch is zoned such that each client is in it's own zone with Aliases for the 2 (soon to be 3) SANs.

This Xsan system was incredibly messy. It was running 4 different versions of Xsan (2.1, 2.2, and 2.2.2 and 3.0 on one client) on the clients. Two of the 3 controllers were running Mountain Lion to match the one edit system running Mountain lion. However the third controller was never updated from Snow Leopard 10.6.4. The rest of the clients were running various flavors of Snow Leopard.

We've needed to upgrade to Mavericks to use the newest Adobe Creative Cloud. There are two workstations we want to be Mavericks workstations to use this newest CC. We want to keep the rest of the work stations using Snow Leopard and Final Cut 7. We have updated all of the client systems to Snow Leopard 10.6.8 and Xsan 2.2.2. One of these clients will be moved to Mavericks eventually. There is also a 10.8.5 Mountain Lion machine that will also be moved to Mavericks.

In order to move to Mavericks we, of course, need to upgrade our MDCs to Mavericks and Xsan 3.1. We have done this as instructed. While I'm sure such a ragtag mixture of OS and Xsan versions is not ideal, the system has functioned this way for years.

The problem we have is that we need a new MDC to host SAN3. We have upgraded one of the clients to Mavericks in hopes of making it an MDC and demoting the old Final Cut Server (which we do NOT want to upgrade to Mavericks).

However, the action options to Remove a Computer from the San, Promote to Metadata Controller, Make Client and . . . something else, are all grayed out.

I am well aware that in order to do this actions, all of the clients and metadata controllers must be turned on and connected to the SAN. They are! The existing SANs are visible and functional on each and every client. Xsan seems to think however that some machine isn't on the network. I assume that's problem because it's acting like one of the machines it's not connected to the SAN and not allowing us to preform the actions that require all the MDCs/clients to be online.

 

My hunch is that problem is the Final Cut server. It is a 10.6.8 MDC while the MDC1 and MDC2 are Mavericks controllers. Surely this is causing a problem, but we don't want to upgrade the Final Cut Server to Mavericks b/c it's the Final Cut Server and we use Final Cut 7. (Can anyone confirm that Final Cut Server 7 works properly on Mavericks?)

So we're in a pickle. We can't demote the Final Cut server or make a new MDC. We can't remove it from the SAN either because of the option to do so is greyed out.

We have not actually added the third SAN yet in Xsan but the Volumes are visible in Xsan and the Disk Utility and properly configured for our set up. I just haven't actually made a third SAN in Xsan yet because we wanted to have the new Mavericks MDC in place to host it.

What can be done to get us out of this position? I've been trying to integrate this new SAN all week and we've been hung up on this for a long time.

Any thoughts and help is much appreciated.

aaron's picture

Mac Pro Tricks

A Mac Pro Easter egg discovered by Robert Hammen.

 

abstractrude's picture

Maya and 2013 Mac Pro

There had been some articles floating around, implying the new Mac Pro GPUs were underperforming. A new article from Arstechnica shows the issue lying with Maya and not the new Mac Pro.

While this issue still needs to be resolved for some programs like modo, some quick sleuthing by Apple and AMD shows that the problem was on the Autodesk side. Maya was not simply querying the GPU RAM

More at this link:

http://arstechnica.com/apple/2014/02/2013-mac-pro-firepro-d700-opengl-is-better-than-we-thought-it-was/

 

Apple Knowledge Base's picture

Airplay mirroring freezes or drops connections on an 802.11 b/g network (Apple KB)

When using a MacBook Pro (Late 2013) or Mac Pro (Late 2013) with Airplay mirroring on an 802.11 b or g network, the TV image might freeze or the connection gets dropped.

Read more: http://support.apple.com/kb/TS5316

DanRDT's picture

Thunderbolt Rack

Forums: 

We have just purchased our first 16 Bay Thunderbolt Rack for a customer project,

The Unit was easy to setup and testing on it worked perfectly.

The Unit will be upgraded to TB2 in q2 of this year, so this will bring extreme speeds for data transfer. 

Looking forward to using more. 

Apple Knowledge Base's picture

OS X: When your Mac doesn't sleep or wake (Apple KB)

Some features are designed to prevent your Mac from going to sleep. If your Mac doesn't sleep or wake as expected, review the steps in this article.

Read more: http://support.apple.com/kb/TS5357

Apple Knowledge Base's picture

Boot Camp: Press any key message appears while installing Windows 8 using DVD media (Apple KB)

When installing Windows 8 on an iMac (Late 2013) or Mac Pro (Late 2013), installation may not complete if you use an optical disc for installation. Instead, you are prompted to "press any key" or the computer restarts back to OS X.

Read more: http://support.apple.com/kb/TS5373

vijay-kumar's picture

All File & real folders disappear after run cvfsck -nv & -wv commands

anyone who can help me out from this problem and recover the volume data ??

serged's picture

XSAN Maverick FSMPM die randomly on 2 MDC.

Tags: 

We have a new install with 2 macmini, promise thunderbolt and osx 10.9.1    

 

Randomly the fsmpm seems to die and restart. This does happend on both mdc and a failover is triggered if the one failing owns the volume.  

 

 

 

First Maverick install...

THX
 

Serge.

 

 

 

 

 

 xsand[30]: Unable to connect to local FSMPM

 kernel[0]: Reconnecting to local portmapper on host '127.0.0.1'

20140130 13:48:07] 0x7fff777e6310 (debug) PortMapper: FSD on port 49170 disconnected.
[20140130 13:48:07] 0x7fff777e6310 (debug) PortMapper: FSS 'SANVOL01' disconnected.
[20140130 13:48:07] 0x7fff777e6310 (debug) PortMapper: kicking diskscan_thread 4389867520.
[20140130 13:48:07] 0x7fff777e6310 (debug) FSS: State Change 'SANVOL01' REGISTERED: (no substate) -> DYING: (no substate) , next event in 60s (/SourceCache/XsanFS/XsanFS-508/snfs/fsmpm/fsmpm.c#5597)
[20140130 13:48:07] 0x105a81000 INFO Starting Disk rescan
[20140130 13:48:07] 0x105a81000 (debug) Disk rescan delay completed
[20140130 13:48:07] 0x7fff777e6310 (debug) PortMapper: new_input authenticating protocol type(134) pmt_type(0) FSD(127.0.0.1)...
[20140130 13:48:07] 0x7fff777e6310 (debug) PortMapper: Local FSD client is registered, on port 49170.

Maverick, MDS, random failover

Hi all, I was awoken abruptly by a message stating that my Xsan volume had failed over. I got up to investigate, but can't find any telltale signs, other that some spotlight oddness. I recently rebuilt our SAN volume fresh under Mavericks.  Prior to this, I had always disabled spotlight, but I read over at Krypted.com that spotlight has drastically improved for Xsan 3, and that there should be no reason to not enable it. Checking my server stats and logs, I see that my acting MDC ramped up to a steady 20% CPU a few days ago.  That didn't subside until the failover this morning.  Looking at the logs, I can't see anything that corresponds with that much CPU usage.  The secondary MDC (now hosing the volume) also had some serious CPU usage following the failover, caused mostly by spotlight processes.  They were SERIOUSLY kicking the CPU.  We're talking total CPU usage in the neighborhood of 60% for the entire box.  Eventually, mds subsided and things are back to normal on the secondary, but I'll be damned if I can make sense of this.  Think I should just disable spotlight on this volume and rest easy? Some logs are below.  I'm particularly concerned about the inode errors at the end of it all. As always, thanks for any input! Pete Primary MDC (during the failover, nothing of note before this): 1/31/14 3:41:54.000 AM kernel[0]: Reconnecting to local portmapper on host '127.0.0.1' 1/31/14 3:41:54.000 AM kernel[0]: Local portmapper OK 1/31/14 3:41:54.269 AM KernelEventAgent[70]: tid 54485244 received event(s) VQ_NOTRESP (1) 1/31/14 3:41:54.269 AM KernelEventAgent[70]: tid 54485244 type 'acfs', mounted on '/Volumes/Xsan', from '/dev/disk14', not responding 1/31/14 3:41:54.270 AM KernelEventAgent[70]: tid 54485244 found 1 filesystem(s) with problem(s) 1/31/14 3:41:55.000 AM kernel[0]: Reconnecting to FSS 'Xsan' 1/31/14 3:41:55.269 AM fsmpm[329]: PortMapper: Initiating activation vote for FSS 'Xsan'. 1/31/14 3:41:56.800 AM fsmpm[329]: PortMapper: Starting FSS service 'Xsan[0]' on crosby.commarts.wisc.edu. 1/31/14 3:41:56.800 AM fsmpm[329]: PortMapper: Started FSS service 'Xsan' pid 70870. 1/31/14 3:42:02.000 AM kernel[0]: Cookie/0x1000001440b6b lsn 0x0 got ESTALE for reopen, about to manually close 1/31/14 3:42:02.000 AM kernel[0]: Cookie/0x1000001440b70 lsn 0x0 got ESTALE for reopen, about to manually close 1/31/14 3:42:02.000 AM kernel[0]: Cookie/0x1000001440b7f lsn 0x0 got ESTALE for reopen, about to manually close 1/31/14 3:42:02.000 AM kernel[0]: Cookie/0x1000001439f6c lsn 0x0 got ESTALE for reopen, about to manually close 1/31/14 3:42:02.000 AM kernel[0]: Cookie/0x1000001439f5f lsn 0x0 got ESTALE for reopen, about to manually close 1/31/14 3:42:02.000 AM kernel[0]: Cookie/0x1000001439f5b lsn 0x0 got ESTALE for reopen, about to manually close 1/31/14 3:42:02.000 AM kernel[0]: Cookie/0x1000001439e8a lsn 0x0 got ESTALE for reopen, about to manually close 1/31/14 3:42:02.000 AM kernel[0]: Cookie/0x1000001439e83 lsn 0x0 got ESTALE for reopen, about to manually close 1/31/14 3:42:02.000 AM kernel[0]: Cookie/0x10000014321cc lsn 0x0 got ESTALE for reopen, about to manually close 1/31/14 3:42:02.000 AM kernel[0]: Cookie/0x10000014321ce lsn 0x0 got ESTALE for reopen, about to manually close 1/31/14 3:42:03.000 AM kernel[0]: Reconnect successful to FSS 'Xsan' on host '10.1.226.66'. 1/31/14 3:42:03.000 AM kernel[0]: Using v2 readdir for 'Xsan' 1/31/14 3:42:03.195 AM fsmpm[329]: PortMapper: Reconnect Event for /Volumes/Xsan 1/31/14 3:42:03.195 AM fsmpm[329]: PortMapper: Requesting MDS recycle of /Volumes/Xsan 1/31/14 3:42:03.195 AM KernelEventAgent[70]: tid 54485244 received event(s) VQ_NOTRESP (1) 1/31/14 3:42:43.330 AM mds[63]: XSANFS_FSCTL_SpotlightRPC fsctl failed (errno = 12) 1/31/14 3:42:43.330 AM mds[63]: ERROR: _MDSChannelInitForXsan: _XsanCreateMDSChannel failed: 12 1/31/14 3:42:43.340 AM mds[63]: (Warning) Volume: vsd:0x7fa0a38b5e00 Open failed. failureCount:0 (null) Secondary MDC (during failover): 1/31/14 3:26:32.534 AM secd[503]: SecErrorGetOSStatus unknown error domain: com.apple.security.sos.error for error: The operation couldn’t be completed. (com.apple.security.sos.error error 2 - Public Key not available - failed to register before call) 1/31/14 3:26:32.534 AM secd[503]: securityd_xpc_dictionary_handler EscrowSecurityAl[1230] DeviceInCircle The operation couldn’t be completed. (com.apple.security.sos.error error 2 - Public Key not available - failed to register before call) 1/31/14 3:41:53.864 AM KernelEventAgent[71]: tid 54485244 received event(s) VQ_NOTRESP (1) 1/31/14 3:41:53.864 AM KernelEventAgent[71]: tid 54485244 type 'acfs', mounted on '/Volumes/Xsan', from '/dev/disk14', not responding 1/31/14 3:41:53.865 AM KernelEventAgent[71]: tid 54485244 found 1 filesystem(s) with problem(s) 1/31/14 3:41:54.000 AM kernel[0]: Reconnecting to FSS 'Xsan' 1/31/14 3:41:54.864 AM fsmpm[332]: PortMapper: Initiating activation vote for FSS 'Xsan'. 1/31/14 3:42:01.000 AM kernel[0]: Reconnect successful to FSS 'Xsan' on host '10.1.226.66'. 1/31/14 3:42:01.000 AM kernel[0]: Using v2 readdir for 'Xsan' 1/31/14 3:42:01.578 AM mds[64]: XSANFS_FSCTL_SpotlightRPC fsctl failed (errno = 35) 1/31/14 3:42:01.578 AM fsmpm[332]: PortMapper: Reconnect Event for /Volumes/Xsan 1/31/14 3:42:01.578 AM mds[64]: ERROR: _MDSChannelXsanFetchAccessTokenForUID: _XsanFetchAccessToken failed: 35 1/31/14 3:42:01.578 AM KernelEventAgent[71]: tid 54485244 received event(s) VQ_NOTRESP (1) 1/31/14 3:42:01.578 AM fsmpm[332]: PortMapper: Requesting MDS recycle of /Volumes/Xsan 1/31/14 3:42:01.578 AM mds[64]: (Error) Message: MDSChannel RPC failure (fetchQueryResultsForContext:) [no channelAccessToken] 1/31/14 3:42:01.579 AM mds[64]: (Error) Store: {channel:0x7fb209709ef0 localPath:'/Volumes/Xsan'} MDSChannel failed -- initiating recovery 1/31/14 3:42:01.580 AM fsm[334]: Xsan FSS 'Xsan[1]': Node 10.1.226.67 [1] does not support Directory Quotas. DQ limits will not be enforced on this client. 1/31/14 3:42:01.581 AM fsm[334]: Xsan FSS 'Xsan[1]': Node 10.1.226.139 [3] does not support Directory Quotas. DQ limits will not be enforced on this client. 1/31/14 3:42:01.581 AM fsm[334]: Xsan FSS 'Xsan[1]': Node 10.1.226.61 [4] does not support Directory Quotas. DQ limits will not be enforced on this client. 1/31/14 3:42:41.686 AM fsm[334]: MDSChannelPeerRef MDSChannelPeerCreate(CFAllocatorRef, CFDictionaryRef): (os/kern) invalid argument 1/31/14 3:42:41.686 AM fsm[334]: Xsan FSS 'Xsan[1]': XsanSpotlightRpc_ChannelCreate: MDSChannelPeerCreate failed 1/31/14 3:42:41.719 AM fsm[334]: MDSChannelPeerRef MDSChannelPeerCreate(CFAllocatorRef, CFDictionaryRef): (os/kern) invalid argument 1/31/14 3:42:41.720 AM fsm[334]: Xsan FSS 'Xsan[1]': XsanSpotlightRpc_ChannelCreate: MDSChannelPeerCreate failed 1/31/14 3:42:41.721 AM fsm[334]: MDSChannelPeerRef MDSChannelPeerCreate(CFAllocatorRef, CFDictionaryRef): (os/kern) invalid argument 1/31/14 3:42:41.721 AM fsm[334]: Xsan FSS 'Xsan[1]': XsanSpotlightRpc_ChannelCreate: MDSChannelPeerCreate failed 1/31/14 3:42:41.729 AM fsm[334]: MDSChannelPeerRef MDSChannelPeerCreate(CFAllocatorRef, CFDictionaryRef): (os/kern) invalid argument 1/31/14 3:42:41.729 AM fsm[334]: Xsan FSS 'Xsan[1]': XsanSpotlightRpc_ChannelCreate: MDSChannelPeerCreate failed 1/31/14 3:42:41.754 AM fsm[334]: MDSChannelPeerRef MDSChannelPeerCreate(CFAllocatorRef, CFDictionaryRef): (os/kern) invalid argument 1/31/14 3:42:41.755 AM fsm[334]: Xsan FSS 'Xsan[1]': XsanSpotlightRpc_ChannelCreate: MDSChannelPeerCreate failed 1/31/14 3:42:42.480 AM fsm[334]: MDSChannelPeerRef MDSChannelPeerCreate(CFAllocatorRef, CFDictionaryRef): (os/kern) invalid argument 1/31/14 3:42:42.480 AM fsm[334]: Xsan FSS 'Xsan[1]': XsanSpotlightRpc_ChannelCreate: MDSChannelPeerCreate failed 1/31/14 3:42:42.923 AM fsm[334]: MDSChannelPeerRef MDSChannelPeerCreate(CFAllocatorRef, CFDictionaryRef): (os/kern) invalid argument 1/31/14 3:42:42.923 AM fsm[334]: Xsan FSS 'Xsan[1]': XsanSpotlightRpc_ChannelCreate: MDSChannelPeerCreate failed 1/31/14 3:42:43.235 AM mds[64]: (Warning) DiskStore: vsd:0x7fb20c01f600 Reindexing /Volumes/Xsan because the volume UUID (B47765A5-AEF7-4E4F-81C6-4AF9905FEAF6) is not the expected UUID (9E593104-F333-4734-8FBA-92E75B5D59B4) Then tons of errors similar to the following: ... 1/31/14 3:56:21.000 AM kernel[0]: Sandbox: mdworker(66241) deny file-write-create /Volumes/Xsan/Users/Staff/joeuser/iPhoto Library S11/.ipspot_update.sb-3f48c7b2-0842vk ... 1/31/14 4:01:32.560 AM mdworker[66185]: (Normal) Import: Using too many resources after 8640 files (wired: 0 resident: 43242 swapped: 0 regions: 2078), hit usage threshold importing /Volumes/Xsan/Users/Staff/joeuser/WFF Archive/WFF 2010 Spot/Digidesign Databases, exiting to clean up now. 1/31/14 4:01:32.643 AM mdworker[66184]: (Normal) Import: Using too many resources after 8576 files (wired: 0 resident: 35498 swapped: 0 regions: 2077), hit usage threshold importing /Volumes/Xsan/Users/Staff/joeuser/WFF Archive/WFF SPOT 08/Web stills/Web icons, exiting to clean up now. Eventually wrapping up with: 1/31/14 4:11:13.637 AM mdworker[66963]: (Normal) Import: Using too many resources after 1984 files (wired: 0 resident: 19441 swapped: 0 regions: 2072), hit usage threshold importing /Volumes/Xsan/Users/Grads/joeuser2/poster.tif, exiting to clean up now. 1/31/14 4:11:27.366 AM mdworker[66891]: (Normal) Import: Using too many resources after 2048 files (wired: 0 resident: 3588 swapped: 0 regions: 2072), hit usage threshold importing /Volumes/Xsan/Users/Undergrads/joeuser3/Adobe Media Cache/Media Cache Files/301B-CAR 48000.pek, exiting to clean up now. 1/31/14 4:13:22.449 AM fsm[334]: Xsan FSS 'Xsan[1]': _Inodelookup invalid inode [0x0] 1/31/14 4:13:22.449 AM fsm[334]: Xsan FSS 'Xsan[1]': _Inodelookup invalid inode [0x0]

Pages

Subscribe to Xsanity RSS