Maverick, MDS, random failover

Hi all, I was awoken abruptly by a message stating that my Xsan volume had failed over. I got up to investigate, but can't find any telltale signs, other that some spotlight oddness. I recently rebuilt our SAN volume fresh under Mavericks.  Prior to this, I had always disabled spotlight, but I read over at Krypted.com that spotlight has drastically improved for Xsan 3, and that there should be no reason to not enable it. Checking my server stats and logs, I see that my acting MDC ramped up to a steady 20% CPU a few days ago.  That didn't subside until the failover this morning.  Looking at the logs, I can't see anything that corresponds with that much CPU usage.  The secondary MDC (now hosing the volume) also had some serious CPU usage following the failover, caused mostly by spotlight processes.  They were SERIOUSLY kicking the CPU.  We're talking total CPU usage in the neighborhood of 60% for the entire box.  Eventually, mds subsided and things are back to normal on the secondary, but I'll be damned if I can make sense of this.  Think I should just disable spotlight on this volume and rest easy? Some logs are below.  I'm particularly concerned about the inode errors at the end of it all. As always, thanks for any input! Pete Primary MDC (during the failover, nothing of note before this): 1/31/14 3:41:54.000 AM kernel[0]: Reconnecting to local portmapper on host '127.0.0.1' 1/31/14 3:41:54.000 AM kernel[0]: Local portmapper OK 1/31/14 3:41:54.269 AM KernelEventAgent[70]: tid 54485244 received event(s) VQ_NOTRESP (1) 1/31/14 3:41:54.269 AM KernelEventAgent[70]: tid 54485244 type 'acfs', mounted on '/Volumes/Xsan', from '/dev/disk14', not responding 1/31/14 3:41:54.270 AM KernelEventAgent[70]: tid 54485244 found 1 filesystem(s) with problem(s) 1/31/14 3:41:55.000 AM kernel[0]: Reconnecting to FSS 'Xsan' 1/31/14 3:41:55.269 AM fsmpm[329]: PortMapper: Initiating activation vote for FSS 'Xsan'. 1/31/14 3:41:56.800 AM fsmpm[329]: PortMapper: Starting FSS service 'Xsan[0]' on crosby.commarts.wisc.edu. 1/31/14 3:41:56.800 AM fsmpm[329]: PortMapper: Started FSS service 'Xsan' pid 70870. 1/31/14 3:42:02.000 AM kernel[0]: Cookie/0x1000001440b6b lsn 0x0 got ESTALE for reopen, about to manually close 1/31/14 3:42:02.000 AM kernel[0]: Cookie/0x1000001440b70 lsn 0x0 got ESTALE for reopen, about to manually close 1/31/14 3:42:02.000 AM kernel[0]: Cookie/0x1000001440b7f lsn 0x0 got ESTALE for reopen, about to manually close 1/31/14 3:42:02.000 AM kernel[0]: Cookie/0x1000001439f6c lsn 0x0 got ESTALE for reopen, about to manually close 1/31/14 3:42:02.000 AM kernel[0]: Cookie/0x1000001439f5f lsn 0x0 got ESTALE for reopen, about to manually close 1/31/14 3:42:02.000 AM kernel[0]: Cookie/0x1000001439f5b lsn 0x0 got ESTALE for reopen, about to manually close 1/31/14 3:42:02.000 AM kernel[0]: Cookie/0x1000001439e8a lsn 0x0 got ESTALE for reopen, about to manually close 1/31/14 3:42:02.000 AM kernel[0]: Cookie/0x1000001439e83 lsn 0x0 got ESTALE for reopen, about to manually close 1/31/14 3:42:02.000 AM kernel[0]: Cookie/0x10000014321cc lsn 0x0 got ESTALE for reopen, about to manually close 1/31/14 3:42:02.000 AM kernel[0]: Cookie/0x10000014321ce lsn 0x0 got ESTALE for reopen, about to manually close 1/31/14 3:42:03.000 AM kernel[0]: Reconnect successful to FSS 'Xsan' on host '10.1.226.66'. 1/31/14 3:42:03.000 AM kernel[0]: Using v2 readdir for 'Xsan' 1/31/14 3:42:03.195 AM fsmpm[329]: PortMapper: Reconnect Event for /Volumes/Xsan 1/31/14 3:42:03.195 AM fsmpm[329]: PortMapper: Requesting MDS recycle of /Volumes/Xsan 1/31/14 3:42:03.195 AM KernelEventAgent[70]: tid 54485244 received event(s) VQ_NOTRESP (1) 1/31/14 3:42:43.330 AM mds[63]: XSANFS_FSCTL_SpotlightRPC fsctl failed (errno = 12) 1/31/14 3:42:43.330 AM mds[63]: ERROR: _MDSChannelInitForXsan: _XsanCreateMDSChannel failed: 12 1/31/14 3:42:43.340 AM mds[63]: (Warning) Volume: vsd:0x7fa0a38b5e00 Open failed. failureCount:0 (null) Secondary MDC (during failover): 1/31/14 3:26:32.534 AM secd[503]: SecErrorGetOSStatus unknown error domain: com.apple.security.sos.error for error: The operation couldn’t be completed. (com.apple.security.sos.error error 2 - Public Key not available - failed to register before call) 1/31/14 3:26:32.534 AM secd[503]: securityd_xpc_dictionary_handler EscrowSecurityAl[1230] DeviceInCircle The operation couldn’t be completed. (com.apple.security.sos.error error 2 - Public Key not available - failed to register before call) 1/31/14 3:41:53.864 AM KernelEventAgent[71]: tid 54485244 received event(s) VQ_NOTRESP (1) 1/31/14 3:41:53.864 AM KernelEventAgent[71]: tid 54485244 type 'acfs', mounted on '/Volumes/Xsan', from '/dev/disk14', not responding 1/31/14 3:41:53.865 AM KernelEventAgent[71]: tid 54485244 found 1 filesystem(s) with problem(s) 1/31/14 3:41:54.000 AM kernel[0]: Reconnecting to FSS 'Xsan' 1/31/14 3:41:54.864 AM fsmpm[332]: PortMapper: Initiating activation vote for FSS 'Xsan'. 1/31/14 3:42:01.000 AM kernel[0]: Reconnect successful to FSS 'Xsan' on host '10.1.226.66'. 1/31/14 3:42:01.000 AM kernel[0]: Using v2 readdir for 'Xsan' 1/31/14 3:42:01.578 AM mds[64]: XSANFS_FSCTL_SpotlightRPC fsctl failed (errno = 35) 1/31/14 3:42:01.578 AM fsmpm[332]: PortMapper: Reconnect Event for /Volumes/Xsan 1/31/14 3:42:01.578 AM mds[64]: ERROR: _MDSChannelXsanFetchAccessTokenForUID: _XsanFetchAccessToken failed: 35 1/31/14 3:42:01.578 AM KernelEventAgent[71]: tid 54485244 received event(s) VQ_NOTRESP (1) 1/31/14 3:42:01.578 AM fsmpm[332]: PortMapper: Requesting MDS recycle of /Volumes/Xsan 1/31/14 3:42:01.578 AM mds[64]: (Error) Message: MDSChannel RPC failure (fetchQueryResultsForContext:) [no channelAccessToken] 1/31/14 3:42:01.579 AM mds[64]: (Error) Store: {channel:0x7fb209709ef0 localPath:'/Volumes/Xsan'} MDSChannel failed -- initiating recovery 1/31/14 3:42:01.580 AM fsm[334]: Xsan FSS 'Xsan[1]': Node 10.1.226.67 [1] does not support Directory Quotas. DQ limits will not be enforced on this client. 1/31/14 3:42:01.581 AM fsm[334]: Xsan FSS 'Xsan[1]': Node 10.1.226.139 [3] does not support Directory Quotas. DQ limits will not be enforced on this client. 1/31/14 3:42:01.581 AM fsm[334]: Xsan FSS 'Xsan[1]': Node 10.1.226.61 [4] does not support Directory Quotas. DQ limits will not be enforced on this client. 1/31/14 3:42:41.686 AM fsm[334]: MDSChannelPeerRef MDSChannelPeerCreate(CFAllocatorRef, CFDictionaryRef): (os/kern) invalid argument 1/31/14 3:42:41.686 AM fsm[334]: Xsan FSS 'Xsan[1]': XsanSpotlightRpc_ChannelCreate: MDSChannelPeerCreate failed 1/31/14 3:42:41.719 AM fsm[334]: MDSChannelPeerRef MDSChannelPeerCreate(CFAllocatorRef, CFDictionaryRef): (os/kern) invalid argument 1/31/14 3:42:41.720 AM fsm[334]: Xsan FSS 'Xsan[1]': XsanSpotlightRpc_ChannelCreate: MDSChannelPeerCreate failed 1/31/14 3:42:41.721 AM fsm[334]: MDSChannelPeerRef MDSChannelPeerCreate(CFAllocatorRef, CFDictionaryRef): (os/kern) invalid argument 1/31/14 3:42:41.721 AM fsm[334]: Xsan FSS 'Xsan[1]': XsanSpotlightRpc_ChannelCreate: MDSChannelPeerCreate failed 1/31/14 3:42:41.729 AM fsm[334]: MDSChannelPeerRef MDSChannelPeerCreate(CFAllocatorRef, CFDictionaryRef): (os/kern) invalid argument 1/31/14 3:42:41.729 AM fsm[334]: Xsan FSS 'Xsan[1]': XsanSpotlightRpc_ChannelCreate: MDSChannelPeerCreate failed 1/31/14 3:42:41.754 AM fsm[334]: MDSChannelPeerRef MDSChannelPeerCreate(CFAllocatorRef, CFDictionaryRef): (os/kern) invalid argument 1/31/14 3:42:41.755 AM fsm[334]: Xsan FSS 'Xsan[1]': XsanSpotlightRpc_ChannelCreate: MDSChannelPeerCreate failed 1/31/14 3:42:42.480 AM fsm[334]: MDSChannelPeerRef MDSChannelPeerCreate(CFAllocatorRef, CFDictionaryRef): (os/kern) invalid argument 1/31/14 3:42:42.480 AM fsm[334]: Xsan FSS 'Xsan[1]': XsanSpotlightRpc_ChannelCreate: MDSChannelPeerCreate failed 1/31/14 3:42:42.923 AM fsm[334]: MDSChannelPeerRef MDSChannelPeerCreate(CFAllocatorRef, CFDictionaryRef): (os/kern) invalid argument 1/31/14 3:42:42.923 AM fsm[334]: Xsan FSS 'Xsan[1]': XsanSpotlightRpc_ChannelCreate: MDSChannelPeerCreate failed 1/31/14 3:42:43.235 AM mds[64]: (Warning) DiskStore: vsd:0x7fb20c01f600 Reindexing /Volumes/Xsan because the volume UUID (B47765A5-AEF7-4E4F-81C6-4AF9905FEAF6) is not the expected UUID (9E593104-F333-4734-8FBA-92E75B5D59B4) Then tons of errors similar to the following: ... 1/31/14 3:56:21.000 AM kernel[0]: Sandbox: mdworker(66241) deny file-write-create /Volumes/Xsan/Users/Staff/joeuser/iPhoto Library S11/.ipspot_update.sb-3f48c7b2-0842vk ... 1/31/14 4:01:32.560 AM mdworker[66185]: (Normal) Import: Using too many resources after 8640 files (wired: 0 resident: 43242 swapped: 0 regions: 2078), hit usage threshold importing /Volumes/Xsan/Users/Staff/joeuser/WFF Archive/WFF 2010 Spot/Digidesign Databases, exiting to clean up now. 1/31/14 4:01:32.643 AM mdworker[66184]: (Normal) Import: Using too many resources after 8576 files (wired: 0 resident: 35498 swapped: 0 regions: 2077), hit usage threshold importing /Volumes/Xsan/Users/Staff/joeuser/WFF Archive/WFF SPOT 08/Web stills/Web icons, exiting to clean up now. Eventually wrapping up with: 1/31/14 4:11:13.637 AM mdworker[66963]: (Normal) Import: Using too many resources after 1984 files (wired: 0 resident: 19441 swapped: 0 regions: 2072), hit usage threshold importing /Volumes/Xsan/Users/Grads/joeuser2/poster.tif, exiting to clean up now. 1/31/14 4:11:27.366 AM mdworker[66891]: (Normal) Import: Using too many resources after 2048 files (wired: 0 resident: 3588 swapped: 0 regions: 2072), hit usage threshold importing /Volumes/Xsan/Users/Undergrads/joeuser3/Adobe Media Cache/Media Cache Files/301B-CAR 48000.pek, exiting to clean up now. 1/31/14 4:13:22.449 AM fsm[334]: Xsan FSS 'Xsan[1]': _Inodelookup invalid inode [0x0] 1/31/14 4:13:22.449 AM fsm[334]: Xsan FSS 'Xsan[1]': _Inodelookup invalid inode [0x0]

ChrisS's picture

http://support.apple.com/kb/HT3749

 

content Indexing of video files is too problematic.

 

searchFS will just index file names.

ChrisS's picture

http://support.apple.com/kb/HT3749

 

content Indexing of video files is too problematic.

 

searchFS will just index file names.

Awesome.  Thanks for the tip, Chirs.  I'll give it a try and see how things go.  Since the last index finihsed, the volume has been pretty stable, but I'd rather not take any chances later on down the line.

abstractrude's picture

I have never used Spotlight on Production Xsan volumes. It is not worth the trouble, if your workflow is designed correctly you shouldn't need to search anyway. That said, there are times for search and that is where easyfind comes in. It works amazing, and does things spotlight has never done. Best of all it is free! 

https://itunes.apple.com/us/app/easyfind/id411673888?mt=12

-Trevor Carlson
THUMBWAR