Xsanity Sanity for Apple's Xsan and Final Cut Server.
  
Monday, May 20 2013 @ 06:32 PM EDT
Topics
Storage (39)
People (1)
Xsan (103)
How To (26)
User Functions
Username:

Password:

Don't have an account yet? Sign up as a New User
Who's Online
Guest Users: 8
Sponsorship

Xsanity is proudly sponsored by:

Tekserve
The Old Reliable Mac Shop

Volume stops with activity

 
Post new topic   Reply to topic    Xsanity Forums Forum Index -> Troubleshooting
View previous topic :: View next topic  
Author Message
geoffk
partially protected
partially protected


Joined: 23 Nov 2009
Posts: 7

PostPosted: Mon Nov 23, 2009 12:02 pm    Post subject: Volume stops with activity Reply with quote

Hi there. I had a SAN fall into my lap pretty recently, and I am pretty new at it. I am having a strange issue with one volume. It is an older xserve RAID (ca. 2006 i think).

Basically, I can get the volume to start, and mount. But once I try to move data to it, or delete files from it, the Volume stops, and then sometimes can be restarted, and sometimes not.

I have a primary and backup MDC, both on Snow Leopard 10.6.1, and Xsan is version 2.

I know this is not much info to go on, but if this sounds familiar to anyone, some help would be much appreciated!

the volume was empty till now, so I would not be opposed to destroying it and starting over. I would rather understand and fix the problem though, if possible Cool

Thanks!
Back to top
View user's profile Send private message
lotte
Xsan Master
Xsan Master


Joined: 11 Dec 2008
Posts: 190

PostPosted: Mon Nov 23, 2009 12:31 pm    Post subject: Reply with quote

Please post the output of /var/log/system.log corresponding to the failure.

Lotte
Back to top
View user's profile Send private message
geoffk
partially protected
partially protected


Joined: 23 Nov 2009
Posts: 7

PostPosted: Mon Nov 23, 2009 12:51 pm    Post subject: Reply with quote

Code:
Nov 23 12:17:12 mdc2 servermgrd[52520]: xsan: [52520/211180] ERROR: get_fsm_process_stats(Storage): Unable to find pid of fsm
Nov 23 12:18:12 mdc2 servermgrd[52520]: xsan: [52520/211180] ERROR: get_fsm_process_stats(Storage): Unable to find pid of fsm
Nov 23 12:19:12 mdc2 servermgrd[52520]: xsan: [52520/211180] ERROR: get_fsm_process_stats(Storage): Unable to find pid of fsm
Nov 23 12:20:12 mdc2 servermgrd[52520]: xsan: [52520/211180] ERROR: get_fsm_process_stats(Storage): Unable to find pid of fsm
Nov 23 12:21:12 mdc2 servermgrd[52520]: xsan: [52520/211180] ERROR: get_fsm_process_stats(Storage): Unable to find pid of fsm
Nov 23 12:22:13 mdc2 servermgrd[52520]: xsan: [52520/211180] ERROR: get_fsm_process_stats(Storage): Unable to find pid of fsm
Nov 23 12:23:13 mdc2 servermgrd[52520]: xsan: [52520/211180] ERROR: get_fsm_process_stats(Storage): Unable to find pid of fsm
Nov 23 12:24:12 mdc2 servermgrd[52520]: xsan: [52520/211180] ERROR: get_fsm_process_stats(Storage): Unable to find pid of fsm
Nov 23 12:25:08 mdc2 org.clamav.clamd[66]: SelfCheck: Database status OK.
Nov 23 12:25:12 mdc2 servermgrd[52520]: xsan: [52520/211180] ERROR: get_fsm_process_stats(Storage): Unable to find pid of fsm
Nov 23 12:25:23 mdc2 com.apple.wikid.mailinglists[69284]: mail:status = "list-updated"
Nov 23 12:26:12 mdc2 servermgrd[52520]: xsan: [52520/211180] ERROR: get_fsm_process_stats(Storage): Unable to find pid of fsm
Nov 23 12:27:12 mdc2 servermgrd[52520]: xsan: [52520/211180] ERROR: get_fsm_process_stats(Storage): Unable to find pid of fsm
Nov 23 12:28:12 mdc2 servermgrd[52520]: xsan: [52520/211180] ERROR: get_fsm_process_stats(Storage): Unable to find pid of fsm
Nov 23 12:29:12 mdc2 servermgrd[52520]: xsan: [52520/211180] ERROR: get_fsm_process_stats(Storage): Unable to find pid of fsm
Nov 23 12:30:12 mdc2 servermgrd[52520]: xsan: [52520/211180] ERROR: get_fsm_process_stats(Storage): Unable to find pid of fsm
Nov 23 12:31:12 mdc2 servermgrd[52520]: xsan: [52520/211180] ERROR: get_fsm_process_stats(Storage): Unable to find pid of fsm
Nov 23 12:32:12 mdc2 servermgrd[52520]: xsan: [52520/211180] ERROR: get_fsm_process_stats(Storage): Unable to find pid of fsm
Nov 23 12:33:09 mdc2 Xsan Admin[69944]: ERROR: No authenticated MDCs! moc = <NSManagedObjectContext: 0x1006224a0>, returning nil
Nov 23 12:33:09 mdc2 Xsan Admin[69944]: ERROR: No session for computer mdc2 (CEEEED40-76D8-49D6-9443-55A2A71FBB58)
Nov 23 12:33:09 mdc2 Xsan Admin[69944]: ERROR: No session for computer mdc1 (1B2C573B-ED8F-408C-8196-153AF072A7A4)
Nov 23 12:33:09: --- last message repeated 1 time ---
Nov 23 12:33:09 mdc2 Xsan Admin[69944]: ERROR: Error getting list of volumes: kOfflineError (0)
Nov 23 12:33:09: --- last message repeated 1 time ---
Nov 23 12:33:09 mdc2 Xsan Admin[69944]: ERROR: Error reading volume statistics for xsan: kOfflineError (0)
Nov 23 12:33:13 mdc2 servermgrd[52520]: xsan: [52520/211180] ERROR: get_fsm_process_stats(Storage): Unable to find pid of fsm
Nov 23 12:33:20 mdc2 servermgrd[52520]: Got error -9806 for SSLHandshake  remote address is 10.0.10.201:11980
Nov 23 12:33:20 mdc2 servermgrd[52520]: Exception in threadListen: Socket: Connect failed
Nov 23 12:33:20 mdc2 fsmpm[198]: PortMapper: Starting FSS service 'Storage[1]' on mdc2.comcast.private.
Nov 23 12:33:21 mdc2 fsmpm[198]: PortMapper: FSS 'Storage'[1] (pid 69976) at port 50988 is registered.
Nov 23 12:33:34 mdc2 fsmpm[198]: PortMapper: Mount Event for /Volumes/Storage on /dev/disk16
Nov 23 12:33:34 mdc2 fsmpm[198]: PortMapper: Setting permissions on trash directory: /Volumes/Storage/.Trashes
Nov 23 12:33:34 mdc2 com.apple.xsan[76]: Filesystem Storage mounted on /Volumes/Storage
Nov 23 12:33:34 mdc2 servermgrd[52520]: xsan: [52520/6045800] ERROR: -[SANFilesystem mountVolumeNamed:writable:withOptions:]: mount of 'Storage' failed: Volume `Storage' is already mounted
Nov 23 12:33:34 mdc2 Xsan Admin[69944]: ERROR: Error mounting volume…: The operation couldn’t be completed. (SANTransactionErrorDomain error 100007.) (100007)
Nov 23 12:35:23 mdc2 com.apple.wikid.mailinglists[70175]: mail:status = "list-updated"
Nov 23 12:37:32 mdc2 bootpd[70366]: server name mdc2.comcastvst.com
Nov 23 12:37:32 mdc2 bootpd[70366]: interface en0: ip 10.0.1.202 mask 255.255.255.0
Nov 23 12:37:32 mdc2 bootpd[70366]: interface en1: ip 10.0.10.202 mask 255.255.255.0
Nov 23 12:37:32 mdc2 bootpd[70366]: bsdpd: re-reading configuration
Nov 23 12:37:32 mdc2 bootpd[70366]: bsdpd: shadow file size will be set to 48 megabytes
Nov 23 12:37:32 mdc2 bootpd[70366]: bsdpd: age time 00:15:00
Nov 23 12:37:32 mdc2 bootpd[70366]: DHCP REQUEST [en0]: 1,0:25:0:4d:27:e7 <MBPro-2>
Nov 23 12:37:32 mdc2 bootpd[70366]: ACK sent MBPro-2 10.0.1.244 pktsize 369
Nov 23 12:38:32 mdc2 KernelEventAgent[96]: tid 00000000 received event(s) VQ_NOTRESP (1)
Nov 23 12:38:32 mdc2 KernelEventAgent[96]: tid 00000000 type 'acfs', mounted on '/Volumes/Storage', from '/dev/disk16', not responding
Nov 23 12:38:32 mdc2 KernelEventAgent[96]: tid 00000000 found 1 filesystem(s) with problem(s)
Nov 23 12:38:33 mdc2 fsmpm[198]: PortMapper: Initiating activation vote for FSS 'Storage'.
Nov 23 12:38:37 mdc2 org.clamav.clamd[66]: SelfCheck: Database status OK.
Nov 23 12:38:38 mdc2 com.apple.xsan[76]: xsan:perfDispatchMicroseconds = 612802
Nov 23 12:38:38 mdc2 com.apple.xsan[76]: xsan:perfFunctionMicroseconds = 613187
Nov 23 12:38:38 mdc2 fsm[69976]: Xsan FSS 'Storage[1]': Windows Security has been turned off in config file but clients have been requested to enforce ACLs.  Windows Security remains in effect.
Nov 23 12:38:38 mdc2 fsmpm[198]: PortMapper: Reconnect Event for /Volumes/Storage
Nov 23 12:38:38 mdc2 fsmpm[198]: PortMapper: Requesting MDS recycle of /Volumes/Storage
Nov 23 12:38:38 mdc2 KernelEventAgent[96]: tid 00000000 received event(s) VQ_NOTRESP (1)
Nov 23 12:38:38 mdc2 fsm[69976]: Xsan FSS 'Storage[1]': PANIC: /Library/Filesystems/Xsan/bin/fsm ASSERT failed "next_inode != L64(0)" file fsm_xattr.c, line 144
Nov 23 12:38:38 mdc2 fsm[69976]: PANIC: /Library/Filesystems/Xsan/bin/fsm ASSERT failed "next_inode != L64(0)" file fsm_xattr.c, line 144
Nov 23 12:38:38 mdc2 KernelEventAgent[96]: tid 00000000 received event(s) VQ_NOTRESP (1)
Nov 23 12:38:38 mdc2 KernelEventAgent[96]: tid 00000000 type 'acfs', mounted on '/Volumes/Storage', from '/dev/disk16', not responding
Nov 23 12:38:38 mdc2 KernelEventAgent[96]: tid 00000000 found 1 filesystem(s) with problem(s)
Nov 23 12:38:38 mdc2 fsm[69976]: Xsan FSS 'Storage[1]': PANIC: wait 3 secs for journal to flush
Nov 23 12:38:38 mdc2 fsm[69976]: Xsan FSS 'Storage[1]': PANIC: aborting threads now.
Nov 23 12:38:44 mdc2 fsmpm[198]: PortMapper: Initiating activation vote for FSS 'Storage'.
Nov 23 12:38:44 mdc2 bootpd[70366]: DHCP REQUEST [en0]: 1,0:16:1:a5:89:58 <TS-HTGL958>
Nov 23 12:38:44 mdc2 bootpd[70366]: ACK sent TS-HTGL958. 10.0.1.239 pktsize 300
Nov 23 12:38:47 mdc2 fsmpm[198]: PortMapper: Reconnect Event for /Volumes/Storage
Nov 23 12:38:47 mdc2 fsmpm[198]: PortMapper: Requesting MDS recycle of /Volumes/Storage
Nov 23 12:38:47 mdc2 KernelEventAgent[96]: tid 00000000 received event(s) VQ_NOTRESP (1)
Nov 23 12:38:47: --- last message repeated 1 time ---
Nov 23 12:38:47 mdc2 KernelEventAgent[96]: tid 00000000 type 'acfs', mounted on '/Volumes/Storage', from '/dev/disk16', not responding
Nov 23 12:38:47 mdc2 KernelEventAgent[96]: tid 00000000 found 1 filesystem(s) with problem(s)
Nov 23 12:38:48 mdc2 fsmpm[198]: PortMapper: Initiating activation vote for FSS 'Storage'.
Nov 23 12:38:50 mdc2 fsmpm[198]: PortMapper: FSS 'Storage' disconnected.
Nov 23 12:38:50 mdc2 fsmpm[198]: PortMapper: kicking diskscan_thread 4338487296.
Nov 23 12:38:50 mdc2 com.apple.ReportCrash.Root[70505]: 2009-11-23 12:38:50.454 ReportCrash[70505:2803] Saved crash report for fsm[69976] version ??? (???) to /Library/Logs/DiagnosticReports/fsm_2009-11-23-123850_localhost.crash
Nov 23 12:38:50 mdc2 fsmpm[198]: Portmapper: FSS 'Storage' (pid 69976) exited on signal 6
Nov 23 12:39:00 mdc2 fsmpm[198]: PortMapper: RESTART FSS service 'Storage[1]' on host mdc2.comcast.private.
Nov 23 12:39:00 mdc2 fsmpm[198]: PortMapper: Starting FSS service 'Storage[1]' on mdc2.comcast.private.
Nov 23 12:39:00 mdc2 fsmpm[198]: PortMapper: FSS 'Storage'[1] (pid 70519) at port 51279 is registered.
Nov 23 12:39:00 mdc2 fsmpm[198]: PortMapper: Initiating activation vote for FSS 'Storage'.
Nov 23 12:39:05: --- last message repeated 1 time ---
Nov 23 12:39:05 mdc2 com.apple.xsan[76]: xsan:perfDispatchMicroseconds = 421031
Nov 23 12:39:05 mdc2 com.apple.xsan[76]: xsan:perfFunctionMicroseconds = 421487
Nov 23 12:39:06 mdc2 fsm[70519]: Xsan FSS 'Storage[1]': Windows Security has been turned off in config file but clients have been requested to enforce ACLs.  Windows Security remains in effect.
Nov 23 12:39:06 mdc2 fsmpm[198]: PortMapper: Reconnect Event for /Volumes/Storage
Nov 23 12:39:06 mdc2 fsmpm[198]: PortMapper: Requesting MDS recycle of /Volumes/Storage
Nov 23 12:39:06 mdc2 KernelEventAgent[96]: tid 00000000 received event(s) VQ_NOTRESP (1)
Nov 23 12:39:06 mdc2 fsm[70519]: Xsan FSS 'Storage[1]': PANIC: /Library/Filesystems/Xsan/bin/fsm ASSERT failed "next_inode != L64(0)" file fsm_xattr.c, line 144
Nov 23 12:39:06 mdc2 fsm[70519]: PANIC: /Library/Filesystems/Xsan/bin/fsm ASSERT failed "next_inode != L64(0)" file fsm_xattr.c, line 144
Nov 23 12:39:06 mdc2 KernelEventAgent[96]: tid 00000000 received event(s) VQ_NOTRESP (1)
Nov 23 12:39:06 mdc2 KernelEventAgent[96]: tid 00000000 type 'acfs', mounted on '/Volumes/Storage', from '/dev/disk16', not responding
Nov 23 12:39:06 mdc2 KernelEventAgent[96]: tid 00000000 found 1 filesystem(s) with problem(s)
Nov 23 12:39:06 mdc2 fsm[70519]: Xsan FSS 'Storage[1]': PANIC: wait 3 secs for journal to flush
Nov 23 12:39:06 mdc2 fsm[70519]: Xsan FSS 'Storage[1]': PANIC: aborting threads now.
Nov 23 12:39:11 mdc2 fsmpm[198]: PortMapper: Initiating activation vote for FSS 'Storage'.
Nov 23 12:39:13 mdc2 servermgrd[52520]: xsan: [52520/211180] ERROR: get_fsmvol_at_index: Could not connect to FSM because File System Manager "Storage" on mdc2.comcast.private is on standby.
Nov 23 12:39:13 mdc2 fsmpm[198]: PortMapper: Initiating activation vote for FSS 'Storage'.
Nov 23 12:39:13 mdc2 servermgrd[52520]: xsan: [52520/211180] ERROR: get_fsmvol_at_index: Could not connect to FSM because File System Manager "Storage" on mdc2.comcast.private is on standby.
Nov 23 12:39:13 mdc2 servermgrd[52520]: xsan: [52520/211180] ERROR: get_quotas_for_fsmvol_named(Storage): Could not connect to FSM because File System Manager "Storage" on mdc2.comcast.private is on standby.
Nov 23 12:39:15 mdc2 fsmpm[198]: PortMapper: Initiating activation vote for FSS 'Storage'.
Nov 23 12:39:16 mdc2 com.apple.ReportCrash.Root[70505]: 2009-11-23 12:39:16.210 ReportCrash[70505:2b03] Saved crash report for fsm[70519] version ??? (???) to /Library/Logs/DiagnosticReports/fsm_2009-11-23-123916_localhost.crash
Nov 23 12:39:16 mdc2 fsmpm[198]: PortMapper: FSS 'Storage' disconnected.
Nov 23 12:39:16 mdc2 fsmpm[198]: PortMapper: kicking diskscan_thread 4338487296.
Nov 23 12:39:16 mdc2 fsmpm[198]: Portmapper: FSS 'Storage' (pid 70519) exited on signal 6
Nov 23 12:39:16 mdc2 fsmpm[198]: FSS 'Storage' appears unstable (2 failures in 60 minutes or less), halting restarts
Nov 23 12:39:17 mdc2 fsmpm[198]: PortMapper: Initiating activation vote for FSS 'Storage'.
Nov 23 12:39:20 mdc2 fsmpm[198]: PortMapper: Reconnect Event for /Volumes/Storage
Nov 23 12:39:20 mdc2 fsmpm[198]: PortMapper: Requesting MDS recycle of /Volumes/Storage
Nov 23 12:39:20 mdc2 KernelEventAgent[96]: tid 00000000 received event(s) VQ_NOTRESP (1)
Nov 23 12:39:20: --- last message repeated 1 time ---
Nov 23 12:39:20 mdc2 KernelEventAgent[96]: tid 00000000 type 'acfs', mounted on '/Volumes/Storage', from '/dev/disk16', not responding
Nov 23 12:39:20 mdc2 KernelEventAgent[96]: tid 00000000 found 1 filesystem(s) with problem(s)
Back to top
View user's profile Send private message
abstractrude
Xsan Master
Xsan Master


Joined: 13 Mar 2008
Posts: 860

PostPosted: Mon Nov 23, 2009 2:21 pm    Post subject: Reply with quote

did it always work correctly and did events occur ? power problems, etc. Have you done a cvfsck -nv of the volume yet? make sure its stopped when you do.
Back to top
View user's profile Send private message
geoffk
partially protected
partially protected


Joined: 23 Nov 2009
Posts: 7

PostPosted: Mon Nov 23, 2009 3:09 pm    Post subject: Reply with quote

there were no events that I was aware of. The problem first manifest itself when I did a mass copy of our working san volume to this one. I just ran a cvfsck -nv as you suggested.

Code:
BUILD INFO:
#!@$ Server Revision 3.5.0 Build 7443 Branch branches_35X (412)
#!@$ Built for Darwin 10.0 i386
#!@$ Created on Wed Sep  9 15:13:36 PDT 2009


Created directory /tmp/cvfsck83416a for temporary files.
Creating MetadataAndJournal allocation check file.
Creating Storage allocation check file.

** NOTE ** Read Only Check.
File system journal will not be recovered.
The results may be inconsistent and mis-leading.


Super Block information.
  FS Created On               : Thu Sep 17 21:40:27 2009
  Inode Version               : '2.5'
  File System Status          : Clean
  Allocated Inodes            : 246784
  Free Inodes                 : 17062
  FL Blocks                   : 29
  Next Inode Chunk            : 0x18ae1
  Metadump Seqno              : 0
  Restore Journal Seqno       : 0
  Windows Security Indx Inode : 0x5
  Windows Security Data Inode : 0x6
  Quota Database Inode        : 0x7
  ID Database Inode           : 0xb
  Client Write Opens Inode    : 0x8

Stripe Group MetadataAndJournal             (  0) 0x17497b0 blocks.
Stripe Group Storage                        (  1) 0x311f4b10 blocks.

Building Inode Index Database 246784 (100%).         

Verifying NT Security Descriptors
Found 247 NT Security Descriptors: all are good


Verifying Free List Extents.

Scanning inodes 235520 ( 95%).       
*Error*: XATTR ERROR detected for inode 0x621e604 (0x621e606)
*Error*: *** RESETTING RPL DATA FOR FILE SYSTEM ***
Scanning inodes 246784 (100%).         

Sorting extent list for MetadataAndJournal pass 1/1
Updating bitmap for MetadataAndJournal extents 83468 ( 30%).                   
Sorting extent list for Storage pass 1/1
Updating bitmap for Storage extents 277807 (100%).                   

Checking for dead inodes 235520 ( 95%).       
Clearing  inode 0x621e605.
Clearing  inode 0x621e607.
Checking for dead inodes 246784 (100%).         

Checking directories 83431 (100%).         

Scanning for orphaned inodes 246784 (100%).       

Verifying link & subdir counts 246784 (100%).         

Super Block -  Free Inodes of 0x42a6 is wrong! Should be 0x42a9.

Repairing free list.

Checking pending free list.                       

Checking Arbitration Control Block.

Checking MetadataAndJournal allocation bit maps (100%).         
Checking Storage allocation bit maps ( 40%).         
Stripe Group 'Storage' block 0x16cbd70f differs - ondisk/Allocated 0x00 incore/Free 0x01 byteoffs/47807201 .
Map file 'Storage' is incorrect. Updating.

File system 'Storage' was modified.

File system 'Storage'. Blocks-824134416 free-424138693 Inodes-246784 free-17065.

File System Check finished - file system modified.
Back to top
View user's profile Send private message
abstractrude
Xsan Master
Xsan Master


Joined: 13 Mar 2008
Posts: 860

PostPosted: Mon Nov 23, 2009 8:14 pm    Post subject: Reply with quote

ok now you need to run a cvfsck -wv


stop the volume do
cvfsck -j VOLNAME
cvfsck -wv VOLNAME
cvfsck -nv VOLNAME

it should NOT say file system was modified after you finish.
Back to top
View user's profile Send private message
johnoag79
partially protected
partially protected


Joined: 21 Jul 2009
Posts: 7

PostPosted: Tue Nov 24, 2009 10:35 am    Post subject: Reply with quote

I had this same problem, ended up needing cvfsck -X to clear the XATTR errors.
Back to top
View user's profile Send private message
abstractrude
Xsan Master
Xsan Master


Joined: 13 Mar 2008
Posts: 860

PostPosted: Tue Nov 24, 2009 2:59 pm    Post subject: Reply with quote

i dont usually use -X but hey it doesnt work right
Back to top
View user's profile Send private message
cedge318
Could work for Apple
Could work for Apple


Joined: 12 Jan 2009
Posts: 47

PostPosted: Tue Nov 24, 2009 8:48 pm    Post subject: ? Reply with quote

Wait, did Johnoag79 say it did work? I haven't actually had the -X function properly for me and would be curious if there was something specific that made it work?
Back to top
View user's profile Send private message Visit poster's website
johnoag79
partially protected
partially protected


Joined: 21 Jul 2009
Posts: 7

PostPosted: Wed Nov 25, 2009 10:13 am    Post subject: Reply with quote

I used it as a last resort, as nothing else would get rid of the XATTR errors reported with -vw.

It did result in an RPL_Upgrade taking place afterwards, which takes time...
Back to top
View user's profile Send private message
geoffk
partially protected
partially protected


Joined: 23 Nov 2009
Posts: 7

PostPosted: Sun Nov 29, 2009 10:04 pm    Post subject: Reply with quote

i had gotten to cvfsck on my own, but wasnt really sure what to do with it. what exactly did we do here?

this seems to have done the trick. I was able to empty trash without the volume going offline. Going to do a full ~15 TB SAN dump to it later this week, shall see how it holds up!

Thanks lotte and abstractrude!
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    Xsanity Forums Forum Index -> Troubleshooting All times are GMT - 5 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group
Best Viewed on a Mac | Suggested Browser: Whatever floats yer boat.