| View previous topic :: View next topic |
| Author |
Message |
geoffk partially protected

Joined: 23 Nov 2009 Posts: 7
|
Posted: Mon Nov 23, 2009 12:02 pm Post subject: Volume stops with activity |
|
|
Hi there. I had a SAN fall into my lap pretty recently, and I am pretty new at it. I am having a strange issue with one volume. It is an older xserve RAID (ca. 2006 i think).
Basically, I can get the volume to start, and mount. But once I try to move data to it, or delete files from it, the Volume stops, and then sometimes can be restarted, and sometimes not.
I have a primary and backup MDC, both on Snow Leopard 10.6.1, and Xsan is version 2.
I know this is not much info to go on, but if this sounds familiar to anyone, some help would be much appreciated!
the volume was empty till now, so I would not be opposed to destroying it and starting over. I would rather understand and fix the problem though, if possible
Thanks! |
|
| Back to top |
|
 |
lotte Xsan Master

Joined: 11 Dec 2008 Posts: 190
|
Posted: Mon Nov 23, 2009 12:31 pm Post subject: |
|
|
Please post the output of /var/log/system.log corresponding to the failure.
Lotte |
|
| Back to top |
|
 |
geoffk partially protected

Joined: 23 Nov 2009 Posts: 7
|
Posted: Mon Nov 23, 2009 12:51 pm Post subject: |
|
|
| Code: | Nov 23 12:17:12 mdc2 servermgrd[52520]: xsan: [52520/211180] ERROR: get_fsm_process_stats(Storage): Unable to find pid of fsm
Nov 23 12:18:12 mdc2 servermgrd[52520]: xsan: [52520/211180] ERROR: get_fsm_process_stats(Storage): Unable to find pid of fsm
Nov 23 12:19:12 mdc2 servermgrd[52520]: xsan: [52520/211180] ERROR: get_fsm_process_stats(Storage): Unable to find pid of fsm
Nov 23 12:20:12 mdc2 servermgrd[52520]: xsan: [52520/211180] ERROR: get_fsm_process_stats(Storage): Unable to find pid of fsm
Nov 23 12:21:12 mdc2 servermgrd[52520]: xsan: [52520/211180] ERROR: get_fsm_process_stats(Storage): Unable to find pid of fsm
Nov 23 12:22:13 mdc2 servermgrd[52520]: xsan: [52520/211180] ERROR: get_fsm_process_stats(Storage): Unable to find pid of fsm
Nov 23 12:23:13 mdc2 servermgrd[52520]: xsan: [52520/211180] ERROR: get_fsm_process_stats(Storage): Unable to find pid of fsm
Nov 23 12:24:12 mdc2 servermgrd[52520]: xsan: [52520/211180] ERROR: get_fsm_process_stats(Storage): Unable to find pid of fsm
Nov 23 12:25:08 mdc2 org.clamav.clamd[66]: SelfCheck: Database status OK.
Nov 23 12:25:12 mdc2 servermgrd[52520]: xsan: [52520/211180] ERROR: get_fsm_process_stats(Storage): Unable to find pid of fsm
Nov 23 12:25:23 mdc2 com.apple.wikid.mailinglists[69284]: mail:status = "list-updated"
Nov 23 12:26:12 mdc2 servermgrd[52520]: xsan: [52520/211180] ERROR: get_fsm_process_stats(Storage): Unable to find pid of fsm
Nov 23 12:27:12 mdc2 servermgrd[52520]: xsan: [52520/211180] ERROR: get_fsm_process_stats(Storage): Unable to find pid of fsm
Nov 23 12:28:12 mdc2 servermgrd[52520]: xsan: [52520/211180] ERROR: get_fsm_process_stats(Storage): Unable to find pid of fsm
Nov 23 12:29:12 mdc2 servermgrd[52520]: xsan: [52520/211180] ERROR: get_fsm_process_stats(Storage): Unable to find pid of fsm
Nov 23 12:30:12 mdc2 servermgrd[52520]: xsan: [52520/211180] ERROR: get_fsm_process_stats(Storage): Unable to find pid of fsm
Nov 23 12:31:12 mdc2 servermgrd[52520]: xsan: [52520/211180] ERROR: get_fsm_process_stats(Storage): Unable to find pid of fsm
Nov 23 12:32:12 mdc2 servermgrd[52520]: xsan: [52520/211180] ERROR: get_fsm_process_stats(Storage): Unable to find pid of fsm
Nov 23 12:33:09 mdc2 Xsan Admin[69944]: ERROR: No authenticated MDCs! moc = <NSManagedObjectContext: 0x1006224a0>, returning nil
Nov 23 12:33:09 mdc2 Xsan Admin[69944]: ERROR: No session for computer mdc2 (CEEEED40-76D8-49D6-9443-55A2A71FBB58)
Nov 23 12:33:09 mdc2 Xsan Admin[69944]: ERROR: No session for computer mdc1 (1B2C573B-ED8F-408C-8196-153AF072A7A4)
Nov 23 12:33:09: --- last message repeated 1 time ---
Nov 23 12:33:09 mdc2 Xsan Admin[69944]: ERROR: Error getting list of volumes: kOfflineError (0)
Nov 23 12:33:09: --- last message repeated 1 time ---
Nov 23 12:33:09 mdc2 Xsan Admin[69944]: ERROR: Error reading volume statistics for xsan: kOfflineError (0)
Nov 23 12:33:13 mdc2 servermgrd[52520]: xsan: [52520/211180] ERROR: get_fsm_process_stats(Storage): Unable to find pid of fsm
Nov 23 12:33:20 mdc2 servermgrd[52520]: Got error -9806 for SSLHandshake remote address is 10.0.10.201:11980
Nov 23 12:33:20 mdc2 servermgrd[52520]: Exception in threadListen: Socket: Connect failed
Nov 23 12:33:20 mdc2 fsmpm[198]: PortMapper: Starting FSS service 'Storage[1]' on mdc2.comcast.private.
Nov 23 12:33:21 mdc2 fsmpm[198]: PortMapper: FSS 'Storage'[1] (pid 69976) at port 50988 is registered.
Nov 23 12:33:34 mdc2 fsmpm[198]: PortMapper: Mount Event for /Volumes/Storage on /dev/disk16
Nov 23 12:33:34 mdc2 fsmpm[198]: PortMapper: Setting permissions on trash directory: /Volumes/Storage/.Trashes
Nov 23 12:33:34 mdc2 com.apple.xsan[76]: Filesystem Storage mounted on /Volumes/Storage
Nov 23 12:33:34 mdc2 servermgrd[52520]: xsan: [52520/6045800] ERROR: -[SANFilesystem mountVolumeNamed:writable:withOptions:]: mount of 'Storage' failed: Volume `Storage' is already mounted
Nov 23 12:33:34 mdc2 Xsan Admin[69944]: ERROR: Error mounting volume…: The operation couldn’t be completed. (SANTransactionErrorDomain error 100007.) (100007)
Nov 23 12:35:23 mdc2 com.apple.wikid.mailinglists[70175]: mail:status = "list-updated"
Nov 23 12:37:32 mdc2 bootpd[70366]: server name mdc2.comcastvst.com
Nov 23 12:37:32 mdc2 bootpd[70366]: interface en0: ip 10.0.1.202 mask 255.255.255.0
Nov 23 12:37:32 mdc2 bootpd[70366]: interface en1: ip 10.0.10.202 mask 255.255.255.0
Nov 23 12:37:32 mdc2 bootpd[70366]: bsdpd: re-reading configuration
Nov 23 12:37:32 mdc2 bootpd[70366]: bsdpd: shadow file size will be set to 48 megabytes
Nov 23 12:37:32 mdc2 bootpd[70366]: bsdpd: age time 00:15:00
Nov 23 12:37:32 mdc2 bootpd[70366]: DHCP REQUEST [en0]: 1,0:25:0:4d:27:e7 <MBPro-2>
Nov 23 12:37:32 mdc2 bootpd[70366]: ACK sent MBPro-2 10.0.1.244 pktsize 369
Nov 23 12:38:32 mdc2 KernelEventAgent[96]: tid 00000000 received event(s) VQ_NOTRESP (1)
Nov 23 12:38:32 mdc2 KernelEventAgent[96]: tid 00000000 type 'acfs', mounted on '/Volumes/Storage', from '/dev/disk16', not responding
Nov 23 12:38:32 mdc2 KernelEventAgent[96]: tid 00000000 found 1 filesystem(s) with problem(s)
Nov 23 12:38:33 mdc2 fsmpm[198]: PortMapper: Initiating activation vote for FSS 'Storage'.
Nov 23 12:38:37 mdc2 org.clamav.clamd[66]: SelfCheck: Database status OK.
Nov 23 12:38:38 mdc2 com.apple.xsan[76]: xsan:perfDispatchMicroseconds = 612802
Nov 23 12:38:38 mdc2 com.apple.xsan[76]: xsan:perfFunctionMicroseconds = 613187
Nov 23 12:38:38 mdc2 fsm[69976]: Xsan FSS 'Storage[1]': Windows Security has been turned off in config file but clients have been requested to enforce ACLs. Windows Security remains in effect.
Nov 23 12:38:38 mdc2 fsmpm[198]: PortMapper: Reconnect Event for /Volumes/Storage
Nov 23 12:38:38 mdc2 fsmpm[198]: PortMapper: Requesting MDS recycle of /Volumes/Storage
Nov 23 12:38:38 mdc2 KernelEventAgent[96]: tid 00000000 received event(s) VQ_NOTRESP (1)
Nov 23 12:38:38 mdc2 fsm[69976]: Xsan FSS 'Storage[1]': PANIC: /Library/Filesystems/Xsan/bin/fsm ASSERT failed "next_inode != L64(0)" file fsm_xattr.c, line 144
Nov 23 12:38:38 mdc2 fsm[69976]: PANIC: /Library/Filesystems/Xsan/bin/fsm ASSERT failed "next_inode != L64(0)" file fsm_xattr.c, line 144
Nov 23 12:38:38 mdc2 KernelEventAgent[96]: tid 00000000 received event(s) VQ_NOTRESP (1)
Nov 23 12:38:38 mdc2 KernelEventAgent[96]: tid 00000000 type 'acfs', mounted on '/Volumes/Storage', from '/dev/disk16', not responding
Nov 23 12:38:38 mdc2 KernelEventAgent[96]: tid 00000000 found 1 filesystem(s) with problem(s)
Nov 23 12:38:38 mdc2 fsm[69976]: Xsan FSS 'Storage[1]': PANIC: wait 3 secs for journal to flush
Nov 23 12:38:38 mdc2 fsm[69976]: Xsan FSS 'Storage[1]': PANIC: aborting threads now.
Nov 23 12:38:44 mdc2 fsmpm[198]: PortMapper: Initiating activation vote for FSS 'Storage'.
Nov 23 12:38:44 mdc2 bootpd[70366]: DHCP REQUEST [en0]: 1,0:16:1:a5:89:58 <TS-HTGL958>
Nov 23 12:38:44 mdc2 bootpd[70366]: ACK sent TS-HTGL958. 10.0.1.239 pktsize 300
Nov 23 12:38:47 mdc2 fsmpm[198]: PortMapper: Reconnect Event for /Volumes/Storage
Nov 23 12:38:47 mdc2 fsmpm[198]: PortMapper: Requesting MDS recycle of /Volumes/Storage
Nov 23 12:38:47 mdc2 KernelEventAgent[96]: tid 00000000 received event(s) VQ_NOTRESP (1)
Nov 23 12:38:47: --- last message repeated 1 time ---
Nov 23 12:38:47 mdc2 KernelEventAgent[96]: tid 00000000 type 'acfs', mounted on '/Volumes/Storage', from '/dev/disk16', not responding
Nov 23 12:38:47 mdc2 KernelEventAgent[96]: tid 00000000 found 1 filesystem(s) with problem(s)
Nov 23 12:38:48 mdc2 fsmpm[198]: PortMapper: Initiating activation vote for FSS 'Storage'.
Nov 23 12:38:50 mdc2 fsmpm[198]: PortMapper: FSS 'Storage' disconnected.
Nov 23 12:38:50 mdc2 fsmpm[198]: PortMapper: kicking diskscan_thread 4338487296.
Nov 23 12:38:50 mdc2 com.apple.ReportCrash.Root[70505]: 2009-11-23 12:38:50.454 ReportCrash[70505:2803] Saved crash report for fsm[69976] version ??? (???) to /Library/Logs/DiagnosticReports/fsm_2009-11-23-123850_localhost.crash
Nov 23 12:38:50 mdc2 fsmpm[198]: Portmapper: FSS 'Storage' (pid 69976) exited on signal 6
Nov 23 12:39:00 mdc2 fsmpm[198]: PortMapper: RESTART FSS service 'Storage[1]' on host mdc2.comcast.private.
Nov 23 12:39:00 mdc2 fsmpm[198]: PortMapper: Starting FSS service 'Storage[1]' on mdc2.comcast.private.
Nov 23 12:39:00 mdc2 fsmpm[198]: PortMapper: FSS 'Storage'[1] (pid 70519) at port 51279 is registered.
Nov 23 12:39:00 mdc2 fsmpm[198]: PortMapper: Initiating activation vote for FSS 'Storage'.
Nov 23 12:39:05: --- last message repeated 1 time ---
Nov 23 12:39:05 mdc2 com.apple.xsan[76]: xsan:perfDispatchMicroseconds = 421031
Nov 23 12:39:05 mdc2 com.apple.xsan[76]: xsan:perfFunctionMicroseconds = 421487
Nov 23 12:39:06 mdc2 fsm[70519]: Xsan FSS 'Storage[1]': Windows Security has been turned off in config file but clients have been requested to enforce ACLs. Windows Security remains in effect.
Nov 23 12:39:06 mdc2 fsmpm[198]: PortMapper: Reconnect Event for /Volumes/Storage
Nov 23 12:39:06 mdc2 fsmpm[198]: PortMapper: Requesting MDS recycle of /Volumes/Storage
Nov 23 12:39:06 mdc2 KernelEventAgent[96]: tid 00000000 received event(s) VQ_NOTRESP (1)
Nov 23 12:39:06 mdc2 fsm[70519]: Xsan FSS 'Storage[1]': PANIC: /Library/Filesystems/Xsan/bin/fsm ASSERT failed "next_inode != L64(0)" file fsm_xattr.c, line 144
Nov 23 12:39:06 mdc2 fsm[70519]: PANIC: /Library/Filesystems/Xsan/bin/fsm ASSERT failed "next_inode != L64(0)" file fsm_xattr.c, line 144
Nov 23 12:39:06 mdc2 KernelEventAgent[96]: tid 00000000 received event(s) VQ_NOTRESP (1)
Nov 23 12:39:06 mdc2 KernelEventAgent[96]: tid 00000000 type 'acfs', mounted on '/Volumes/Storage', from '/dev/disk16', not responding
Nov 23 12:39:06 mdc2 KernelEventAgent[96]: tid 00000000 found 1 filesystem(s) with problem(s)
Nov 23 12:39:06 mdc2 fsm[70519]: Xsan FSS 'Storage[1]': PANIC: wait 3 secs for journal to flush
Nov 23 12:39:06 mdc2 fsm[70519]: Xsan FSS 'Storage[1]': PANIC: aborting threads now.
Nov 23 12:39:11 mdc2 fsmpm[198]: PortMapper: Initiating activation vote for FSS 'Storage'.
Nov 23 12:39:13 mdc2 servermgrd[52520]: xsan: [52520/211180] ERROR: get_fsmvol_at_index: Could not connect to FSM because File System Manager "Storage" on mdc2.comcast.private is on standby.
Nov 23 12:39:13 mdc2 fsmpm[198]: PortMapper: Initiating activation vote for FSS 'Storage'.
Nov 23 12:39:13 mdc2 servermgrd[52520]: xsan: [52520/211180] ERROR: get_fsmvol_at_index: Could not connect to FSM because File System Manager "Storage" on mdc2.comcast.private is on standby.
Nov 23 12:39:13 mdc2 servermgrd[52520]: xsan: [52520/211180] ERROR: get_quotas_for_fsmvol_named(Storage): Could not connect to FSM because File System Manager "Storage" on mdc2.comcast.private is on standby.
Nov 23 12:39:15 mdc2 fsmpm[198]: PortMapper: Initiating activation vote for FSS 'Storage'.
Nov 23 12:39:16 mdc2 com.apple.ReportCrash.Root[70505]: 2009-11-23 12:39:16.210 ReportCrash[70505:2b03] Saved crash report for fsm[70519] version ??? (???) to /Library/Logs/DiagnosticReports/fsm_2009-11-23-123916_localhost.crash
Nov 23 12:39:16 mdc2 fsmpm[198]: PortMapper: FSS 'Storage' disconnected.
Nov 23 12:39:16 mdc2 fsmpm[198]: PortMapper: kicking diskscan_thread 4338487296.
Nov 23 12:39:16 mdc2 fsmpm[198]: Portmapper: FSS 'Storage' (pid 70519) exited on signal 6
Nov 23 12:39:16 mdc2 fsmpm[198]: FSS 'Storage' appears unstable (2 failures in 60 minutes or less), halting restarts
Nov 23 12:39:17 mdc2 fsmpm[198]: PortMapper: Initiating activation vote for FSS 'Storage'.
Nov 23 12:39:20 mdc2 fsmpm[198]: PortMapper: Reconnect Event for /Volumes/Storage
Nov 23 12:39:20 mdc2 fsmpm[198]: PortMapper: Requesting MDS recycle of /Volumes/Storage
Nov 23 12:39:20 mdc2 KernelEventAgent[96]: tid 00000000 received event(s) VQ_NOTRESP (1)
Nov 23 12:39:20: --- last message repeated 1 time ---
Nov 23 12:39:20 mdc2 KernelEventAgent[96]: tid 00000000 type 'acfs', mounted on '/Volumes/Storage', from '/dev/disk16', not responding
Nov 23 12:39:20 mdc2 KernelEventAgent[96]: tid 00000000 found 1 filesystem(s) with problem(s) |
|
|
| Back to top |
|
 |
abstractrude Xsan Master

Joined: 13 Mar 2008 Posts: 860
|
Posted: Mon Nov 23, 2009 2:21 pm Post subject: |
|
|
| did it always work correctly and did events occur ? power problems, etc. Have you done a cvfsck -nv of the volume yet? make sure its stopped when you do. |
|
| Back to top |
|
 |
geoffk partially protected

Joined: 23 Nov 2009 Posts: 7
|
Posted: Mon Nov 23, 2009 3:09 pm Post subject: |
|
|
there were no events that I was aware of. The problem first manifest itself when I did a mass copy of our working san volume to this one. I just ran a cvfsck -nv as you suggested.
| Code: | BUILD INFO:
#!@$ Server Revision 3.5.0 Build 7443 Branch branches_35X (412)
#!@$ Built for Darwin 10.0 i386
#!@$ Created on Wed Sep 9 15:13:36 PDT 2009
Created directory /tmp/cvfsck83416a for temporary files.
Creating MetadataAndJournal allocation check file.
Creating Storage allocation check file.
** NOTE ** Read Only Check.
File system journal will not be recovered.
The results may be inconsistent and mis-leading.
Super Block information.
FS Created On : Thu Sep 17 21:40:27 2009
Inode Version : '2.5'
File System Status : Clean
Allocated Inodes : 246784
Free Inodes : 17062
FL Blocks : 29
Next Inode Chunk : 0x18ae1
Metadump Seqno : 0
Restore Journal Seqno : 0
Windows Security Indx Inode : 0x5
Windows Security Data Inode : 0x6
Quota Database Inode : 0x7
ID Database Inode : 0xb
Client Write Opens Inode : 0x8
Stripe Group MetadataAndJournal ( 0) 0x17497b0 blocks.
Stripe Group Storage ( 1) 0x311f4b10 blocks.
Building Inode Index Database 246784 (100%).
Verifying NT Security Descriptors
Found 247 NT Security Descriptors: all are good
Verifying Free List Extents.
Scanning inodes 235520 ( 95%).
*Error*: XATTR ERROR detected for inode 0x621e604 (0x621e606)
*Error*: *** RESETTING RPL DATA FOR FILE SYSTEM ***
Scanning inodes 246784 (100%).
Sorting extent list for MetadataAndJournal pass 1/1
Updating bitmap for MetadataAndJournal extents 83468 ( 30%).
Sorting extent list for Storage pass 1/1
Updating bitmap for Storage extents 277807 (100%).
Checking for dead inodes 235520 ( 95%).
Clearing inode 0x621e605.
Clearing inode 0x621e607.
Checking for dead inodes 246784 (100%).
Checking directories 83431 (100%).
Scanning for orphaned inodes 246784 (100%).
Verifying link & subdir counts 246784 (100%).
Super Block - Free Inodes of 0x42a6 is wrong! Should be 0x42a9.
Repairing free list.
Checking pending free list.
Checking Arbitration Control Block.
Checking MetadataAndJournal allocation bit maps (100%).
Checking Storage allocation bit maps ( 40%).
Stripe Group 'Storage' block 0x16cbd70f differs - ondisk/Allocated 0x00 incore/Free 0x01 byteoffs/47807201 .
Map file 'Storage' is incorrect. Updating.
File system 'Storage' was modified.
File system 'Storage'. Blocks-824134416 free-424138693 Inodes-246784 free-17065.
File System Check finished - file system modified.
|
|
|
| Back to top |
|
 |
abstractrude Xsan Master

Joined: 13 Mar 2008 Posts: 860
|
Posted: Mon Nov 23, 2009 8:14 pm Post subject: |
|
|
ok now you need to run a cvfsck -wv
stop the volume do
cvfsck -j VOLNAME
cvfsck -wv VOLNAME
cvfsck -nv VOLNAME
it should NOT say file system was modified after you finish. |
|
| Back to top |
|
 |
johnoag79 partially protected

Joined: 21 Jul 2009 Posts: 7
|
Posted: Tue Nov 24, 2009 10:35 am Post subject: |
|
|
| I had this same problem, ended up needing cvfsck -X to clear the XATTR errors. |
|
| Back to top |
|
 |
abstractrude Xsan Master

Joined: 13 Mar 2008 Posts: 860
|
Posted: Tue Nov 24, 2009 2:59 pm Post subject: |
|
|
| i dont usually use -X but hey it doesnt work right |
|
| Back to top |
|
 |
cedge318 Could work for Apple

Joined: 12 Jan 2009 Posts: 47
|
Posted: Tue Nov 24, 2009 8:48 pm Post subject: ? |
|
|
| Wait, did Johnoag79 say it did work? I haven't actually had the -X function properly for me and would be curious if there was something specific that made it work? |
|
| Back to top |
|
 |
johnoag79 partially protected

Joined: 21 Jul 2009 Posts: 7
|
Posted: Wed Nov 25, 2009 10:13 am Post subject: |
|
|
I used it as a last resort, as nothing else would get rid of the XATTR errors reported with -vw.
It did result in an RPL_Upgrade taking place afterwards, which takes time... |
|
| Back to top |
|
 |
geoffk partially protected

Joined: 23 Nov 2009 Posts: 7
|
Posted: Sun Nov 29, 2009 10:04 pm Post subject: |
|
|
i had gotten to cvfsck on my own, but wasnt really sure what to do with it. what exactly did we do here?
this seems to have done the trick. I was able to empty trash without the volume going offline. Going to do a full ~15 TB SAN dump to it later this week, shall see how it holds up!
Thanks lotte and abstractrude! |
|
| Back to top |
|
 |
|