| View previous topic :: View next topic |
| Author |
Message |
rberd Been around the blocks

Joined: 05 Oct 2009 Posts: 22
|
Posted: Tue May 01, 2012 6:30 pm Post subject: File opened via AFP Xsan re-share causes XSAN restart/lockup |
|
|
Wondering if anyone has seen this before or has any ideas on this…
Having an issue when a Microsoft PowerPoint or Word document (version 2011 for Mac) is opened on a client that is accessing the Xsan through an AFP re-share. I know, I know - these kinds of files should not be stored on the SAN… but scrips and things tend to get put with the rest of the project media files. Anyway when the Microsoft document is opened it causes the computer to freeze (spinning beach ball) and then the Xsan volume disconnects from all the other clients (both AFP and Fibre connected). From what I can tell the Xsan volume tries to failover over to the Backup MDC, but the Backup MDC also freezes which locks up all network accounts.
This has happened multiple times now - each time it is triggered by opening a Microsoft document. The only way I have been able to get things going agin is to shut down/hard power both servers and bring them up again… testing the failover between both Primary MDC and Backup MDC. Wondering if this is some issue with the AFP re-share that is causing larger issues overall. I wouldn't think that a simple Power Point document could cause something so major but I can't seem to find any other explanation.
Here is the setup:
• 2 Xserve's running 10.6.8
- Primary Xsan MDC, plus the following services:
- Open Directory Replica
- Secondary DNS
- Backup Xsan MDC, plus the following services:
- Open Directory Master
- DHCP
- Primary DNS
- AFP
- SMB
- Groupware (iCal, iChat…)
• Xsan 2.2.2
• Separate ethernet networks for Public & Metadata
• Qlogic SanBox 5600 & 5602 Fibre switch's with stacking ports
• Promise Vtrak E/J-Class Storage Units - about 25TB storage
• 9 Fibre clients - all 10.6.8
• 5 AFP clients - all 10.6.8
Here is an excerpt of the logs from both Xserve's:
Primary Xsan MDC:
7:56:54 AM fsm[94508] Xsan FSS 'SAN[0]': PANIC: /Library/Filesystems/Xsan/bin/fsm ASSERT failed "rangep->headp == NULL" file range_ops.c, line 387
7:56:54 AM KernelEventAgent[74] tid 00000000 received event(s) VQ_NOTRESP (1)
7:56:54 AM fsm[94508] PANIC: /Library/Filesystems/Xsan/bin/fsm ASSERT failed "rangep->headp == NULL" file range_ops.c, line 387
7:56:54 AM KernelEventAgent[74] tid 00000000 type 'acfs', mounted on '/Volumes/SAN', from '/dev/disk9', not responding
7:56:54 AM KernelEventAgent[74] tid 00000000 found 1 filesystem(s) with problem(s)
7:56:54 AM fsm[94508] Xsan FSS 'SAN[0]': PANIC: wait 3 secs for journal to flush
7:56:54 AM fsm[94508] Xsan FSS 'SAN[0]': PANIC: aborting threads now.
7:56:55 AM fsmpm[94505] PortMapper: Initiating activation vote for FSS 'SAN'.
7:56:56 AM kernel Reconnecting to FSS 'SAN'
7:56:59 AM fsmpm[94505] PortMapper: Reconnect Event for /Volumes/SAN
7:56:59 AM fsmpm[94505] PortMapper: Requesting MDS recycle of /Volumes/SAN
7:56:59 AM KernelEventAgent[74] tid 00000000 received event(s) VQ_NOTRESP (1)
7:57:00 AM kernel Reconnect successful to FSS 'SAN' on host '10.0.0.7'.
7:57:00 AM kernel Using v2 readdir for 'SAN'
7:57:16 AM fsmpm[94505] PortMapper: FSS 'SAN' disconnected.
7:57:16 AM fsmpm[94505] PortMapper: kicking diskscan_thread 4338487296.
7:57:16 AM fsmpm[94505] Portmapper: FSS 'SAN' (pid 94508) exited on signal 6
7:57:16 AM com.apple.ReportCrash.Root[91536] 2012-01-30 07:57:16.792 ReportCrash[91536:2a03] Saved crash report for fsm[94508] version ??? (???) to /Library/Logs/DiagnosticReports/fsm_2012-01-30-075716_localhost.crash
7:57:19 AM servermgrd[94512] xsan: [94512/2112D0] ERROR: get_fsm_process_stats(SAN): Unable to find pid of fsm
7:57:24 AM KernelEventAgent[74] tid 00000000 received event(s) VQ_NOTRESP (1)
7:57:24 AM KernelEventAgent[74] tid 00000000 type 'acfs', mounted on '/Volumes/SAN', from '/dev/disk9', not responding
7:57:24 AM KernelEventAgent[74] tid 00000000 found 1 filesystem(s) with problem(s)
7:57:26 AM kernel Reconnecting to FSS 'SAN'
7:57:26 AM kernel No FSS registered with PortMapper on host 10.0.0.7, retrying...
7:57:26 AM fsmpm[94505] PortMapper: RESTART FSS service 'SAN[0]' on host primary.company.lan.
7:57:26 AM fsmpm[94505] PortMapper: Starting FSS service 'SAN[0]' on primary.company.lan.
7:57:26 AM fsmpm[94505] PortMapper: FSS 'SAN'[0] (pid 91550) at port 58485 is registered.
7:57:27 AM fsmpm[94505] PortMapper: Initiating activation vote for FSS 'SAN'.
7:57:27 AM fsmpm[94505] NSS: Could not elect an FSS for 'SAN' - vote aborted.
7:57:29 AM fsmpm[94505] PortMapper: Initiating activation vote for FSS 'SAN'.
Backup Xsan MDC:
7:56:54 AM KernelEventAgent[75] tid 00000000 received event(s) VQ_NOTRESP (1)
7:56:54 AM KernelEventAgent[75] tid 00000000 type 'acfs', mounted on '/Volumes/SAN', from '/dev/disk8', not responding
7:56:54 AM KernelEventAgent[75] tid 00000000 found 1 filesystem(s) with problem(s)
7:56:55 AM fsmpm[262] PortMapper: Initiating activation vote for FSS 'SAN'.
7:56:56 AM kernel Reconnecting to FSS 'SAN'
7:56:58 AM com.apple.xsan[57] xsan:perfDispatchMicroseconds = 788625
7:56:58 AM com.apple.xsan[57] xsan:perfFunctionMicroseconds = 788874
7:56:59 AM fsm[55662] Xsan FSS 'SAN[1]': Windows Security has been turned off in config file but clients have been requested to enforce ACLs. Windows Security remains in effect.
7:57:00 AM kernel Failed to re-open cookie/0x180000230b1b1 error/2
Last edited by rberd on Wed May 02, 2012 3:00 pm; edited 1 time in total |
|
| Back to top |
|
 |
Sirsloth fully protected

Joined: 04 May 2009 Posts: 14
|
Posted: Wed May 02, 2012 2:51 pm Post subject: |
|
|
| I had a similar issue with 2.2.1 on a Xsan connected client. Cleared the issue with updating Xsan controller to 2.2.2 with the client. One thing is to check if the SAN reshare server when opening this file has the same result. This would give you a clue if the AFP client is causing issues or the Xsan coonected reshare machine. |
|
| Back to top |
|
 |
rberd Been around the blocks

Joined: 05 Oct 2009 Posts: 22
|
Posted: Wed May 02, 2012 2:59 pm Post subject: |
|
|
| My apologies, just double checked and am running 2.2.2 on both MDC's and all clients. Will have to find some after-hours time to try opening the file on the re-share server as you suggest... in case it breaks anything. I will mentioned though the problem has happened with two different AFP client machines. Thanks for your thoughts Sirsloth. |
|
| Back to top |
|
 |
singlemalt Xsan Master

Joined: 27 Feb 2009 Posts: 109
|
Posted: Wed May 02, 2012 3:06 pm Post subject: |
|
|
In general anytime a volume panics there's a very good chance it's due to
meta data corruption. You should repair the volume with cvfsck, and probably
run cvfsck in read only mode after the repair to make sure it was fully repaired. |
|
| Back to top |
|
 |
rberd Been around the blocks

Joined: 05 Oct 2009 Posts: 22
|
Posted: Wed May 09, 2012 7:35 pm Post subject: |
|
|
| I was able to run cvfsck - nv this evening and it reported that the file system status was clean, and that the file system read-only check completed successfully... no errors. So I guess the volume seems to be ok. Anything else I should try there? I'm thinking that maybe it is something on theAFP re-share server. I hate to create more problems by trying to to create the issue by opening the file on the re-share server. |
|
| Back to top |
|
 |
rberd Been around the blocks

Joined: 05 Oct 2009 Posts: 22
|
Posted: Thu May 10, 2012 6:12 pm Post subject: |
|
|
| This has happened on again today when a staff member tried to open a Microsoft Word Document from the SAN on an iMac connected via AFP re-share. Again I fail to see why something like this could cause such a catastrophic problem - forcing both Primary & Backup Xsan MDC's to completely lockup forcing a hard shutdown. If anyone has any idea it would be greatly appreciated. If anyone would like to look at any of the logs please PM me, I will be glad to share at this point. |
|
| Back to top |
|
 |
digitaldesktop partially protected

Joined: 21 Nov 2008 Posts: 5
|
Posted: Fri Jun 08, 2012 12:35 am Post subject: |
|
|
We are experiencing the same exact issue with the same exact error message. Very frustrating. I have turned off AFP. My gut tells me there is some weird file byte locking issue that bringing down the whole ball of wax.
Jeff |
|
| Back to top |
|
 |
rberd Been around the blocks

Joined: 05 Oct 2009 Posts: 22
|
Posted: Fri Jun 08, 2012 11:06 am Post subject: |
|
|
Hey Jeff, sorry to hear you are having the same issue. Still sharing our SAN via AFP, but just have made it very clear to our staff that they must not do anything with Microsoft Documents on the SAN for now. Not a great fix but is working until I have time to do further troubleshooting.
I posted the same topic over on Creative Cow and it got a bit more action there. Check it out: http://forums.creativecow.net/thread/180/857596
The latest is to try to dtruss the Microsoft Application as it opens the file, to see if there is anything useful there. I am not incredibly familiar with this procedure and just haven't had the time to try it. Let me know if you made any headway or figure anything out! Would love to get this resolved eventually. |
|
| Back to top |
|
 |
singlemalt Xsan Master

Joined: 27 Feb 2009 Posts: 109
|
Posted: Fri Jun 08, 2012 12:24 pm Post subject: |
|
|
Just out of curiosity, you're not sharing the same folder from two different
servers at the same time are you? If you are that maybe causing the problem.
The servers won't be aware of each others file locks on documents so that could
cause some problems. I would expect that to cause file corruption/overwriting though,
not a volume lockup/failover. |
|
| Back to top |
|
 |
rberd Been around the blocks

Joined: 05 Oct 2009 Posts: 22
|
Posted: Fri Jun 08, 2012 12:41 pm Post subject: |
|
|
| Hey. No only sharing from one server. |
|
| Back to top |
|
 |
|