Xsanity Sanity for Apple's Xsan and Final Cut Server.
  
Saturday, May 25 2013 @ 01:32 AM EDT
Topics
Storage (39)
People (1)
Xsan (103)
How To (26)
User Functions
Username:

Password:

Don't have an account yet? Sign up as a New User
Who's Online
Guest Users: 6
Sponsorship

Xsanity is proudly sponsored by:

Tekserve
The Old Reliable Mac Shop

File opened via AFP Xsan re-share causes XSAN restart/lockup

 
Post new topic   Reply to topic    Xsanity Forums Forum Index -> Troubleshooting
View previous topic :: View next topic  
Author Message
rberd
Been around the blocks
Been around the blocks


Joined: 05 Oct 2009
Posts: 22

PostPosted: Tue May 01, 2012 6:30 pm    Post subject: File opened via AFP Xsan re-share causes XSAN restart/lockup Reply with quote

Wondering if anyone has seen this before or has any ideas on this…

Having an issue when a Microsoft PowerPoint or Word document (version 2011 for Mac) is opened on a client that is accessing the Xsan through an AFP re-share. I know, I know - these kinds of files should not be stored on the SAN… but scrips and things tend to get put with the rest of the project media files. Anyway when the Microsoft document is opened it causes the computer to freeze (spinning beach ball) and then the Xsan volume disconnects from all the other clients (both AFP and Fibre connected). From what I can tell the Xsan volume tries to failover over to the Backup MDC, but the Backup MDC also freezes which locks up all network accounts.

This has happened multiple times now - each time it is triggered by opening a Microsoft document. The only way I have been able to get things going agin is to shut down/hard power both servers and bring them up again… testing the failover between both Primary MDC and Backup MDC. Wondering if this is some issue with the AFP re-share that is causing larger issues overall. I wouldn't think that a simple Power Point document could cause something so major but I can't seem to find any other explanation.


Here is the setup:

• 2 Xserve's running 10.6.8
- Primary Xsan MDC, plus the following services:
- Open Directory Replica
- Secondary DNS
- Backup Xsan MDC, plus the following services:
- Open Directory Master
- DHCP
- Primary DNS
- AFP
- SMB
- Groupware (iCal, iChat…)
• Xsan 2.2.2
• Separate ethernet networks for Public & Metadata
• Qlogic SanBox 5600 & 5602 Fibre switch's with stacking ports
• Promise Vtrak E/J-Class Storage Units - about 25TB storage
• 9 Fibre clients - all 10.6.8
• 5 AFP clients - all 10.6.8


Here is an excerpt of the logs from both Xserve's:

Primary Xsan MDC:
7:56:54 AM fsm[94508] Xsan FSS 'SAN[0]': PANIC: /Library/Filesystems/Xsan/bin/fsm ASSERT failed "rangep->headp == NULL" file range_ops.c, line 387
7:56:54 AM KernelEventAgent[74] tid 00000000 received event(s) VQ_NOTRESP (1)
7:56:54 AM fsm[94508] PANIC: /Library/Filesystems/Xsan/bin/fsm ASSERT failed "rangep->headp == NULL" file range_ops.c, line 387
7:56:54 AM KernelEventAgent[74] tid 00000000 type 'acfs', mounted on '/Volumes/SAN', from '/dev/disk9', not responding
7:56:54 AM KernelEventAgent[74] tid 00000000 found 1 filesystem(s) with problem(s)
7:56:54 AM fsm[94508] Xsan FSS 'SAN[0]': PANIC: wait 3 secs for journal to flush
7:56:54 AM fsm[94508] Xsan FSS 'SAN[0]': PANIC: aborting threads now.
7:56:55 AM fsmpm[94505] PortMapper: Initiating activation vote for FSS 'SAN'.
7:56:56 AM kernel Reconnecting to FSS 'SAN'
7:56:59 AM fsmpm[94505] PortMapper: Reconnect Event for /Volumes/SAN
7:56:59 AM fsmpm[94505] PortMapper: Requesting MDS recycle of /Volumes/SAN
7:56:59 AM KernelEventAgent[74] tid 00000000 received event(s) VQ_NOTRESP (1)
7:57:00 AM kernel Reconnect successful to FSS 'SAN' on host '10.0.0.7'.
7:57:00 AM kernel Using v2 readdir for 'SAN'
7:57:16 AM fsmpm[94505] PortMapper: FSS 'SAN' disconnected.
7:57:16 AM fsmpm[94505] PortMapper: kicking diskscan_thread 4338487296.
7:57:16 AM fsmpm[94505] Portmapper: FSS 'SAN' (pid 94508) exited on signal 6
7:57:16 AM com.apple.ReportCrash.Root[91536] 2012-01-30 07:57:16.792 ReportCrash[91536:2a03] Saved crash report for fsm[94508] version ??? (???) to /Library/Logs/DiagnosticReports/fsm_2012-01-30-075716_localhost.crash
7:57:19 AM servermgrd[94512] xsan: [94512/2112D0] ERROR: get_fsm_process_stats(SAN): Unable to find pid of fsm
7:57:24 AM KernelEventAgent[74] tid 00000000 received event(s) VQ_NOTRESP (1)
7:57:24 AM KernelEventAgent[74] tid 00000000 type 'acfs', mounted on '/Volumes/SAN', from '/dev/disk9', not responding
7:57:24 AM KernelEventAgent[74] tid 00000000 found 1 filesystem(s) with problem(s)
7:57:26 AM kernel Reconnecting to FSS 'SAN'
7:57:26 AM kernel No FSS registered with PortMapper on host 10.0.0.7, retrying...
7:57:26 AM fsmpm[94505] PortMapper: RESTART FSS service 'SAN[0]' on host primary.company.lan.
7:57:26 AM fsmpm[94505] PortMapper: Starting FSS service 'SAN[0]' on primary.company.lan.
7:57:26 AM fsmpm[94505] PortMapper: FSS 'SAN'[0] (pid 91550) at port 58485 is registered.
7:57:27 AM fsmpm[94505] PortMapper: Initiating activation vote for FSS 'SAN'.
7:57:27 AM fsmpm[94505] NSS: Could not elect an FSS for 'SAN' - vote aborted.
7:57:29 AM fsmpm[94505] PortMapper: Initiating activation vote for FSS 'SAN'.

Backup Xsan MDC:
7:56:54 AM KernelEventAgent[75] tid 00000000 received event(s) VQ_NOTRESP (1)
7:56:54 AM KernelEventAgent[75] tid 00000000 type 'acfs', mounted on '/Volumes/SAN', from '/dev/disk8', not responding
7:56:54 AM KernelEventAgent[75] tid 00000000 found 1 filesystem(s) with problem(s)
7:56:55 AM fsmpm[262] PortMapper: Initiating activation vote for FSS 'SAN'.
7:56:56 AM kernel Reconnecting to FSS 'SAN'
7:56:58 AM com.apple.xsan[57] xsan:perfDispatchMicroseconds = 788625
7:56:58 AM com.apple.xsan[57] xsan:perfFunctionMicroseconds = 788874
7:56:59 AM fsm[55662] Xsan FSS 'SAN[1]': Windows Security has been turned off in config file but clients have been requested to enforce ACLs. Windows Security remains in effect.
7:57:00 AM kernel Failed to re-open cookie/0x180000230b1b1 error/2


Last edited by rberd on Wed May 02, 2012 3:00 pm; edited 1 time in total
Back to top
View user's profile Send private message
Sirsloth
fully protected
fully protected


Joined: 04 May 2009
Posts: 14

PostPosted: Wed May 02, 2012 2:51 pm    Post subject: Reply with quote

I had a similar issue with 2.2.1 on a Xsan connected client. Cleared the issue with updating Xsan controller to 2.2.2 with the client. One thing is to check if the SAN reshare server when opening this file has the same result. This would give you a clue if the AFP client is causing issues or the Xsan coonected reshare machine.
Back to top
View user's profile Send private message
rberd
Been around the blocks
Been around the blocks


Joined: 05 Oct 2009
Posts: 22

PostPosted: Wed May 02, 2012 2:59 pm    Post subject: Reply with quote

My apologies, just double checked and am running 2.2.2 on both MDC's and all clients. Will have to find some after-hours time to try opening the file on the re-share server as you suggest... in case it breaks anything. I will mentioned though the problem has happened with two different AFP client machines. Thanks for your thoughts Sirsloth.
Back to top
View user's profile Send private message
singlemalt
Xsan Master
Xsan Master


Joined: 27 Feb 2009
Posts: 109

PostPosted: Wed May 02, 2012 3:06 pm    Post subject: Reply with quote

In general anytime a volume panics there's a very good chance it's due to
meta data corruption. You should repair the volume with cvfsck, and probably
run cvfsck in read only mode after the repair to make sure it was fully repaired.
Back to top
View user's profile Send private message
rberd
Been around the blocks
Been around the blocks


Joined: 05 Oct 2009
Posts: 22

PostPosted: Wed May 09, 2012 7:35 pm    Post subject: Reply with quote

I was able to run cvfsck - nv this evening and it reported that the file system status was clean, and that the file system read-only check completed successfully... no errors. So I guess the volume seems to be ok. Anything else I should try there? I'm thinking that maybe it is something on theAFP re-share server. I hate to create more problems by trying to to create the issue by opening the file on the re-share server.
Back to top
View user's profile Send private message
rberd
Been around the blocks
Been around the blocks


Joined: 05 Oct 2009
Posts: 22

PostPosted: Thu May 10, 2012 6:12 pm    Post subject: Reply with quote

This has happened on again today when a staff member tried to open a Microsoft Word Document from the SAN on an iMac connected via AFP re-share. Again I fail to see why something like this could cause such a catastrophic problem - forcing both Primary & Backup Xsan MDC's to completely lockup forcing a hard shutdown. If anyone has any idea it would be greatly appreciated. If anyone would like to look at any of the logs please PM me, I will be glad to share at this point.
Back to top
View user's profile Send private message
digitaldesktop
partially protected
partially protected


Joined: 21 Nov 2008
Posts: 5

PostPosted: Fri Jun 08, 2012 12:35 am    Post subject: Reply with quote

We are experiencing the same exact issue with the same exact error message. Very frustrating. I have turned off AFP. My gut tells me there is some weird file byte locking issue that bringing down the whole ball of wax.

Jeff
Back to top
View user's profile Send private message
rberd
Been around the blocks
Been around the blocks


Joined: 05 Oct 2009
Posts: 22

PostPosted: Fri Jun 08, 2012 11:06 am    Post subject: Reply with quote

Hey Jeff, sorry to hear you are having the same issue. Still sharing our SAN via AFP, but just have made it very clear to our staff that they must not do anything with Microsoft Documents on the SAN for now. Not a great fix but is working until I have time to do further troubleshooting.

I posted the same topic over on Creative Cow and it got a bit more action there. Check it out: http://forums.creativecow.net/thread/180/857596

The latest is to try to dtruss the Microsoft Application as it opens the file, to see if there is anything useful there. I am not incredibly familiar with this procedure and just haven't had the time to try it. Let me know if you made any headway or figure anything out! Would love to get this resolved eventually.
Back to top
View user's profile Send private message
singlemalt
Xsan Master
Xsan Master


Joined: 27 Feb 2009
Posts: 109

PostPosted: Fri Jun 08, 2012 12:24 pm    Post subject: Reply with quote

Just out of curiosity, you're not sharing the same folder from two different
servers at the same time are you? If you are that maybe causing the problem.
The servers won't be aware of each others file locks on documents so that could
cause some problems. I would expect that to cause file corruption/overwriting though,
not a volume lockup/failover.
Back to top
View user's profile Send private message
rberd
Been around the blocks
Been around the blocks


Joined: 05 Oct 2009
Posts: 22

PostPosted: Fri Jun 08, 2012 12:41 pm    Post subject: Reply with quote

Hey. No only sharing from one server.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    Xsanity Forums Forum Index -> Troubleshooting All times are GMT - 5 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group
Best Viewed on a Mac | Suggested Browser: Whatever floats yer boat.