Xsanity Sanity for Apple's Xsan and Final Cut Server.
  
Friday, September 03 2010 @ 05:37 AM EDT
Topics
Storage (23)
Xsan (72)
How To (25)
User Functions
Username:

Password:

Don't have an account yet? Sign up as a New User
Who's Online
Guest Users: 16
Sponsorship

Xsanity is proudly sponsored by:

Tekserve
The Old Reliable Mac Shop

"OpHangLimitSecs exceeded VOP-Setattr 183.18 secs"
Goto page 1, 2, 3, 4, 5  Next
 
Post new topic   Reply to topic    Xsanity Forums Forum Index -> Troubleshooting
View previous topic :: View next topic  
Author Message
MaXi-XCeL
fully protected
fully protected


Joined: 25 Oct 2007
Posts: 10

PostPosted: Thu Oct 25, 2007 8:04 am    Post subject: "OpHangLimitSecs exceeded VOP-Setattr 183.18 secs" Reply with quote

Hi Guys,

My XSAN setup is driving me nuts... My MDC's keep crashing on the following error message:

Code:

Oct 25 14:38:08 xsan02 fsm[348]: Xsan FSS 'StudioWorkDisc[1]': PANIC: /Library/Filesystems/Xsan/bin/fsm "OpHangLimitSecs exceeded VOP-Setattr 183.18 secs Conn[1] Thread-0x187fc00 Pqueue-0x404b78 Workp-0x1cf9218 MsgQ-0x1cf9208 Msg-0x1cf9264 now 0x43d5081a1bb34 started 0x43d5076b6961b limit 180 secs. " file queues.c, line 612
Oct 25 14:38:08 xsan02 fsm[348]: PANIC: /Library/Filesystems/Xsan/bin/fsm "OpHangLimitSecs exceeded VOP-Setattr 183.18 secs Conn[1] Thread-0x187fc00 Pqueue-0x404b78 Workp-0x1cf9218 MsgQ-0x1cf9208 Msg-0x1cf9264 now 0x43d5081a1bb34 started 0x43d5076b6961b limit 180 secs.\n" file queues.c, line 612\n
Oct 25 14:38:08 xsan02 fsm[348]: Xsan FSS 'StudioWorkDisc[1]': PANIC: wait 3 secs for journal to flush
Oct 25 14:38:08 xsan02 fsm[348]: Xsan FSS 'StudioWorkDisc[1]': PANIC: aborting threads now.
Oct 25 14:38:17 xsan02 fsmpm[274]: Portmapper: FSS 'StudioWorkDisc' (pid 348) exited on signal 4
Oct 25 14:59:11 xsan02 crashdump[671]: fsm crashed
Oct 25 14:59:11 xsan02 crashdump[671]: crash report written to: /Library/Logs/CrashReporter/fsm.crash.log


My current configuration is:
2 MDC's
3 XRAIDs
6 LUNS

Any ideas? Did all the cable stuff already...
Back to top
View user's profile Send private message
ACSA
Xsan Master
Xsan Master


Joined: 28 Jan 2007
Posts: 91

PostPosted: Thu Oct 25, 2007 10:45 am    Post subject: Reply with quote

Hi can you give us more info than you have provided us sofar?


What OS are the MDC's running?
What version is the XSAN?
What is the firmware versions of the RAID's?
What versions are on the Clients?
Is the Meta Data lun seperate from the Data LUN?

etc

Then we perhaps now what to do.
Back to top
View user's profile Send private message Visit poster's website
MaXi-XCeL
fully protected
fully protected


Joined: 25 Oct 2007
Posts: 10

PostPosted: Fri Oct 26, 2007 2:34 am    Post subject: Reply with quote

MDC's: Mac OS X Server 10.4.10
XSAN: Version 1.4.1
XRAID FIRMWARE: 1.5/1.50f

Metadata is separate from the data luns.

XRAID 1:
RAID 1 Metadata LUN
RAID 5 Data LUN

XRAID 2:
RAID 5 Data LUN
RAID 5 Data LUN

XRAID 3:
RAID 5 Data LUN
RAID 5 Data LUN

StudioWorkDisc (13.65 TB)
-> metadatapool
- XSERVE1-LUN1-METADATA 465,73 GB
-> storagepool
- XSERVE1-LUN2-RAID5 2,73 TB
-> Any
- XSERVE2-LUN1-RAID5 2,73 TB
- XSERVE2-LUN2-RAID5 2,73 TB
-> Last
- XSERVE2-LUN1-RAID5 2,73 TB
- XSERVE2-LUN2-RAID5 2,73 TB

I know my choice of config and storagepools degrade the way XSAN should be used but that has some historical issues.
Back to top
View user's profile Send private message
donald
Xsan Master
Xsan Master


Joined: 25 Jun 2007
Posts: 67

PostPosted: Fri Oct 26, 2007 6:41 am    Post subject: check from CLI Reply with quote

From the log it seems you have trouble with metadata.

with cvlabel -l -s you can check if the lun for metadata is available.
from cvadmin you can check the storagepool (aka stripegroups) with the command show long.
Back to top
View user's profile Send private message Visit poster's website
ipott
RAID 5
RAID 5


Joined: 27 Oct 2007
Posts: 18

PostPosted: Sat Oct 27, 2007 5:41 am    Post subject: Reply with quote

Does the ophanglimit message appear BEFORE the fsm crashed message or after the fsm crashed message?

do you have filesharing active for the volume on more than one node?
Back to top
View user's profile Send private message
jordan
Been around the blocks
Been around the blocks


Joined: 12 Jan 2006
Posts: 24

PostPosted: Wed Nov 07, 2007 7:37 pm    Post subject: Reply with quote

Maxi,

have you found a fix for this one?
Back to top
View user's profile Send private message
BenT
Knows DNS is the answer
Knows DNS is the answer


Joined: 12 Jun 2007
Posts: 38

PostPosted: Tue Nov 20, 2007 4:48 am    Post subject: We are seeing this problem also Reply with quote

We have recently started seeing this same behaviour:
XSAN 1.4.1
25 FC-connected users
24TB volume approx 92% full
OSX 10.4.9
MDC is Intel XServe with 2GB RAM
All clients and servers dual-FC connected
FSM process on MDC sits at very high CPU usage, when it gets to 100% usage the FSM process dies with the error:

0x180ac00 (**FATAL**) PANIC: /Library/Filesystems/Xsan/bin/fsm "OpHangLimitSecs exceeded VOP-VopLookupV4 183.31 secs Conn[71] Thread-0x1881200 Pqueue-0x405018 Workp-0x959f618 MsgQ-0x959f608 Msg-0x959f664 now 0x43e4a1e3e8923 started 0x43e4a13517fd5 limit 180 secs." file queues.c, line 612

Anyone else found a solution to this problem?
We are going to increase MDC RAM, upgrade the 10.4.10 and XSAN 1.4.2 and then maybe try an MDC with faster CPUs
Back to top
View user's profile Send private message
drocamor
RAID 5
RAID 5


Joined: 30 Apr 2007
Posts: 18

PostPosted: Tue Nov 20, 2007 11:42 am    Post subject: Reply with quote

Hey gang,

I'm working with romannumeral5 on his Xsan. We see the "OpHangLimitSecs exceeded" message after the san has already frozen for the users. (approximately 180 seconds later...)

We've escalated up through Applecare and they tell us that the message means that the MDC is unable to write to the metadata LUN for this period of time. When this happens fsm panics and attempts to shut itself down. In our situation the process does not exit and a manual failover must be forced.

We have seen some high latency on the fibre channel and a handful of IO errors in the system.log on the hosting MDC every once in a while. We've replaced the fiber cable, switch port, SFPs, RAID controller, and one disk in the metadata LUN while trying to fix this. Our next steps are to replace the second disk in the LUN and then move the disks and controllers to a new XServe RAID.

We upgraded to 1.4.2 in an attempt to resolve the issue but this just brought us to a situation where the manual failover does not work reliably. We were unable to fail the volume over when just the two MDCs were online and being unable to stop production outside of our downtime window we brought the volume back up without knowing if we would be able to failover. There was a panic yesterday and we were able to failover. I attribute this to bad luck when we were testing. We've got a ticket in Applecare open for this as well as one for the original panics.

BenT, I would hope that you're not having a similar experience to ours. Does your volume failover after the panic? I would check all the fiber connections look for errors on the Xserve RAID that hosts your metadata LUN. You should also probably have more than 2 GB of RAM in your MDC because of the size of your volume. Your volume is also mad full. In my experience this just makes any Xsan problem worse.

If anyone has any other ideas about things to try please let me know.

Thanks,
Dave
Back to top
View user's profile Send private message
ipott
RAID 5
RAID 5


Joined: 27 Oct 2007
Posts: 18

PostPosted: Mon Nov 26, 2007 4:14 am    Post subject: Reply with quote

we have the same issue.. fsm is going up to 350% CPU and then crashing.
The failover hangs most of the time. Sometimes we have to failover several times.

We opened a ticket at applecare 6!!! months ago. Till now they did not come up with any solution, telling us we are the only customer with this kind of problem.

We will throw the XSAN out in the next weeks.
Back to top
View user's profile Send private message
ipott
RAID 5
RAID 5


Joined: 27 Oct 2007
Posts: 18

PostPosted: Mon Nov 26, 2007 4:24 am    Post subject: by the way Reply with quote

our MDC has 4 cores. 16 Gb RAM and the filesystem is filled up to 75%.
10.4.11 and XSAN 1.4.2
Back to top
View user's profile Send private message
drocamor
RAID 5
RAID 5


Joined: 30 Apr 2007
Posts: 18

PostPosted: Mon Nov 26, 2007 9:27 am    Post subject: Reply with quote

ipott,

Sorry to hear you are having issues. How many clients do you have? What are they doing? How large is your volume?
Back to top
View user's profile Send private message
ipott
RAID 5
RAID 5


Joined: 27 Oct 2007
Posts: 18

PostPosted: Mon Nov 26, 2007 9:39 am    Post subject: Reply with quote

Hi,

the volume is 17 TB. 5 Xsan clients and smb + afp filesharing over 2 of the clients for the renderfarm.

We tried a lot for solving the issue, but currently even a simple "find" command crashes the volume after some minutes.
Back to top
View user's profile Send private message
drocamor
RAID 5
RAID 5


Joined: 30 Apr 2007
Posts: 18

PostPosted: Mon Nov 26, 2007 9:42 am    Post subject: Reply with quote

What kind of renderfarm software is accessing the Xsan over the AFP and SMB share?

I wonder if that could be an issue, we are resharing to several machines that are transcoders and others that do some heavy searching and indexing of certain folders.

Is everyone else having this problem also resharing the volume over AFP, SMB, or NFS?
Back to top
View user's profile Send private message
ipott
RAID 5
RAID 5


Joined: 27 Oct 2007
Posts: 18

PostPosted: Mon Nov 26, 2007 2:20 pm    Post subject: Reply with quote

it's a royal render renderfarm. with about 25 nodes. but I think that is not the problem, because even if filesharing is turned off, I can crash the volume by typing a simple "find" command. Or by doing an snfsdefrag -r on the volume.

we are currently thinking about tuning Volume parameters like buffercache and inodecachesize, threadpoolsize. ADICs documentation says these parameters are increasing metadata performance...

we also have had 3 weeks without crash some month ago and people were rendering a lot on the volume. That was after we installed more memory.

we are trying some things at this stage and I will post the results here.
Back to top
View user's profile Send private message
drocamor
RAID 5
RAID 5


Joined: 30 Apr 2007
Posts: 18

PostPosted: Mon Nov 26, 2007 2:26 pm    Post subject: Reply with quote

To help me track down my own problems and maybe help you out too, do you think you could send me some of the system.logs, cvlogs, and volume configuration for your xsan?

I'm looking for possible issues with my volume involving latency and disk io. If you could send me those files I'd like to compare them to my own. Would that be alright? Maybe we can help each other.

Thanks,
Dave
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    Xsanity Forums Forum Index -> Troubleshooting All times are GMT - 5 Hours
Goto page 1, 2, 3, 4, 5  Next
Page 1 of 5

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group
Best Viewed on a Mac | Suggested Browser: Whatever floats yer boat.