User Functions
Don't have an account yet? Sign up as a New User
|
| View previous topic :: View next topic |
| Author |
Message |
robert.isherwood JBOD

Joined: 26 Apr 2010 Posts: 3
|
Posted: Mon Apr 26, 2010 8:44 am Post subject: fsm ASSERT failed "IP_XATTR_INODE(ip)" - panic, wo |
|
|
We're having a very puzzling problem. Our volume won't mount - the volume log throws the error "fsm ASSERT failed "IP_XATTR_INODE(ip)" - then FSM panics and dies.
cvfsck -wv runs all the way through showing only three problems, all extents issues and always the same three issues.
Suggestions? We're thinking about cvfsck -X; but we're concerned about data loss. Losing ACL's is fine, they're easy to fix on this data. Losing the data would be huge problem.
Here's the relevant log:
[0426 04:01:14] 0x7fff70884be0 (Info) Server Revision 3.5.0 Build 7443
Branch branches_35X (412.3)
[0426 04:01:14] 0x7fff70884be0 (Info) Built for Darwin 10.0 i386
[0426 04:01:14] 0x7fff70884be0 (Info) Created on Mon Dec 7 12:52:39 PST 2009
[0426 04:01:14] 0x7fff70884be0 (Info) Built in /SourceCache/XsanFS/XsanFS-412.3
[0426 04:01:14] 0x7fff70884be0 (Info)
Configuration:
DiskTypes-6
Disks-6
StripeGroups-4
MaxConnections-139
ThreadPoolSize-256
StripeAlignSize-32
FsBlockSize-16384
BufferCacheSize-128M
InodeCacheSize-32768
RestoreJournal-Disabled
RestoreJournalDir-None
[0426 04:01:14] 0x7fff70884be0 (Info) Self (superglue.22squared.com)
IP address is 10.1.201.8.
[0426 04:01:14.448465] 0x7fff70884be0 (Debug) No fsports file - port
range enforcement disabled.
[0426 04:01:14] 0x7fff70884be0 (Info) Listening on TCP socket
superglue.22squared.com:49358
[0426 04:01:14] 0x7fff70884be0 (Info) Node [0]
[superglue.22squared.:49358] File System Manager Login.
[0426 04:01:14] 0x7fff70884be0 (Info) ForceStripeAlignment is enabled.
[0426 04:01:14] 0x7fff70884be0 (Info) Service standing by on host
'superglue.22squared.com:49358'.
[0426 04:01:15.530758] 0x7fff70884be0 (Debug) Standby service - NSS
ping from nailgun.atlxsan.net:55488.
[0426 04:01:15.530780] 0x7fff70884be0 (Debug) Vote count is 2
[0426 04:01:15.530955] 0x7fff70884be0 (Debug) FOUsurpCheck: read ARB
info (pass 1): host (10.1.201.8:49173) conns 0 age 1272268851.00 secs
his delta 0.00 secs my delta 0.00 secs.
[0426 04:01:15.530960] 0x7fff70884be0 (Debug) FOUsurpCheck: polling
ARB block to check for active peer (pass 1).
[0426 04:01:16.531160] 0x7fff70884be0 (Debug) FOUsurpCheck: read ARB
info (pass 2): host (10.1.201.8:49173) conns 0 age 1272268851.00 secs
his delta 0.00 secs my delta 1.00 secs.
[0426 04:01:16.531168] 0x7fff70884be0 (Debug) FOUsurpCheck: ARB is already mine.
[0426 04:01:16] 0x7fff70884be0 (Info) Branding Arbitration Block
(attempt 1) votes 2.
[0426 04:01:18.532217] 0x7fff70884be0 (Debug) Cannot find fail over
script [/Library/Filesystems/Xsan/bin/cvfail.superglue.22squared.com]
- looking for generic script.
[0426 04:01:18] 0x7fff70884be0 (Info) Launching fail over script
["/Library/Filesystems/Xsan/bin/cvfail" superglue.22squared.com 49358
TestVol]
[0426 04:01:18.541537] 0x7fff70884be0 (Debug) Starting journal log recovery.
[0426 04:01:18.666037] 0x7fff70884be0 (Debug) Completed journal log recovery.
[0426 04:01:18.666279] 0x7fff70884be0 (Debug)
Inode_init_post_activation: FsStatus 0x2d27, Brl_ResyncState 1
[0426 04:01:18] 0x11f588000 (Info) FSM Alloc: Loading Stripe Group
"MetadataAndJournal". 698.48 GB.
[0426 04:01:18] 0x11fa0c000 (Info) FSM Alloc: Loading Stripe Group
"Data1". 6.82 TB.
[0426 04:01:18] 0x11fe90000 (Info) FSM Alloc: Loading Stripe Group
"Data2". 6.82 TB.
[0426 04:01:18] 0x11ff13000 (Info) FSM Alloc: Loading Stripe Group
"Data3". 2.05 TB.
[0426 04:01:18] 0x11f588000 (Info) FSM Alloc: Stripe Group
"MetadataAndJournal" active.
[0426 04:01:18] 0x11ff13000 (Info) FSM Alloc: free blocks 89669074
with 0 blocks currently reserved for client delayed buffers.Reserved
blocks may change with client activity.
[0426 04:01:18] 0x11ff13000 (Info) FSM Alloc: Stripe Group "Data3" active.
[0426 04:01:19] 0x11fa0c000 (Warning) FSM Alloc: Stripe Group "Data1"
237190901 free blocks in 173554 fragments inserted.
[0426 04:01:19] 0x11fa0c000 (Warning) FSM Alloc: Stripe Group "Data1"
360973 free blocks in 35281 fragments ignored.
[0426 04:01:19] 0x11fa0c000 (Info) FSM Alloc: free blocks 237190901
with 0 blocks currently reserved for client delayed buffers.Reserved
blocks may change with client activity.
[0426 04:01:19] 0x11fa0c000 (Info) FSM Alloc: Stripe Group "Data1" active.
[0426 04:01:20] 0x11fe90000 (Warning) FSM Alloc: Stripe Group "Data2"
121524364 free blocks in 343051 fragments inserted.
[0426 04:01:20] 0x11fe90000 (Warning) FSM Alloc: Stripe Group "Data2"
1074777 free blocks in 139040 fragments ignored.
[0426 04:01:20] 0x11fe90000 (Info) FSM Alloc: free blocks 121524364
with 0 blocks currently reserved for client delayed buffers.Reserved
blocks may change with client activity.
[0426 04:01:20] 0x11fe90000 (Info) FSM Alloc: Stripe Group "Data2" active.
[0426 04:01:20.017067] 0x7fff70884be0 (Debug) FSUUID_init: found
`FSUUID' xattr on root inode: d7972700-9f04-4605-b58e-11d31432982a
[0426 04:01:20] 0x7fff70884be0 (Info) File system 'TestVol' requires
UTF8-NFC file names
[0426 04:01:20] 0x7fff70884be0 (Info) File system 'TestVol' supports
named streams
[0426 04:01:20] 0x7fff70884be0 (Info) File System Service 'TestVol[0]'
now active on host 'superglue.22squared.com:49358'.
[0426 04:01:20] 0x11f20a000 (Info) Inode_fl_scan: starting scan
[0426 04:01:20.033522] 0x7fff70884be0 (Debug) Node [1]
[ducttape.atlxsan.net:55480] connected.
[0426 04:01:20.034105] 0x7fff70884be0 (Debug) Active service - NSS
ping from ducttape.atlxsan.net:55479.
[0426 04:01:20.034953] 0x7fff70884be0 (Debug) Node [2]
[chewinggum.atlxsan.net:50678] connected.
[0426 04:01:20.035444] 0x7fff70884be0 (Debug) Active service - NSS
ping from chewinggum.atlxsan.net:50677.
[0426 04:01:20.036326] 0x7fff70884be0 (Debug) Node [3]
[superglue.atlxsan.net:49365] connected.
[0426 04:01:20.036896] 0x7fff70884be0 (Debug) Active service - NSS
ping from superglue.atlxsan.net:49364.
[0426 04:01:20.039182] 0x7fff70884be0 (Debug) Node [4]
[chewinggum.atlxsan.net:50676] connected.
[0426 04:01:20] 0x11f20a000 (**Error**) add_to_free_list: inode
0x7f800000babf43 failed lookup
[0426 04:01:20] 0x11f20a000 (Info) Inode_fl_scan: scan aborted
[0426 04:01:20] 0x11f20a000 (Info) Inode_fl_scan final tally: 6978
inodes scanned, 0 pending added, 0 free added.
[0426 04:01:20.039720] 0x10b14e000 (Debug) FSM received client
capabilities: capsClient = 0xdea0eefed
[0426 04:01:20.039730] 0x10b14e000 (Debug) CvRootDir is /: CvRootCookie is 0x2
[0426 04:01:20.040165] 0x10b1d1000 (Debug) FSM received client
capabilities: capsClient = 0xdea0eefed
[0426 04:01:20.040183] 0x10b1d1000 (Debug) CvRootDir is /: CvRootCookie is 0x2
[0426 04:01:20.040657] 0x10b254000 (Debug) FSM received client
capabilities: capsClient = 0xdea0eefed
[0426 04:01:20.040673] 0x10b254000 (Debug) CvRootDir is /: CvRootCookie is 0x2
[0426 04:01:20.041140] 0x7fff70884be0 (Debug) Active service - NSS
ping from superglue.atlxsan.net:49363.
[0426 04:01:20] 0x10b14e000 (Info) Node [2]
[chewinggum.atlxsan.n:50678] Client Login (active 1).
[0426 04:01:20] 0x10b1d1000 (Info) Node [1]
[ducttape.atlxsan.net:55480] Client Login (active 2).
[0426 04:01:20] 0x10b254000 (Info) Node [3]
[superglue.atlxsan.ne:49365] Client Login (active 3).
[0426 04:01:20.043448] 0x7fff70884be0 (Debug) Node [5]
[netvault.atlxsan.net:63468] connected.
[0426 04:01:20.043985] 0x7fff70884be0 (Debug) Active service - NSS
ping from netvault.atlxsan.net:63467.
[0426 04:01:20.044705] 0x7fff70884be0 (Debug) Node [6]
[nailgun.atlxsan.net:55490] connected.
[0426 04:01:20.045133] 0x7fff70884be0 (Debug) Active service - NSS
ping from nailgun.atlxsan.net:55489.
[0426 04:01:20.045527] 0x7fff70884be0 (Debug) Active service - NSS
ping from netvault.atlxsan.net:63466.
[0426 04:01:20.045915] 0x7fff70884be0 (Debug) Active service - NSS
ping from ducttape.atlxsan.net:55478.
[0426 04:01:20.101731] 0x10dfdf000 (Debug) FSM received client
capabilities: capsClient = 0xdea0eefed
[0426 04:01:20.101751] 0x10dfdf000 (Debug) CvRootDir is /: CvRootCookie is 0x2
[0426 04:01:20] 0x10dfdf000 (Info) Node [6]
[nailgun.atlxsan.net:55490] Client Login (active 4).
[0426 04:01:20.104919] 0x10e062000 (Debug) FSM received client
capabilities: capsClient = 0xdea0eefed
[0426 04:01:20.104940] 0x10e062000 (Debug) CvRootDir is /: CvRootCookie is 0x2
[0426 04:01:20] 0x10e062000 (Info) Node [5]
[netvault.atlxsan.net:63468] Client Login (active 5).
[0426 04:01:20] 0x10b6ef000 (**FATAL**) PANIC:
/Library/Filesystems/Xsan/bin/fsm ASSERT failed "IP_XATTR_INODE(ip)"
file fsm_xattr.c, line 736
[0426 04:01:20] 0x10b6ef000 (**FATAL**) PANIC: wait 3 secs for journal to flush
[0426 04:01:20.218464] 0x103a9e000 (Debug)
timed_free_pending_inode_thread: flushing journal.
[0426 04:01:20.218487] 0x103a9e000 (Debug)
timed_free_pending_inode_thread: journal flush complete.
[0426 04:01:20] 0x10b6ef000 (**FATAL**) PANIC: aborting threads now.
Logger_thread: sleeps/12 signals/0 flushes/7 writes/7 switches 0
Logger_thread: logged/80 clean/80 toss/0 signalled/0 toss_message/0
Logger_thread: waited/0 awakened/0 |
|
| Back to top |
|
 |
cthomasquinlan Been around the blocks

Joined: 20 Jan 2010 Posts: 21
|
Posted: Mon Apr 26, 2010 3:00 pm Post subject: |
|
|
Seems like more and more people are encountering this issue. We ran into this a few weeks back and were able to solve it. First off, what version of Xsan are you running? And what OS are your MDCs/clients at?
Check this posting for someone running into a very similar issue:
http://www.xsanity.com/forum/viewtopic.php?t=7983&postdays=0&postorder=asc&start=0
Our solution (after many failed attempts with cvfsck -j, -wv, and such) was to hit the volume with a cvfsck -C for the free inode errors, followed by a cvfsck -X for the xattr errors. I know you're concerned about data loss but you'll soon run out of options.
Once we cleared the xattr errors, we couldn't mount the volume (it would panic with a reference to a line in xattr.c, as you're encountering) until we started it with a temporary 10.6.2 MDC we added in (we were all on 10.5.8, so we edited the failover priority for the vol so that when we tried to start it from cvadmin, the temp 10.6.2 server would take control). This allowed the volume to get past the fsm panic and mount, which we then failed over to the PMDC, and everything returned to normal.
If you post some more info we could try to go through it, as we just had to deal with this. |
|
| Back to top |
|
 |
robert.isherwood JBOD

Joined: 26 Apr 2010 Posts: 3
|
Posted: Mon Apr 26, 2010 3:41 pm Post subject: |
|
|
| cthomasquinlan wrote: | | First off, what version of Xsan are you running? And what OS are your MDCs/clients at? |
Yep - XSAN 2.2.1 with 10.6 MDC's and clients. Everything's up to date.
Great post - we did review it and followed the suggestions. Still - no joy. Running out of options is where we are.
We did cvfsck -C and that did claim it cleared up a lot of errors. Still no mount. cvfsck -wv run multiple times shows everything is good, except for three extent issues that come up repeatedly.
Next step looks like cvfsck -X. That should clear the Extents, but will it help with the odd XATTR fsm panics?
And then I'm really hoping that I don't have to actually understand this excellent posting:
http://www.xsanity.com/article.php?story=20080327101938881&query=cvfsdb |
|
| Back to top |
|
 |
cthomasquinlan Been around the blocks

Joined: 20 Jan 2010 Posts: 21
|
Posted: Mon Apr 26, 2010 5:12 pm Post subject: |
|
|
cvfsck -X cleared up the extent errors for us, but we were still unable to start the volume without it panicking after ten or so seconds with the same line you've encountered.
We had 10.5.8 mdcs and neither would start the volume, and adding another 10.5.8 server didn't change anything, so we added a temporary 10.6.2 mdc to see if it'd happen with SL, edited failover and started the vol with that server, which worked immediately... so fast it was barely cathartic. If you're already on 10.6.2 for your mdcs, maybe try promoting a client to an mdc temporarily and start the volume with it, but SL server was the magic change that let us start the FS after clearing the errors reported in cvfsck.
And yeah, you might be looking at some fun with cvfsdb. The main goal obviously would be just to be able to start the volume, and hopefully it'll be somewhat obvious if things are missing or corrupt. |
|
| Back to top |
|
 |
robert.isherwood JBOD

Joined: 26 Apr 2010 Posts: 3
|
Posted: Tue Apr 27, 2010 2:07 pm Post subject: And we're back - so far |
|
|
Interestingly - running cvfsck -X and then cvfsck -C then -w several times has healed it. It seems so far. Good news! -- so far... And being on 10.6.2 is a good thing.
I really appreciate your suggestions. |
|
| Back to top |
|
 |
cthomasquinlan Been around the blocks

Joined: 20 Jan 2010 Posts: 21
|
Posted: Tue Apr 27, 2010 2:22 pm Post subject: |
|
|
It took both -C and -X for us as well. Dangerous, but no issues since.
Out of curiosity, did you start out on 2.2 and upgrade to 2.2.1, or did you start at 2.2.1?
For some reason it seems like using Xsan 2.2 or starting out on 2.2 and upgrading is the common denominator for this issue, but if you start @ 2.2.1 this doesn't occur... It would be nice to find the pattern. For the time being we either do 10.5.8 and 2.1.1 which was very stable for us for a long time, or 10.6.3 and 2.2.1 (10.6.3 seems to have resolved some of the Finder issues.)
Did you have any data loss? Regardless, if I were you, I'd start backing up anything that isn't yet in case the issue returns. |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|
|