| View previous topic :: View next topic |
| Author |
Message |
remdem partially protected

Joined: 24 Jan 2011 Posts: 6
|
Posted: Tue Aug 23, 2011 7:00 am Post subject: Volume won't mount after power faillure |
|
|
Hi all
Please excuse my english, I'm french!
So I'm using Xsan 2.2 on Leopard with two Promise raid.
I have 2 volumes on for the video editing and prod which is mounted on mac pros, and one for network home and sharing which is shared via AFP by an Xserve.
2 days ago we had some power failure and and the UPS felt down.
On monday morning one of the two promise was red and I had to rebuild the data on 1 of the disk.
Now one of my two XSAN Volume isn't mounting anymore.
On Xsan admin it appears with a red X on the volume tab.
If I try to start the volume, it comes up for a few seconds, then a yellow I prints on the volume, and then a red X again.
When I try a cvfsck -wv it crashes after a while saying:
……
*Error*: Cannot map NT security descriptor block for NT Security descriptor: = 0x14498a8a450
Removing Bad NT Security descriptor: = 0x14498a8a450
*Error*: Cannot map NT security descriptor block for NT Security descriptor: = 0x1449c16609e
Removing Bad NT Security descriptor: = 0x1449c16609e
*Error*: Cannot map NT security descriptor block for NT Security descriptor: = 0x1449f63bb99
Removing Bad NT Security descriptor: = 0x1449f63bb99
*Fatal*: Fatal error attempting to verify NTSD's
*Error*: Fatal error checking NTSD's
*Critical*: File System Check finished - with errors.
I've already tried to disable the ACL on this volume but without any improvement.
I still cannot run the cvfsck command till the end.
the only hope I have is that a cvfsck -x XSAN_1 still prints a list of the files that were on the volume.
Do you have any idea of something I could try?
Thanks a lot.
Rémi |
|
| Back to top |
|
 |
singlemalt Xsan Master

Joined: 27 Feb 2009 Posts: 109
|
Posted: Tue Aug 23, 2011 9:23 am Post subject: |
|
|
Hi,
Are you really running just 2.2 and not 2.2.1? there were some improvements in cvfsck that came with 2.2.1. If you run
/Library/Filesystems/Xsan/bin/cvversions
on the mdc it will print which version/build you're running.
411.3 9M1207 is 2.2.1 for Leopard
412.3 10M310 is 2.2.1 for Snow Leopard
411 9M1079 is 2.2 for Leopard
412 10M220 is 2.2 for Snow Leopard |
|
| Back to top |
|
 |
remdem partially protected

Joined: 24 Jan 2011 Posts: 6
|
Posted: Tue Aug 23, 2011 9:44 am Post subject: |
|
|
You're right I'm running 2.2.1:
Server Revision 3.5.0 Build 7443 Branch branches_35X (411.3)
Not a lot imporvement since last message, cvfsck still don't stop before the end.
I have been able to get the list of the files on the volume and to restore some of these files using cvfsdb, but still not mounting.
Thanks for your help! |
|
| Back to top |
|
 |
morphenine Xsan Master

Joined: 22 Dec 2008 Posts: 126
|
Posted: Tue Aug 23, 2011 7:23 pm Post subject: |
|
|
Try a cvfsck -jv to repair dirty journals
Another thing is I always use cvfsck -nv first (read-only) if -wv crashes it can do more damage. If -nv crashes you can still track down that problem without creating more. I make sure -nv always finishes first before I run a -wv. |
|
| Back to top |
|
 |
remdem partially protected

Joined: 24 Jan 2011 Posts: 6
|
Posted: Wed Aug 24, 2011 2:52 am Post subject: |
|
|
Thanks!
So cvfsck -jv seems to finish nicelly:
Attempting to acquire arbitration block... successful.
Creating MetadataAndJournal allocation check file.
Creating Data-1 allocation check file.
Creating Data-2 allocation check file.
Recovering Journal Log.
File System Journal Recovery completed successfully.
But tring to run read onlny test useing -nv it stops with the same error
message that the read right one:
Removing Bad NT Security descriptor: = 0x108006d0bdb
*Fatal*: Fatal error attempting to verify NTSD's
*Error*: Fatal error checking NTSD's |
|
| Back to top |
|
 |
remdem partially protected

Joined: 24 Jan 2011 Posts: 6
|
Posted: Thu Aug 25, 2011 9:41 am Post subject: |
|
|
Ok, so I'm still stuck with my XSAN problem.
I've tryed some other commands like
cvfsck -vCw
or
cvfsck -cKw
But with no luck, and always the same error message about NT Security Descriptor.
Do you know what these Bad NT Security descriptor are, and if there is a way to force delete them, or something?
I begin to think that there is no way I can get my volume back to work, so is there a hack or any piece of software I can try to copy some datas to another volume?
If you have anything that can help....
Thanks!
Rémi |
|
| Back to top |
|
 |
sf809 Been around the blocks

Joined: 21 Jan 2010 Posts: 27
|
|
| Back to top |
|
 |
remdem partially protected

Joined: 24 Jan 2011 Posts: 6
|
Posted: Fri Aug 26, 2011 5:05 am Post subject: |
|
|
Thanks,
I will give it a try.
I'm still hoping for a way to get the volume back to work, but his hack can be a good workaround.
Thanks again! |
|
| Back to top |
|
 |
ravi Xsan Master

Joined: 06 Mar 2008 Posts: 149
|
Posted: Fri Aug 26, 2011 8:20 am Post subject: |
|
|
| remdem wrote: | Thanks,
I will give it a try.
I'm still hoping for a way to get the volume back to work, but his hack can be a good workaround.
Thanks again! |
Hi
There is another way, you can potentially "zero out" the security descriptors: it consists of replacing the current NT security descriptor inode with something else using cvfsdb, and then running cvfsck and get past the problem. It is a dangerous technique though. |
|
| Back to top |
|
 |
remdem partially protected

Joined: 24 Jan 2011 Posts: 6
|
Posted: Sat Aug 27, 2011 3:53 am Post subject: |
|
|
Thanks Ravi.
It sounds to me like a good idea, as the security descriptor really seem to be the matter for cvfsck.
But I took a look at the help page of cvfsdb and I'm not sure how to do it.
Is the command something like ondisk, or inode?
If you have further infos, I'm interested!
Thanks again.
Rémi |
|
| Back to top |
|
 |
ravi Xsan Master

Joined: 06 Mar 2008 Posts: 149
|
Posted: Sun Aug 28, 2011 1:23 pm Post subject: |
|
|
| remdem wrote: | Thanks Ravi.
It sounds to me like a good idea, as the security descriptor really seem to be the matter for cvfsck.
But I took a look at the help page of cvfsdb and I'm not sure how to do it.
Is the command something like ondisk, or inode?
If you have further infos, I'm interested!
Thanks again.
R�mi |
Hi
I should send a detailed article to Aaron/MattG about this, couldn't get around doing it, but here is the basic outline. Again, usual disclaimers apply. Caution, caution, backup, backup! I am not responsible, and neither is the company I work for!
(1) Use cvfsdb to save the contents of the current NT idx inode (sb_NTSecurityIdxInode). You can use cvfsdb <volume name> and then the interactive command "show sb" to find the inode number [ see http://www.xsanity.com/article.php?story=20080327101938881 ]
Assuming the inode number in this example is 0x5, you can use the interactive command
save 0x5 /somewhere/current_idx
You should see some thing like the message
Saved 8 of 8 blocks (100%)
IF IT IS NOT 8 blocks, STOP, do not proceed further.
(2) Create an Xsan volume with USB LUNs/dmg LUNs/fibre LUNs etc with the same file system block size as the damaged volume. Enable ACLs, apply something very simple for this volume, say admin full control.
(3) Use cvfsdb to save the contents of the NT idx node of this new volume as outlined in (1), eg., /somewhere/new_idx
(4) Make sure the sizes of current_idx and new_idx are the same. If you are using a 16K block size, the size should be 131072 (8 blocks).
(5) Use cvfsdb with the damaged volume to replace 0x5 with the contents of the new idx:
replace 0x5 /somewhere/new_idx
[You should see an English impaired message
Restoring data from inode 0x5 to file "/somewhere/new_idx"
Replaced 8 of 8 blocks (100%)
]
(6) Now run cvfsck on the damaged volume, this should get rid of the NT security descriptor problem and the file system check should proceed and work as usual. You will lose all the ACLs for the current volume however and may have to recreate them.
[It may very well happen that you may have to run cvfsck -X as well to clear the extended attribute chain as well, but that might result in an RPL update that could take forever (days, weeks etc) depending on how many files you have since it needs to deal with every single file.] |
|
| Back to top |
|
 |
|