Xsan Volume Failed to start after a power failure.

mahmoud's picture

Hello ,
yesterday we had a power failure and unfortunately the ups failed at the same time so the power went out on the mdc , promise and xserve raid.
we use the promise to make a volume called Media_HD
and Xserve Raid to create a volume called Media.
after the power came back again the Media_HD was working fine but the Media was stopped and refusing to start again , i tried to repair it using cvfsck and here is what i got :

[=red]After cvfsck -j:

    • WARNING ** This file system check may modify the meta-data of the

--- [MEDIA] ---

file system. This procedure cannot be un-done!

Do you want to proceed? (Y/N) -> y

Created directory /tmp/cvfsck1734a for temporary files.

Attempting to acquire arbitration block... successful.

Creating MetadataAndJournal allocation check file.
Creating Video allocation check file.
Creating Audio allocation check file.

Recovering Journal Log.

File System Journal Recovery completed successfully.
/color

[=orange]then cvfsck -nv:
Created directory /tmp/cvfsck1768a for temporary files.
Creating MetadataAndJournal allocation check file.
Creating Video allocation check file.
Creating Audio allocation check file.

    • NOTE ** Read Only Check.

File system journal will not be recovered.
The results may be inconsistent and mis-leading.

Super Block information.
FS Created On : Mon Feb 4 21:07:08 2013
Inode Version : '2.7' - 4.0 big inodes + NamedStreams (0x207)
File System Status : Clean
Allocated Inodes : 52736
Free Inodes : 45411
FL Blocks : 9
Next Inode Chunk : 0x15e4
Metadump Seqno : 0
Restore Journal Seqno : 0
Windows Security Indx Inode : 0x6
Windows Security Data Inode : 0x7
Quota Database Inode : 0x8
ID Database Inode : 0xb
Client Write Opens Inode : 0x9

Stripe Group MetadataAndJournal ( 0) 0xaea760 blocks.
Stripe Group Video ( 1) 0x2baa060 blocks.
Stripe Group Audio ( 2) 0x36948e0 blocks.

Inode block size is 1024

Building Inode Index Database 52736 (100%).
52736 inodes found out of 52736 expected.

Verifying NT Security Descriptors
Found 110 NT Security Descriptors: all are good

Verifying Free List Extents.

Scanning inodes 10240 ( 19%).
Xattr inode flag not set in 0x11b7c0.
Clearing xattr chain for inode 0x11b7bf.
Scanning inodes 52736 (100%).

Sorting extent list for MetadataAndJournal pass 1/1
Updating bitmap for MetadataAndJournal extents 937 ( 7%).
Sorting extent list for Video pass 1/1
Updating bitmap for Video extents 7020 ( 54%).
Sorting extent list for Audio pass 1/1
Updating bitmap for Audio extents 12779 (100%).

Checking for dead inodes 10240 ( 19%).
Clearing inode 0x100000011b7c0.
Checking for dead inodes 52736 (100%).

Checking directories Directory entry Document Templates (p=0x1000000115fe5 i=0x100000011ed54 flags=0x23) is an orphan. Removed.
Directory entry Frameworks (p=0x1000000115fe8 i=0x1000000116e41 flags=0x23) is an orphan. Removed.
Dot error! dirent inode: 0x18b653 db inode: 0x100000011c843
Directory entry (p=0x100000011c843 i=0x100000011c843) has an invalid inode. Placing entry in bad inode list.
Directory entry (p=0x100000011c843 i=0x0) has an invalid inode. Removing entry.
Directory entry (p=0x100000011c843 i=0x100000011c843) has an invalid inode. Placing entry in bad inode list.
Directory entry (p=0x100000011c843 i=0x0) has an invalid inode. Removing entry.
Directory entry (p=0x100000011c843 i=0x100000011c843) has an invalid inode. Placing entry in bad inode list.
Directory entry (p=0x100000011c843 i=0x0) has an invalid inode. Removing entry.
Directory entry (p=0x100000011c843 i=0x100000011c843) has an invalid inode. Placing entry in bad inode list.
Directory entry (p=0x100000011c843 i=0x0) has an invalid inode. Removing entry.
Directory entry (p=0x100000011c843 i=0x100000011c843) has an invalid inode. Placing entry in bad inode list.
Directory entry (p=0x100000011c843 i=0x0) has an invalid inode. Removing entry.
Directory entry (p=0x100000011c843 i=0x100000011c843) has an invalid inode. Placing entry in bad inode list.
Directory entry (p=0x100000011c843 i=0x0) has an invalid inode. Removing entry.
Directory entry (p=0x100000011c843 i=0x100000011c843) has an invalid inode. Placing entry in bad inode list.
Directory entry (p=0x100000011c843 i=0x0) has an invalid inode. Removing entry.
Directory entry mc_mux_mp4.framework (p=0x100000011c843 i=0x100000011c9c2 flags=0x23) is an orphan. Removed.
Directory entry (p=0x100000011c843 i=0x100000011c843) has an invalid inode. Placing entry in bad inode list.
Directory entry (p=0x100000011c843 i=0x0) has an invalid inode. Removing entry.
Directory entry (p=0x100000011c843 i=0x100000011c843) has an invalid inode. Placing entry in bad inode list.
Directory entry (p=0x100000011c843 i=0x0) has an invalid inode. Removing entry.
Directory entry (p=0x100000011c843 i=0x100000011c843) has an invalid inode. Placing entry in bad inode list.
Directory entry (p=0x100000011c843 i=0x0) has an invalid inode. Removing entry.
Directory entry (p=0x100000011c843 i=0x100000011c843) has an invalid inode. Placing entry in bad inode list.
Directory entry (p=0x100000011c843 i=0x0) has an invalid inode. Removing entry.
Directory entry (p=0x100000011c843 i=0x100000011c843) has an invalid inode. Placing entry in bad inode list.
Directory entry (p=0x100000011c843 i=0x0) has an invalid inode. Removing entry.
Directory entry (p=0x100000011c843 i=0x100000011c843) has an invalid inode. Placing entry in bad inode list.
Directory entry (p=0x100000011c843 i=0x0) has an invalid inode. Removing entry.
Directory entry (p=0x100000011c843 i=0x100000011c843) has an invalid inode. Placing entry in bad inode list.
Directory entry (p=0x100000011c843 i=0x0) has an invalid inode. Removing entry.
Directory entry (p=0x100000011c843 i=0x100000011c843) has an invalid inode. Placing entry in bad inode list.
Directory entry (p=0x100000011c843 i=0x0) has an invalid inode. Removing entry.
Directory entry mc_mux_mxf.framework (p=0x100000011c843 i=0x100000011c9ce flags=0x23) is an orphan. Removed.
Directory entry (p=0x100000011c843 i=0x100000011c843) has an invalid inode. Placing entry in bad inode list.
Directory entry (p=0x100000011c843 i=0x0) has an invalid inode. Removing entry.
Directory entry (p=0x100000011c843 i=0x100000011c843) has an invalid inode. Placing entry in bad inode list.
Directory entry (p=0x100000011c843 i=0x0) has an invalid inode. Removing entry.
Directory entry (p=0x100000011c843 i=0x100000011c843) has an invalid inode. Placing entry in bad inode list.
Directory entry (p=0x100000011c843 i=0x0) has an invalid inode. Removing entry.
Directory entry (p=0x100000011c843 i=0x100000011c843) has an invalid inode. Placing entry in bad inode list.
Directory entry (p=0x100000011c843 i=0x0) has an invalid inode. Removing entry.
Directory entry (p=0x100000011c843 i=0x100000011c843) has an invalid inode. Placing entry in bad inode list.
Directory entry (p=0x100000011c843 i=0x0) has an invalid inode. Removing entry.
Directory entry (p=0x100000011c843 i=0x100000011c843) has an invalid inode. Placing entry in bad inode list.
Directory entry (p=0x100000011c843 i=0x0) has an invalid inode. Removing entry.
Directory entry (p=0x100000011c843 i=0x100000011c843) has an invalid inode. Placing entry in bad inode list.
Directory entry (p=0x100000011c843 i=0x0) has an invalid inode. Removing entry.
Directory entry (p=0x100000011c843 i=0x100000011c843) has an invalid inode. Placing entry in bad inode list.
Directory entry (p=0x100000011c843 i=0x0) has an invalid inode. Removing entry.
Directory entry (p=0x100000011c843 i=0x100000011c843) has an invalid inode. Placing entry in bad inode list.
Directory entry (p=0x100000011c843 i=0x0) has an invalid inode. Removing entry.
Directory entry (p=0x100000011c843 i=0x100000011c843) has an invalid inode. Placing entry in bad inode list.
Directory entry (p=0x100000011c843 i=0x0) has an invalid inode. Removing entry.
Directory entry (p=0x100000011c843 i=0x100000011c843) has an invalid inode. Placing entry in bad inode list.
Directory entry (p=0x100000011c843 i=0x0) has an invalid inode. Removing entry.
Directory entry (p=0x100000011c843 i=0x100000011c843) has an invalid inode. Placing entry in bad inode list.
Directory entry (p=0x100000011c843 i=0x0) has an invalid inode. Removing entry.
Directory entry (p=0x100000011c843 i=0x100000011c843) has an invalid inode. Placing entry in bad inode list.
Directory entry (p=0x100000011c843 i=0x0) has an invalid inode. Removing entry.
Directory entry (p=0x100000011c843 i=0x100000011c843) has an invalid inode. Placing entry in bad inode list.
Directory entry (p=0x100000011c843 i=0x0) has an invalid inode. Removing entry.
Directory entry (p=0x100000011c843 i=0x100000011c843) has an invalid inode. Placing entry in bad inode list.
Directory entry (p=0x100000011c843 i=0x0) has an invalid inode. Removing entry.
Directory entry (p=0x100000011c843 i=0x100000011c843) has an invalid inode. Placing entry in bad inode list.
Directory entry (p=0x100000011c843 i=0x0) has an invalid inode. Removing entry.
Directory entry (p=0x100000011c843 i=0x100000011c843) has an invalid inode. Placing entry in bad inode list.
Directory entry (p=0x100000011c843 i=0x0) has an invalid inode. Removing entry.

  • Error*: Cannot do lookup of '' entry in inode 0x100000011c843.
  • Fatal*: Removal of file failed!
  • Warning*: Cannot remove temp dir '/tmp/cvfsck1768a' - Directory not empty.
  • Critical*: File System Read-Only Check finished - with errors.

/color

[=green]finally cvfsck -wv :

Created directory /tmp/cvfsck1782a for temporary files.

Attempting to acquire arbitration block... successful.

Creating MetadataAndJournal allocation check file.
Creating Video allocation check file.
Creating Audio allocation check file.

Recovering Journal Log.

Super Block information.
FS Created On : Mon Feb 4 21:07:08 2013
Inode Version : '2.7' - 4.0 big inodes + NamedStreams (0x207)
File System Status : Clean
Allocated Inodes : 52736
Free Inodes : 45411
FL Blocks : 9
Next Inode Chunk : 0x15e4
Metadump Seqno : 0
Restore Journal Seqno : 0
Windows Security Indx Inode : 0x6
Windows Security Data Inode : 0x7
Quota Database Inode : 0x8
ID Database Inode : 0xb
Client Write Opens Inode : 0x9

Stripe Group MetadataAndJournal ( 0) 0xaea760 blocks.
Stripe Group Video ( 1) 0x2baa060 blocks.
Stripe Group Audio ( 2) 0x36948e0 blocks.

Inode block size is 1024

Building Inode Index Database 52736 (100%).
52736 inodes found out of 52736 expected.

Verifying NT Security Descriptors
Found 110 NT Security Descriptors: all are good

Verifying Free List Extents.

Scanning inodes 10240 ( 19%).
Xattr inode flag not set in 0x11b7c0.
Clearing xattr chain for inode 0x11b7bf.
Scanning inodes 52736 (100%).

Sorting extent list for MetadataAndJournal pass 1/1
Updating bitmap for MetadataAndJournal extents 937 ( 7%).
Sorting extent list for Video pass 1/1
Updating bitmap for Video extents 7020 ( 54%).
Sorting extent list for Audio pass 1/1
Updating bitmap for Audio extents 12779 (100%).

Checking for dead inodes 10240 ( 19%).
Clearing inode 0x100000011b7c0.
Checking for dead inodes 52736 (100%).

Checking directories Directory entry Document Templates (p=0x1000000115fe5 i=0x100000011ed54 flags=0x23) is an orphan. Removed.
Directory entry Frameworks (p=0x1000000115fe8 i=0x1000000116e41 flags=0x23) is an orphan. Removed.
Dot error! dirent inode: 0x18b653 db inode: 0x100000011c843
Directory entry (p=0x100000011c843 i=0x100000011c843) has an invalid inode. Placing entry in bad inode list.
Directory entry (p=0x100000011c843 i=0x0) has an invalid inode. Removing entry.
Dot error! dirent inode: 0x18b653 db inode: 0x100000011c843

  • Critical*: File System Check finished - with errors.

/color

i don't know what i can do else , i hope you can help me in that issue as i'm really in a big trouble now.

abstractrude's picture

In this scenario I would restore from backup. Does your Xserve RAID still have working battery modules? Or did the power go out while things were being written with no battery support.

I guess you can also try a -C first then a -X even thought I think thats headed in the wrong direction(doesn't seem to be an issue with the free inode list). I have seen clobbers make a normally failing cvfsck -wv finish repairs and at least get the system to mount for data recovery.

-Trevor Carlson
THUMBWAR

mahmoud's picture

abstractrude
Thanks a lot for your reply , i really appreciate your help.
actually i think that the Xserve raid battery is the problem as we never replace it before and i read that it lasts for maximum 2 years.
now we are working on recovering the data from backup storage.
regarding the cvfsck -C command does this have any effect on the working volume (Media_HD) or i can select the volume to apply it.

abstractrude's picture

mahmoud. You always those battery backup for RAID controllers as they will save you during power outages. Once you get power back fire up the RAID first and let it finish transactions stored in the battery before the power outage. You usually have about 72 hours before the battery also loses its available power. But I would bring in a UPS and power up the RAID via battery to close transactions. Another technique if you have a dying RAID battery is to use simple signaling protocol via a serial cable.

This will cause you RAIDS to switch into a write to disk mode that will prevent transactions not making it to disk.

As for the Clobber, you will want to use it a volume basis. cvfsck -C Volumename

I would run a -j first incase.

-Trevor Carlson
THUMBWAR

mahmoud's picture

abstractrude
first of all i really want to thank you for your help and sorry for my late reply as i didn't manage to post it due to some problem with browsing the site .
i tried the command on the volume and it worked but still gives error using cvfsck -nv and canot be repaired with -wv as well but i can fully access the volume . as a conclusion yes it was a battery problem as i found that there was no battery at all so you clarification was absolutely right , i will try to buy the modules and with new batteries but may they decide to buy a new storage instead.

any way thanks again for your help.

abstractrude's picture

No problem, yeah those cache batteries are crucial in power failure scenarios. I would buy new storage. The idea that you are using 7 year old EOL equipment is scary. We can help you spec out some new storage here as well. Let us know if you need help.

-Trevor Carlson
THUMBWAR

mahmoud's picture

i can't agree more , yes they have to buy a new storage .
at this point we have to options :
1- continuing using xsan structure and but some kind of 8Gbps channel storage (promise Ex30 , Galaxy ) .

2- change to a turnkey system like terrablock or the new promse VTrak A-Class.

what would you recommend ?

abstractrude's picture

Why don't you tell us more about your workflow, users, video etc.

-Trevor Carlson
THUMBWAR

mahmoud's picture

it is a video workflow based on mac systems minimum 5 workstations , HD up to RED 5k material .Inthe coming days i think about merge other system like windows and linux to get advantage of using lustre or low cost windows workstations.

abstractrude's picture

Xsan is about low cost as you can go for a shared filesystem. Sounds like you are already on a tight budget, I would recommended some modern storage and SAN controllers and infrastructure. Lets come up with a budget and workflow needs (Straight bandwidth requirement.)

-Trevor Carlson
THUMBWAR