Xsan administration using cvadmin

beau@318's picture

Of all the command line tools available to assist you in the management of an Xsan volume, cvadmin serves as the primary command line tool to interface with, troubleshoot, and ultimately administer your Xsan. It provides for capabilities beyond what is available through Apple-provided GUI tools, which can be extremely useful for resolving the more complex problems that can crop up. Given the reported issues with Xsan Admin occasionally misreporting information, cvadmin is one of the tools that can help to make your experience administering an Xsan much less frustrating.

Read more below...

The basics:
Located at /Library/Filesystems/Xsan/bin/cvadmin, this tool can be run either interactively or non-interactively with the '-e' flag. For the purposes of this article, we'll be using primarily the interactive function of cvadmin. In it's most basic form, cvadmin is started with no arguments, but must run as root. If you fire up cvadmin from a different user account, the tool will run, but you will not be able to access your filesystems. To get started we'll launch the program with elevated privileges:

fig. 1
$ sudo /Library/Filesystems/Xsan/bin/cvadmin
Xsan Administrator

Enter command(s)
For command help, enter "help" or "?".

List FSS

File System Services (* indicates service is in control of FS):
1>*MyVolume[0] located on 192.168.56.5:51520 (pid 512)
2> MyVolume[1] located on 192.168.56.6:51520 (pid 509)

Select FSM "MyVolume"

Created : Tue Jan 13 15:33:57 2009
Active Connections: 1
Fs Block Size : 16K
Msg Buffer Size : 4K
Disk Devices : 2
Stripe Groups : 2
Fs Blocks : 61277616 (935.02 GB)
Fs Blocks Free : 61006893 (930.89 GB) (99%)


Xsanadmin (MyVolume) >


When cvadmin is first invoked it displays all of the valid File System Services (which in this context means volumes per metadata controller) and selects our only volume. In this particular instance, you see that there are two entries for "MyVolume". This is completely normal as you should see one entry per volume per metadata controller. In this case, we have one volume and 2 metadata controllers, so we have 2 entries. The asterisk denotes the active FSS (or active metadata controller), in this case 192.168.56.5.

In order to perform any worth while tasks using these tools, we next need to 'select' a volume. In this particular instance, there is only one volume and so cvadmin 'selected' the active volume and displayed the statistics for that volume. But in environments where there are multiple volumes you will need to 'select' a volume before you proceed with using cvadmin. The cvadmin prompt will always display the active volume (in this case 'MyVolume'), so there is no confusion as to which is being administered:

Xsanadmin (MyVolume) >

If more than one volume existed, we may desire to operate on a different volume, to do so we can simply run the command ">select volumeName" which would then select the volume "volumeName" and output statistics for it. Alternatively, the select command can be run with no arguments to output a list of all availableFile System Services.

fig 2.
Xsanadmin (MyVolume) > select
List FSS

File System Services (* indicates service is in control of FS):
1>*MyVolume[0] located on 192.168.56.5:51520 (pid 512)
2> MyVolume[1] located on 192.168.56.6:51520 (pid 509)


Practical Administration:
Ok, so we now know how to run the tool and select a particular volume, but what else can we do with it? Well, the answer is "quite a bit"; for the most part, my day-to-day Xsan administration happens only in cvadmin, the only reason that I typically fire up Xsan Admin is to help end users who rely on it. A full list of commands is available in the cvadmin man pages, or by typing "help" in an interactive cvadmin session. That being said, the commands listed below are my most frequently used:

>who
List all metadata controllers, client, and administration sessions open relating to this volume. Nodes with the volume mounted will be indicated with a [CLI] entry. metadata controllers will be indicated with

>stats
Prints out volume statistics.

>stop/start MyVolume
>stop/start MyVolume on 192.168.5.5
The stop and start commands are equivalent to starting or stopping the volume in Xsan. However, by specifying a hostname/ip, we can stop file system services only on that particular MDC, which can be handy for maintenance purposes.

>fail MyVolume
This will failover the volume "MyVolume" and initiate a FSS vote amongst your metadata controllers. The metadata controller providing services for this volume with the highest failover priority should win this election. If no failover is available, the volume will failback to the original host that was hosting the volume.

>fsmlist
This will output a list of FSM processes on the machine that is selected, which is useful when determining which volumes it is capable of hosting as a metadata controller.

>repof
This will generate an open file report to /Library/Filesystems/Xsan/data/MyVolume/open_file_report.txt. This report includes a slew of information, but noticeably absent is the actual file name. Arg! It does offer an inode number for the file in question though, so you can use a command such as `find /Volumes/MyVolume -inum X` to determine the actual file from the published inode number. The >repof command can be very useful when attempting to determine why a client will not unmount a volume.

Troubleshooting:
There are also a number of options to help with more complex troubleshooting.

>show
>show long
This will output information about the stripe groups/storage pools used by this volume. It is useful for cross referencing index numbers outputted in system.log to human readable storage pool names. It also provides various statistics and configuration, such as stripe group role, corresponding LUNs, affinity tags, multipath method, and other useful bits of information.

>paths
This will output a list of LUNs visible to the node and the corresponding HBA port used to communicate with that LUN. This option can be helpful when you are getting those pesky "stripe group not found" errors.

>debug 0x01000000
This will output IO latency numbers to /Library/Filesystems/Xsan/data/MyVolume/log/cvlog immediately (which is otherwise done every hour). The key figure to look at in this output is the "sysavg" number for "PIO HiPriWr SUMMARY". If your metadata is hosted on an Xserve RAID volume, this number should be below 500ms. If you're using Promise based storage, the active/active controller setup introduces additional latency, you will see numbers in the 805-1000ms range.

>latency-test
Use this command to run latency tests between the FSM and clients. It can be used to isolate problematic clients on the SAN.

>activate MyVolume 192.168.56.5
Use this command to activate FSS services for the specified IP/hostname. Alternatively you can leave off the IP and it will activate the local server (if applicable). This command can be run on an MDC if it is not showing appropriate FSS services available. If you see errors to the effect that an MDC is on 'standby', activating the volume on the respective server will often address this issue.

When to Use Another Tool:
The cvadmin tool is very useful when troubleshooting metadata controller behavior. But cvadmin isn't used when you want to perform Xsan setup or client management operations. To label LUNs you would use cvlabel. To mount and unmount volumes you would likely use the new xsanctl tool or 'mount -t acfs'. To perform defrag operations and volume maintenance you would use the snfsdefrag and cvfsck tools, respectively. While you can add serial numbers and create volumes from the command line, it is probably much easier to continue performing these operations through the Xsan Admin GUI tool.

All in all, cvadmin can be a very useful tool. It is fast and can help to ease the administrative burden of managing an Xsan once you get used to it. Unlike the GUI, the information that it displays can always be trusted to be an accurate representation of the state of your SAN. It can also provide more insight when you're troubleshooting and help you to become a more lucid Xsan administrator.

Comments

2
aaron's picture

This is great stuff. Thanks.

---
Aaron Freimark
http://www.tekserve.com/vcard/af.vcf

Aaron Freimark
CTO, Tekserve

JesusAli's picture

Two Notes:

Under "Practical Administration" it seems that the descriptive text gets cut off
mid-sentance.

In the next paragraph, the Command "stats" does not work for me (xsan 2.1.1
2009) but "stat" singular does.

One question:

when I run "latency-test" I get results like "... latency 249us"

what increment is "us" ?