XSAN Controller

Nolf's picture
Tags: 

Good afternoon. Faced with the problem of constant forking Comptroller xsan. That is 2 minutes ready, 5 minutes Offline. The logs at the moment, the following occurs:
servermgrd [910] xsan: [910/2112D0] ERROR: get_fsmvol_at_index: Could not connect to FSM because Admin Tap Connection to FSM failed: [errno 60]: Operation timed out
servermgrd [910] xsan: [910/6033400] ERROR: get_fsmvol_at_index: Could not connect to FSM because Admin Tap Connection to FSM failed: [errno 60]: Operation timed out

Volume mount only on the controller when the controller when you try to get online to connect to the client the following error:
kernel Could not mount filesystem prod_f, cvfs error 'Timeout' (25)
kernel Could not mount filesystem prod_f, cvfs error 'Timeout' (25)
kernel Could not mount filesystem prod_f, cvfs error 'Timeout' (25)
kernel Could not mount filesystem prod_f, cvfs error 'Timeout' (25)
postfix / master [3029] master exit time has arrived
kernel Could not mount filesystem prod_f, cvfs error 'Timeout' (25)
com.apple.xsan [2774] mount_acfs: Operation timed out [2774] mount of volume 'prod_f' failed (exit code = 22)
servermgrd [53] xsan: [53/3199F60] ERROR: - [SANFilesystem mountVolumeNamed: writable: withOptions:]: mount of 'prod_f' failed: Unable to mount volume `prod_f '

Please help sort out the problem and solve it. Thanks in advance!!!

arls1481's picture

you need to check DNS first, make sure clients and MDCs can resolve their own network names.
if one MDC can't ping itself or other MDCs then that's where you need to start in order to sort this out.
sudo changeip -checkhostname
will help you sort out any issues with DNS

Also, check your stripe groups for a failed array.
from cvadmin:
select prod_f
show

look for any groups that say 'down' for a status.
If you find any go to admin interface on your storage arrays and mitigate.

Nolf's picture

Thank you very much for your reply.
What DNS is swinging on the controller and the client provides that everything is OK:

The names match. There is nothing to change.
dirserv:success = "success"

But with cvadmin gives an error at startup:

admin$ sudo cvadmin
Password:
Xsan Administrator

Enter command(s)
For command help, enter "help" or "?".

List FSS

File System Services (* indicates service is in control of FS):
1>*prod_f[0] located on 10.17.18.22:49165 (pid 116)

Select FSM "prod_f"

Admin Tap Connection to FSM failed: [errno 60]: Operation timed out
Cannot select FSS "prod_f"
Xsanadmin> select prod_f
Select FSM "prod_f"

Admin Tap Connection to FSM failed: [errno 60]: Operation timed out
Cannot select FSS prod_f

Xsanadmin> select prod_f
Select FSM "prod_f"

Admin Tap Connection to FSM failed: [errno 60]: Operation timed out
Cannot select FSS prod_f

Xsanadmin>

What to do in this situation?

-=Learning is never too late=-

csanchez's picture

changeip -checkhostname just checks your public IP. I suggest you check that all DNS servers answer promptly for PTR lookups of your metadata IP address--in particular of the client trying to mount the volume. Test with:

dig -x metadata_ip_address_of_client @ip_of_dns_server

Do that for each nameserver IP address visible in output of scutil --dns. They must all respond immediately. Adding /etc/hosts file entries is not a fix.

Also, check that you don't have packet loss across your metadata network between client and MDCs. Try pinging the IP of the MDC from a client for a minute or so. Any amount of packet loss could be problematic. If the problem affects all clients but the MDC, start with the MDC's metadata network link.

Nolf's picture

I like the controller and clients 2 network:
1. Production
2. Metadata
In the DNS server was added only network production. Once added a metadata record for the network it worked!!! It's not just not likely!!
  All great thank you very much for your help!!<

-=Learning is never too late=-

Nolf's picture

got to make:

Xsanadmin (prod_f) > show
Show stripe groups (File System "prod_f")

Stripe Group 0 [MetadataAndJournal] Status:Up,MetaData,Journal,Exclusive
Total Blocks:12206080 (186.25 GB) Reserved:0 (0.00 B) Free:12163800 (185.60 GB) (99%)
MultiPath Method:Rotate
Primary Stripe [MetadataAndJournal] Read:Enabled Write:Enabled

Stripe Group 1 [Data] Status:Up
Total Blocks:1696776192 (25.28 TB) Reserved:270720 (4.13 GB) Free:295076480 (4.40 TB) (17%)
MultiPath Method:Rotate
Primary Stripe [Data] Read:Enabled Write:Enabled

-=Learning is never too late=-