Xsan AFP reshare problems

Hi all,

Firstly, sorry for cross-posting, but I did already post this over on [url=http://forums.creativecow.net/readpost/180/857759]Creative Cow/url.

I've been hitting a huge problem with our Xsan reshare this week. I'm getting to the end of my rope pretty fast, so any insights are greatly appreciated.

Our setup:

Xsan 2.3
2x MDCs @ 10.7.5
4x (ActiveStorage) head raids + 1x head chassis for metadata and other data
Qlogic 5600 switches
1x beefy Mac Pro AFP reshare @ 10.7.5
- 12-core @ 2.4GHz
- 64GB RAM
- 6x 4Gb FC link
- 2x 10Gb Ethernet bonded link via SmallTree card (lab1)
- 6x 1Gb Ethernet bonded link via SmallTree card (lab2)
Lab 1 = 40x 2013 iMacs @ 10.8.4
Lab 2 = 17x 2012 iMacs @ 10.8.4

The media that the students are using is primarily old DV-NTSC footage or AVHCD right off our new cameras. We also have several students building simple websites, and they store all their source files and images on this volume. The volume is definitely able to handle these data rates, so the problem must be with my reshare configuration. After a server reboot, the labs all perform fine, but as the day wears on, we eventually reach a point where new users can no longer connect to the AFP reshare. One class arrives, logs in, edits, logs out, (repeat), but by the fourth class, the students get a nasty prompt that the sever is unavailable. Users who are already connected can continue working just fine.

In additional to that, we occasionally have users experience choppy connections to the server. Their footage will suddenly just pause for about 30 seconds. A reboot of the client workstation usually clears this up, but not always (it'll crop up again). When I check the AFP connection list on the server, I usually see two or more duplicate connections from the workstation in question, the older "stale" ones all with increasing idle times while the active connection shows 0 idle time.

When new connections can no longer be made to the server, that's usually the point where I need to run and forcibly shut it off. Even after a fresh reboot of the server and the clients, things aren't perfect. I'm now seeing constant metadata traffic between my reshare server and the acting MDC. It's all very small data (<1MB/s), but it's constant, even when no users are connected via AFP. My acting MDC also has >100% CPU load for the fsm process, which currently boasts 299 threads (is this normal?). The kernel process on the reshare server is constantly at about 20% CPU, and the AppleFileServer process always has about 204 threads (seems high, right?).

Some frequent log messages I'm seeing on the reshare server are:

Oct 24 18:10:39 reshare-server kernel[0]: add_fsevent: unable to get path for vp 0xffffff80881e67c0 (-UNKNOWN-FILE; ret 22; type 2)
Oct 24 18:10:39 reshare-server kernel[0]: add_fsevent: unable to get path for vp 0xffffff80881e67c0 (-UNKNOWN-FILE; ret 22; type 4)
Oct 24 19:03:51: --- last message repeated 1 time ---

Oct 24 19:55:23 reshare-server AppleFileServer[958]: received message with invalid client_id 1583
Oct 24 19:55:25 reshare-server AppleFileServer[958]: _Assert: /SourceCache/afpserver/afpserver-585.7/afpserver/AFPRequest.cpp, 2006
Oct 24 19:55:25: --- last message repeated 2 times ---

Oct 24 20:33:01 reshare-server AppleFileServer[958]: MDSChannelPeerCreate: (os/kern) invalid argument
Oct 24 20:33:51: --- last message repeated 1 time ---

The first group of log messages above seems to relate to when someone has experienced a bad (stale) connection, though I can't confirm this. After said client reboots, these message cease. I was really happy when I made it the whole morning without seeing any of them, but then they started cropping up again. All these messages tend to increase with more use of the client systems, but I can't pinpoint the cause.

I've been debating converting this setup to NFS instead of AFP, but several of our workflows rely pretty heavily on the ACLs that are in place on this volume. My understanding is that Apple's nfsd doesn't pass the ACLs along to the clients. We tried NFS for a small subset of machines for a semester, but AFP was much easier and gave us our ACLs. I also have another almost identical reshare server ready and waiting to go in the rack, but I'd like to make sure I'm heading in the right direction before I throw more hardware at the situation.

Thank you!

Sam Edwards's picture

Any Mavericks out there?

Hey Folks,
I can see apple advertises xsan with the new server v3. Has anybody tried it? Sorry if I missed another thread on this subject.
thx
Sam

Chanmax07's picture

Xsan extend volume problem with total free space

Hello

I have an xsan with 2 mdc (mountain lion) with 1 eclass promise and 1 jclass and 2 volumes.

Recently I need to add space and more bandwitch.

I add 1 e class and 1 j class to my fabric swicth with four lun.

I extend the first volume in my san. After that my volume is up to 32To.

The free space of the volume is wrong. I see in xsan admin 9To of free space but I have only 9To of data in my volumes.

I think the extension of volume create a mirror of my lun.

Do you confirm that ? And what si the method to expand a volume with new contrôler and more space

King regards<

Gerard's picture

StorNext Reshare (SMB vs NFS)

Forums: 

Hello,

In my current environment, we are running
Xsan 3.0
Few fiber switches (5600s)
Few RAIDs (Promise E & Js)
Two MDCs (Xserves)
MDCs and fiber-clients are running OSX 10.6.8

We are looking into several, new solutions and StorNext is one of them. Alot of our users reshare, via AFP, from a Xserve, which has the Xsan volume attached on there

If we proceed with StorNext, we can either go with a SMB reshare over a Windows Server (Windows 7 Server) or a NFS reshare over a Linux server (Red Hat Enterprise).

In my experience with SMB, it is more stable since it will group multiple request together, reducing network overload. With NFS, it will offer better speeds

With a large group of reshare users, fifty, accessing large, Adobe files (Photoshop, Indesign, etc) at 4Gbs and up, which in your opinion would be the best reshare protocol to use?

Thanks.

Moe's picture

Disaster recovery using cvfsb ( spaces issues )

Hi All

Have a customer with an Xsan volume that got nasty, it won't mount anymore

We can retrieve files using the method described in this page

http://www.xsanity.com/article.php/20080327101938881

running commands like this works fine

[code]echo “save 0x3c4de8 /Volumes/restore/Projects/moonshine” | cvfsdb Testvol/code

My main issue is with folder paths that's got spaces in it ( tell me about naming guidelines ! )

If the destination has a space in the folder name, it doesn't work

So if we have a directory structure like VolumeRoot/Face/To Sound/

I can't retrieve files to it

running this

[code]echo "save 0x357c51f /Volumes/Restore1/Face/To\ Sound/test2.dv" | cvfsdb XSAN/code

gives : [code]syntax error. Enter "help" or "?" for help/code.

While if I eliminate the space, it works fine like below

[code]echo "save 0x357c51f /Volumes/Restore1/Face/To-Sound/test2.dv" | cvfsdb XSAN/code

I've tried various options for the syntax, but none seems to work with the spaces

Needless to say the volume structure is full of spaces ( as well as & , / and non-latin characters too! ) , so it's close to impossible to rename all such folders

Any help about including non-latin characters in this command syntax would be appreciated

Regards

tonyswu's picture

Volume Can't Be Opened Because You Don't Have Permission

Hi,

We have an Xsan environment, with 2 Mac Mini running 10.8.4 as metadata controllers, Promise x30 RAID, and 5 Mac Pro clients. Today all of a sudden Xsan stopped working. The volume now has a lock on it, and when I double click on it it says:

The folder cannot be opened because you don't have permission to see its contents.

I opened up Xsan Admin, but when I try to set permission

I tried to change permission in Xsan Admin and propagate, but it gave me a strange error saying permission denied. I then noticed that the option to stop or restart volume has grayed out.

If I try to do sudo chown or sudo chmod, it is still permission denied. If I sudo cd into the directory, I can seem to still access the files.

Can someone offer some advice? Thanks.

marook's picture

Xsan Volume will not allow access - RW mount is Locked

Hi All,

I have a client with a Xsan 2.2.2/10.6.8 system where the one Volume has started to act up.

I have run cvfsck -j & -vw and all seems fine.
I'm running snfsdefrag -vr on the volume, and a lot of files are being moved.

The volume will mount on MDC's and clients, but we can not get access to it in Finder.
The volume icon has a Lock on it in XSan Admin and in Finder.
Trying to change Permissions in XAdmin, throw error: 100069 (CANNOT_SET_ACL_FILE_ERR).
I can SU to root and access the volume in Terminal, delete files and such ok.
I have rm -f .DS_Store on the root of the volume, did not help.

I can't se any extended attributes on the volume...

Anyone have a clue what is going on here??

TIA,

Nolf's picture

Error creating Open Directory Replica OS X 10.8.5

Forums: 

Good afternoon. Need help in creating Open Directory Replica. At the moment there are 2 servers (os x 10.8.5) Some server 10.0.0.1 master OD, the second 10.0.0.2 want to set up Open Directory Replica.

slapconfig -ver (master od):
[code]admin $ sudo slapconfig-ver
2013-10-11 08:43:34 +0000 command: / usr / libexec / slapd-T cat-c-f / etc / openldap / slapd.conf-s ou = macosxodconfig, cn = config, dc = test249, dc = home
2013-10-11 08:43:34 +0000 Error execing slapcat: slapcat: slap_init no backend for "ou = macosxodconfig, cn = config, dc = test249, dc = home"
LDAP Setup Tool (slapconfig), Apple, Inc., Version 1.2/code
slapconfig-ver ( future replica od):
[code]admin $ sudo slapconfig -ver
2013-10-11 08:43:34 +0000 command: / usr / libexec / slapd-T cat-c-f / etc / openldap / slapd.conf-s ou = macosxodconfig, cn = config, dc = test249, dc = home
2013-10-11 08:43:34 +0000 Error execing slapcat: slapcat: slap_init no backend for "ou = macosxodconfig, cn = config, dc = test249, dc = home"
LDAP Setup Tool (slapconfig), Apple, Inc., Version 1.2
/code
changeip-chackhostname (master od):
[code]Primary address = 10.0.0.1

Current HostName = tech-p**.pr***
DNS HostName = tech-p**.pr***

The names match. There is nothing to change.
dirserv: success = "success"
/code
changeip-chackhostname ( future replica od):
[code]Primary address = 10.0.0.2

Current HostName = tech-s**.pr***
DNS HostName = tech-s**.pr***

The names match. There is nothing to change.
dirserv: success = "success"/code

On the master node configuration was no problem but when I tried to connect a replica error:
[code]admin $ sudo / usr / sbin / slapconfig-preflightreplica tech-p**.pr***diradmin
master.net Password:
2013-10-11 08:52:06 +0000 NSMutableDictionary * _getRootDSE (const char *): rootDSE not found
2013-10-11 08:52:06 +0000 Error: Unable to determine the master's software version./code

tonyswu's picture

How to Patch In to Fiber Switch

Hi,

Supposedly you have 2 fiber switches, and your servers / storage devices all have 2 fiber connectors, do you connect 1 to each switch, or do you connect both to 1 switch and rely on the switch uplink?

Just being curious.

AnMAX's picture

NEED HELP Xsan volume is not mounted (strange problem)

Hi Xsan guru :)
I ask the help in solving the volume mount problem. (advance I am sorry for my english)
All began with the fact that the MDC is rebooted and on the two LUNs were gone XSAN label. I relabeled these LUNs using commands:
[code]
cvlabel -c >label_list/code
then in the file label_list I corrected unknown disks on the label with a same names that were. Then I ran the [i]cvlabel label_list/i command. Finally I got the correct label on all drives.

[img]https://dl.dropboxusercontent.com/u/5920358/LUNS.png/img/img

[code]

  1. cvlabel -l

/dev/rdisk14 [Raidix meta_i 3365] acfs-EFI "META_I"Sectors: 3906830002. Sector Size: 512. Maximum sectors: 3906975711.
/dev/rdisk15 [Raidix QSAN_I 3365] acfs-EFI "QSAN_I"Sectors: 7662714619. Sector Size: 4096. Maximum sectors: 7662714619.
/dev/rdisk16 [Raidix meta_ii 3365] acfs-EFI "META_II"Sectors: 3906830002. Sector Size: 512. Maximum sectors: 3906975711.
/dev/rdisk17 [Raidix 2k_I 3365] acfs-EFI "2K_I"Sectors: 31255934943. Sector Size: 512. Maximum sectors: 31255934943.
/dev/rdisk18 [Raidix 2k_II 3365] acfs-EFI "2K_II"Sectors: 31255934943. Sector Size: 512. Maximum sectors: 31255934943.
/dev/rdisk19 [Raidix QSAN_II 3365] acfs-EFI "QSAN_II"Sectors: 7662714619. Sector Size: 4096. Maximum sectors: 7662714619.
/code

The volume [2K] starts successfully.

[img]https://dl.dropboxusercontent.com/u/5920358/VOLUME.png/img

but not mounted on the MDC and client.
I ran a volume check
[code]
sh-3.2# cvfsck -wv 2K
Checked Build disabled - default.

BUILD INFO:

  1. !@$ Revision 4.2.2 Build 7443 (480.8) Branch Head
  2. !@$ Built for Darwin 12.0
  3. !@$ Created on Mon Jul 29 17:01:44 PDT 2013

Created directory /tmp/cvfsck3929a for temporary files.

Attempting to acquire arbitration block... successful.

Creating MetadataAndJournal allocation check file.
Creating Video allocation check file.
Creating Data allocation check file.

Recovering Journal Log.

Super Block information.
FS Created On : Wed Oct 2 23:59:20 2013
Inode Version : '2.7' - 4.0 big inodes + NamedStreams (0x207)
File System Status : Clean
Allocated Inodes : 4022272
Free Inodes : 16815
FL Blocks : 79
Next Inode Chunk : 0x51a67
Metadump Seqno : 0
Restore Journal Seqno : 0
Windows Security Indx Inode : 0x5
Windows Security Data Inode : 0x6
Quota Database Inode : 0x7
ID Database Inode : 0xa
Client Write Opens Inode : 0x8

Stripe Group MetadataAndJournal ( 0) 0x746ebf0 blocks.
Stripe Group Video ( 1) 0x746ffb60 blocks.
Stripe Group Data ( 2) 0xe45dfb60 blocks.

Inode block size is 1024

Building Inode Index Database 4022272 (100%).
4022272 inodes found out of 4022272 expected.

Verifying NT Security Descriptors
Found 13 NT Security Descriptors: all are good

Verifying Free List Extents.

Scanning inodes 4022272 (100%).

Sorting extent list for MetadataAndJournal pass 1/1
Updating bitmap for MetadataAndJournal extents 21815 ( 0%).
Sorting extent list for Video pass 1/1
Updating bitmap for Video extents 3724510 ( 91%).
Sorting extent list for Data pass 1/1
Updating bitmap for Data extents 4057329 (100%).

Checking for dead inodes 4022272 (100%).

Checking directories 11136 (100%).

Scanning for orphaned inodes 4022272 (100%).

Verifying link & subdir counts 4022272 (100%).

Checking free list. 4022272 (100%).
Checking pending free list.

Checking Arbitration Control Block.

Checking MetadataAndJournal allocation bit maps (100%).
Checking Video allocation bit maps (100%).
Checking Data allocation bit maps (100%).

File system '2K'. Blocks-5784860352 free-3674376793 Inodes-4022272 free-16815.

File System Check completed successfully.
/code
check not helping :(
[code]sh-3.2# cvadmin
Xsan Administrator

Enter command(s)
For command help, enter "help" or "?".

List FSS

File System Services (* indicates service is in control of FS):
1>*2K[0] located on big.local:64844 (pid 5217)

Select FSM "2K"

Created : Wed Oct 2 23:59:20 2013
Active Connections: 0
Fs Block Size : 16K
Msg Buffer Size : 4K
Disk Devices : 5
Stripe Groups : 3
Fs Blocks : 5784860352 (86.20 TB)
Fs Blocks Free : 3665561306 (54.62 TB) (63%)

Xsanadmin (2K) > show
Show stripe groups (File System "2K")

Stripe Group 0 [MetadataAndJournal] Status:Up,MetaData,Journal,Exclusive
Total Blocks:122088432 (1.82 TB) Reserved:0 (0.00 B) Free:121753961 (1.81 TB) (99%)
MultiPath Method:Rotate
Primary Stripe [MetadataAndJournal] Read:Enabled Write:Enabled

Stripe Group 1 [Video] Status:Up
Total Blocks:1953495904 (29.11 TB) Reserved:270720 (4.13 GB) Free:129179 (1.97 GB) (0%)
MultiPath Method:Rotate
Primary Stripe [Video] Read:Enabled Write:Enabled

Stripe Group 2 [Data] Status:Up
Total Blocks:3831364448 (57.09 TB) Reserved:270720 (4.13 GB) Free:3665432127 (54.62 TB) (95%)
MultiPath Method:Rotate
Primary Stripe [Data] Read:Enabled Write:Enabled

/code
I checked the availability of LUNs on MDC and client, there also all right.
[img]https://dl.dropboxusercontent.com/u/5920358/disk_util.png/img

But, unfortunately, the Volume is not mounted :(

[code]sh-3.2# xsanctl mount k2
mount command failed: Unable to mount volume `k2' (error code: 3)
/code

[b]Please help me to figure out this situation, I will be grateful for any information./b
Thanks

Pages

Subscribe to Xsanity RSS