10.7.3 Problems w/ ACLs

MattRK's picture

So i stupidly made the mistake of upgrading my relatively small xsan to 10.7.3. Previously it was sitting at 10.7.2 and seemed fairly stable. (MDCs were 10.7.2 but the clients were 10.5.8) We recently purchased a new J-class expansion array that we grew our volume onto. At that time, a decision was made to bring all the clients up to Lion and FCS3. We performed the grow and then upgraded all of the clients. Shortly afterwords, i realized the clients were all sitting at 10.7.3. (Software Update pulled down the latest greatest) With xsan best practices in mind I decided to go ahead and update my MDCs as well to keep them at the same level as the highest client. (Version wise)

[u][b]Big mistake. /b/u

The Xsan volume did not like this. After the upgrade, my volume stopped accepting ACLs & AD permissions. As of right now, ACLs are completely worthless and broken. If i put any kind of ACL on the volume, none of my clients logged in with their domain accounts are able to write files to it. They can read the volume. They just can't write to it. If they try to edit/delete/create files or folders they get an "Error code -43" message. If i remove ACLs (clicking the minus sign on all ACLs under "Set Permissions") and fall back to Posix permissions alone ("everyone" set to read/write), the clients work just fine. They can read/write/edit/delete/change files all day long. But with ACLs on the volume, its worthless.

All of my machines are bound to AD. They all seem perfectly fine. I can login with domain accounts and don't have any problems. The machines all have green lights next to the domain in the system preference pane. I can set permissions on local folders and everything seems great.

I've tried practically everything i can think of. I've removed clients from the san and re-added. I've unbound machines from AD and rebound. I've unbound both MDCs and rebound. I wiped one of the clients and did a fresh Lion install from scratch. I've added a single AD account (instead of our normal security group) to the ACLs section and still no luck. I still get that stupid -43 error. I've even turned ACLs in Volume Settings off and then back on. No luck. Today i tried rolling back one of the clients to 10.7.2 (leaving the MDCs @ 10.7.3) and still had the same issue.

Looking through the logs, i'm not seeing anything suspicious. Though i will say that i'm fairly new to Mac logs. The only thing i see that might be of any concern is a few annoying Spotlight errors. (Even though spotlight search is unchecked in volume settings)

At this point the only idea i have left is to rollback both my MDCs to 10.7.2. I'm planning to try that tomorrow evening to see if i have any luck. If anyone has any advice, i would greatly appreciate it. Lion still seems to be in beta and i seem to be one of only a few beta testers.

Thanks again for all your help and for this site's forum!

brianwells's picture

Sometimes an erroneous ACL entry has caused us similar problems, which I've been able to fix in the Terminal with this command as an administrator:[code]sudo chmod -R -N /Volumes/name/codeReplace [i]name/i with the name of your mounted Xsan volume. All ACL entries will be removed from the volume, including ones that the permissions dialog may not be displaying. Afterwards you can try setting permissions with ACLs.

MattRK's picture

brianwells wrote:
Sometimes an erroneous ACL entry has caused us similar problems, which I've been able to fix in the Terminal with this command as an administrator:[code]sudo chmod -R -N /Volumes/name/codeReplace [i]name/i with the name of your mounted Xsan volume. All ACL entries will be removed from the volume, including ones that the permissions dialog may not be displaying. Afterwards you can try setting permissions with ACLs./quote

I just gave that a shot and still no luck. After that command completed, i opened up xsan admin and added an ACL back on the volume. (To the root) Went to a client and was not able to create a directory or write a file. Still getting the error code -43.

keithkoby's picture

Old xsan rule: no ACLs on the root of the volume, just POSIX there. Start ACLs in your folders, [b]never on the root/b.

MattRK's picture

I just got off the phone with Apple's Xsan tech support. We got it fixed. There's a small bug that caused this so i'm going to document it here for future reference or for anyone else on 10.7.3 what is having the same problem.

The problem turned out to be the posix group that owned the volume. Our volume was owned by a unix group that only existed on the MDCs and not on the clients. (We had changed it to something other than wheel/admin a while back for various reasons) For some reason the clients were getting hung up and erroring because they had no idea what or who this group was. As soon as i set it back to admin/wheel everything started working again. (ACLs and all) The error code -43 went away.

The apple engineer said the posix group owner needs to be set to something that the clients AND MDCs can recognize. He said i could either go around and create this group on all the clients, set it back to admin/wheel or, even better, just set it to an Active Directory group which all the clients and MDCs recognized. (Which is what i did.)

So there you go. He did confirm that this is a bug with 10.7.3. He said by design the clients aren't really supposed to care about posix permissions if ACLs exist. He said if Xsan encounters a posix group and/or owner id that it doesn't recognize its not supposed to care. But for some reason 10.7.3 is caring and erroring with error code -43. He said a future release should fix this problem.

TRANSIT's picture

Anybody know if this was fixed on 10.7.4? We just upgraded from 10.7.2 to 10.7.4 and we have been running into write issues across the board. Clients can read without a problem but no longer write.

Basically, if I reassign ACL's and propagate them down and it works for a few hours, then the client can no longer write to the directory The only way to write to the drive is if they manually authenticate with an admin password in the finder (I assume this is enabling some form of root access).

Oddly, this doesn't happen to the same client if they are connecting over AFP with the same OD login, but it will happen over the fiber connection. Additionally, we have noticed that if the client goes into the Xsan pref pane in System prefs and then unmount and mount the drive, the ACLs will read correctly until they reboot.

Anyway around this? Is this a 10.7.3/10.7.4 issue?

Solidus's picture

Good piece of information.

Thanks for reporting back.

TRANSIT's picture

So having done a lot of tests, I think the reason this is happening only on fiber clients is because it is the same 10.7.3 POSIX issue (clearly this wasn't corrected on 10.7.4), and I don't believe POSIX permissions travel over AFP, just over fibre channel.

Right now, under my POSIX permissions, only root has read/write access. The other two groups (wheel and others) are read only. Should wheel be set to Read/Write as well?

lucasnap's picture

I also have a lot of ACL issues with a Lion Xsan setup. We are using 10.7.4 and ACL's don't really work. They do with ACL groups but not with users. But then again, sometimes they do work. I can't get my finger on it but it seems to get worse.

I thought I got it right: No ACL on the root and a group everybody knows: wheel (a local system group every Mac has).

Turns out, it should be a OD group (or AD.. I don't know. We use OD) as the POSIX group.

I've tested now with this setting for a few minutes and it seems to work well.

So this is what I think should work:

- No ACL on the root (of the Xsan volume)
- the root and the folders should have a OD group as the POSIX group