Can a corrupt/buggy Xsan filesystem cause extremely low read/write speeds on a SAN?

Andrew Allen's picture

For the sake of brevity, I've leave out the 3 months of background information trying to resolve the problems we've had.

We have a SAN that used to have 450-550 MG/s read/write speeds. It preformed this way for 4.5 years. Now it's writing and reading at much lower speeds, like 40-150 MG/s write speeds and 250-300 read speeds. YES, we've investigated all the likely things: SFPs, fibre cables, thoroughly investigated the fibre switch, checked the RAID controller, etc.

My question is simply this: Does anyone know if it's possible for a degraded/corrupted/buggy Xsan filesystem to cause such heavily reduced speeds? To our knowledge, we have never run a cvfsck on this Xsan volume and it's been in heavy use for almost 5 years. I'm going to do run this tomorrow. Could the Xsan filesystem be the cause of our degraded drive speeds?

Sirsloth's picture

Certainly over 5 years the filesystem will become fragmented and compounded by the fact that if the filesystem is at 70 to 85 per cent capacity you are losing speed on the drive spindles because data is being read/written to the inner part of the drive platters where performance naturally degrades.

Free up space, defragment the volume and/or do a Xsan Filesystem check. Also make sure write/read cache is still enabled on your storage controllers. Good luck.

bpolyak's picture

It sounds like you might also have a failing drive somewhere. Could you run a scrub on all disk groups? What sort of disks do you use? Desktop class SATA, "Enterprise"-class SATA, SAS?

Did you check that all of your RAID controllers use write back cache setting? It does reset to write through if something like BBM failure or PDU failure occurs, and sometimes does not reset itself back.

Issues like that could be caused by filesystem corruption. What speeds do you get with something like Blackmagic Disk Speed Test? If it's a fragmentation issue, the test might be unaffected.

Gerard's picture

I've had the same exact thing happen to my Xsan environment as well.

The volume is about five/six years old. More users added onto the volume, more content being created, more reads/writes, etc. All of this can leads to bad performance over time. This is what I did for our Xsan.

- Go through your volume and archive/delete things you do not needs. Xsan performance starts to dip at 75%. I migrated terabytes off the volume and got capacity down to 56%.

- Check your data LUNs. After so many years, some of the drives might be stale now with longer than expected ping times.

- Defrag the volume. In my case, we had more extent than files, so the whole volume was affected. Had to get Apple to build a custom, defrag script to resolve this. My issue with the built in defrag Xsan has is that it isn't smart enough, so it'll defrag everything. Even the good sectors. Plus, it won't optimize anything afterwards.

Hope this helps.

percisely's picture

+1 to the tip above to check your cache settings. Set incorrectly those can cause the volume to slow way down.