User Functions
Don't have an account yet? Sign up as a New User
Who's Online
Guest Users: 13
|
| View previous topic :: View next topic |
| Author |
Message |
fritz. Xsan Master

Joined: 03 Oct 2007 Posts: 177
|
Posted: Sun Apr 11, 2010 3:22 pm Post subject: Performance issues on Xsan 2 following an upgrade |
|
|
Hey guys -
I was hoping to get a few opinions on the behavior of a SAN volume. The facility has a 35TB Xsan volume in place with 3 editors working with several streams of DVCPRO HD 1080i60 apiece. This Xsan volume started out around 12TB and was orginally composed of 2 Xserve G5s, two Qlogic5200s, and several Xserve RAID chassis, running Xsan 1.4.2. It has grown over the years to where it had a Promise RAID array added to it, expanding it out to around 20TB, still on Xsan 1.4 with the editors reporting no performance issues. A few months ago, they upgraded their edit systems with new Mac Pros, and added another Promise RAID, and upgraded the SAN volume to Xsan 2.1. In place still are the Xserve G5s and the Qlogic 5200 switches, along with the existing storage. As well, to save money, the old PCIe 2Gb Fibre Channel cards from the G5s were reused in the new Mac Pros. Since the MDCs are Xserve G5s, the OS being used is 10.5.8 for all the systems.
After the upgrade to Xsan 2, they started to see evidence of decreased performance, such as audio drift and dropped frames during tape layoff. Restarting their systems would correct the performance issues for a few hours, but would reappear the longer they continued to edit.
A quick run of the AJA speedtest reports back performance of about 65-71MB/sec read/write on the volume now. I'm not sure what it ran at prior to the upgrades but I can tell you that it was definitely not operating at that low of a speed.
While there are plenty of places to start looking, I am wondering if the fact that the MDCs only have 2GB of RAM in them might be the culprit, owing to the slightly increased system requirements for Xsan 2. If I am to understand the system requirements (2GB plus 2GB for each SAN volume), they are missing about HALF of what is needed for the proper operation of Xsan 2. |
|
| Back to top |
|
 |
MattG Xsan Master

Joined: 15 Apr 2005 Posts: 456
|
Posted: Sun Apr 11, 2010 8:16 pm Post subject: |
|
|
My suggestion would be to look at those old Fibre Cards, and especially slot placement within the Mac Pro. Even if you could borrow some later model Apple/LSI 4Gb cards, that would be you start. Those 2Gb cards are losing their fans at an alarming rate right now, which overheats the ASIC and leads to the issues that you're having.
Look at your Qlogic switch port stats where those 2Gb cards are current coming into the switch. Look for any kind of errors. Reset baseline to make sure those are from the currently connected machines. |
|
| Back to top |
|
 |
lucasnap Xsan Master

Joined: 05 Oct 2006 Posts: 107
|
Posted: Tue Apr 13, 2010 7:03 am Post subject: |
|
|
Some extra memory wouldn't hurt...
I was wondering how much space is used on the volume, and if it was that high before? |
|
| Back to top |
|
 |
fritz. Xsan Master

Joined: 03 Oct 2007 Posts: 177
|
Posted: Fri Apr 16, 2010 12:16 pm Post subject: |
|
|
| Thanks for the suggestions. Short of physical inspection, is there any app that would tip off that there is a fan failure on the card? |
|
| Back to top |
|
 |
MattG Xsan Master

Joined: 15 Apr 2005 Posts: 456
|
Posted: Sat Apr 17, 2010 8:49 am Post subject: |
|
|
| No. Because there's no temperature sensor on the card. You can hear for failing fans because they sometimes make a lot of noise, and then, there's always the smell of frying electronics. So now you have three senses of troubleshooting tips. |
|
| Back to top |
|
 |
fritz. Xsan Master

Joined: 03 Oct 2007 Posts: 177
|
Posted: Tue Apr 20, 2010 1:42 pm Post subject: |
|
|
I finally had a chance to spend some quality time with this volume, late Friday night after all the editors went home. There's a rather lengthy list of things that need to be straightened out, mostly thanks to years of different integrators and/or gnomes.
1. Nothing about the fibre channel cards seemed to be burning, hissing, whizzing, or otherwise indicating that they needed to be taken out and put down. Likewise, the switch reported no errors from any of them. I'm still going to go forward with borrowing a newer 4Gb card and seeing how it behaves.
2. All three systems follow a recommended slot installation, which is the fibre channel card in slot 4 and the video capture card in slot 3. There's also a Sonnet FW 400 card in slot 2 for these systems since they occasionally bring in FW decks for field captures, and these Mac Pros don't take kindly to FW 400 to 800 conversion cables. Although... I didn't test the systems with just the Fibre Channel card in place... hmmm.
3. Some prankster messed with the ethernet wiring so that the metadata switch is getting Internet traffic on it!! I've asked their IT department to come clean about the wiring to see what is being routed into it and why that is being done, as there's no excuse for that.
4. The allocation strategy on the SAN volume is currently set to "Balance" rather than "Round Robin". I suspect this was done because the (wait for it) Xserve RAIDs that make up the system are almost at capacity, with the Promise Drives relatively unused. It's high time that the volume goes through a bit of housecleaning at this point.
5. The Promise drives both could be upgraded to the latest firmware. The newest one is one release behind from the latest version, and the other one is at the firmware version that fixed it so you didn't need to have the ethernet plugged in to playback DV in real time.
6. I noticed that the LUNs on the Promise storage are configured slightly differently from how the script from Apple sets it up. Normally, with a 2 LUN data configuration, you have a "left" and "right" LUN that each controller primarily deals with (e.g. one lun is comprised of drives 1,2,5,6,9), and then a scratch LUN and 2 spare drives. Both of these Promise systems appear to have been configured manually, with one controller taking the first 7 drives and the second controller working with drives 8-14. The last two, 15 and 16, are the spares. Odd but not really what I think is causing this issue, since they had one of the Promises in place for about a year before they upgraded to Leopard and Xsan 2 and started seeing performance decrease.
When I cvfsck the volume, it comes back as clean. I'd say at this point that I'd look into having them go through and simply cleaning house on the volume, as there is no doubt a tremendous amount of bloat on that thing. I'll keep you guys updated on any other fun things i uncover. |
|
| Back to top |
|
 |
snfsguy RAID 5

Joined: 02 Mar 2010 Posts: 17
|
Posted: Sun Apr 25, 2010 3:07 pm Post subject: |
|
|
If you're chasing performance issues, a snfsdefrag -er over the file system can be informative. Save it to a file on the local file system and then you can grep for files that have a lot of fragments (say 5000 or more). If you have a lot of those, it's time to look at defrag, and preventing fragmenation going forward.
A cvfsck -f to print out your free space chunks will also be informative. If your freespace is fragmented, your files are going to be fragmented. Unfortunately, there isn't anything that will defragment free space, aside from moving all the files off of a stripe group (and then back on).
There are lots of other places to look at for performance problems, but let's find out what we're working with on the file system before we go looking everywhere (since "everywhere" is a big place to look). |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|
|