Xsanity Sanity for Apple's Xsan and Final Cut Server.
  
Sunday, May 19 2013 @ 05:26 AM EDT
Topics
Storage (39)
People (1)
Xsan (103)
How To (26)
User Functions
Username:

Password:

Don't have an account yet? Sign up as a New User
Who's Online
Guest Users: 10
Sponsorship

Xsanity is proudly sponsored by:

Tekserve
The Old Reliable Mac Shop

New XSAN speed issue - help please
Goto page 1, 2  Next
 
Post new topic   Reply to topic    Xsanity Forums Forum Index -> Troubleshooting
View previous topic :: View next topic  
Author Message
Solidus
Xsan Master
Xsan Master


Joined: 28 Jan 2011
Posts: 70

PostPosted: Tue Jan 10, 2012 11:36 am    Post subject: New XSAN speed issue - help please Reply with quote

Hello,

I have just configured a new 90TB XSAN. With two MDC's and two clients resharing via AFP. Running 10.6.8 and XSAN 2.2.2

Extended atts are off, as is Spotlight.

Setup using four new XServes, Three QLogic 5802's - stacked, Three Active Storage RAIDS.

The SAN is fully operational but I am not getting the speed I think I should be seeing.

If I run a test on an XSAN client using time mkfile I see a 10GB file takes 3minutes 52seconds to be created. Through AFP the timing is oddly better at 2minutes 30 seconds.

I cannot see anything odd in the logs, the RAIDS look fine and nothing appears out of order.

DNS is working and providing lookup addresses on the main and meta data network including full reverse lookups. I am seeing ping times between clients of approx 0.400 ms give or take to either the meta data addresses or main addresses.

I have the SAN configured as the following:

RAID 1 - Metadata and Journal on RAID 1 with the two RAID 5 LUNS
RAID 2 - Two RAID 5 LUNS
RAID 3 - Two RAID 5 LUNS

I have these split into three pools, the two LUNS from the first RAID into pool 1 the second and third raid into their own pools.

I have the volume set to balance 16k allocation strategy using the XSAN config to create a standard file setup.

What sort of speed should I expect to get from the SAN? Am I seeing the correct speed?

I was expecting something circa 500MB/s not what I seem to be seeing of 4MB/s

What is a suitable method of testing for this? is time mkfile adequate?

Anybody who can help or point me in the right direction please. This has come to light as an application is unresponsive reading data as I believe its getting it too slowly.

Thank you in advance!
Back to top
View user's profile Send private message
Solidus
Xsan Master
Xsan Master


Joined: 28 Jan 2011
Posts: 70

PostPosted: Wed Jan 11, 2012 10:50 am    Post subject: Reply with quote

I have run a AJA Systems test and am getting results on the line of 150MB/s read and 60!!MB/s write..

Does anyone have any ideas on this at all please?
Back to top
View user's profile Send private message
abstractrude
Xsan Master
Xsan Master


Joined: 13 Mar 2008
Posts: 860

PostPosted: Wed Jan 11, 2012 3:57 pm    Post subject: Reply with quote

build 1 data pool with your 4 raid 5 luns
7 disc luns

build 1 metadata pool with discs raid 1
2 disc lun

block size is fine. set your stripe breadth to 64

Unfortunately you have a bunch of wasted storage here if you want to keep things symmetrical. What bandwith do you need to hit? your really missing a fourth raid here....
Back to top
View user's profile Send private message
HeinerLesaar
fully protected
fully protected


Joined: 10 Dec 2011
Posts: 11

PostPosted: Wed Jan 11, 2012 6:11 pm    Post subject: Reply with quote

Depending on what file type you want to store or use in general and how many machines are accessing your san the configuration could be right or wring.

As abstractrude suggested, building one storage pool with more of your luns is generally a good idea to get better performance values - but your values show me that there is something more wrong.

Even if keeping your storage pool symmetrical would be a good thing to do, it wouldnt hurt too much to build one pool with 6 instead of 4 or 8 data luns. But before you start thinking about this - do another test to check your performance: "dd if=/dev/zero of=/Volumes/*name-of-your-filesystem*/test.dd bs=512k count=2000"

This will create a file on your san and depending on how long it took, we can see how "worse" your problem really is. Keep in mind those values are far from being real-life values, especially if you think about working with special file types like DPX etc...there are loads of things to consider to get a good real-life benchmark value.

Further, you should think about FC related problems. Did you check the statistic counters in the ql5802's?
Lots of "other IO" or decode errors on the ports? I/O Streamguard settings? Fixed port speeds / ports configured as F-Type?

Hit me up if you need any further help....if you like, you can drop me a mail.
Back to top
View user's profile Send private message Visit poster's website
Solidus
Xsan Master
Xsan Master


Joined: 28 Jan 2011
Posts: 70

PostPosted: Mon Jan 16, 2012 11:33 am    Post subject: Reply with quote

abstractrude wrote:
build 1 data pool with your 4 raid 5 luns
7 disc luns

build 1 metadata pool with discs raid 1
2 disc lun

block size is fine. set your stripe breadth to 64

Unfortunately you have a bunch of wasted storage here if you want to keep things symmetrical. What bandwith do you need to hit? your really missing a fourth raid here....


Thank you for your reply it really is appreciated. Sorry for the delay in getting back to you. I currently have three pools. Two identical pools 3 x 8 Disk RAID 5 LUNs and one pool which has 2 x 6 Disk RAID 5 LUNs all with a stripe breadth of 32.

The SAN is currently populated so making changes is going to prove difficult. Can you change the stripe breadth or is it a destructive change?

How do you mean it will be wasted? Currently an affinity is set on the smaller pool to make this used for office documents and small files. The rest of the SAN is for images. Ranging from 40-100MB and retouched images up to a few GB in size.

Performance is slow when using Phase One, Capture one application, I really need to hit the same speed and performance you would find for an AFP server using DAS.
Back to top
View user's profile Send private message
Solidus
Xsan Master
Xsan Master


Joined: 28 Jan 2011
Posts: 70

PostPosted: Mon Jan 16, 2012 11:45 am    Post subject: Reply with quote

HeinerLesaar wrote:
Depending on what file type you want to store or use in general and how many machines are accessing your san the configuration could be right or wring.

As abstractrude suggested, building one storage pool with more of your luns is generally a good idea to get better performance values - but your values show me that there is something more wrong.

Even if keeping your storage pool symmetrical would be a good thing to do, it wouldnt hurt too much to build one pool with 6 instead of 4 or 8 data luns. But before you start thinking about this - do another test to check your performance: "dd if=/dev/zero of=/Volumes/*name-of-your-filesystem*/test.dd bs=512k count=2000"

This will create a file on your san and depending on how long it took, we can see how "worse" your problem really is. Keep in mind those values are far from being real-life values, especially if you think about working with special file types like DPX etc...there are loads of things to consider to get a good real-life benchmark value.

Further, you should think about FC related problems. Did you check the statistic counters in the ql5802's?
Lots of "other IO" or decode errors on the ports? I/O Streamguard settings? Fixed port speeds / ports configured as F-Type?

Hit me up if you need any further help....if you like, you can drop me a mail.


Hello,

Thank for the reply. I thought this would be a suitable configuration and ran it past quiet a few people before implementing it. The SAN currently has approx 30 machines on it, but only about 10 really moving data to it, the majority use small documents.

Would one pool with more LUNs increase speed? I always thought it was more pools increased speed?

This returned from the test Sad

2000+0 records in
2000+0 records out
1048576000 bytes transferred in 8.965599 secs (116955487 bytes/sec)

Which compared to DAS was approx. three times slower.

As I said in my earlier post, people are seeing issues when scanning the file system using Capture One. They open directories containing on average anything from 20-150 images and see really slow performance in image refresh and usability. Something they did not see when using DAS storage on XServes.

Capture One stores a small file in each directory with changes made to images, this needs to be loaded along with the actual images. I don't know if that helps you understand the issue at all.

The SAN is in full use now, however I am resigned to the fact that I may need to destroy the volume should the settings I have chosen be the wrong ones
Embarassed those this will obviously prove to be a nightmare to restore 50TB.

I am checking the fibre logs of the three switches now. I have set them up according to this document http://www.xsanity.com/article.php?story=20060312090411100&query=zoning

I have three switches all linked by 10Gb connections, all nodes have one connection on switch A and one connection on switch B for redundancy, while switch C will be used for additional clients.

As a note it is worth mentioning that a second volume is planned as a video server for Final Cut Pro.

Thank you, any insight or help you can offer is appreciated.
Back to top
View user's profile Send private message
HeinerLesaar
fully protected
fully protected


Joined: 10 Dec 2011
Posts: 11

PostPosted: Mon Jan 16, 2012 12:03 pm    Post subject: Reply with quote

More LUNs in a single pool generally means higher max. performance. But depending on the way your clients access the data (amount of parallel read vs. write processes etc), it could be more clever to split your FS into more than one Storage Pool. Things like fragmentation also count into those considerations, its a very complex topic.

Fact is that each read/write process from a single client always only gets the max. performance from one storage pool (because each request gets steered to only one pool per client - default in a round robin way).

To make it short:

Stripe breadth can NOT be changed without destroying the volume.


Did you make sure that the metadata network is up and running fine? Fixed speeds / Auto Negotiation?

When browsing a single directory and listing the content takes very long, it usually comes down to slow metadata operation (which can be disk and/or network related)
Back to top
View user's profile Send private message Visit poster's website
Solidus
Xsan Master
Xsan Master


Joined: 28 Jan 2011
Posts: 70

PostPosted: Mon Jan 16, 2012 12:57 pm    Post subject: Reply with quote

I am seeing less than 100 decode errors on the fibre switches for the servers, the targets show 4,5 and a few in the 400 and two in the 1000 market.

I have the fibre switch ports set to GL and they have assigned F to the servers and FL to the RAIDS. I have the speed set to auto and it has assigned 4G. isn't FL the correct way?
Back to top
View user's profile Send private message
HeinerLesaar
fully protected
fully protected


Joined: 10 Dec 2011
Posts: 11

PostPosted: Mon Jan 16, 2012 1:02 pm    Post subject: Reply with quote

Depending on the raid, its correct. As I said, check the metadata performance aswell. In general, the dd test should have shown better results anyway, but let's have a look at this as well.

Oh, and check your PM.
Back to top
View user's profile Send private message Visit poster's website
Solidus
Xsan Master
Xsan Master


Joined: 28 Jan 2011
Posts: 70

PostPosted: Mon Jan 16, 2012 1:09 pm    Post subject: Reply with quote

I see, I currently have it set to balance so it should spread between the two different pools it can use.

The metadata network all looks okay, 0.4 ms ping and fully resolving DNS for everything on the network.

It is sitting on its own gigabit switch, IPv6 off and set to auto which has selected 1000 Base. Out of completeness the filesystem access on both of the AFP heads have 6 port Ethernet cards in them with LAG but I don't see how that will slow down the BMDC access and I tried one Xserve without them.

The filesystem browses quickly through the Finder in my opinion. I don't think it is slow browsing. It doesn't really strike me as slow when using AFP for transfers.

It is only the raw test and Capture One's slowness (users complaining) that highlighted an issue.

Any other tests I can perform on the metadata network? Though I don't feel that is the issue. DNS is also very strong and responsive.

Bad news about the need to destroy the volume to alter the stripe breadth.

I am right in thinking I should be able to achieve a much better speed on two pools with two LUNs in each than the pathetic speeds I was getting before?
Back to top
View user's profile Send private message
HeinerLesaar
fully protected
fully protected


Joined: 10 Dec 2011
Posts: 11

PostPosted: Mon Jan 16, 2012 1:22 pm    Post subject: Reply with quote

Balance only impacts the way data is assigned to the Storage Pools.
It does not impact the way data is assigned to each LUN inside of a storage pool. My guess is that you have to consider a compelte rebuild of
the volume.

Maybe have a look at the cache settings on the raids, but it shouldn't affect the performance that much. Did you try the dd from different machines? All the approx. same speed?
Back to top
View user's profile Send private message Visit poster's website
Solidus
Xsan Master
Xsan Master


Joined: 28 Jan 2011
Posts: 70

PostPosted: Mon Jan 16, 2012 2:18 pm    Post subject: Reply with quote

Yes they all came back at 8.6 seconds. What would you expect to see from a healthy SAN?

I can't see why this SAN has problems, I setup the box to to use the general file storage system.

If I do have to rebuild the volume what would I change to improve the speed?

I can't really see yet where my volume config is so wrong as to cause this?
Back to top
View user's profile Send private message
Solidus
Xsan Master
Xsan Master


Joined: 28 Jan 2011
Posts: 70

PostPosted: Mon Jan 16, 2012 6:19 pm    Post subject: Reply with quote

I am still working through this in an attempt to figure this out as it is becoming more and more problematic for end users.

I am looking at the RAID firmware to see if there are any issues with that on earlier boxes (1 x16TB Active and 2 x 48TB Active)

I cannot see any problems with the fibre network and am on a stable release for the switches.

Can anyone tell me if they think something is wrong with how I have set this SAN up? Does anyone have a test environment where they can replicate the pools and config to test for speed?
Back to top
View user's profile Send private message
HeinerLesaar
fully protected
fully protected


Joined: 10 Dec 2011
Posts: 11

PostPosted: Tue Jan 17, 2012 2:42 am    Post subject: Reply with quote

From 3 of those boxes, you should expect something like around 1200-1500MB/sec in total. Not 115MB.

I guess you have to kill the volume and start again in little steps. I agree that there isn't anything completely wrong with your config, maybe not the best way to do it...but not as bad as your performance figures.

If you are able to kill the volume:
- use one of the Raid5's as DAS and measure the performance
- build a Xsan volume with metadata and journaling on the same LUNs and only two Raid5 LUNs coming from one box. Measure again...
- etc...

My idea is to get a feeling for the raw system performance of your components and to see at which points you may get the slow figures..
Back to top
View user's profile Send private message Visit poster's website
Solidus
Xsan Master
Xsan Master


Joined: 28 Jan 2011
Posts: 70

PostPosted: Tue Jan 17, 2012 6:23 am    Post subject: Reply with quote

Right so I am well out of the park on the speed figures then!

I have to look at killing the volume as a last last ditch resort, at least until I can guarantee as best as possible that I know what to change to fix the issue.

I see where you are coming from and agree.

I have two 32TB Active boxes and a late Apple Xserve RAID I can setup as a test volume, using the same MDC and BMDC. So I am thinking of setting them up in the same manor using the same settings and see what I get from that volume.

I can then destroy the volume and alter the stripe breadth to 64 and see if that makes any difference. Granted it is not an identical setup but it is very close.

It might shed some light if the volume I create with the same setup is fast. Equally it will allow me to mess with the config in a safer manor.

If you had the following hardware available (3 Actives - 16TB and 2 x 48TB) what configuration would you choose to achieve speed above the normal DAS via a reshare?

Its a long shot but the stacked fibre switches isn't going to slow performance this much is it.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    Xsanity Forums Forum Index -> Troubleshooting All times are GMT - 5 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group
Best Viewed on a Mac | Suggested Browser: Whatever floats yer boat.