Xsanity Sanity for Apple's Xsan and Final Cut Server.
  
Wednesday, September 08 2010 @ 09:27 PM EDT
Topics
Storage (23)
Xsan (72)
How To (25)
User Functions
Username:

Password:

Don't have an account yet? Sign up as a New User
Who's Online
Guest Users: 16
Sponsorship

Xsanity is proudly sponsored by:

Tekserve
The Old Reliable Mac Shop

Random Disk I/O Errors

 
Post new topic   Reply to topic    Xsanity Forums Forum Index -> Troubleshooting
View previous topic :: View next topic  
Author Message
Skaffen
Knows DNS is the answer
Knows DNS is the answer


Joined: 03 Nov 2006
Posts: 31

PostPosted: Sat Feb 06, 2010 5:26 am    Post subject: Random Disk I/O Errors Reply with quote

Hi all,

Has anyone seen any problems with 10.6.2 and Xsan and seemingly random disk I/O errors? The setup is with a Promise Vtrack E-Class/J-Class setup and a Qlogic 5602 switch. Neither the switch nor the Promise show any errors in their event logs, but every now and then the Xserve just comes up with a "disk0 i/o error", with the actual disk in question rotating between any of the Promise LUNs.

There's no other errors in the Console at the time of these disk i/o errors, they just appear on their own, and obviously with no errors on the switch/RAID it doesn't look like the fabric itself is dropping at any point.

Thanks.

EDIT: Just realised, the only thing I changed this week that may have set these off was turning device scan off on the initiator ports. I'd left that on on all ports, but checking Apple's recommendations I turned it off. After further checking I think that I was probably right the first time with having that on, although I can't believe that having it off could actually cause this kind of issue?

The other thing is, is it worth doing a hard reset on the switch in case that it's got an issue? Clutching at straws a bit here, but all the usual suspects check out fine.
Back to top
View user's profile Send private message
Skaffen
Knows DNS is the answer
Knows DNS is the answer


Joined: 03 Nov 2006
Posts: 31

PostPosted: Sun Feb 07, 2010 5:17 pm    Post subject: Reply with quote

Well I turned Device Scan back on for the Initiator (Xserve's) ports and I've not had an I/O error for two days now. Not saying that's fixed it, should be able to confirm that tomorrow, but it's looking like an odd coincidence.

How on earth would that cause a problem? I/O Streamguard is manually set, so it's not like that is set to auto and causing problems with Device Scan turned off.
Back to top
View user's profile Send private message
Skaffen
Knows DNS is the answer
Knows DNS is the answer


Joined: 03 Nov 2006
Posts: 31

PostPosted: Mon Feb 08, 2010 10:46 am    Post subject: Reply with quote

Sadly, they've come back, I'm getting these again:

Code:
I/O path in error [hostid: 1 lun: 1 path </dev/rdisk6>]
quiesced for 300 seconds
disk6: I/O error.


I've not come across anyone reporting similar issues, anyone here have any ideas? I'm starting to think it must be a problem with the switch as both the RAID and the Xserve were configured as a DAS before without any problems. Feel like I'm clutching at straws a little though!
Back to top
View user's profile Send private message
lotte
Xsan Master
Xsan Master


Joined: 11 Dec 2008
Posts: 168

PostPosted: Mon Feb 08, 2010 3:10 pm    Post subject: Reply with quote

You may want to check for FC cabling and /or the Gbics. Also have a look at the errors of the Lunīs Switch Port in the stats tab.

Does this appear on several Clients or just one?

Lotte
Back to top
View user's profile Send private message
Skaffen
Knows DNS is the answer
Knows DNS is the answer


Joined: 03 Nov 2006
Posts: 31

PostPosted: Mon Feb 08, 2010 4:03 pm    Post subject: Reply with quote

The cabling has been changed recently from Apple SFP copper cabling to Finisar transceivers, and the errors have remained the same, so I'm pretty sure it's not that.

There are a few errors listed in the port stats, none for AL Init Error or Flow Errors, but there are a few Decode Errors listed. Is there a level of acceptable errors for Decode Errors or should there basically be none listed?
Back to top
View user's profile Send private message
Skaffen
Knows DNS is the answer
Knows DNS is the answer


Joined: 03 Nov 2006
Posts: 31

PostPosted: Wed Feb 10, 2010 1:29 pm    Post subject: Reply with quote

Looks like it could be one of the controllers in the Vtrack failing silently. Noticed after the last round of problems that all the LUNs affected had their affinity set to Controller 2. Changed the affinities to point to Controller 1 for all LUNs and I've not had it crop up since. A little bit annoying that there's absolutely nothing in the logs on the Promise, but fingers crossed this has actually isolated the issue.
Back to top
View user's profile Send private message
lotte
Xsan Master
Xsan Master


Joined: 11 Dec 2008
Posts: 168

PostPosted: Thu Feb 11, 2010 5:12 pm    Post subject: Reply with quote

The latest firmware should have a fix for that, also make sure, both controllers are connected with Ethernet.


Lotte
Back to top
View user's profile Send private message
Skaffen
Knows DNS is the answer
Knows DNS is the answer


Joined: 03 Nov 2006
Posts: 31

PostPosted: Thu Feb 11, 2010 5:26 pm    Post subject: Reply with quote

Hi Lotte,

Both controllers are running the latest firmware (10.06.2270.00).

I'm not sure if the second controller has ethernet connected but the config hasn't changed for 4 months or so and this problem has only just cropped up over the past two weeks really. What would cause I/O problems if the management port on that controller was just unplugged?

Thanks.
Back to top
View user's profile Send private message
vicpache
partially protected
partially protected


Joined: 22 Oct 2008
Posts: 8

PostPosted: Fri Feb 12, 2010 12:36 am    Post subject: Random Disk I/O Errors Reply with quote

Skaffen,

The Vtrak E-Class WebPam Pro-E allows the user to save the Subsystem report.
This report contains a wealth of information that will help Promise Support staff understand the problem.

From the report we can check Fiber Channel front end ports statistics, internal components status and error counters for just about every component.
It also allow us to correlate which initiators are connected to what target port and correlate the LUN reporting the IO error to the LUN on the storage (very useful for large SAN deployments).
It offers way much more....

Please email me the following:

- http://kb.promise.com/KnowledgebaseArticle10095.aspx?Keywords=save+subsystem
- Primary and secondary MDC system profilers

See my contact information below. I am confident will get this straightened out.
--
Best Regards,


Victor Pacheco
Manager, Field Application Engineering & Support
Promise Technology
580 Cottonwood Drive
Milpitas, Ca 95035
Office (408) 228-1441
Mobil (408) 202-6808
Technical Support - (408) 228-1400
http://www.promise.com
Back to top
View user's profile Send private message
Skaffen
Knows DNS is the answer
Knows DNS is the answer


Joined: 03 Nov 2006
Posts: 31

PostPosted: Fri Feb 12, 2010 2:21 pm    Post subject: Reply with quote

Thanks Victor, have sent you a PM. Not sure if you can help directly as we're UK based. If you PM me back with an email address I can send any details over though if you think you can help.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    Xsanity Forums Forum Index -> Troubleshooting All times are GMT - 5 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group
Best Viewed on a Mac | Suggested Browser: Whatever floats yer boat.