Integration Issue

messenger82's picture

We haven an xserve RAID we would like to take out of mothball retirement to provide an extra backup volume but we're stuck on this realitvely straight forward project.

We cannot see any of the Xraid LUNS in XSAN admin. Our systems are 2.2, 10.6.5, and Qlogic 5600 switches. "RAID Admin" gives me green lights across the board. All of the drives have been formated into three LUNs we wish to create, formatting has completed. The 5600's show that the target is connected, however, we can't see any of the LUNS in XSAN admin on either the lower or upper controller?

Any thoughts? We have nothing at all in the console. We mothballed it when we moved from 1.4 to 2.0, is the Xraid even compatible (I can't see why it wouldn't be) with >2.0.

Any help getting me unstuck is appreciated.

singlemalt's picture

Do the devices show up in Disk utility? If not then check for lun masking
in RAIDadmin. If they do and they are or are not formatted then I would
suspect they got formatted at some point in the past. Used to be you could just re-partition
and choose "free space" as the format, but since 10.5.6 (I think) disk utility
won't over write the partition map.
See http://support.apple.com/kb/TS1370 for the dd command to zero out the
partition map, then reboot.
Other than that I would suspect they're not in the right FC zone(s) on the switch.

messenger82's picture

No LUN visibility in disk utility (I should have put that in the initial post). Also, no LUN masking is setup. As a check I plugged in the storage as a DAS to one of my sandbox machines and the storage showed up.

The zoning is an interesting idea to look at. We use the default zoning with the 5600's which is memory serves puts everything into a default orphan set. The device does detect as a target and should be in the orphan set and has worked that way in the past. Do you think there is something else with zoning I'm missing?

brianwells's picture

I've always defined zones as described in this article and have never had issues with Fibre Channel devices not showing up:

[url=http://www.xsanity.com/article.php/20060312090411100]Bulletproof Zoning on Xsan/url

If you have zones with more than 64 devices, you will need to switch from hard zones to soft zones, although I have not yet had to do this:

[url=http://www.xsanity.com/article.php/20080816121818740]Necessary Zoning Changes with the QLogic 9000 Series FC Switch/url

singlemalt's picture

If both the target and the initiator are in the orphaned zone, then they should
see each other ( although if you had to put the initiator in a zone you usually
have to reboot before it sees anything). Since the unit was mothballed some time ago, maybe there's an old zone no longer in use on the switch that it was a part of. You hook it back and it got put back in the old zone again.

Are the RAID controllers set to Point to Point? Sometimes if they decide to go arbitrated loop because the other end is auto, that can cause them to not show up reliably.
Good that it does show up as DAS on the different system. Pretty much narrows it down to something in FC network and not stuffed controller(s).

singlemalt's picture

Something else I remembered (been a while since I worked with XserveRAID)
If you used RAIDadmin to create or destroy then recreate new arrays, the
computer you used RAIDadmin on wouldn't see the new devices until you rebooted.

Something else I remember now. One of the AppleCare guys (Tony something) showed my a trick when the controllers were exhibiting very odd
behavior where sometimes they were seen, sometimes not. IIRC the procedure was;
put the RAID in standby mode.
then pull the plugs on both power supplies.
pull the cache backup batteries (if you have them) out just enough to disengage.
Have two paper clips ready to reset the controllers, i.e straighten out one end.
plug in one power supply and as quickly as you can reset both controllers at the same time.
count to 10 then release the paper clips.
If the power supply is blinking VERY rapidly, pull the plug and do it again.
At this point both controllers have been hard reset to factory defaults so name and passwords have
been reset and if you assigned static IP addresses, they are back to DHCP.
Reconnect in RAIDadmin, and take the unit back out of standby power mode.
Once the luns do show back up in Disk Utility (and or Xsanadmin) then re-engage the back up batteries.

messenger82's picture

Singlemalt, thanks for all of the help on this.

I've run a hot reset on the switches and I do now get the LUNs, sort of. I still have problems with those LUNs though. XSAN admin is seeing those LUNs and allows me to label them. Disk Utility is not seeing the LUNs. My log is being useless and not reporting anything noteworthy. XSAN admin sees the LUNs as problematic (I have the yellow explanation point over the icon). XSAN admin also tells me that the LUNs are not visible to all metadata controllers (which makes sense since Disk Utility doesn't see them). So there's been progress but I'm still stuck in the mud, just further down the road.

To recap:
XSAN 2.2, OS 10.6.5, Qlogic 5600 switches
No zoning on our switches, default "orphan zone" only, confirmed no zoning issues
Switch sees the Fibre Channel connection to storage as a target and offers no errors
XSAN admin sees LUNs
XSAN admin allows me to label the LUNs
XSAN admin indicates problem with LUNs
XSAN admin notes that controllers cannot see the LUNs
Log file is useless
Disk Utility doesn't see the LUNs
I can confirm that the xraid sees good fiber links and no LUN masking

Any other suggestions?

nrausch's picture

I assume that all the other newer storage chassis and computers are running 4GB fiber cards to your 5600.

Is it possible that because the Xserve Raid is 2GB only, that the Ports on the switch need to be hard coded to 2GB and not auto. It sounds like something intermittent may be going on, and I wonder if speed auto-negotiation plays a part...

singlemalt's picture

In addition to what nrausch said...
hmmmm….. well some system(s) is seeing the luns.
Are you sure the RAID controllers (in RAIDAdmin) are
listed as Point to Point? It can be Automatic (PtP) or manually set as PtP just not
Arbitrated Loop (either manually or automatic). AL could cause only some or no systems to see the luns.

It would also probably be instructive to open disk utility on each MDC and client and
see who exactly does see the luns and who doesn't. Or ssh into them and run diskutil list.
Might be a good idea to bring the whole thing down then back up again as per
http://support.apple.com/kb/HT4027
Probably want to make sure that I/O stream guard is also disabled on the storage
and enabled on the initiators, and the port type is GL while your at it.
http://support.apple.com/kb/HT1084
Typically the ports should show as configured type GL, running Type F.

You mentioned resetting the switch, but did you also hard reset the XserveRAID controllers?
If your going to follow http://support.apple.com/kb/HT4027 to bring it all down then back
up then may as well hard reset them if you haven't already. Just remember that it'll go back to using DHCP so
you may need to direct connect to them and set the system your using for RAIDAdmin to using DHCP.

If I think of anything else I'll post it.

messenger82's picture

Almost there!

I/O stream guard was the issue; I must have overlooked it. One problem remains.

I am able to build volumes now and otherwise access the LUNs. However, one of the LUNs is indicating a weird issue I haven't seen before. Next to the LUN capacity there is red text with a strike out line through it? Again, all log's on both controllers and the Xraid indicate no issue. I've established a volume with the LUNs and I can execute R/W operations to it without issue and decent performance (given the equipment). Does anyone know exactly what the red strike out text indicates, better yet, does anyone know how to fix it?

[img]http://uncwtv.uncw.edu/images/2011-01-04.png/img

matx's picture

messenger82 wrote:
Does anyone know exactly what the red strike out text indicates, better yet, does anyone know how to fix it?/quote

Usually the red strike through text indicates lost storage due to mismatched LUNs in the same storage pool.

But in this case it appears to be saying that 3TB of space are being wasted by a 3TB LUN!

messenger82's picture

I thought that might be it but the strange thing is the strikeout appeared on the LUN even before I built the volume. What's more, the volume does correctly show the size as 6TB. I more or less just built the volume to see if it would let me figuring it would self destruct in the process but give me an error indicating the true problem. I'm hesitant to use the volume to store data, even backup data, until I figure it out.

singlemalt's picture

oops!! posted the following before I saw your latest, messenger82.
Oh well. I'm going to leave it up in case somebody else runs into it.

Glad to hear you tracked it down!

Run sudo cvlabel -c>mylabels.txt (or some other meaningful name).
Compare the sector count between the two luns. You'll probably find
that one is a few MB smaller than the other. Usually this is due to different vendors drives having slightly different sizes. In other words one lun is
actually 3.02 and one is 3.03.

Depending on how big the difference is and what you plan on using the volume for, it may or may not be worth it to put the luns in separate stripe groups.

thomasb's picture

messenger82 wrote:
However, one of the LUNs is indicating a weird issue I haven't seen before. Next to the LUN capacity there is red text with a strike out line through it? Again, all log's on both controllers and the Xraid indicate no issue. I've established a volume with the LUNs and I can execute R/W operations to it without issue and decent performance (given the equipment). Does anyone know exactly what the red strike out text indicates, better yet, does anyone know how to fix it?

[img]http://uncwtv.uncw.edu/images/2011-01-04.png/img/quote

Hi!

Did you manage to find the solution to this "red text with a stroke through it" issue? I see the same thing on two LUNs on an Xsan volume at work. Everything seems to work fine, and the volume size is correct.

xsanguy's picture

one LUN is (very) slightly smaller than the other.

thomasb's picture

Hmmm. Interesting. How can I check the size of each LUN in bytes?

xsanguy's picture

sudo cvlabel -l is a good way.

xsanguy's picture

Also one more thing, don't name your storage pools the exact same thing. Keep labels under 8 char, and put a 1 or 2 at the end.