Render Glitches

mimsey's picture

I was just wondering if anyone has heard of any strange render issues associated with Xsan 1.3
I'm having these problems only when rendering to my San when I render local it doesn't happen.

here are images of my glitch - [url]http://www.mimsey.com/renderglitch.html/url

The render problem is intermittent so sometimes less sometimes more, always the same split screen with grey boxes.
This reproduces the same on SD and HD, the pix are DVCpro HD.

Please let me know if anyone has seen similar.
Thanks
aj

ron's picture

Look at the performace settings in Raid Admin. Do you have "allow host cache flushing" unchecked?

Just a guess.

mimsey's picture

all my cacheing is enabled. DOH! Controller, Host and Drive. I do have UPS connected but my powers always wacked out in my office.

It's been this way since 1.1 and I hadn't experianced these problems, but then again we're talking about Xsan here.

Here's the million dollar question. If I modify those settings, do I lose the raid? or is it just a preference?

Let me know if you have any insight.

Thanks Again
aj

mimsey's picture

So I've talked to some people and they reassured me all would be okay if I uncheck host flush. You never know what small detail will make the xsan turn from good to evil. I've unchecked and now I just need to test it out.

I'll let you know what I find.

aj

aaron's picture

A user on Apple's Xsan-Users list [url=http://lists.apple.com/archives/xsan-users/2006/May/msg00043.html]notes the same rendering problem/url, so I think we can stop calling it a "glitch." :?

Aaron Freimark
CTO, Tekserve

mimsey's picture

yeah, I switched off host cache flush and no luck. So, anyone have any problems downgrading to 1.2? or has anyone done that yet? or, as the song so nicely states, "Is there anybody out there?"

I'll try to grin and bear it another day and then I'll try to downgrade.

aj

aaron's picture

Let us know how it goes. I have not yet upgraded our customers, and am waiting for a response from Apple before we do. Thanks for keeping up the postings...

Aaron Freimark
CTO, Tekserve

mimsey's picture

I figure we're all in it together and the Apple boards are sure not helpful. I have to sort through 1000 posts that say, "I've got my Intel Macbook running as MDC and 3 avid clients but I can't get Xsan to work... what am I doing wrong?"
So yeah, I'll keep the thread updated. I was thinking about rolling back my clients but leaving my MDC's on 1.3. I've read about people that have done it but on the Apple boards. I'm just worried cause I've got four 2TB LUNS and I'm not sure if the 1.2's will play nicely.

Anyone running this config?
just wondering.
aj

boblenon's picture

I too appear to be suffering from this problem. Upgraded last week to 10.4.6 and 1.3. Everyone is fine - except 2 FCP projects that generate grey boxes on renders. It appears as though they are always in the same spot on the exact same frame - the last frame of a transistion between two images.

Am going to downgrade one workstation to 1.2 to see if that resolves the issues today. mimsey have you had any luck with this?

mimsey's picture

to be honest I've been too scared to try it. I've got three jobs running right now and things are working to an extent. I'm worried that rocking the boat may induce a cataclysmic force that tears through the office like the giant "Lost" magnet.
I'd love to hear if you have any luck. I've got about 2 more days in the trench with these jobs and then I can experiment.

aj

boblenon's picture

well ... it seems no one here has the time to try their project out on the 1.2 machine - did get it downgraded with no problem though.

And apperantly over the weekend more projects started having issues with this.

Also, forgot to mention that when I first upgraded to 1.3 i also had those cache settings on my raids - which i changed to no avail as well :)

MattG's picture

I can confirm that I have reconfigured at least 4 systems now back to 1.2, and in all cases, multiple issues surrounding writes have disappeared on the downgrade. I've seen panicing clients during ingest and I have too seen the grey box issue upon playback of rendered files.

I mention the phrase "back to" pointedly, since all of these examples were volumes originally created in 1.2 or earlier. 1.3 involves many new additions to the .cfg file, specifically having to do with UID mappings, so the following recipe only works for downgrading systems whose volumes were originally created in 1.2 or earlier:

In a dual MDC situation, one can uninstall 1.3 and reinstall 1.2 on the secondary, soft-fail the volume over to the secondary running 1.2, and then do the same to the primary. During this operation, all clients should be down. Downgrading the clients to 1.2 can be done in the same manner. You should of course back up the entire /L/F/X/config folder of the primary MDC before attempting this, and at all costs, a backup of that precious .cfg file should be somewhere off the system.

As far as 1.3 volumes that were created "fresh" in 1.3, our technique so far has been to archive the data off the SAN, recreate the volume in 1.2, and copy the data back over.

boblenon's picture

MattG - the downgrade has only been sucsessful with downgrade of both clients and MDCs?

One of my users has just started seeing the same problem with AfterEffects too.

MattG's picture

IMO, all nodes should downgrade to 1.2 for this issue.

billm's picture

We are seeing corruption when saving Photoshop files to XSAN.

XSAN 1.3, OSX10.4.6

MattG's picture

The Photoshop issue mentioned above I believe is directly addressed in the OS 10.4.7 update, so update all nodes to 10.4.7 to see if the issue is resolved.

We are slowly and carefully upgrading our Xsan systems to 10.4.7, and Xsan 1.3 along with it, and are seeing promising results with this new cocktail. Gone are the compressed write kernel panics and it seems the grey box render issue is gone as well.

I'd like to hear from other admins who are at 10.4.7 / 1.3 to corroborate this. Make sure to mention what kinds of data streams your systems are handling.

sundraghan's picture

I wouldn't recommend updating to 1.3 if you have a block size set higher than 16 on any of your Volumes. I documented my issues on Apple's discussion boards, and recommend reading a related thread that shows the issue:

http://discussions.apple.com/thread.jspa?threadID=550118&tstart=0

http://discussions.apple.com/thread.jspa?threadID=508995&tstart=15

I'd be curious if other people who can confirm that their volume has a higher block size have also seen the same issue.

Brian Conner's picture

Matt,

sundraghan has posted some interesting information, and several others on those linked threads seem to be having success with a lower Block Allocation Size than what Apple recommends.

Also, in the first thread he linked to, the OP said their problems began when they updated to v10.4.7 on all clients.

I wonder if the order of updating is a factor. IOW, if you update to v10.4.7 on all clients first, then update to Xsan 1.3, the problems go away, but if you go to 1.3 first, then you still have the problems. Or not.

I have a new install that begins next Monday, so I'm keen to find a consensus on Block Size/Stripe Breadth. At the moment, I'm leaning hard toward 16/64.

Brian Conner

MattG's picture

Here I was thinking we were out of the woods...

I'd like to know what _kinds_ of renders are causing the lockups. My hunch is that it is rendering of compressed data streams such as DV25.

The fact is that many upgrades to 10.4.7 / 1.3 on systems that work only with uncompressed video are just plain fine.

But I do have a client in Colorado that is complaining of client freezes when rendering DV25 material to a volume that has 64K Blocks with 16 stripe.

Brian Conner's picture

This might be a question you can't answer, Matt, but is there any reason to not set up a new Final Cut oriented Xsan with 16KB blocks and Stripe Breadth of 64, even though it's not the Apple recommended setting for video?

I realize that after these current problems get sorted, the client will want to change back to 64/16, after backing up all the data on the volumes so they can copy it all back after the change.

sundraghan's picture

Just in response to MattG's question of what kind of redners caused the issues, we were seeing it across the board.

Any type of write operation where the app was writing data at a slower speed for a file larger than 1MB would cause the lock up. Thus saves from Photoshop, any render from FCP etc. were triggers for the issue.

Interestingly any operation which wrote data in large chunks and at a steady rate never caused the system lock ups. That includes capture for any codec and duplicating large chunks of data through Finder.

MattG's picture

Well, after some polling of my own, the write issues seems to come specifically from volumes that were:

(a) newly created in 1.3
(b) block/stripe set to 64/16
(c) LUNs larger than 2TB.

Preexisting 64/16 volumes with systems that were upgraded to 1.3 don't seem to have the issue, and the difference there of course is that all LUNs are less than 2TB.

All I can say is that engineering has acknowledged this and hopefully we'll see a fix for it soon.

Kudos to sundraghan for creating that webliography of all related posts.

sundraghan's picture

Thanks Matt! Our SAN volume fell under conditions A, B, & C, so that explains it! It also helps explain why more people weren't seeing it.

Do you know if a volume created in Xsan 1.2 that is then modified in 1.3 to add storage pools and LUNs larger than 2TB will also exhibit the problem? Or does the volume have to be created in 1.3 from the start for the issue to present?