| View previous topic :: View next topic |
| Author |
Message |
abstractrude Xsan Master

Joined: 13 Mar 2008 Posts: 881
|
Posted: Tue Oct 05, 2010 12:29 pm Post subject: |
|
|
i totally believe him. are you using native extended attributes? i have about 70 hollywood movies at prores 422 hq in our SAN. they are usually about 125-175 gigs. Haven't been able to reproduce those issues...what are your metadata latency times like. can you give us a cvsummary.
Typical large prores file found on the big SAN.
Macgruber.mov: 8 extent(s)
137G Aug 6 15:33 Macgruber.mov
Have you thought about doing a perl script or even bash that does nightly extent scrubbing?? so doesnt solve the problem but it would be a great workaround. |
|
| Back to top |
|
 |
lucasnap Xsan Master

Joined: 05 Oct 2006 Posts: 107
|
Posted: Sat Oct 09, 2010 4:55 am Post subject: |
|
|
Latency is between 250-280 us. So that's OK.
Extended Attributes are turned off, you never know when they come with the question if they can hook up a Linux machine for this and that.
What do you mean by cvsummary? Do you mean a cvgather?
Does anyone have any news on the Apple bugreport?
Our extent counts are up to 6000, but even more files are holy. This is because of the way they work. They record a show with 3 HD camera's simultaneously ingesting on the Xsan Volume.
Our workaround is a big bummer: The ingest Mac's record on an extra internal disk and after that the file is copied (one by one) to the Xsan Volume.
I've played with snfsdefrag, but because of the hanging and instability of machines when only touching the corrupt file i'm not so keen on scripting on this subject. I rather see an update. |
|
| Back to top |
|
 |
abstractrude Xsan Master

Joined: 13 Mar 2008 Posts: 881
|
Posted: Mon Oct 11, 2010 11:23 am Post subject: |
|
|
| cvsummary is in the volume log. Also, how much free space do you usually have on your Xsan? cvadmin show for us? im trying to figure out why are you getting such high extent counts on your files. |
|
| Back to top |
|
 |
lucasnap Xsan Master

Joined: 05 Oct 2006 Posts: 107
|
Posted: Sat Oct 16, 2010 8:32 am Post subject: |
|
|
These are the summary's:
| Code: | [1016 14:00:00.485958] 0x108573000 (Debug) FSM memory SUMMARY virtual size 2683MB resident size 122MB.
[1016 14:00:00.485978] 0x108573000 (Debug) FSM I/O SUMMARY writes total/9075.
[1016 14:00:00.485984] 0x108573000 (Debug) FSM I/O SUMMARY writes journal/942 sb/0 buf/954 abm/0.
[1016 14:00:00.485996] 0x108573000 (Debug) FSM I/O SUMMARY writes inode/954 ganged/2160 (7.08%).
[1016 14:00:00.486001] 0x108573000 (Debug) FSM wait SUMMARY inode pool expand waits/0.
[1016 14:00:00.486006] 0x108573000 (Debug) FSM wait SUMMARY journal waits/0 max depth/0 max time/0
[1016 14:00:00.486011] 0x108573000 (Debug) FSM wait SUMMARY journal bytes used avg/443051 max/1230848.
[1016 14:00:00.486016] 0x108573000 (Debug) FSM wait SUMMARY journal rsv count/2160 avg/409600 bytes.
[1016 14:00:00.486020] 0x108573000 (Debug) FSM wait SUMMARY free buffer waits/0.
[1016 14:00:00.486025] 0x108573000 (Debug) FSM wait SUMMARY free inode waits/0.
[1016 14:00:00.486030] 0x108573000 (Debug) FSM wait SUMMARY revokes/0 avg/0 min/0 max/0.
[1016 14:00:00.486034] 0x108573000 (Debug) FSM wait SUMMARY revokes/0 avg/0 min/0 max/0.
[1016 14:00:00.486042] 0x108573000 (Debug) FSM Trans SUMMARY Reserve cnt/2160 avg/2 min/0 max/20.
[1016 14:00:00.486049] 0x108573000 (Debug) FSM Trans SUMMARY Commit cnt/2160 avg/23 min/4 max/176.
[1016 14:00:00.486054] 0x108573000 (Debug) FSM threads SUMMARY max busy hi-prio/6 lo-prio/3.
[1016 14:00:00.486059] 0x108573000 (Debug) FSM threads SUMMARY max busy dmig/0 events/0.
[1016 14:00:00.486064] 0x108573000 (Debug) FSM msg queue SUMMARY hi-prio now/0 min/0 max/4.
[1016 14:00:00.486069] 0x108573000 (Debug) FSM msg queue SUMMARY lo-prio now/0 min/0 max/2.
[1016 14:00:00.486073] 0x108573000 (Debug) FSM msg queue SUMMARY dmig now/0 min/0 max/0.
[1016 14:00:00.486078] 0x108573000 (Debug) FSM msg queue SUMMARY events now/0 min/0 max/0.
[1016 14:00:00.486084] 0x108573000 (Debug) FSM cache SUMMARY inode lookups/22500 misses/0 hits/100.00%.
[1016 14:00:00.486089] 0x108573000 (Debug) FSM cache SUMMARY free incore inodes now/8192 min/8189 max/8192.
[1016 14:00:00.486095] 0x108573000 (Debug) FSM cache SUMMARY buffer lookups/84626 misses/0 hits/100.00%.
[1016 14:00:00.486100] 0x108573000 (Debug) FSM cache SUMMARY free buffers now/2048 min/2045 max/2048.
[1016 14:00:00.486105] 0x108573000 (Debug) FSM cache SUMMARY attrs now/3726 min/3726 max/3726.
[1016 14:00:00.486110] 0x108573000 (Debug) FSM extent SUMMARY extent lookups/11695 misses/0 hits/100.00%.
[1016 14:00:00.486115] 0x108573000 (Debug) FSM extent SUMMARY hint tries/0 misses/0 hits/0.00%.
[1016 14:00:00.486124] 0x108573000 (Debug) VOP SUMMARY Setattr cnt/2160 avg/24+62 min/16+34 max/108+254.
[1016 14:00:00.486133] 0x108573000 (Debug) VOP SUMMARY Getquota cnt/120 avg/23+54 min/17+29 max/28+78.
[1016 14:00:00.486141] 0x108573000 (Debug) VOP SUMMARY VopLookupV4 cnt/8640 avg/27+44 min/16+19 max/123+199.
[1016 14:00:00.486151] 0x108573000 (Debug) TKN SUMMARY TokenChangeV5 cnt/2160 avg/20+18 min/10+4 max/107+166.
[1016 14:00:00.486165] 0x108573000 (Debug) SG SUMMARY WorkingMTD space total/1.82 TB free/1.82 TB (99.96%)
[1016 14:00:00.486174] 0x108573000 (Debug) SG SUMMARY WorkingMTD space minfree/1.82 TB (99.96%) maxfree/1.82 TB (99.96%)
[1016 14:00:00.486181] 0x108573000 (Debug) SG SUMMARY WorkingMTD alloc extent cnt/0 avgsize/0.00 B.
[1016 14:00:00.486186] 0x108573000 (Debug) SG SUMMARY WorkingMTD 19 btree free space fragments 35 splay tree fragments
[1016 14:00:00.486191] 0x108573000 (Debug) SG SUMMARY WorkingMTD perfectfit succeeded/0 failed/0
[1016 14:00:00.486196] 0x108573000 (Debug) SG SUMMARY WorkingMTD stripealign succeeded/0 failed/0
[1016 14:00:00.486203] 0x108573000 (Debug) SG SUMMARY WorkingData-1 space total/43.66 TB free/39.12 TB (89.61%)
[1016 14:00:00.486211] 0x108573000 (Debug) SG SUMMARY WorkingData-1 space minfree/39.12 TB (89.61%) maxfree/39.12 TB (89.61%)
[1016 14:00:00.486217] 0x108573000 (Debug) SG SUMMARY WorkingData-1 alloc extent cnt/0 avgsize/0.00 B.
[1016 14:00:00.486222] 0x108573000 (Debug) SG SUMMARY WorkingData-1 21402 btree free space fragments 42666 splay tree fragments
[1016 14:00:00.486227] 0x108573000 (Debug) SG SUMMARY WorkingData-1 perfectfit succeeded/0 failed/0
[1016 14:00:00.486232] 0x108573000 (Debug) SG SUMMARY WorkingData-1 stripealign succeeded/0 failed/0
[1016 14:00:00.486239] 0x108573000 (Debug) SG SUMMARY WorkingData-2 space total/43.66 TB free/38.53 TB (88.26%)
[1016 14:00:00.486246] 0x108573000 (Debug) SG SUMMARY WorkingData-2 space minfree/38.53 TB (88.26%) maxfree/38.53 TB (88.26%)
[1016 14:00:00.486252] 0x108573000 (Debug) SG SUMMARY WorkingData-2 alloc extent cnt/0 avgsize/0.00 B.
[1016 14:00:00.486257] 0x108573000 (Debug) SG SUMMARY WorkingData-2 19447 btree free space fragments 38732 splay tree fragments
[1016 14:00:00.486262] 0x108573000 (Debug) SG SUMMARY WorkingData-2 perfectfit succeeded/0 failed/0
[1016 14:00:00.486267] 0x108573000 (Debug) SG SUMMARY WorkingData-2 stripealign succeeded/0 failed/0
[1016 14:00:00.486273] 0x108573000 (Debug) PIO HiPriWr SUMMARY Working_MTD cnt/942 maxq/0.
[1016 14:00:00.486280] 0x108573000 (Debug) PIO HiPriWr SUMMARY Working_MTD avg/1691 min/479 max/14271.
[1016 14:00:00.486286] 0x108573000 (Debug) PIO HiPriWr SUMMARY Working_MTD sysavg/1690 sysmin/479 sysmax/14267.
[1016 14:00:00.486292] 0x108573000 (Debug) PIO HiPriWr SUMMARY Working_MTD avglen/1686 minlen/1024 maxlen/2048.
[1016 14:00:00.486298] 0x108573000 (Debug) PIO Read SUMMARY Working_MTD cnt/7179 maxq/1.
[1016 14:00:00.486304] 0x108573000 (Debug) PIO Read SUMMARY Working_MTD avg/161 min/111 max/10382.
[1016 14:00:00.486310] 0x108573000 (Debug) PIO Read SUMMARY Working_MTD sysavg/130 sysmin/99 sysmax/10363.
[1016 14:00:00.486315] 0x108573000 (Debug) PIO Read SUMMARY Working_MTD avglen/512 minlen/512 maxlen/512.
[1016 14:00:00.486321] 0x108573000 (Debug) PIO Write SUMMARY Working_MTD cnt/8133 maxq/1.
[1016 14:00:00.486327] 0x108573000 (Debug) PIO Write SUMMARY Working_MTD avg/1933 min/482 max/209779.
[1016 14:00:00.486333] 0x108573000 (Debug) PIO Write SUMMARY Working_MTD sysavg/1919 sysmin/472 sysmax/209766.
[1016 14:00:00.486339] 0x108573000 (Debug) PIO Write SUMMARY Working_MTD avglen/2373 minlen/512 maxlen/16384.
[1016 14:00:00.486351] 0x108573000 (Debug) Cache SUMMARY [homes.metadata.hmth] attrs now/2 min/2 max/2.
[1016 14:00:00.486357] 0x108573000 (Debug) RSVD SUMMARY [homes.metadata.hmth] reserved space max requested/282 MB accounted now/0 MB
[1016 14:00:00.486366] 0x108573000 (Debug) VOP SUMMARY [homes.metadata.hmth] Getquota cnt/120 avg/23+54 min/17+29 max/28+78.
[1016 14:00:00.486373] 0x108573000 (Debug) Cache SUMMARY [editset02.metadata.hmth] attrs now/1 min/1 max/1.
[1016 14:00:00.486378] 0x108573000 (Debug) RSVD SUMMARY [editset02.metadata.hmth] reserved space max requested/282 MB accounted now/0 MB
[1016 14:00:00.486387] 0x108573000 (Debug) Cache SUMMARY [editset03.metadata.hmth] attrs now/184 min/184 max/184.
[1016 14:00:00.486392] 0x108573000 (Debug) RSVD SUMMARY [editset03.metadata.hmth] reserved space max requested/282 MB accounted now/0 MB
[1016 14:00:00.486400] 0x108573000 (Debug) Cache SUMMARY [metadata02.metadata.hmth] attrs now/27 min/27 max/27.
[1016 14:00:00.486405] 0x108573000 (Debug) RSVD SUMMARY [metadata02.metadata.hmth] reserved space max requested/282 MB accounted now/282 MB
[1016 14:00:00.486414] 0x108573000 (Debug) VOP SUMMARY [metadata02.metadata.hmth] Setattr cnt/2160 avg/24+62 min/16+34 max/108+254.
[1016 14:00:00.486422] 0x108573000 (Debug) VOP SUMMARY [metadata02.metadata.hmth] VopLookupV4 cnt/8640 avg/27+44 min/16+19 max/123+199.
[1016 14:00:00.486428] 0x108573000 (Debug) Cache SUMMARY [metadata01.metadata.hmth] attrs now/3511 min/3511 max/3511.
[1016 14:00:00.486433] 0x108573000 (Debug) RSVD SUMMARY [metadata01.metadata.hmth] reserved space max requested/282 MB accounted now/0 MB
[1016 14:00:00.486442] 0x108573000 (Debug) Cache SUMMARY [editset01.metadata.hmth] attrs now/1 min/1 max/1.
[1016 14:00:00.486447] 0x108573000 (Debug) RSVD SUMMARY [editset01.metadata.hmth] reserved space max requested/282 MB accounted now/0 MB
[1016 14:00:00.486462] 0x108573000 (Debug) TKN SUMMARY [metadata02.metadata.hmth] TokenChangeV5 cnt/2160 avg/20+18 min/10+4 max/107+166.
[1016 14:00:00.486488] 0x108573000 (Debug) NETWORK SUMMARY [homes.metadata.hmth:50228]: QueuedInputMsgs max/2 QueuedOutputBytes max/0.
[1016 14:00:00.486494] 0x108573000 (Debug) NETWORK SUMMARY [editset02.metadata.h:49167]: QueuedInputMsgs max/2 QueuedOutputBytes max/0.
[1016 14:00:00.486500] 0x108573000 (Debug) NETWORK SUMMARY [editset03.metadata.h:49167]: QueuedInputMsgs max/1 QueuedOutputBytes max/0.
[1016 14:00:00.486506] 0x108573000 (Debug) NETWORK SUMMARY [metadata02.metadata.:49199]: QueuedInputMsgs max/3 QueuedOutputBytes max/0.
[1016 14:00:00.486511] 0x108573000 (Debug) NETWORK SUMMARY [metadata01.metadata.:49963]: QueuedInputMsgs max/1 QueuedOutputBytes max/0.
[1016 14:00:00.486516] 0x108573000 (Debug) NETWORK SUMMARY [editset01.metadata.h:49167]: QueuedInputMsgs max/0 QueuedOutputBytes max/0.
[1016 14:00:00.486522] 0x108573000 (Debug) NETWORK SUMMARY [editset01.metadata.h:49167]: QueuedInputMsgs max/2 QueuedOutputBytes max/0.
[1016 14:00:00.486529] 0x108573000 (Debug) FSM RSVD SPACE current 4230 MB actual 282 MB max since boot 4230 MB since last stats 4230 MB
[1016 14:00:00.486534] 0x108573000 (Debug) FSM RSVD clients with files open for write: now 5 max 6 last 5
[1016 14:00:00.486540] 0x108573000 (Debug) FSM RSVD client dirty in slot 4 with IP 192.168.0.202
[1016 14:00:00.486545] 0x108573000 (Debug) FSM RSVD clients writing: now 5 empty 4 max 6 last 5 |
Free space:
| Code: | Xsanadmin> select Working
Select FSM "Working"
Created : Tue Jul 20 14:54:49 2010
Active Connections: 6
Fs Block Size : 16K
Msg Buffer Size : 4K
Disk Devices : 9
Stripe Groups : 3
Fs Blocks : 5859369856 (87.31 TB)
Fs Blocks Free : 5210703728 (77.65 TB) (88%) |
The large files are made by Pictureready. Mostly 3 Mac's ingesting at the same time. The way Pictureready (and FCP) are ingesting is the reason why we get big amounts of extents: they just start writing. We've just started using MovieRecorder for our ingests and it does a much nicer job. It allocates the space on the volumes it needs (you have to give a max time), this results in one extent. So just like the Finder it preallocates the disk space.
Render files of a FCP project is now our main issue. We still have no nice workaround for this.
Instead of increasing the InodeExpand settings of the volume it sounds more logical to decrease the InodeExpandMin setting cause this will result in less inodes.. I will try this next time i'm there. |
|
| Back to top |
|
 |
Karepi JBOD

Joined: 08 Jul 2010 Posts: 4
|
Posted: Fri Oct 22, 2010 9:33 am Post subject: me too |
|
|
Wanted to add a me too, to this situation. Currently experiencing slow browsing just as described in this thread. Sometimes it will become unstuck but most of the times a client reboot will be necessary
So right now the possible causes are:
Cables.
Fragmentation.
Any other option that we might be missing?
Have a good day. |
|
| Back to top |
|
 |
jfw fully protected

Joined: 07 Mar 2010 Posts: 10
|
Posted: Fri Oct 22, 2010 10:23 pm Post subject: |
|
|
I have a client with these symptoms too.
I saw it on a directory without any files with extents problems, but perhaps they hit a file with the problem without noticing it before navigating to the directory that wouldn't show up at all.
Apparently once the Finder gets in this state, you can still do an "ls" in Terminal to see the contents of the directory. But you have to reboot to get the Finder to work again (for a while anyway).
I'd like to get an update from the original poster, dmastroluca, who hasn't posted since 3 weeks ago when he said (http://www.xsanity.com/forum/viewtopic.php?p=53703#53703) that essentially defragmenting the whole volume (by erasing it and recopying the files) didn't help. |
|
| Back to top |
|
 |
lucasnap Xsan Master

Joined: 05 Oct 2006 Posts: 107
|
Posted: Mon Oct 25, 2010 7:27 am Post subject: |
|
|
Nice to know there are more sites with this problem. Maybe we can try to figure out some more.
Do you guys use very large files as well? (around 100GB and up)
And how do these files get on the Xsan? Ingest, copy, FCP, FCSvr?
We have seen the same extreme amounts of extends on files who got picked up by FCSvr (Watch folder > copy and delete) so we need to make a work around on that as well. |
|
| Back to top |
|
 |
Karepi JBOD

Joined: 08 Jul 2010 Posts: 4
|
Posted: Tue Oct 26, 2010 12:00 pm Post subject: |
|
|
Very large files as well.
We get the files on the SAN through ingest.
And have some scripts moving files around from volume to volume.
Running snfsdefrag will get stuck while defragging, would seem like a latency problem the thing is if you test the whole fiber setup and network setup it seems ok.
If the Volume is clean there are no problems. It starts when the volume is around 50%.
Doing ls through the terminal works sometimes, but i bet if you do ls -ls it will get stuck.
My question is, is it a client-side problem or a server-side problem? Meaning what if the mdc's are hosted by a leopard instead of a snow leopard? |
|
| Back to top |
|
 |
lucasnap Xsan Master

Joined: 05 Oct 2006 Posts: 107
|
Posted: Wed Oct 27, 2010 4:11 am Post subject: |
|
|
If you run Leopard on the MDC you also need to run Leopard (or lower) on the clients.. So it certainly is a good question but it is a rhetorical one.
The MDC makes too many extends, that is the problem. So the solution will be either at the server (to make less) or at the client (read more). |
|
| Back to top |
|
 |
lucasnap Xsan Master

Joined: 05 Oct 2006 Posts: 107
|
Posted: Mon Nov 01, 2010 4:21 am Post subject: |
|
|
I posted a bugreport at radar.apple.com:
| Quote: | 25-Oct-2010 02:03 PM Lucas Nap:
Summary:
Large file (100GB+) will get holy (with holes) on an Xsan volume under some circumstances: a busy filesystem and a application who is not using correct memory allocation (such as Final Cut Pro and Quicktime).
Steps to Reproduce:
1. Setup an Xsan 2.2.1 Volume with default settings (I used the Xsan Admin for HiRes)
2. Connect multiple MacPro's with Mac OS 10.6.4 as an Xsan client
3. Ingest with Final Cut Pro 7.x 3 large movies (3 hours) ProRes (one stream per MacPro with AJA Kona card)
4. Check the files with snfsdefrag -c
5. The amount of extends will be up to 6000 or even "holy".
6. Check the files with the Finder (Command-I or dubble-click)
7. Finder will be unresponsive from this moment on, all the folder on the Xsan volume will seem empty until the Xsan client is restarted.
Expected Results:
Normal Finder behavior.
Actual Results:
Finder will crash, quicktime will crash trying to play the large movie.
Regression:
I'm using all up to date software:
- Mac OS 10.6.4 (for the MacPro's running Final Cut Studio 3)
- Mac OS 10.6.4 Server (for the Xserve's running as metadata controller and for the Xserve running Open Directory )
- Xsan 2.2.1
- 2 x Promise Vtrack E-class (firmware 10.6.2270.00)
- 2 x Promise Vtrack J-class
- 2 x Qlogic 5602 FC switch (firmware 7.4.0.16.00)
- Netgear gigabit switch for metadata
- Netgear gigabit switch for normal LAN
We did not had this problem with our old setup (differences):
- Mac OS 10.5.x
- Apple Xserve RAID
- Only one Qlogic FC switch
|
I hope to get a response soon but I had less productive results with radar before. |
|
| Back to top |
|
 |
AL RAID 5

Joined: 23 Sep 2009 Posts: 15
|
Posted: Mon Nov 01, 2010 8:01 am Post subject: |
|
|
Hello, I'm administering a system that's having similar problems. There's separate post about our issue (http://www.xsanity.com/forum/viewtopic.php?t=10603) but I think I should continue posting here.
We don't think we have any hardware issues but big problems with Finder when browsing the file system. I've been to contact with Apple's Xsan support and currently waiting information from their engineers of the collected log files. I suspect that the problem is caused by some the following or a combination of them: high fragmentation of the volume, high number of files and used inodes or/and volume settings. |
|
| Back to top |
|
 |
mcjunkie partially protected

Joined: 31 May 2005 Posts: 8
|
Posted: Thu Nov 11, 2010 6:37 am Post subject: |
|
|
Today Apple released Mac OS X 10.6.5.
There is an Xsan related note in update description. (http://support.apple.com/kb/HT4249)
Xsan
Improves reliability of fibre channel connections, resolving a potential Xsan volume availability problem.
What do you think about this?
I hope it may fix issue we've experienced long time.
Thanks. |
|
| Back to top |
|
 |
lucasnap Xsan Master

Joined: 05 Oct 2006 Posts: 107
|
Posted: Thu Nov 11, 2010 8:07 am Post subject: |
|
|
Doesn't sound like what we need... I was hoping more in the line of "improved reliability memory allocation/fragmentation".
I've seen this in the Pro App 2010-02 update:
Final Cut Pro "Several memory fragmentation issues have been resolved."
That sounds about right. Did anyone test this already? |
|
| Back to top |
|
 |
Karepi JBOD

Joined: 08 Jul 2010 Posts: 4
|
Posted: Thu Nov 11, 2010 10:39 am Post subject: |
|
|
| Still haven't tested 10.6.5. Anyone out there that has? So what's the current status of everybody? Over here still same old same old. Would moving to full fledged Stornext help? |
|
| Back to top |
|
 |
lucasnap Xsan Master

Joined: 05 Oct 2006 Posts: 107
|
Posted: Mon Nov 15, 2010 4:01 am Post subject: |
|
|
As promised I decreased the InodeExpandMin settings to 2 and 4 and interesting enough it did change the behavior of the volume. (default is 8 )
First with "2" the volume became pretty unresponsive, especially with writhing big files.
With the "4" setting I exported 2 150GB QT files with QT 7 Pro on two different Xsan clients. The result was extents of 7000 and 14000!!! That must be a record! But unfortunately still all the Finder's crashed trying to do something with the files.
Next test: 10.6.5 (I'm at the test environment at the end of the month, so we will have to wait). |
|
| Back to top |
|
 |
|