If you work with a storage system larger than 10TB and letís say hundred thousands or more files stored on it, saving these data as a backup or mirror can be quite challenging. I know many systems, which solely rely on RAID level 5 as the only security feature, which is somehow frightening if you consider the value of your data.
In this article I will explain how Xsan 2.2.xís fsevents feature in conjunction with Archiware PresSTOREís 4.x Synchronize module can help you creating a secure copy of your data during normal business hours without interrupting your real-time I/O operations.
A. The challenge
Why is it that so many people donít create copies of their online data?
- To copy a large volumeís data to tape or disk, you need to spend money for the backup device, and nowadays most production environments need to be as cost-effective as possible. So itís tempting not to spend money for a backup, which in an ideal world you are never going to use anyway.
- If the primary online volume fails, most people need to be able to work on the backup as soon as possible, probably even without much or any interruption. Thatís why you usually need to have a copy of your online data on a second disk set in exactly the same form like on the primary volume. Using a proprietary backup format on the secondary disk set wouldnít help you much in disaster scenarios, because you would need to restore your data before you could actually use them. If you consider, that after some time you donít need instant access to project data anymore, this leads to a typical three tier setup as a minimum environment: primary online volume, secondary online volume, tape archive (with data which can be removed from the disk sets). This setup can be complex and expensive.
- If you are willing to spend the money for a primary and secondary disk volume, as well as for a tape library, you need to make sure that the copy on your secondary disk set is as recent as possible. With very large volumes, scanning the volume for file changes takes a reasonable amount of time (like a couple of hours) which you probably donít have very often, probably even not at the weekend. In addition, this scanning process can put so much load on your storage system, that you canít scan your volumes during normal business operations without interrupting your real-time I/O. So with large volumes and constantly changing files, in many environments you would simply not find a backup window to create regular copies of your data.
Regarding point 1 Iíd like to say that yes, in an ideal world you would never need to restore data from your backup. In the real world Iíd say, that you would only not need your backup if you have a very good backup strategy. As soon as you donít set up your backup, you would probably need it. I havenít seen a single system without some kind of backup, versioning, etc. where people havenít lost data sooner or later, no matter if it was by accidentally deleting files, file system corruption, or hardware issues with or without power failures. Ah, and did I mention that you absolutely need a UPS for your whole storage system?
So whatever you do, you need a copy of your data. Some people argue that they still have their footage on tape, Professional disk, P2 media, etc. But what happens with your Photoshop files, FCP project files, Word documents, etc? And how long would it take to re-ingest/-transfer your footage to your storage system? And what about metadata? If you use an asset management system like Final Cut Server, you need to relink your re-ingested material to the metadata available in your asset managementís database only. Not so easy, especially if you are in a hurry. I treat the copy of my XDCAM, P2, etc. files as the new originals, as in most environments the original media (I mean e.g. the Professional disk) would simply not be stored forever, or at least not in an environment, which would guarantee media availability after some years.
So I would simply never recommend to design a production system which doesnít include any kind of backup.
How you deal with point 2 depends on your environment. If you never work on time-critical projects, you might backup your data from your online volume directly to tape. Depending on your tape library, this can be as fast as copying files over to a second storage system. If in a disaster scenario you think you would still have enough time to restore your data before you continue working on them, thatís a good approach. Yet, you still need to scan your online volume for file changes, and you would probably not do that during normal business operation as this might interrupt your real-time I/O. So not having a second disk based volume would only work in environments which either have not many data on their primary volume (so scanning the filesystem for file changes would be superfast), or they donít need recent backups and are fine if they only create a nightly backup. But even then you need to make sure that your backup window is large enough, so that you can scan your whole volume and copy the file changes to tape.
If your budget allows for it, you should try to get (at least) a setup like this one:
Letís have a look at some concepts, which are going to help us implement the 3 tier setup shown in the picture above.1. Secondary Online Disk
In point A.2. I said, that you would probably need to design a 3 tier storage environment, with primary online storage, secondary online storage, and tape based archive. As the secondary online storage is supposed to be a most recent copy of the primary online storage, this raises the question how to mirror the two volumes.
The first thing that comes to my mind is using a block level hardware solution like e.g. Cloverleafís iSN. Iíve used it in a 24/7 broadcast environment and was not unhappy with it, but what happens if the file system breaks? This would affect both the primary and the secondary volume. To avoid this you could use snapshots, which are supported by iSN, but the whole thing gets really complex at some stage, and Iíd like to keep things as simple as possible.
So Iíd prefer to have two independent file systems and make sure, that I only copy files from one file system to the other. If one system breaks, I could then simply repair/recreate it and copy my data onto it.
What happens if the primary volume really fails? How do I access the second volume?
Letís say your volumes are called XsanVolume1 and XsanVolume2. They share the same size. If XsanVolume1 is mounted on all your servers and your clients, XsanVolume2 needs to be mounted on your backup/synchronize machine at least, because at least one machine needs access to both volumes to copy data from volume 1 to volume 2. In case XsanVolume1 fails, you would use Xsan Admin to unmount XsanVolume1 from all servers and clients, mount XsanVolume2 on all servers and clients, and then e.g. use Remote Desktop or something alike to create a symlink on all machines from /Volumes/XsanVolume2 to /Volumes/XsanVolume1. Test this with your specific applications before you try this in a disaster scenario, but this will work with most apps.
You could of course fully automate this, but I would recommend to prepare the tools for a manual failover. This way you would be able to first investigate the reason for the failure.2. Filesystem Scan vs. Filesystem Events
The functionality that we need to create a copy of our primary volume on the secondary volume is a synchronization function. If we use PresSTORE, we can still decide to support versioning like in a backup, and this would not prevent us from working on the secondary volume directly if we needed to. But right now we just need a way to synchronize our two volumes.
As I said, scanning our primary volume for file changes since the last synchronization puts a reasonable amount of load on our system and would probably interrupt real-time I/O like ingest and playout, so a full scan of our volume should only happen at night, at the weekend, or whenever we find the time to do so. A full scan can easily take a couple of hours, depending on the number of files on your file system.
The solution to this sounds very easy: letís use filesystem events instead. This means, that each time a file in our filesystem gets created, deleted, or modified, this event will be written into a database. When my next synchronization starts, it just reads the database and copies the modified data to volume 2. No scanning necessary, and instead of letís say 5 hours of scanning, the process takes a couple of seconds only. Wow!
Since version 2.2 Xsan supports filesystem events. Apple doesnít really tell you about it directly, but you can extract it from their knowledge base articles if you understand, that Spotlight uses filesystem events as its foundation. In Xsan environments, every file modification will trigger a filesystem event on the machine which writes the file (if itís a Mac) as well as on the active metadata controller. The only machine which catches all filesystem events on your Xsan volume is the currently active metadata controller.3. How PresSTORE Synchronize 4.x uses Xsan 2.2.x Filesystem Events
Even PresSTORE 3.x supports filesystem events, so why do I need PresSTORE4.x? In version 3, the filesystem event needed to happen on the machine on which PresSTORE was running. So with PresSTORE 3 you would need to run your synchronization software on the currently active metadata controller to copy data from volume 1 to volume 2. The data copy processes would put so much load on your MDC, that its latency during metadata access would increase until your clients might loose frames during ingest or playout. The idea in version 4 is to run the Synchronize module on a normal client system and have a little tool running on the MDC, which does nothing but collecting all the filesystem events of your active Xsan volumes. Whenever your Sync process starts on the PresSTORE machine, it will simply copy the minimal information from the MDC, check if it needs to move some data, and then do the dirty work without impacting the load of the active MDC. You can say, that PresSTORE 3′s fsevents implementation was supposed to support local HFS+ volumes only, while version 4 now supports fsevents in Xsan environments, too.
How does it look like? You just install the PresSTORE client software on your MDCs (remember that you need a valid license for each of them), then you set up a synchronize plan which syncs XsanVolume1 to XsanVolume2. In the ďSynchronize OptionsĒ section of the sync plan you now find the option to set up an fsevents server, which in Xsan environments has to be the currently active MDC:
In case your primary MDC goes down, just select the secondary MDC as fsevents server for your sync plan.4. Appleís Implementation of Filesystem Events in Xsan 2.2 (in my understanding) and how to deal with it
Thereís the File System Events Programming Guide available on the Apple website, which describes the general fsevents mechanisms Apple uses in HFS+ and Xsan 2.2. At this stage you might think that with fsevents support, PresSTORE would simply get a list of files which have changed and would then work through it. That would be great. If you work in the fsevents team at Apple and read this article, please add this feature Ė that would be so awesome.
Anyway. What really happens if the Apple fsevents programming guide applies to the Xsan world, too, is this:
- Instead of receiving an event, that a file has changed, you get an event, that something within a folder has changed. So your application (like PresSTORE), still needs to scan this folder for file changes. In video environments this could be a folder with an image sequence (150.000 files?, no problem, thatís just 40 minutes made of 60 frames per second), so scanning all these files would take some time. If you work with Final Cut Server, most of the assets will be stored on the root level of one folder, so you also end up with many files in one folder. Each time you upload a txt file into Final Cut Server, you get an fsevent, that you need to scan the whole FCSvr Library. You can structure your Final Cut Server environment to better work with this fsevents strategy, but you canít rely on FCSvrís default settings. Be aware of this when you design your Xsanís folder structure.
- When you have too many fsevents at the same time or within a short time frame, the fsevents mechanism can merge these events and point your application to the common point of the events in the folder hierarchy. In the worst case this can mean that the fsevent says ďScan your whole Xsan volumeĒ. Your app could then still ignore the whole event, but hey, youíd like to protect your data, wouldnít you?
- All applications which register for fsevents share the same buffer. The first application to work on the fsevents list is Spotlight, which seems to deal with it pretty well. If you use other fsevents application, they might slow down the process so much, that new events donít fit into the buffer anymore. This means, you will not catch all the events, which again means, that a part of your data would not be protected. Yet, you should not run any software on the MDC anyway, so chances are very, very high, that you never run into this problem.
The main message here is that in very rare situations it might happen that the fsevents mechanism doesnít catch all the file operations on your Xsan volume, so at least from time to time you would still need a full scan of your volume. I would recommend to use fsevents for your Synchronize module during the day and during normal business operation, and then use full scans as often as possible. If this is once a week, I would feel secure.
C. To Do
While Archiwareís implementation of fsevents support in Xsan environments is simply great, the current version doesnít automatically deal with Xsan Metadata Controller failover. The perfect solution would be if for any Xsan volume you could enter multiple MDCs as fsevents source in a PresSTORE Synchronize plan, and then PresSTORE would automatically detect which one is the currently active server. If you donít have a split-brain problem with one of your volumes (or run software on one of the MDCs which writes data to your Xsan volume), there will always be a single MDC serving your fsevents, so this is something which would greatly improve the software. If you think that you really need automatic failover detection in PresSTORE 4, please go ahead and file a feature request at Archiware.