If you work with a storage system larger than 10TB and let’s say hundred thousands or more files stored on it, saving these data as a backup or mirror can be quite challenging.
I know many systems, which solely rely on RAID level 5 as the only security feature, which is somehow frightening if you consider the value of your data.
In this article I will explain how Xsan 2.2.x’s fsevents feature in conjunction with Archiware PresSTORE’s 4.x Synchronize module can help you creating a secure copy of your data during normal business hours without interrupting your real-time I/O operations.
A. The challenge
Why is it that so many people don’t create copies of their online data?
Regarding point 1 I’d like to say that yes, in an ideal world you would never need to restore data from your backup. In the real world I’d say, that you would only not need your backup if you have a very good backup strategy. As soon as you don’t set up your backup, you would probably need it. I haven’t seen a single system without some kind of backup, versioning, etc. where people haven’t lost data sooner or later, no matter if it was by accidentally deleting files, file system corruption, or hardware issues with or without power failures. Ah, and did I mention that you absolutely need a UPS for your whole storage system?
So whatever you do, you need a copy of your data. Some people argue that they still have their footage on tape, Professional disk, P2 media, etc. But what happens with your Photoshop files, FCP project files, Word documents, etc? And how long would it take to re-ingest/-transfer your footage to your storage system? And what about metadata? If you use an asset management system like Final Cut Server, you need to relink your re-ingested material to the metadata available in your asset management’s database only. Not so easy, especially if you are in a hurry. I treat the copy of my XDCAM, P2, etc. files as the new originals, as in most environments the original media (I mean e.g. the Professional disk) would simply not be stored forever, or at least not in an environment, which would guarantee media availability after some years.
So I would simply never recommend to design a production system which doesn’t include any kind of backup.
How you deal with point 2 depends on your environment. If you never work on time-critical projects, you might backup your data from your online volume directly to tape. Depending on your tape library, this can be as fast as copying files over to a second storage system. If in a disaster scenario you think you would still have enough time to restore your data before you continue working on them, that’s a good approach. Yet, you still need to scan your online volume for file changes, and you would probably not do that during normal business operation as this might interrupt your real-time I/O. So not having a second disk based volume would only work in environments which either have not many data on their primary volume (so scanning the filesystem for file changes would be superfast), or they don’t need recent backups and are fine if they only create a nightly backup. But even then you need to make sure that your backup window is large enough, so that you can scan your whole volume and copy the file changes to tape.
If your budget allows for it, you should try to get (at least) a setup like this one:
B. Concepts
Let’s have a look at some concepts, which are going to help us implement the 3 tier setup shown in the picture above.
1. Secondary Online DiskIn point A.2. I said, that you would probably need to design a 3 tier storage environment, with primary online storage, secondary online storage, and tape based archive. As the secondary online storage is supposed to be a most recent copy of the primary online storage, this raises the question how to mirror the two volumes.
The first thing that comes to my mind is using a block level hardware solution like e.g. Cloverleaf’s iSN. I’ve used it in a 24/7 broadcast environment and was not unhappy with it, but what happens if the file system breaks? This would affect both the primary and the secondary volume. To avoid this you could use snapshots, which are supported by iSN, but the whole thing gets really complex at some stage, and I’d like to keep things as simple as possible.
So I’d prefer to have two independent file systems and make sure, that I only copy files from one file system to the other. If one system breaks, I could then simply repair/recreate it and copy my data onto it.
What happens if the primary volume really fails? How do I access the second volume?
Let’s say your volumes are called XsanVolume1 and XsanVolume2. They share the same size. If XsanVolume1 is mounted on all your servers and your clients, XsanVolume2 needs to be mounted on your backup/synchronize machine at least, because at least one machine needs access to both volumes to copy data from volume 1 to volume 2. In case XsanVolume1 fails, you would use Xsan Admin to unmount XsanVolume1 from all servers and clients, mount XsanVolume2 on all servers and clients, and then e.g. use Remote Desktop or something alike to create a symlink on all machines from /Volumes/XsanVolume2 to /Volumes/XsanVolume1. Test this with your specific applications before you try this in a disaster scenario, but this will work with most apps.
You could of course fully automate this, but I would recommend to prepare the tools for a manual failover. This way you would be able to first investigate the reason for the failure.
2. Filesystem Scan vs. Filesystem EventsThe functionality that we need to create a copy of our primary volume on the secondary volume is a synchronization function. If we use PresSTORE, we can still decide to support versioning like in a backup, and this would not prevent us from working on the secondary volume directly if we needed to. But right now we just need a way to synchronize our two volumes.
As I said, scanning our primary volume for file changes since the last synchronization puts a reasonable amount of load on our system and would probably interrupt real-time I/O like ingest and playout, so a full scan of our volume should only happen at night, at the weekend, or whenever we find the time to do so. A full scan can easily take a couple of hours, depending on the number of files on your file system.
The solution to this sounds very easy: let’s use filesystem events instead. This means, that each time a file in our filesystem gets created, deleted, or modified, this event will be written into a database. When my next synchronization starts, it just reads the database and copies the modified data to volume 2. No scanning necessary, and instead of let’s say 5 hours of scanning, the process takes a couple of seconds only. Wow!
Since version 2.2 Xsan supports filesystem events. Apple doesn’t really tell you about it directly, but you can extract it from their knowledge base articles if you understand, that Spotlight uses filesystem events as its foundation. In Xsan environments, every file modification will trigger a filesystem event on the machine which writes the file (if it’s a Mac) as well as on the active metadata controller. The only machine which catches all filesystem events on your Xsan volume is the currently active metadata controller.
3. How PresSTORE Synchronize 4.x uses Xsan 2.2.x Filesystem EventsEven PresSTORE 3.x supports filesystem events, so why do I need PresSTORE4.x? In version 3, the filesystem event needed to happen on the machine on which PresSTORE was running. So with PresSTORE 3 you would need to run your synchronization software on the currently active metadata controller to copy data from volume 1 to volume 2. The data copy processes would put so much load on your MDC, that its latency during metadata access would increase until your clients might loose frames during ingest or playout. The idea in version 4 is to run the Synchronize module on a normal client system and have a little tool running on the MDC, which does nothing but collecting all the filesystem events of your active Xsan volumes. Whenever your Sync process starts on the PresSTORE machine, it will simply copy the minimal information from the MDC, check if it needs to move some data, and then do the dirty work without impacting the load of the active MDC. You can say, that PresSTORE 3′s fsevents implementation was supposed to support local HFS+ volumes only, while version 4 now supports fsevents in Xsan environments, too.
How does it look like? You just install the PresSTORE client software on your MDCs (remember that you need a valid license for each of them), then you set up a synchronize plan which syncs XsanVolume1 to XsanVolume2. In the “Synchronize Options” section of the sync plan you now find the option to set up an fsevents server, which in Xsan environments has to be the currently active MDC:
In case your primary MDC goes down, just select the secondary MDC as fsevents server for your sync plan.
4. Apple’s Implementation of Filesystem Events in Xsan 2.2 (in my understanding) and how to deal with itThere’s the File System Events Programming Guide available on the Apple website, which describes the general fsevents mechanisms Apple uses in HFS+ and Xsan 2.2. At this stage you might think that with fsevents support, PresSTORE would simply get a list of files which have changed and would then work through it. That would be great. If you work in the fsevents team at Apple and read this article, please add this feature – that would be so awesome.
Anyway. What really happens if the Apple fsevents programming guide applies to the Xsan world, too, is this:
The main message here is that in very rare situations it might happen that the fsevents mechanism doesn’t catch all the file operations on your Xsan volume, so at least from time to time you would still need a full scan of your volume. I would recommend to use fsevents for your Synchronize module during the day and during normal business operation, and then use full scans as often as possible. If this is once a week, I would feel secure.
C. To Do
While Archiware’s implementation of fsevents support in Xsan environments is simply great, the current version doesn’t automatically deal with Xsan Metadata Controller failover. The perfect solution would be if for any Xsan volume you could enter multiple MDCs as fsevents source in a PresSTORE Synchronize plan, and then PresSTORE would automatically detect which one is the currently active server. If you don’t have a split-brain problem with one of your volumes (or run software on one of the MDCs which writes data to your Xsan volume), there will always be a single MDC serving your fsevents, so this is something which would greatly improve the software. If you think that you really need automatic failover detection in PresSTORE 4, please go ahead and file a feature request at Archiware.