InfoWorld magazine reported today on a leak from none other than Jonathan Schwartz (CEO of Sun Microsystems) that Apple would announce the use of its ZFS technology in the next Mac OSX iteration at the WWDC next week. (see here) In fact, it is integral to the new Time Machine feature built into the OS. This set me to thinking. What exactly is ZFS? Why would it be integral to Time Machine? How does Time Machine work?
The thought that ZFS would become the file system of choice in Leopard is not a new idea. When Steve Jobs demonstrated Time Machine at the WWDC last year, techie brains got to churning and reached this conclusion. It was quickly debunked by Apple. John Siracusa has an excellent write-up of all of the details and speculation concerning ZFS and Leopard at Ars Technica.
But if Apple has insisted that Time Machine does not rely on ZFS, how could it work? Though a superior file system, HFS+ has limitations that would make a product such as Time Machine seem nearly impossible. John points out several of the flaws of using Time Machine with large media files in his article.
The assumption is that Leopard keeps multiple copies of files so that the user can go back to a specific period in time and recover altered or deleted data. Even with large capacity disks, this can easily consume the available storage resources of even the largest MacBook Pro drive. Is there a better way to achieve these results without sacrificing capacity? I think there is.
This is just a theory that I developed on my own. It has not been tested or verified. But what if Time Machine used a system similar to Aperture?
Aperture is a rather simple design in a feature rich product. Photos can be directly imported in RAW format from a digital camera. These masters are locked and never touched. Any changes that the user makes to color balance, white balance, hue, saturation, etc . are stored in a separate file. This allows a photographer to revert back to the original or create numerous small versions of a photograph without requiring massive amounts of disk space for each photograph. The versions are merely scripts showing the settings that have been applied to each master modified. Because they do not include the original uncompressed image file, these script files are rather small.
Time Machine could easily apply the same technology to other files within the OS. When you create a Keynote presentation and save it for the first time, a full master of the presentation is created. It would be rather simple if the OS recorded only future modifications to a separate filter to be used as a filter the next time the file was opened. Since it would not need to contain all of the original information, size can be reduced by only containing the changes. When a user opens the document, the filesystem would look for any change files or filters and process them at the time the master is opened. To revert to a previous version, you would start peeling off filters until you reach the desired result.
Until now, providing this functionality was application specific. To make it work across applications, each would have to be developed with the technology. Alternatively, the operating system can be developed to make this a de facto behavior for file creation. Modifying the OS to perform the task extends the functionality beyond applications to file folders and OS specific behaviors.
We would still need to consider the limitation of most traditional file systems in storing one file - regardless of how small - in one file block on the hard drive. This prevents other files from being stored to the same block. The thousands of files that would be created from using the procedure described above could still have the potential to eventually choke a file system with small files consuming more resources than they actually need.
We still have to consider Jonathan Schwartz's revelation. In a different manner than described above, ZFS can achieve the same results - and more. By design, ZFS more efficiently stores small files without running into the one file per block limitation. Whether the limitation is broken or the block size reduced, I do not know without doing further research. ZFS also introduces the concept of snapshots which are like on-the-fly backups of file changes so that files can be more easily restored without having to resort to tape backup recovery.
By far one of the largest advantages of ZFS for me would be the ability to increase the capacity of the hard drive by adding additional drives. Imagine having a laptop with only 40GB of data storage. You plug in an external 100GB drive and your capacity increases to 140GB. But rather than the traditional method of accessing each device separately, ZFS combines them into one large volume. A movie file for example could extend across devices whereas it currently must me stored in its entirety on one device in other file systems. This allows you to use every available block on your storage media and makes the augmentation transparent to the user. Tremendous!
I am waiting anxiously for the Steve Jobs keynote next week to hear the latest revelations from Apple. What great new things are on the horizon? Just tell me when and how much and I will be there with my checkbook.