On Hierarchical File Systems and Storage Location
2009-08-23 1 Comment
The problem with hierarchical file systems
It’s clear that hierarchical file systems are not good for organizing music or photos. In short, the problem is having several attributes of interest, where no single one of them is more suited to lend structure than others. This same problem applies to all kinds of documents too, but the needs and wants regarding meta-data are more variable and harder to tackle there.
Specialized library-management applications, one for each kind of media, lead to a fractured user experience and data islands. It should be possible to exchange objects with meta-data over the net, without the user having to take special care. There needs to be a generalized solution.
As long as hierarchical filesystems stay around as layer on the bottom, there will be cases where users are exposed to them. For example on installation, backup and restore. Generally all cases concerning the physical location of data. You get a leaky abstraction. The fewer concepts the user has to deal with, the better.
I think object storage with attributes should happen on the file system level, so applications can build on a common foundation and there is no lower layer poking through.
Storage location as attribute
With current files systems, storage location is implied by the path of a file. On systems of the unix family, with single-rooted file system trees, it’s not even obvious which drive a given path maps to. One symptom of this is that dragging files in a file manager from one place to another will either move or copy the files, depending on whether the places are on one drive or not. Quite a bit of background information is required until a user can understand this behavior.
I think storage location should be dealt with explicitly. To do so, it could be presented to the user as just another attribute in the form of a list of one or more locations.
- Storage locations could be partitions on various media, or servers that do not reveal actual physical location.
- Locally there would need to be a catalog of all known objects along with their lists of storage locations. It would have to be updated on remote changes.
- Each location would have a catalog for its own content. However, I’m not sure if these should contain complete lists of the other locations per object.
- To identify objects across several locations, they would need to have unique IDs as sole invariable attribute.
Combined with versioning, this could be an elegant way to deal with:
- Downloads: store the origin of the initial version with date of retrieval
- Uploads to file hosts or directly to contacts. It’s just adding a location for a specific version.
- Constant mirroring of important data to another local location without any special tools.
- Tracking backups. Keeping note of what versions of objects are stored on what removable media.
- Mirroring/synchronization. Remote locations could be coupled with a schedule for synchronization.