[nepomuk-kde] Thoughts on Portable Meta Information
Wang Hoi
zealot.hoi at gmail.com
Wed Apr 1 08:19:17 CEST 2009
Hello, i just FWD a post from my blog, about Portable Meta-Information
storage with File Forks/Resource Forks.
Portable meta-information has been discussed twice recently on planet kde:
http://www.kdedevelopers.org/node/3923
http://zwabel.wordpress.com/2009/03/29/portable-meta-information/
Portable meta information can be thought as File's meta information is
not stored centrally (they may still be indexed in a central database
to optimize query).
It introduces some benefits:
1. You can transfer the files around without loose the meta
informations, I only concern move files locally now.
2. When you re-install your system or your central database broken, as
long as your data files are still there or get backuped, their meta
information aren't lost.
3. It's a file not sth. inside database, as long as we standardlize
the file format, apps/libraries needn't to learn nepomuk query,
manipulate database to perform basic meta info operation (like backup,
remove etc....)
My thought is realistic:
1. Will nepomuk really be more usable with portable/distributed stored
meta information ?
I don't know much of nepomuk's server implementations, afaik it seems
use a central database to store and index these information, so I've
questions to ask :
When user move his file to another local folder(maybe on another
filesystem), will the file's meta information get lost , updated
later(next time to index) or updated instantly (use inotify )?
Any meta info import/export support ?
2. How to implement portable meta information ?
someone has suggested to use side-car file (like .aaa.meta), others
suggested xattr.
There is a seems better way to implement it : file forks/resource
forks (not process fork),
http://en.wikipedia.org/wiki/Fork_(filesystem)
http://en.wikipedia.org/wiki/Resource_fork
It's invented by Apple , store text file's encoding, music files'
external lyrics/tags, applications' icons ...
Quote:
Apple's HFS, and the original Apple Macintosh file system MFS, were
designed to allow a file to have a resource fork to store metadata
that would be used by the system's graphical user interface (GUI),
such as a file's icon to be used by the Finder or the menus and dialog
boxes associated with an application.
It tightly bonded/embed to a file unlike side-car file, but also
bypass the size limit of xattrs(Extended File Attributes), the size
limit is the largest file you can create, and you can also assign
serveral meta info file to one file.
This is a practise proof method,but only modern(advanced) filesystem support it:
Only Apple's HFS, Microsoft's NTFS, Solaris's ZFS has full support now....
And extensively used by Apple(store all sorts of metadata) and
Microsoft(store its system backup related infomation, security control
info).
Something need to mention, that Mac OS X's unix command line utilities
(cp, mv, ....) can handle file with resource/file forks correctly. And
Microsoft name it "alternate data stream" (ADS).
You can refer their dev docs to see how they design the api.
Oops the main linux file system, ext2/3/4, XFS, JFS... doesn't support it well.
3. Is it possible/hard to implement it under linux ? (You can skip
the following paragraphs if you're not interested in implementing such
things in kernel)
In my personal opinion, not hard indeed, if i want :) and asume it to
be implemented in VFS level (so all sorts of filesystem beneath this
level gain support).
The simplest way is to add a "dentry" to each "general inode", that
from the filesystem's view(not user visible), a regular file can be
associated with a "directory" too, all metadata files resides under
that "directory".
like this:
/home/xx/xxx/a.jpeg ----> nepomuk.xml
user.encoding
user.img.source.url
....
And make getfattr/setfattr related system call to lookup that "dentry"
too, this method also remains backward compatible with xattrs
(Extended file attributes), but remove the size limit put on them.
Or we can add new system call like getfmeta/setfmeta ....
Of course, we need more analysis/profiling to say sth. on the
memory/time performance.
Just make a predication, the extra memory requirement it introduces(an
extra "dentry" pointer) is affordable, and there's no/very subtle
extra time needed by open/write ..regular system call, and it will be
fast than create side-car files, since we needn't to create/open the
side-car files from user space and pollute a directory with side-car
file per data file, kernel handles it ..
Also need some security concerns when implements it.
Anyway, to implement portable meta information, i think we need
support from underline library/filesystem/architecture.
4. This method's disadvantages:
a.We have no POSIX standards to define the API, and Apple's HFS,
Microsoft's NTFS and Solaris's ZFS use different api for similar
functionalities. We may need to abstract the api to make KDE cross
platform.
b.No major filesystems under linux provide file forks / resources
forks support now. Need to implement it or make feature requirement to
kernel people ....
c.We can't directly associate a xml as a file's meta info, since xml
is not an appendable format, data corruption may happens when we
update this file's meta info while user eject the source disk,
suddenly power goes off ,etc.......... These problems need to be
solved if we want to support portable/distributed meta information in
practise.
---------------------------------------------------------------------------------------
Thanks for reading..
Require comments,
Regards,
More information about the nepomuk-kde
mailing list