Introduction

There are a number of metadata frameworks and indexers such as Beagle, Kat, Strigi and GLScube, as well as a new freedesktop system Tracker, which is based on this spec and is currently under development. These frameworks provide a rich source of metadata about files including such things as the author of a document or the artist of an mp3 file. The purpose of this specification is to define a common metadata naming scheme that each framework can implement to allow applications to tap into this wealth of information. Some examples of interested applications would include filemanagers that want to display and allow editing of this metadata as well as providing integrated search functionality and virtual folder capability (IE folders whose file contents are defined by metadata rather than physical location). This specification will define a common set of "well-known" metadata.

Also worth reading is Apple's spotlight metadata attributes reference

CommonExtendedAttributes describes common extended attributes that can be used by indexers when retrieving metadata from files.

Metadata

Metadata is usually defined as data about data. In our case the metadata describes data about files that is often user visible in file managers, office applications, document viewers and audio players. Metadata can typically be viewed or written by selecting "properties" from the file menu of one of these applications.

Whilst there are some standards for naming document metadata like Dublin Core, most desktop applications use a propriety set of metadata names. This specification will attempt to define a common set of metadata using a mixture of Dublin Core, ID3 for audio files, EXIF for image files as well as application specific metadata names. The purpose of these common metadata names is not just for the benefit of metadata frameworks and search engines but also for standardising the display of metadata in all applications.

Metadata rules

The only requirement for metadata names is that they are unique and do not overload or cause confusion with each other. To make this possible, all metadata is namespaced by an appropriate class based on the type of the file or the application name (if the metadata is application specific).

This specification only defines a common subset of all possible metadata and is not designed to limit what metadata any file can have nor does it provide any formal names for custom or non-standard metadata other than a namespace class.

None of the metadata defined in this specification is mandatory and the existence of any metadata is dependent on the framework being used and the files being indexed.

All metadata defined here may be used in search strings.

Only metadata that is not derived from the file or file contents may be editable in the interface (applications that want to change non-writable metadata need to modify the embedded metadata in the file's contents themselves).

Metadata Data Types

Metadata typically comes in a variety of formats and types. In order to facilitate efficient storage and querying, we need to define a group of data types and formats that all metadata we are interested can conform to. The basic data types specified here are:

Metadata Namespaces

For all metadata, each metadata item needs to be namespaced with its class type using a "." qualifier (EG Audio.Artist represents the metadata Artist for an audio class file). Metadata that is strictly application specific should use a namespace class based on the application name (EG "Nautilus.Window_Geometry").

This specification defines the following built-in classes:

Generic File Metadata

Generic file metadata is applicable to all files regardless of their format. The specified metadata uses a few Dublin Core based types where applicable with the rest being custom ones. Generic file metadata types are namespaced with the "File" class. Only some of the generic metadata may be writable. Custom metadata not listed below that is generic and applies to all files should also be namespaced with the "File" class unless it is strictly application specific.

Audio Metadata

Audio metadata is based on the widespread ID3.1 tags embedded in mp3, ogg and similar files. These are already defined in that specification. All metadata in this section is prefixed with "Audio" and it is recommended that any other metadata not listed below also uses this prefix if its audio related (unless it is application specific).

Document Metadata

For documents, applications have typically used a mixture of Dublin Core types and propriety types. In order to be consistent with them, we have based our metadata names likewise. We have also based these names on metadata names found in Open Office, Ms Word and PDF documents. All metadata in this section is prefixed with the "Doc" class and any other document based metadata should also have this prefix (unless it is application specific). All the metadata here is not editable through the interface as all of it is derived from the file contents.

Image Metadata

For images, most support the EXIF standard and so this largely makes up this specification. SVG files have user definable non-standard metadata so a subset of Dublin Core is also provided here. All metadata in this section is prefixed with the "Image" class and any other image based metadata should also have this prefix (unless it is application specific). All the metadata here is not editable through the interface as all of it is derived from the file contents.