Metadata in Greenstone

Metadata is data about data. In Greenstone this is information about the documents, that may be separate from the content of the document – typically title, author, creation date, and so on. Adding metadata to documents is an extremely important part of building digital collections. Metadata can help users navigate collections and find the information/documents they need. Metadata can also provide important contextual and provenance information about documents. This section explains how metadata is created, edited, assigned and retrieved, and how to use external metadata sources.

Adding Metadata Using GLI

The easiest way to add metadata is using the Greenstone Librarian Interface (GLI). In GLI, you create or open a collection, add documents to it, and add metadata to the documents. The Enriching You Collection With Metadata section of the GLI help covers topics such as adding metadata to the documents; importing documents that already have Greenstone metadata, editing metadata, reviewing assigned metadata.

Metadata Sets

All metadata fields in Greenstone belong to a metadata set, which is simply a pre-defined collection of metadata fields. Because sets will often have metadata fields with the same name (for instance, most sets will have a 'Title' field), namespaces are used to distinguish between metadata from different sets. For instance, all metadata fields in Dublin Core are preceded by dc. (dc.Title, dc.Creator, etc.). Metadata sets are stored in the Librarian Interface's metadata folder and have the suffix ".mds".

The default metadata sets for new collections are Dublin Core (dc), the Greenstone Metadata Set (gs), and the Extracted Greenstone Metadata Set (ex). The Extracted set is unique because it contains metadata automatically generated during the collection building process and cannot be edited. Metadata values in this set cannot be modified (as it is extracted from the documents themselves), and metadata fields in the extracted set can be referred to without a namespace (so referencing Title is the same as referencing ex.Title).

The Metadata sets section of GLI help explains how to manage metadata sets in GLI. The metadata sets page in the user guide looks at all the metadata sets currently defined for Greenstone. It also talks about GEMS, the Greenstone Editor for Metadata Sets, which can be used to create new metadata sets, or all new elements to an existing one.

Metadata database files

You may have metadata for your documents that is not in Greenstone metadata.xml form, for example, in MARC, OAI, or CSV (which can be created from a spreadsheet) format. This metadata has to be processed by Greenstone first before the metadata can be associated with the documents in your collection. There are several different options for processing these metadata files, depending on whether you want to be able to view and/or edit the metadata in the GLI. You can read more about these different options on the metadata database files page.

Greenstone archive format

During collection "importing", all source documents are brought into the Greenstone system by converting them to a format known as the Greenstone Archive Format (alternatively, you can choose to use Greenstone's METS profile). This is an XML style that marks documents into sections, and can hold metadata at the document or section level. During collection "building" these archive documents are processed, and the content indexed and classified. Greenstone metadata formats

Using metadata

You can make the most of the metadata you've added to your collection by using it in several different ways. You can create Browsing Classifiers to allow users to browse your collection by certain a metadata field(s). You can create search indexes and partitions based on one or many metadata fields. Finally, when you are formatting your collection, you can decide which pieces of metadata will be displayed for each document (on browsing pages, on the document pages, and in search results) and how it this metadata will be displayed.

Exporting metadata

Greenstone can export the contents and/or metadata of a collection to several standard formats, including METS, DSpace and MARCXML.

To export a collection, open the "File" menu and choose "Export…". You can choose which format to export to by selecting it in the "Export to" drop-down list. Specify a name for the directory where you want to put the exported files—the files will end up in <path to greenstone>/tmp/exported_xxx, where xxx is the name you specified. Select one collection in the list of available collections, then click "Export Collection".

There are other options specific to the various formats. You can specify XSLT files which will be applied to the resulting XML document(s) in order to customize the output format. Exporting to MARCXML uses a mapping file to map Greenstone metadata to MARC fields. The default mapping file maps only Dublin Core metadata. You can specify a custom mapping file to be used instead. Visit the exporting collections page for more information.

Additional resources