User Tools

Site Tools


old:enriching_your_collection_with_metadata

This page is in the 'old' namespace, and was imported from our previous wiki. We recommend checking for more up-to-date information using the search box.

Enriching Your Collection With Metadata

Having gathered several files into the collection, now enrich them with additional information called "metadata". This section explains how metadata is created, edited, assigned and retrieved, and how to use external metadata sources (also see Chapter 2 of the Greenstone Developer's Guide – Getting the most out of your documents).

The Enrich View

Use the glidict::GUI.Enrich view to assign metadata to the documents in the collection. Metadata is data about data – typically title, author, creation date, and so on. Each metadata item has two parts: glidict::Metadata.Element tells what kind of item it is (such as author), and glidict::Metadata.Value gives the value of that metadata element (such as the author's name).

On the left of the glidict::GUI.Enrich view is the Collection Tree. All the right-click functionality that was available for the Collection Tree in the glidict::GUI.Gather view is available here too. To the right is the Metadata Table, which shows metadata for any selected files or folders in the Collection Tree. Columns are named in black at the top, and can be resized by dragging the separating line. If several files or folders are selected, black text indicates that the value is common to all of the selected items, while grey text indicates that it is not. Editing grey values will only affect those documents with that metadata. Any new metadata values entered will be added to all selected items.

A folder icon may appear beside some metadata entries. This indicates that the values are inherited from a parent (or ancestor) folder. Inherited metadata cannot be edited or removed, only appended to. Click on the folder icon to go immediately to the folder where the metadata is assigned.

Clicking on a metadata element in the table will display the existing values for that element in the glidict::EnrichPane.ExistingValues area below the table. This "Value Tree" expands and collapses. Usually it is a list that shows all values entered previously for the selected element. Clicking an entry automatically places it into the value field. Conversely, typing in the value field selects the Value Tree entry that starts with the characters you have typed. Pressing [Tab] auto-completes the typing with the selected value.

Metadata values can be organized into a hierarchy. This is shown in the Value Tree using folders for internal levels. Hierarchical values can be entered using the character "|" to separate the levels. For example, "Cards|Red|Diamonds|Seven" might be used in a hierarchy that represents a pack of playing cards. This enables values to be grouped together. Groups can also be assigned as metadata to files.

Greenstone extracts metadata automatically from documents into a metadata set whose elements are prefixed by "ex.". This has no value tree and cannot be edited.

Selecting Metadata Sets

Sets of predefined metadata elements are known as "metadata sets". An example is the Dublin Core metadata set. When you add a metadata set to your collection, its elements become available for selection. You can have more than one set; to prevent name clashes a short identifier that identifies the metadata set is pre-pended to the element name. For instance the Dublin Core element Creator becomes "dc.Creator". Metadata sets are stored in the Librarian Interface's metadata folder and have the suffix ".mds".

When you create a new collection, the Dublin Core metadata set is added by default. You can change which metadata sets are used in a collection by clicking the glidict::EnrichPane.ManageMetadataSets button underneath the Collection Tree in the Enrich view. This brings up a new window for managing the collection's metadata sets.

The glidict::MetadataSetDialog.Current_Sets list shows you what sets are currently used by the collection.

To use another metadata set with the loaded collection, click "Add…". A popup window shows you the default metadata sets that GLI knows about. To add one of these, select it from the list and click "Add". If you have defined your own metadata set, you can use the "Browse" button to locate the file on your file system.

To create a new metadata set, click "New…". This will launch the Greenstone Editor for Metadata Sets, GEMS. An initial popup window prompts you for the set name, namespace and description. You can also choose to base the new set on an existing one, in which case it will inherit all the elements from the specified set. Click OK. The main window shows the elements of metadata set on the left hand side, and some attributes for the set on the right hand side. If you have based the set on an existing one, one or more elements will be displayed. Clicking one displays attributes of the element in the right hand side.

To add a new element, right click on the name of the set and choose "Add Element". To add a new subelement, right click on the element and choose "Add Subelement". Elements and subelements can be deleted by choosing "Delete (Sub)element" from the right click menu.

Note: the Greenstone Editor for Metadata Sets can be run independently of GLI by selecting it from the Greenstone folder in the Start menu, or by running gems.sh or gems.bat in the gli folder of your Greenstone installation.

Sometimes two metadata sets may have the same namespace, for example, Dublin Core and Qualified Dublin Core both use the namespace "dc". Such sets cannot be used in the collection at the same time. If you try to add a set with a namespace already used by the collection, a warning will be shown. If you go ahead, the existing set will be removed and the new one added. Any assigned metadata values will be transferred to the new set providing those elements still exist.

With GEMS you can edit existing metadata sets as well as create new ones. Clicking the "Edit" button launches GEMS with the specified metadata set open. Once you have finished editing the set (as described above), save it (File→Save) and close GEMS.

If a collection no longer needs a metadata set, select it and press "Remove". If you have assigned any metadata to its elements you will be asked how to deal with this metadata when you next open the collection.

Appending New Metadata

We now add a metadata item – both element and value – to a file. First select the file from the Collection file tree on the left. The action causes any metadata previously assigned to this file to appear in the table at the right.

Next select the metadata element you want to add by clicking its row in the table.

Type the value into the value field. Use the "|" character to add structure, as described in The Enrich View. Pressing the [Up] or [Down] arrow keys will save the metadata value and move the selection appropriately. Pressing [Enter] will save the metadata value and create a new empty entry for the metadata element, allowing you to assign multiple values to a metadata element.

You can also add metadata to a folder, or to several multiply selected files at once. It is added to all files within the folder or selection, and to child folders. Keep in mind that if you assign metadata to a folder, any new files in it automatically inherit the folder's values.

Adding Previously Defined Metadata

To add metadata that has an existing value, first select the file, then select the metadata element that you are assigning to, then select the required value from the value tree, expanding hierarchy folders as necessary. The value of the selected entry automatically appears in the metadata field (alternatively, use the value tree's auto-select and auto-complete features).

The process of adding metadata with already-existing values to folders or multiple files is just the same.

Editing or Removing Metadata

To edit or remove a piece of metadata, first select the appropriate file, and then the metadata value from the table. Edit the value field, deleting all text if you wish to remove the metadata.

The process is the same when updating a folder with child folders or multiple files, but you can only update metadata that is common to all files/folders selected.

The value tree shows all currently assigned values as well as previous values for the current session, so changed or deleted values will remain in the tree. Closing the collection and then re-opening it will remove all values that are no longer assigned.

Reviewing Assigned Metadata

Sometimes you need to see the metadata assigned to many files at once – for instance, to determine how many files are left to work on, or to get some idea of the spread of dates.

Select the files in the Collection Tree you wish to examine, then right-click and choose "Assigned Metadata…". A window called "All Metadata", dominated by a large table with many columns, appears. The first column shows file names; the rows show all metadata values assigned to those files.

Drawing the table can take some time if many files are selected. You can continue to use the Librarian Interface while the "All Metadata" window is open.

When it gets too large, you can filter the "All Metadata" table by applying filters to the columns. As new filters are added, only those rows that match them remain visible. To set, modify or clear a filter, click on the "funnel" icon at the top of a column. You are prompted for information about the filter. Once a filter is set, the column header changes colour.

The filter prompt has a "Simple" and an "Advanced" tab. The Simple version filters columns so that they only show rows that contain a certain metadata value ("*" matches all values). You can select metadata values from the pull-down list. The Advanced version allows different matching operations: must start with, does not contain, alphabetically less than and is equal to. The value to be matched can be edited to be any string (including "*"), and you can choose whether the matching should be case insensitive. Finally, you can specify a second matching condition that you can use to specify a range of values (by selecting AND) or alternative values (by selecting OR). Below this area is a box that allows you to change the sort order (ascending or descending). Once you have finished, click "Set Filter" to apply the new filter to the column. Click "Clear Filter" to remove a current filter. Note that the filter details are retained even when the filter is cleared.

For example, to sort the "All Metadata" table, choose a column, select the default filter setting (a Simple filter on "*"), and choose ascending or descending ordering.

Importing Previously Assigned Metadata

This section describes how to import previously assigned metadata: metadata assigned to documents before they were added to the collection.

If metadata in a form recognized by the Librarian Interface has been previously assigned to a file – for example, when you choose documents from an existing Greenstone collection – it is imported automatically when you add the file. To do this, the metadata must be mapped to the metadata sets available in the collection.

The Librarian Interface prompts for the necessary information. The prompt gives brief instructions and then shows the name of the metadata element that is being imported, just as it appears in the source file. This field cannot be edited or changed. Next you choose what metadata set the new element should map to, and then the appropriate metadata element in that set. The system automatically selects the closest match, in terms of set and element, for the new metadata.

Having checked the mapping, you can choose "Add" to add the new metadata element to the chosen metadata set. (This is only enabled if there is no element of the same name within the chosen set.) "Merge" maps the new element to the one chosen by the user. Finally, "Ignore" does not import any metadata with this element name. Once you have specified how to import a certain piece of metadata, the mapping information is retained for the collection's lifetime.

For details on the metadata.xml files which Greenstone uses to store the metadata, see Chapter 2 of the Greenstone Developer's Guide – Getting the most out of your documents.

old/enriching_your_collection_with_metadata.txt · Last modified: 2023/03/13 01:46 by 127.0.0.1